Decoding the American Scholar: Towards a Distant Computational Reading of Emerson’s Prose

The following entry discusses some ideas that I plan to explore in a research paper that I will write for a course titled “Knowledge, Belief, and Science in Melville’s America,” which is being offered by Dr. Laura Dassow Walls at the University of Notre Dame during the fall semester of 2012.

During my last semester of school work, I became fascinated with the concept of hybridity. Something that became extremely apparent during my readings was the fact that the humanities and sciences are not as opposing as we may initially deem. Also, I became aware of the tantalizing possibilities of approaching humanistic studies in a scientific/quantitative fashion (and the extent of these possibilities is increasing tenfold with a course I am taking in Digital Humanities/Humanities Computing). This research project will be my first attempt to approach a collection of literary texts from a scientific and quantitative perspective using the tools that I’ve encountered in the area of humanities computing. My hope is that this approach will help me to understand the ever-elusive Ralph Waldo Emerson  and the overall patterns and systems that are implemented in his prose.

As readers of my website are well aware by now, Emerson has been an extremely difficult scholar to understand (at least in my opinion). I tend to develop a strange sense of fascination and utter confusion when I read his prose. I also find it tedious to delve into close readings of his essays mainly because he seems to posit ideas that are at times contradictory and difficult to conciliate (check out my past posts that discuss Emerson in order to understand this point). Of course, this is arguably because Emerson wrote in an extremely subjective point of view, but even more so, it is due to the fact that he was trying his best to grapple with notions that are both abstract and elusive: god, nature, humanity, science, religion, and methods. It can also be argued that Emerson had difficulties in terms of separating the objectivity of his idea(l)s from the subjectivity of his personal experiences. This notion is evidenced in essays such as “Experience,” in which he argues that grief is pointless and futile in the vast scope of the universe, yet it is blatantly obvious that the death of his child created an existential chasm within his life (check out his collection of letters that he sent after the death of his child if you don’t believe me).

How do we even begin to understand such a complex and obviously tormented individual? In order to hypothesize answers to these questions, I am going to suggest a rather Thoreauvian move: rather than trying to integrate myself with the text, and rather than trying to figure out Emerson through close readings, I am going to suggest that we should take a step back and try to piece together the mystery of Emerson through a distant reading.

What is distant reading? Franco Moretti greatly pushed forward this practice when he posited that the issue of close reading is that scholars only able to study a very select amount of texts, while virtually ignoring the influence of other texts within a collection or canon. Thus, textual readings are ignored, and instead, the scholar focuses on determining systems, patterns, themes, and tropes that exist within a collection of texts in order to understand a system in its entirety. Now, Moretti is quite aware that when conducting a distant reading, there are definitely particularities and ideas that are lost. This is an extremely pressing issue, especially when dealing with authors such as Emerson, whose prose and poetry were injected with countless political, religious, and social ideologies that are ostensibly lost when approaching the text from a distance. However, Moretti argues that this is perhaps the only way to make the unmanageable and invisible forces behind literature visible:

Distant reading: where distance, let me repeat it, is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes—or genres and systems. And if, between the very small and the very large, the text itself disappears, well, it is one of those cases when one can justifiably say, Less is more. If we want to understand the system in its entirety, we must accept losing something. We always pay a price for theoretical knowledge: reality is infinitely rich; concepts are abstract, are poor. But it’s precisely this ‘poverty’ that makes it possible to handle them, and therefore to know. This is why less is actually more. (Conjectures…)

How will this notion of distant reading take place within my research? Simple. I created a database of Emerson’s major prose works in digitalized format (using an archive of Emerson’s texts in HTML format), including a selection of his early addresses and lectures, his first series of essays, and his second series of essays. This database of works, adapted from the prose readings available in the Norton Critical edition of Emerson’s prose and poetry, was organized in chronological order and saved within the same archive.

I then used a series of online textual analysis applications known as “Voyant Tools” (which I discuss in length in this post), which use a series of algorithms that will allow me to approach Emerson’s works from a distant quantitative fashion: the program indicates the frequency and distribution of all of the words used within the inputted database, and it is even able to graphically illustrate the trend of each word within the entire scope of texts that I uploaded. Since the database contains the texts in chronological order, this will allow me to observe patterns of word usage from Emerson’s earlier works to his later ones.

I have already tested the program using a tentative collection of Emerson’s most famous prose works, and the results have indeed been interesting. I programmed Voyant Tools to remove stopwords from the database, meaning that all grammatical and non-content words were removed from the data that was provided. The application then produced a frequency list of the words available in the entire corpus. The most frequent words found within all of the words inputted into the database were as follows (keep in mind that this list was generated using Emerson’s early addresses and lectures, his first and second series of essays, and his essay on Nature):









































I think it is unsurprising to see that ‘man’ and ‘nature’ are the most common words found within Emerson’s prose, but something that did provoke a vast sense of curiosity was the abstract and conceptual nature of the words on this list. Not only does this provide evidence that Emerson was indeed an abstract writer, but it also highlights an important issue: most, if not all of these words, have various shades of meaning can alter immensely according to the context the word is being used in, and are extremely linked to subjective ideological views of the word. Also, note that most of the words in this list are concepts that tend to be associated with positive feelings and optimistic attitudes (god, truth, love, mind, great, good, new, life, world, nature, men, etc.). I think this says an awful lot about the rhetorical nature of Emerson’s prose, and how it is expected that the overabundance of these positive terms will serve as effective emotional rapport for an audience.

What was even more fascinating was the trend graphs that I was able to generate, which indicate the usage of words across Emerson’s texts in a chronological fashion. Here are a slideshow of the graphs that I generated:

I think that the graphs tend to demonstrate some very insightful trends. For instance, Emerson’s use of the word ‘soul’ is particularly frequent during his earlier addresses and lectures (with the usually appearing on an average of over 50 times), whereas the use of the term begins to drop noticeably after the publication of his “Over-Soul” essay. Usage of the term ‘god’ starts off particularly strong in his earlier prose works, it drops continuously as he continues to publish essays, and suddenly, towards the publication of his essay on “Nature,” the use of the term sky-rockets. What promoted this sudden interest in god? What led to this dramatic spike in the data?

I thought the graph that illustrated the trend of the words ‘new’ and ‘old’ was very intriguing, for not only is the term ‘new’ being used much more frequently than the term ‘old,’ but both concepts tend to follow the same rises and falls throughout Emerson’s work, indicating that the concepts are frequently contrasted and are perhaps presented in a binary fashion. Notice how these words are consistently used throughout the entirety of the prose works inputted in the collection of Emerson’s prose. I never realized how consistent “newness” and “oldness” were in Emerson’s prose!

The graph that compares the use of ‘man’ versus ‘men’ is also intriguing to me, for not only do both terms tend to demonstrate the same degree of fluctuation throughout Emerson’s works, but there is a noticeable divergence between the lines when they approximate Emerson’s latter works: whereas the plural ‘men’ is being used around 40 times when approaching his essay on nature, the singular ‘man’ is used nearly 150 times (it surpasses the use of ‘men’ by a margin of nearly 300%). Perhaps this is in some way reflective of his increasing belief in the self-reliance of human beings, and his increasing concern with the perils of subjectivity.

I think there is something worthwhile to be studied here. The graphs have definitely opened up questions, but now the issue is to come up with some concrete answers and interpretations. I wonder how these graphs will change when I input more of Emerson’s prose work into the database. I am also concerned with whether or not I’ll be able to develop a full-fledged research project based on this quantitative data. My guess is that I will ultimately resort to close readings in order to better understand the trends and word frequencies produced by the program, but that in and of itself is an issue: I simply do not have the time to conduct close readings of every single one of the essays available in the database (especially considering that I am currently teaching, taking graduate courses, and working on annotations for a book series).

Do you have any thoughts or suggestions for this project? Does it seem somewhat feasible and worthwhile? Any and all feedback will be greatly appreciated!


  1. Joel says:

    An interesting post Angel, and I can see how making these graphs is fun and intriguing. I have to admit I am antagonistic to distant reading, and Moretti is quite harrowing here: “But it’s precisely this ‘poverty’ [of concepts] that makes it possible to handle them, and therefore to know.” First of all, what does Emerson himself say of objects in ‘Experience’? “I take this evanescence and lubricity of all objects, which lets them slip through our fingers then when we clutch hardest, to be the most unhandsome part of our condition.” In short, Emerson, like Nietzsche after him, does not trust in an absolute ability to know, and I don’t think he would settle for the kind of paltry empiricism Moretti seems to find consoling here. Granted Moretti is talking about concepts, but I think objects are parallel to concepts for Emerson and Thoreau: they both open out, from their finitude, unto a cosmos. That the abstract terms Emerson employs shift based on his mood is par of what makes Emerson difficult and intriguing and is, like you write here, what you got simply from reading him. Distant reading only seems to confirm the same questions (good old circular certainty!), and so I remain unconvinced that there are new questions here, and that this method doesn’t put Emerson into a test tube – which one might only hope will drop!

    • Angel D. Matos says:

      Thanks for your reply Joel. I think you hit the heart of what is at stake here: too much is being lost when stripping Emerson from the contextual content of his work, especially when considering that he had an aim that transcended the realm of facts (further increasing the difficulty of understanding Emerson with a simple reading). I find the notion of distant reading very tantalizing, but I also am deeply skeptical about the results that are obtained… indeed, at times, the results are very tautological! The charts do reveal things that we already know, to some extent. Of course, as you pointed out, Emerson is probably rolling in his grave as the possibility of conducting this research is being discussed; I was just intrigued with the possibility of applying quantification to a scholar who is in essence, qualitative in every imaginable way. I can’t help but wonder if there is something that the data may reveal that I’m not quite seeing at this point. Thank you for your insightful response!

