Adventure, History, Letters, and Memoir: Mapping Title to Text in the 18th Century Novel
Eighteenth-century fictions often announce their genre in their titles: adventures, memoirs, etc. But what, if anything, do these “genre keywords” in titles actually indicate about the texts? Because researchers in the digital humanities frequently use metadata, like these titles, as a representation of a full work, it is important to investigate the connection between these two elements – here, title and text. To begin to analyze connections between title metadata and full-text data, I focused on the 1760s, using a small dataset of texts specifically from 1760-1770 (these texts represented the entirety of the “genred” texts I found searching through clean full text databases for titles on a full list of 1760s fiction in English created by gathering titles from ESTC and Raven’s bibliography of English fiction 1750-1770). I encountered significant data problems here – see this blogpost about those. My analysis of the four genres I ultimately worked with (adventure, history, letters, and memoir) suggest some very preliminary conclusions we can draw about the connection between the “genre” of a text based on the genre indicated in its title in the 1760s.
I collected as many titles published from 1760-1770 that included one of four “genre keywords,” including both titles first published in that decade and reprints. I chose these four keywords after making a wordcloud using full list of 1760s fiction titles compiled by END;1 the wordcloud helped me determine some of the major, repeated keywords that might plausibly indicate some kind of category for the books they were labeling. Novel was originally a fifth category that I planned to use as one of my genres, but (see separate END post on this) I had to eliminate it due to a lack of available clean full texts of “novels” from this period. Some of the works I found fell into multiple genre categories, and I assigned them to one based on the genre categories that had fewer or more works.2
I ran a topic model across all my files,3 which resulted in two pieces of output that I used in this analysis: a “composition” file, which shows the percent of each individual piece of text in the corpus that appears in each “topic;” and a “topic key,” which shows topics, or sets of words that probabilistically appear together throughout the corpus, and the relative “weight” or prominence of those topics. By going through this composition file, I was able to take the individual texts from each genre and re-calculate the percentage with which each of the genres appears in each topic. These are the percentages that appear throughout the analysis. The topics in the topic key are each actually very large, as all the words (minus stop words) in the entire corpus are divided amongst them, but what appears in each “topic” of the key is the top twenty most significant words of those keys – my analysis focuses on these words as indicative of the full topic. The “topics” in the topic key each have a numerical label, but because the computer generates them using probability, without any understanding of their “meaning,” it is up to the researcher to determine what the topic output actually means.
This meaning-assignment stage of analysis is one where the researcher’s subjective interpretations can get confused with the computer’s “objective” output. Throughout the analysis, I try to refer to the contents of each topic and explain the logic behind the “working titles” I used to compare the different topics and by extension the genres. But these working titles inevitably limited (even as they enabled) my analysis. For example, I called topic 1 “general positive human world,” certainly an interpretive leap from the computer-generated cluster of words in my topic key. For this and many of the other topics, I could have interpreted the key differently, or even given the topic a slightly different “name” and thus worked with the corpus at large differently.
The immediate work of looking through the differences between these title-defined genres resulted in some skewing of the data in both the memoir and letter categories. 29% of memoir appeared to be in a single topic in which no other genre appeared, and 35% of letters likewise appeared in a topic with a 1% showing for each of the other genres. Each of these topics seemed to correlate with a single work (in letters, Pamela, and in memoirs, The adventures of Peregrine Pickle. In which are included, Memoirs of a lady of quality), both of which were originally published before the 1760s, both of which were very long compared to the other works in their genre sections, and one of which included two genre keywords in its title. All of these factors may have contributed to their skewing of the data; to deal with the problem, I recalculated the percentages in each topic for the two genres without these particular works. All further analysis was done with these altered percentages in mind. The problem with my solution to problem #1 is that it makes my corpus of texts in both these categories smaller, and more subject to the particularities of other individual works, something I tried to bear in mind as I did my analysis. Another note on the particularities of this data as a product of topic modeling is that some of the topics should more accurately be split into two topics that happen to frequently appear together, or two topics might really be practically identical and could easily to be combined into one; this, too, I tried to account for in the analysis.
Although the majority of my analysis is genre-based, there are a few interesting categories that showed up with similar rates in all of the genres, some of them in very large quantities. The first, and largest, of these is topic 1, which I gave the working title “positive present human world.”
This is what that looks like from mallet:
1 1.23702 make give time good great life world present part mind thought person reason find pleasure manner love till kind heart
Broken down by genre, all the categories of texts fit into the topic with between 19% and 22%. This seems to suggest that all the texts approach the present human life as something that is ultimately positive: pleasurable, kind, reasonable. The verbs here are make, find, and give, and suggest that action in this positive world is generative and creative, with a purpose that propels into the future and is positive in the moment. More on this specifically in the analysis of the histories.
Another, much weaker category (9% in letters and 4% in all the other categories), but still notably evenly distributed category was number 11, which I dubbed “English people and especially men as intelligent and powerful.”
11 0.31762 man country men people letter great nature learning genius english proper beauty public human found author wisdom history taste china
This topic seems to locate wisdom, genius, and perhaps a history of those things with nature and English people, particularly men. It presents a nationalistic, patriarchal and (again) positive understanding of the world, when the world is premised on those nationalistic patriarchal terms. This is a weak topic, but its relatively even distribution suggests that the “weakness” of the topic refers to its lack of centrality in any particular text – it isn’t the “point” of any of the narratives, but rather a simply accepted fact in all of them, always present but never prominent, just as the kind of implicit assumptions the topic suggests tend to be.
The adventure texts, along with the history texts, grouped very clearly in a certain set of topics (perhaps because they are the categories I have the best data on, perhaps for other reasons – more on that when I get to the more ambiguous “letter” and “memoir” categories).
By far, adventure had its highest percentage in topic 19, at 25%, and (leaving out topic 1) topic 16, at 14%. Topic 19, which I named “public masculine economic activity,” looks like this:
19 0.10232 master guinea adventures sir made directly proper general chapter moment success power service business gave raised nature make human money
It suggests that “adventures” show up around business and money, and that they yield success, with power mixed in somewhere, perhaps in the conditions or the yield of adventures. All of this is (or at least shows up around things that are) natural, human, and proper. The topic is “masculine” in that the few person-indicators here are masculine, but probably first because the public economic world “adventures” seem to take place in are predominantly masculine in this period.
Topic 16 is formatted similarly, in a different location – if 19 is a public economic world for adventures to inhabit, 16 is the social world it inhabits:
16 0.65228 made man time gentleman place money company day young put house immediately gave master set honour friend people good gentlemen
As with topic 19, 16 is masculine-specific, but it focuses on the home; where the economic movements of the adventure meet the social world, they result in this “masculine domestic” topic, interesting in the context of debate that often focuses on hard-defined edges between male-female, public-private binaries. There aren’t a lot of verbs visible in the topic here, which implies a contrast to the economic activity that dominated topic 19. But the non-verb words that do appear in the topic suggest movement and action (e.g. day and immediately); the kind of action here suggests, however, the way in which the masculine domestic plays into the masculine public of topic 19. Immediately and day imply an outward focus, the possibility of motion in the future pointed towards windows signaling morning; even the word “company” suggests a porous boundary between the home and outside world. These promises of motion and references to the outside, without the overt action of verbs, suggest that the masculine-domestic allows adventures a connection to, and perhaps purpose within, the social schema it largely eschews, while still decentralizing those things from its narrative (this may function similarly to the socially-defined positive qualities like “proper” that appear with adventures in topic 19). Topic 2, which is adventure’s third-largest topic and references family positions and roles of both genders, totaling 11%, is actually the lowest genre showing in that category. And adventure is the only genre category with 0% in all the topics that suggest social/familial roles and relationships. It is focused on a world that lies outside of the social world of family and women in general, and it is perhaps for that reason that it seems to take characters that embody the norms of the social world – they are “gentleman” (although this could refer to their social position rather than their social behavior); they display “honour”; and they are significantly attached to their homes and households.
It is surprising that these proper, active men seem to be openly pursuing economic activity; markedly absent are words that signify glory – although adventure shows up with 6% in a maritime-focused topic, #7, the words there seem to signify means towards an end rather than the “ends” we associate with military and often adventure, e.g. commodore, hatchway, and consequence rather than glory, justice, freedom, etc. 19 is also one of two categories (the other is concentrated in history) that includes both the past and present tense of the verb “make.” It is worth noting that “made,” in both present and past tense, is the most frequent verb to appear in the top 20 words across all topics (what we see in the “topic key”), with 7 instances. But this particular combination of made and make in one topic suggests something generative in whatever else is happening in the topic, something that is creative in the past tense and moves forward into the present. It isn’t completely clear here what is being “generated” or made in this topic; perhaps it is money, perhaps it is the kind of proper rather than dangerous masculinity that adventures seem to rely on. Or perhaps it is the social order that adventure stakes a strong, if inattentive, claim to.
When I broke down the topic model I ran across all the texts by genre, the group of genre texts with the widest distribution amongst the topics was the history group (history texts appear in 16 of the 20 topics, followed by memoir at 15 and letters and adventure at 13). This means that “high” instances in history are comparatively lower than in other genres. The topics within which histories cluster most significantly, however, could all be grouped together as social category-focused. Aside from topic 1 (21%) the top categories for history are 2, 4, 9, 12, and 15. Topic 2 is the family role topic that was comparatively weak in adventures, at 15% for histories. Topic 12 is a similar family-specific topic, but with a focus on a male led family household (father, master etc without female equivalents like mother or madam). Topic 4 (8%) seems to delineate social roles (sir, gentleman, madam) in combination with speech and social qualifiers like age (“young”) and “manner,” but eschews family-specific social roles (mother, father); it seems to present the public face rather than private face of social interaction. Topic 8 is very similar to this one, at 4%, also noting social (but not family) roles, conversation, and youth. It is interesting that the specific age that gets mentioned in these two categories is youth – perhaps this is because that is the age most worth noting in a character, or perhaps because the focus of histories is on the youth (within the context of their social world and families).
This plays into several interesting qualities in topic 9, which is almost exclusive to histories, at 8% (1% from both letters and memoir, 0% from adventures). It is the only topic that seems to focus on romantic relationships – not, notably, on the emotions of romance, but on its formal social elements; this appears in words like “love,” “hand,” “dear,” and “hope.” But the category is also, in addition to adventure-heavy topic 19, one of two topics to include both the past and present forms of the verb to make. What is being “generated” in this topic is more suggestive than in topic 19:
9 0.07818 sir man dear miss lady madam lord love charles good heart harriet lucy woman brother made hand hope make tho
The combination of a socially sanctioned and protected romance with a generative quality, in the wider context of the social-role and perhaps youth-focused histories suggests that what is generated in these histories, the aim of the socialization of youths through their families, is the regeneration of the social structure from the past to the present.
Topic 15, also notably strong in history (10%), is the only other topic that seems to approach physical, bedroom-located relationships – it probably denotes either romantic/sexual relationships or emotional lying-on-the-bed-crying-or-praying scenes.
15 0.69551 hand eyes night head face replied hands door found began room time heard soul bed lay left stood fell cried
If it denotes the former, this topic, distinct from topic 9, is not generative, and it is not positive. “Good” and “hope” can accompany “hand” in topic 9, and even “love” is present, but in topic 15, where the body is expanded through four body signifiers and the verb “lay,” urgency and negative emotion replace positivity – “cried” is combined with “left” and “fell.” The high level of motion here isn’t directed as the simple “make” is in topic 9 – “began” is combined with “left” and falling and finding and crying. 15 is slightly stronger than topic 9 amongst the histories, but it is also evident in all of the other genres, while topic 9 is almost excusive to histories. 15, then, might represent implicit negative associations with undirected, uncontrolled sexuality or emotions, while 9, and the social history, represent a means of controlling it. If the topic is more representative of religious emotion, the chaotic motion and physicality of the topic is actually socially harnessed and controlled. In that case, the topic might rather indicate the otherwise negative and dangerous passions and directionless energies that religion contains.
The topics that letters cluster in, excluding the general topic 1, are 5 (politics and war from the social position of the aristocracy), 11 (men and intelligence) and 14 (a topic focused on you in different variations, for example “thee” and “thou,” and on proper names). They also have a notably low presence of 7% in topic 2 (family roles).
Because the corpus of data, with one text removed, was very low for the letters, I am wary of assuming these results are due to qualities of letters in general rather than to specific texts. Topics 5 and 11 particularly seem like they may have been skewed towards their high percentages (23% and 9%, respectively) by particular texts: one of the “letters” is a history of England from the perspective of letters between a nobleman and his son, hence 5, and two are letters between two men, so perhaps mutual compliments of one another leading to 11.
The skewing here, first the extreme skewing from one text and then the possibility that the remaining texts are still skewing the results, may be a product of the fact that a letter is a form as well as a “genre.” As a form, “letters” can be filled with different kinds of content, across the 1760s and certainly over time.
If taken as a form that is the basis for something I decided to call “genre,” letters’ focus on “you” and on proper names (topic 14) in combination with their diminished focus on the family and general low showings in all the social role/relationship may suggest something about a particular perspective inherent in the (mostly second person) letter.
Letters approach topics through the filtered lens of a particular personal relationship, addressing “thou” and using lots of personal names. If the names stand in for individuals other than “thou,” than perhaps this personal relationship on which a letter is based leads to a more personal focus on other individuals immediately present in that filtering relationship. This personal focus doesn’t mean that letters are “emotional” or intimate in the way we might imagine a personal relationship, because the text itself is not about the personal relationship (see: about anything). It simply means that the approach to the topic is through a particular relationship and the particularities of that personal relationship rather than a larger social schema.
The memoirs, like the histories, were pretty evenly distributed, with low percentages across the topics. But they were distinctively strong in two topics: 2 (28%), the “family positions and roles” topic, and 14 (10%), the “you/proper names” topic. This is interesting in that the only other genre with a significant showing in 14 is letters, and in the case of letters, high percentage in 14 is paired with a distinctly low showing in topic 2 and in general across all the social category topics, both family-centric and more general or public. Memoir, in comparison, has a fairly even and high distribution across social category topics, second to history and ahead of the low letter showing and the almost-absent adventure showing. This is an interesting pairing, then: unlike letters, which perhaps privilege the one-to-one personal relationship over familial and social relationships, memoir seems to privilege one-to-one relationships in addition to social and, especially, familial relationships (re: high presence in topic 2). If a memoir is generally expected to focus on the life of an individual, whatever that individual’s life might contain, this is perhaps surprising. But it makes sense that immediate relationships, to “you”s and to the family, take extra precedence, followed by broader social relationships. This might place memoir somewhere in between the “form” that unites letters and the more clearly “genred” histories and adventures. Any life can be recorded in a memoir, but that life, at least in 1760s memoirs, seems to start with one-to-one relationships, expand to familial relationships, and generalize into social relationships.
1 I also made wordclouds of the 1760s titles in the END database and of all the titles END has catalogued thus far, which span the 18th century. The results [available in this public file] for specifically END 1760s titles and all 1760s titles are approximately the same, which suggests that the END database is a fairly representative sample of all texts! The 18th century results are unsurprisingly different, with, among other things, notably higher rates of romances and tales. ↩
2 I chose not to double count these texts in an effort not to keep my categories as even as possible with the texts available, so as not to overweigh certain genres in the topic modeling output, but with a better dataset I would have preferred to double count these multi-genred texts. ↩
3 The program I used for this topic modeling was MALLET, an open source program created at UMASS Amherst, using their automatic settings (stop word list, 1000 iterations, etc.). It runs best with a large number of shorter pieces of texts, so I split all the books I was working with into 500-line documents before feeding them into the program. The data that I got from MALLET and used in this analysis is available [here.] ↩