19  Evaluation in Literary History

Artjoms Šeļa (Kraków)

19.1 Introduction

How can evaluation be framed in the context in literary history? All method-specific evaluation techniques for supervised classification (Underwood 2019), clustering (Šeļa, Plecháč, and Lassche 2022) or information retrieval (Manjavacas, Long, and Kestemont 2019) are used in historical research. Often evaluation metrics are recycled to model a perspective, that is, to use them as a measure of label or class distinctiveness and trace the change in these values over time under different exposure of supervised model to parts of data. Are library-assigned ‘detective’ labels for books more recognizable than ‘gothic’ using only 19th century training data (Underwood 2019)? Are future states of poetic traditions recognizable from the past (Šeļa, Plecháč, and Lassche 2022)? A so-called ‘perspectival modeling’ is not interested in maximizing model accuracy, but is using change (or, indeed, a continuity) of accuracy scores to argue about distinctiveness of periods (Broadwell and Tangherlini 2017), genre groups (Calvo Tello 2021), or pace of change in literary judgement (Underwood and Sellers 2016).

In this survey, I want to focus on an evaluation process that is specific to questions of history: the observed change and continuity itself. Statistical modeling against causal hypotheses is one way to approach this, but, as it follows from the chapter on “Analysis in Literary History” (Chapter 18), literary history is only starting to engage with causal inference. Another evaluative path is empirical validation of the observation, or robustness checks. These can come in many forms: by running replication analysis on corpora of different designs, by varying sampling strategies, establishing random baselines, performing bootstrapping or permutations.

19.2 A case in point: decrease of abstract lexical items

A case of a decrease of abstract lexical items in 18-19th century Anglophone fiction can be an informative example of how corpus selection and replication is coupled with empirical evaluation and evidence accumulation. Heuser and Le-Khac (2012) first find a show, don’t tell effect in the development of fictional language: using the Chadwyck-Healey corpus (around 3,000 books at that time), they show a decline of abstract diction and increase in concrete words and action verbs. The authors link this to the change in the social space: novels stop being about private, inner worlds and open up to the public surroundings. Underwood and Sellers (2016), using another corpus of 4000 works assembled for the occasion, find a similar effect, but link it to a divergence of fictional language from non-fiction, which is further corroborated by Underwood (2019) using vast HathiTrust data. In parallel, the same trends are observed when novels are stratified by canonicity (Algee-Hewitt et al. 2016), and, independently, by Kao and Jurafsky (2012) in their study of Imagist language and influence in poetry, where the authors assign the drop in abstractness to the specifics of the movement.

The observed trend is robust and ubiquitous. We know it is there, at least for English-language literature, but why is it there? There are multiple explanations: social change, diction divergence (emerging autonomy of literature?), literary movements. One of the problems with computational inquiry in literary history is that the answers we are seeking may very well not be in our data (or metadata). Luckily, there are some ways to deal with that, too.

19.3 Towards evaluation in literary history: generative models and historical simulations

Research in computational literary studies, and in the broader field of Digital Humanities, is mostly descriptive and data-driven. I use ‘descriptive’ here as an opposite to ‘generative’. Scholars describe the outcomes of literary history: corpora, collections, trends, aligned texts, pathways of citations and try to understand the causes and drivers of observed patterns a posteriori. These claims might sound plausible, or counter-intuitive, but how we can tell which of those are more likely than others? The usual answer, in some parts of cognitive and social sciences is: controlled experiments. For complex systems and their macro-historical dynamics however, fully-fledged experiments are unrealistic (how would an experiment in evolution of dramatic structure look like? How could one in the rise and diversification of the European novel be designed?). In these cases, simulations and formal modeling is, often, the only way to directly engage with mechanics of history.

Simulations and generative models are, primarily, tools for understanding — and they are mostly ignored in CLS /DH , despite recently gaining traction in adjacent fields, like archaeology, that deal with similar problems of having an incomplete record of outcomes of cultural processes Acerbi, Mesoudi, and Smolla (2022). Formal models allow to explicitly define parts of the systems we are interested in and relationships between these parts (see Smaldino 2017 for a defense of formal modeling). The main advantage of simulations is that they put our verbal, informal models at risk, by forcing out our basic, deeply ingrained assumptions about important features of the objects and processes we study. In turn, the observed vs. assumed behavior of simulations allows better understanding and refinement of theories with their causality claims. In the last few decades, the modeling scene saw a rise of agent-based approaches to simulations (ABS): models that are not based on deterministic equations, but on the individual interactions between agents that can lead to complex, often not immediately intuitive, outcomes (examples of ABS implementations of classic models are sampling error or random drift in small populations, murmuration behavior in birds, or patterns of racial/social segregation).

There are only a few examples of formal modeling and agent-based simulations known to us that have been done for questions of literary history. Early attempts focused on simulating communication circuits in book history — an informal model of the book market by Darnton (Throne 2014). Gavin (2014), relying on Throne’s model, in his essay tried to incorporate social simulations into the humanities. However, almost a decade later, a CLS INFRA survey (Van Rossum and Šeļa 2022) did note both nearly an absolute absence of simulations in current DH training and a lack of interest in the topic by practitioners. It suggests that formal modeling remains an alien, unknown approach with no immediately perceived usability.

Specifically important to literary history is a relatively unknown paper by Sack (2013) that directly challenges Moretti’s claim that diversity of the novels in 19th century was driven by a rapid growth of reader’s market. The model use authors as agents; authors try to figure out preferences of the readers (navigate the differently set up ‘preference landscapes’); they do it by ‘writing’ novels (represented as binary strings); each new novel can be a recombination, mutation, or a direct copy of previous, relatively successful texts; under some conditions, there is also a feedback loop between produced texts and readership market. Since simulated novels have explicit traits, their diversity can be measured and compared at the start and at the end of each simulated run of history. Simulations show that under the presented conditions, population size by itself does not guarantee growth in novel diversity (in case of homogeneous preference landscape diversity will actually decrease). Researchers reframe the problem as a general ‘product diversification’ process and suggest that an individual ‘creativity factor’ might be important for making novels explore diverse stylistic directions. This complicates the relationship between consumer population size, diversity and innovation and can drive further research towards the study of the very conditions that can alter innovative behavior itself (e.g. the competition between elites in the cultural field), which can be used to refine the model, which can be used to update the theory, etc, etc.

More recent examples of formal simulations for historical questions revolve around medieval manuscripts (paleography, thus, has all three components of a fully functioning scientific engine: empirical studies, experiments, and generative models). Kestemont and Karsdorp (2020) simulate the historical loss of manuscripts under very simple conditions to test their estimation of missing or undiscovered copies. They first simulate the very process of loss, where the ‘true’ population is known, and then, using the remains of simulated data, test how well the ‘unseen species’ estimator points to the the true value. This direction was recently radically expanded by Camps and Randon-Furling (2022) who simulate the whole process of manuscript writing, transmission and loss to not only argue about the discipline’s assumptions, but also show the emergence of the frequent features of manuscript histories observed in stemmata reconstructions (e.g. the famous binary fork at the root).

Ferdinand Braudel, having a different kind of model in mind — the explanatory informal models of historical change — wrote: “Once the ship [the model] is built, what interests me is to launch it, to see if it floats, then to make it sail, as I wish, up and down the waters of time. A shipwreck always constitutes the most significant moment” (Braudel and Wallerstein 2009, (1958), p. 194-195). Simulations allow historians to launch thousands of ships, design shipwrecks and gain information that is hardly accessible otherwise. Current use of simulations in CLS demonstrates the potential of counterfactual literary history.


See works cited and further readings on Zotero.

Citation suggestion

Artjoms Šeļa (2023): “Evaluation in Literary History”. In: Survey of Methods in Computational Literary Studies (= D 3.2: Series of Five Short Survey Papers on Methodological Issues). Edited by Christof Schöch, Julia Dudar, Evegniia Fileva. Trier: CLS INFRA. URL: https://methods.clsinfra.io/evaluation-lithist.html, DOI: 10.5281/zenodo.7892112.

License: Creative Commons Attribution 4.0 International (CC BY).