General Introduction

Christof Schöch (Trier)

Welcome to this methodological survey of Computational Literary Studies (CLS). The aim of this publication is to document and describe current, widespread research practices in CLS, based on a large collection of publications that have been published in this field over the last approximately ten years. The perspective of this survey is primarily descriptive: it aims to document current, widespread practices as the authors were able to observe them in the published literature. In this sense, the survey can also serve as an annotated bibliography of sorts and as a guide to further reading. Despite the fact that this survey is not intended as an introductory textbook, it can nevertheless also serve as an introduction to several research areas or issues that are prominent within CLS as well as to several key methodological concerns that are of importance when performing research in CLS.

Areas of research and steps in the research process

The research areas or issues covered by this survey are the following, and correspond to significant strands of CLS research:

  • Stylometric authorship attribution, that is the identification of possible authors of a text of anonymous or disputed authorship using quantitative indicators (for an introduction, see Chapter 5);
  • Literary genre analysis, that is the analysis of the similarities and differences between groups of texts belonging to related but distinct literary genres or subgenres (see Chapter 10);
  • Literary history, that is the investigation of patterns and developments that are determined by factors such as literary periods and movements or the development of literature over time (see Chapter 15);
  • Analysis of gender, that is the investigation of patterns, differentiations and other phenomena related either to the biological sex or social gender of authors or to the ascribed sex and gender identities of fictional characters in literary works (see Chapter 20);
  • Finally, the analysis of canonicity and prestige and the ways these attributions to authors and texts are related both to textual and extra-textual factors and processes (see Chapter 25).

The key methodological concerns covered by this survey, in turn, are the following, and are directly related to a certain number of key steps in the research process:

  • Corpus building, that is the design and composition of corpora in such a way that they best support the investigation of one or several research questions (for an introduction, see Chapter 1);
  • Preprocessing and annotation, that is the process of preparing texts selected for inclusion into a corpus, or belonging to a previously designed corpus, by way of text encoding, data cleaning, token-level annotation and document-level metadata collection (see Chapter 2);
  • Analysis, that is the performance of qualitative or (in particular) quantitative methods of operationalizing or formalizing specific literary phenomena and the investigation of the nature, prevalence, and distribution of such phenomena in literary corpora (see Chapter 3);
  • Finally, the formal evaluation of the robustness, generalizability, explainability, performance and/or significance of the analyses performed in the previous step (see Chapter 4).

Some practical matters

The two perspectives described above, the research areas or issues, on the one hand, and the methodological concerns or steps in the research process, on the other, structure this survey, which is presented as a two-dimensional grid of short texts. As a consequence, this survey can be read in at least three manners: Readers who are primarily interested in different aspects of a given research issue, such as authorship attribution or genre analysis, may want to read all the texts in the relevant column, from top to bottom. Readers, however, who are rather interested in different aspects of a given step in the research process or methodological concerns, such as corpus building or evaluation, you may want to read all the texts in the relevant row, from left to right. A set of short introductions to each research issue and each methodological concern provide orientation to readers adopting either approach. Finally, readers are of course welcome as well to dive right in and read texts in any order they prefer.

Please note that bibliographic references are included in each text using the ‘author date’-system in brakets within the text. The full references for a given section, including both works cited as well as additional references recommended for further reading, can be obtained by clicking on the link provided at the end of each individual chapter. A list of all cited references is also provided for convenience’s sake. In addition, all cited references as well as further readings are available in the CLS Bibliography that also documents the corpus of publications that is the foundation of this survey.

Citation suggestion

Christof Schöch (2023): “General Introduction”. In: Survey of Methods in Computational Literary Studies (= D 3.2: Series of Five Short Survey Papers on Methodological Issues). Edited by Christof Schöch, Julia Dudar, Evegniia Fileva. Trier: CLS INFRA. URL:, DOI: 10.5281/zenodo.7892112.

License: Creative Commons Attribution 4.0 International (CC BY).