Post-Scriptum

Schöch, Christof

doi:10.5281/zenodo.7892112

Christof Schöch (Trier)

Introduction

With the preparation, conceptualization, writing and publication phases behind us now, in April 2023, we as editors and coordinators have felt a need (and seen potential usefulness to others) for a brief look back on the challenges and benefits of the intense collaborative process of creating this Survey of Methodological Concerns in Computational Literary Studies.

We would like to structure this reflection into two concerns, despite the fact that they are of course closely related: the collaborative writing experience as a process of developing and realizing a shared understanding of the survey’s goals, structure and format, on the one hand; and the collaborative experiment in writing technology that we conducted at the same time, on the other hand.

Collaborative Writing, or: Towards a shared vision

Although they were originally conceived as five separate and independent surveys, we quickly realized that in order to make the most out of the idea of creating a Survey of Methods in CLS research, a different approach would be more useful. Namely, to think of the survey as a whole as a grid structured by two axes: the key areas of research we wanted to address (such as authorship attribution or canonicity) and the key steps in the research process (such as corpus building or data analysis).

This decision had a number of consequences. For instance, in the grid structure, each area of research, and each step in the research process, would receive an introduction that would be valid and useful to the survey as a whole, thus reducing redundancy and increasing readability. Also, instead of five surveys of medium length, we would end up with a set of 30 short sections forming one coherent whole.

However, many challenges in the implementation of this idea remained. First of all, even with a wiki that allows, in principle, writing and revising right in the online editor, this is clearly not the preferred way of writing for most people, who prefer to use their own writing environments and only upload finalized texts into the wiki in the end. This, however, created challenges for the coherence of the approach and style between sections. In addition, in order to minimize overlap between sections, a clear and shared understanding of the scope of each research area and step in the research process was essential, and needed to be developed in virtual meetings by the geographically-distributed team. Also, with the general and conceptual introductory chapters regarding each research area and step in the research process came a need to agree on a sensible balance between introductory, easily readable textbook prose providing an overview, on the one hand, and more specifically survey-like, detailed documentation of recent practice in CLS research on the other hand.

Connected to this question is of course the question of coverage: with over 1600 publications in the domain of CLS documented in our Zotero database for the period 2011-2020 alone, not every publication could be cited and described in the survey. Still, around 400 publications ended up being mentioned once or multiple times in the survey.

An Experiment in Collaborative Writing Technology

The conceptual work went hand in hand with an open-minded and experimental approach to the question of writing and publication tools. Again, this came with advantages and challenges. Our setup, which may look simple and logical after the fact, but which it took time to settle on, can be described as follows:

Writing with a wiki: Quite early on, we decided that we wanted to use a writing tool that would clearly support a collaborative writing experience, with the possibility of continuous mutual reading and feedback, in order to make sure our shared vision for the survey would become visible in a unified approach to the texts in each section. Therefore, we opted to use a wiki for writing, which thankfully was directly available in our existing Gitlab infrastructure. This is a wiki that is Markdown-based, which turned out to be a good match with our publication framework (see below). It also has a friendly visual editor for a smooth writing experience.

References in Zotero: It was also quite obvious that we wanted to take advantage of the bibliography and corpus of CLS research that we had already assembled for an earlier deliverable (D3.1: Baseline Methodological User Needs Analysis, 2022), in order to give a large and well-document empirical basis to our survey. The corresponding CLS Bibliography was already publicly available on Zotero and was therefore easy to re-use and expand for the survey.

Document transformation using Quarto: An initially less obvious choice was the publication environment. With the grid structure came the desire to publish the survey in a format that would encourage non-linear reading strategies, and we felt that a browser-based reading experience would have clear advantages here over a PDF, which we see more as a derivative, static format for offline reading. Therefore, we wanted to be able to easily create both PDF and HTML versions of our text. The high density and importance of references that a survey inevitably brings with it also meant we needed a robust integration of bibliographic data into the workflow. The solution we found for these requirements was Quarto, an elegant and flexible single-source publication environment. Here, our texts in Markdown from the wiki can be combined with several other files to generate formatted, interactive texts, using the Quarto ‘book’ format.¹

Publication on Zenodo and clsinfra.io: Finally, we decided to do the publication of our outputs in two places. On Zenodo, for long-term archiving of all materials, from the Markdown source files to the PDF produced by Quarto, but including the metadata, configuration, BibTeX and HTML files as well. And on the CLS INFRA project website, where the set of static HTML files can easily be placed and provide an intuitive, interactive reading experience.²

While we can generally report that the combination of Markdown in a wiki, BibTeX from a Zotero collection, and Quarto for bringing it all together is a great combination, there were of course also technical challenges.

For example, we needed to develop a tagging system to help us track our own work and allow us to generate per-section bibliographies for use by our readers. Because of the combinatorial nature of our grid, this posed challenges for efficiency and precision. Also, at some point our Zotero library would not sync correctly anymore, presumably because of a mismatch between Zotero versions installed on the various computers in our group, making the database instances incompatible with each other. We had to reduce the group of collaborators to the bare minimum and create a new library in order to resolve this issue; clearly not a good solution.

Another example is the way the Markdown files from the Gitlab wiki come together in Quarto. Despite the fact that each wiki on Gitlab is also a repository and can be cloned, making the Markdown files readily available in bulk, it was necessary to move them to a separate repository for production, if only because the file naming conventions in the wiki (based on the title of each page) did not correspond to the way we wanted to design the URLs in the final publication. This is an area where more experiments with a closer integration would probably have been beneficial, because creating duplicates of final versions of section files inevitably, despite our best efforts, led to occasional parallel revisions on both versions of some of the files.

Conclusion

In conclusion, it seems to us that, beyond the publicly available result of this collaborative effort, and despite (or perhaps even because) of the many challenges we encountered, whether regarding the concept, the technical setup, or the teamwork aspects of the project, the most important outcome is probably the learning experience of the entire team. Performing such a collaborative writing process together from the initial idea to the finished product taught as innumerable things. Next time we write a book with a group of eight people, we will be able to anticipate challenges and pitfalls earlier and be able to create an even smoother experience.

Would we undertake such a collaborative writing project again with a shared understanding of the conceptual structure emerging only in the process? Maybe. Would we do it again using a Gitlab Wiki, a Zotero library and Quarto? Absolutely!

Citation suggestion

Christof Schöch (2023): “Post-Scriptum”. In: Survey of Methods in Computational Literary Studies (= D 3.2: Series of Five Short Survey Papers on Methodological Issues). Edited by Christof Schöch, Julia Dudar, Evegniia Fileva. Trier: CLS INFRA. URL: https://methods.clsinfra.io/postscriptum.html, DOI: 10.5281/zenodo.7892112.

License: Creative Commons Attribution 4.0 International (CC BY).

A few more details for the technically-minded: Quarto is built on top of the document transformation tool pandoc and can be used in conjunction with plugins for editors such as Visual Studio Code. It relies on a YAML file (_quarto.yml) to provide a number of parameters for the book, including metadata, the structure of the book (parts and chapters), and layout parameters for the various output formats (e.g. template file or page format for PDF; theme or base font size for HTML). The bibliographic data is exported from Zotero as a BibTeX file (references-cited.bib) and rendered by Quarto in a citation style defined in a CSL file (chicago-author-date.csl). Some more display parameters can be defined using a CSS file (custom.scss), including things like the banner background or the relative font sizes for headings. For the HTML version, we also embed additional metadata using the meta tags in the header (there is helpful information on this on the Zotero website (“Exposing your metadata”) and on David I. Verrelli’s site (“Metadata tags for academic publications”). A Python script (metadata_embed.py) is used to generate and embed the section-level metadata into the <head> of each rendered HTML file, based on a pre-populated template file (metadata_template.md) and a TSV file (metadata_source.tsv) that contains section-level metadata. The PDF file, in turn, is prefixed with a CLS INFRA coversheet for consistency of our deliverables. All of these files are included in the Zenodo upload for documentation.↩︎
Gitlab pages, directly from the CLS INFRA Gitlab instance, would be an alternative and very simple publication channel for the HTML version, given that there are ready-to-use template files for static HTML available in Gitlab.↩︎