6 Analysing Texts - SunoikisisDC/SunoikisisDC-2024-2025 GitHub Wiki

Analysing and visualising texts

SunoikisisDC Digital Classics: Session 6

Date: Thursday February 27, 2025. 16:00-17:30 GMT.

Convenors: Kaspar Beelen (University of London), Gabriel Bodard (University of London), Megan Bushnell (Oxford Text Archive)

Youtube link: https://youtu.be/xNihExxxOy0

Slides: Combined slides (PDF)

Outline

This sessions introduces theory and practice of various digital methods for the exploration, analysis and visualisation of historical texts. We begin with theoretical discussion of quantitative, stylistic and computational linguistic approaches to text analysis, defining terms, some history of the discipline, and an overview of tools and codebases available. The second half of the session is a practical demonstration of using the Voyant Tools reading and analysis environment, showing examples in English, Latin and Greek and some of the visulaisation modules in Voyant. We end with a suggested exercise for students to take away and try in their own time, and a general discussion.

Required readings

Hawkins, Laura F. 2018. “Computational Models for Analyzing Data Collected from Reconstructed Cuneiform Syllabaries”, Digital Humanities Quarterly 12.1. Available: http://digitalhumanities.org:8081/dhq/vol/12/1/000368/000368.html.
Rodda, Mar A., and Barbara McGillivray. 2024. “Computational Valency Lexica and Homeric Formularity.” Journal of Greek Linguistics 24(2). Pre-print: https://arxiv.org/abs/2208.10795.

Resources

Exercise

Try it out for yourself now in Voyant Tools!

First pick a text or group of texts to work with. Your text(s) must be digital and either in plain text, HTML, XML, PDF, RTF, or Word format. You might consider using the Diorisis Ancient Greek Corpus, or texts from the Oxford Text Archive (not from the OTA Legacy Collection!), or texts of your own. Voyant also has some texts available to load in. Keep in mind that distant reading tools like Voyant work best with corpora made out of many texts, so ideally pick several texts or an especially long, segmented one. Try to choose texts by the same author, or texts written in the same language in the same period, or texts of a similar type (poetry, prose, sermons, legal texts, etc.).
Once you have selected your texts, download them, and then load them into Voyant. Try experimenting with the various analysis tools to answer the questions below:

What are the most common words in your text(s)? Are they what you expected? If you provide stop words, do your results change?
If you uploaded several texts (or a segmented text), do you notice any patterns in the distinctive words discovered for each text or segment?
What are the differences between the Collocates, Links, and TermsBerry tools?
Are there any repeated segments across your text(s)? Why do you think they might be repeated? Or if there are none – why potentially are there none?
What is the most readable text (or segment)? How did they calculate this readability metric?

Now reflect on this process. Consider the questions below:

What did distant reading tell you that close reading could not?
Are there any types of texts that you think Voyant could not handle? Are there any research questions where Voyant might not be useful?
Which tool or visualization did you personally find the most useful and why?
Did you have any difficulties using any of the tools? Were you able to determine why?
Do you feel equipped to understand the metrics in Voyant tools?