9 Linguistics - SunoikisisDC/SunoikisisDC-2024-2025 GitHub Wiki
Papyrological texts and linguistic research
SunoikisisDC Digital Classics: Session 9
Date: Thursday March 20, 2025. 16:00-17:30 GMT.
Convenors: Marja Vierros (University of Helsinki), Polina Yordanova (University of Helsinki)
Youtube link: https://youtu.be/A4Pc2cm3bjA
Slides: tba
Outline
This session discusses linguistic approaches to studying the language and text of ancient papyri, in particular on the use of Treebanks (texts digitally encoded with morphological and syntactic annotations). We begin with a theoretical overview of the topic, the kinds of research questions that these methods may be applied to, and some examples of recent projects and tools for digital analysis of ancient Greek Treebanks. We then focus on two main case studies, projects looking at variations in post-classical Greek through morphology and syntax, and on word order in documentary papyri, respectively. We end with a demonstration and proposed exercise involving two major recent tools: PapyGreek Search and KilnTreebank.
Required readings
- Henriksson, E. and Vierros, M. 2025. "PapyGreek Search: Exploring the Language of Greek Papyri." In: Reggiani, N. ed. Digital Papyrology III: The Digital Critical Edition of Greek Papyri: Issues, Projects, and Perspectives. Berlin, Boston: De Gruyter, pp. 163-184. https://doi.org/10.1515/9783111070162-011
- Mambrini, F. 2016. "The Ancient Greek Dependency Treebank: Linguistic Annotation in a Teaching Environment." In Romanello M. & Bodard G, Digital Classics Outside the Echo-Chamber. London: Ubiquity Press. Available: https://doi.org/10.5334/bat.f
Further readings
- Lajos Berkes. 2018. "Perspectives and Challenges in Editing Documentary Papyri Online A Report on Born-Digital Editions through Papyri.info." In: Reggiani, N. ed. Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri. Berlin, Boston: De Gruyter, pp. 75-86. DOI: https://doi.org/10.1515/9783110547450-004
- Biagetti, E., Inglese, G., Zanchi, C., & Luraghi, S. 2023. "Reconstructing variation in Indo-European word order: A treebank-based quantitative study." Language Dynamics and Change, 13(2), 198-231. https://doi.org/10.1163/22105832-bja10025
- Chronopoulos, Stelios, Felix K. Maier, and Anna Novokhatko, eds. 2022. Digital Text Analysis of Greek and Latin sources; Methods, Tools, Perspectives. Classics@ 21. Available: https://classics-at.chs.harvard.edu/volume/classics20-digital-text-analysis-of-greek-and-latin-sources/
- Dell'Oro, Francesca, Helena Bermúdez Sabel & Paola Marongiu. 2020. “Implemented to Be Shared: the WoPoss Annotation of Semantic Modality in a Latin Diachronic Corpus.” Sharing the Experience: Workflows for the Digital Humanities. Proceedings of the DARIAH-CH Workshop 2019. Available: https://zenodo.org/record/3739440#.XzqoTZMzZTZ
- Keersmaekers, Alek. 2021. “The GLAUx corpus: methodological issues in designing a long-term, diverse, multi-layered corpus of Ancient Greek.” Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, 39–50. Association for Computational Linguistics. Available: https://aclanthology.org/2021.lchange-1.6
- Mambrini, F., & Passarotti, M. 2016. "Subject-Verb Agreement with Coordinated Subjects in Ancient Greek: A Treebank-Based Study." Journal of Greek Linguistics, 16(1), 87-116. https://doi.org/10.1163/15699846-01601003
- Passarotti, Marco. 2019. "The Project of the Index Thomisticus Treebank." In Monica Berti (ed), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter. Pp. 299–320. Available: https://doi.org/10.1515/9783110599572-017
- Polis, S. & S. Rosmorduc. 2013. 'Building a Construction-Based Treebank of Late Egyptian: The Syntactic Layer in Ramses'. In Polis, S. & J. Winand (eds.), Texts, languages & information technology in egyptology: selected papers from the meeting of the Computer Working Group of the International Association of Egyptologists (Informatique & Égyptolgie), Liège, 6-8 July 2010. Liège: Presses Universitaires de Liège. 45–59. Available: https://orbi.uliege.be/bitstream/2268/110297/1/AegLeod9_03_Ramses2.pdf
- Nicola Reggiani. 2017. Digital Papyrology I: Methods, Tools and Trends. Berlin, Boston: De Gruyter. DOI: https://doi.org/10.1515/9783110547474
- Nicola Reggiani (ed.). 2018. Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri. Berlin, Boston: De Gruyter. DOI: https://doi.org/10.1515/9783110547450
- Nicola Reggiani (ed.). 2025. Digital Papyrology III: The Digital Critical Edition of Greek Papyri: Issues, Projects, and Perspectives. Berlin, Boston: De Gruyter. DOI: https://doi.org/10.1515/9783111070162
- Joanne Stolk. 2018. "Encoding Linguistic Variation in Greek Documentary Papyri The Past, Present and Future of Editorial Regularization". In: Reggiani, N. ed. Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri. Berlin, Boston: De Gruyter, pp. 119-138. DOI: https://doi.org/10.1515/9783110547450-007
- Lucia Vannini. 2022. "Online availability, impact and sustainability of digital papyrological resources." Digital Classics Online 8. Available: https://doi.org/10.11588/dco.2022.8.87562
- Vierros, M. 2018. Linguistic Annotation of the Digital Papyrological Corpus: Sematia. In Nicola Reggiani (Editor), Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri (pp. 105–118). Berlin, Boston: De Gruyter. Available: https://doi.org/10.1515/9783110547450-006
- Marja Vierros & Polina Yordanova. 2022. “Querying Syntactic Constructions in Ancient Greek Parsed Corpora: A Case Study on the Genitive Absolute in Literature and Documentary Papyri.” Classics@ 20. Available: https://classics-at.chs.harvard.edu/querying-syntactic-constructions-in-ancient-greek-parsed-corpora-a-case-study-on-the-genitive-absolute-in-literature-and-documentary-papyri/
- Yordanova, P. forthcoming 2025. "'Tolerable fluency and grace… and occasionally an interesting word order’: Quantifying language proficiency for the study of word order variation in documentary papyri". Submitted in Trends in Classics - Supplementary Volumes, De Gruyter. Author's copy available here
Treebanking Guidelines:
- Bamman David & al. 2008. Guidelines for the Syntactic Annotation of Latin Treebanks (v. 1.3). Available: https://github.com/PerseusDL/treebank_data/blob/master/v1/latin/docs/guidelines.pdf (only p. 3-21; 24; 26)
- Celano, Giuseppe G.A. 2014. Guidelines for the annotation of the Ancient Greek Dependency Treebank 2.0. Available: https://github.com/PerseusDL/treebank_data/edit/master/AGDT2/guidelines (only Chapter 3, including analysis of the hyperlinked examples)
Resources
- Treebanking overview
- Pedalion project
- PapyGreek
- Arethusa at Perseids
- Glaux
- KTB-lite
Exercise
I PapyGreek Search
In Ancient Greek, nouns, pronouns and adjectives have gender; they are either masculine, feminine or neuter. Personal pronouns express gender only in the third person (‘he’ or ‘she’, or ‘it’, which is also used for children; in plural ‘they’, gender distinction also exists, unlike in English). In their article, Henriksson & Vierros (2025) looked into feminine first person pronouns using the PapyGreek Search. This is only possible with the manually annotated papyri, where the annotator has added the gender information, even though Greek language does not mark the gender with this pronoun. They use this saved search: https://papygreek.com/search/190, that you can try out and browse the results.
- By modifying the above search, compare the number of feminine and masculine first person pronouns (the reason for the huge difference is mentioned in the article). The question mark next to ‘Search’ will provide a User guide.
- By further modifying the search, find the second person pronouns in singular and plural (hint: first remove the lemma ἐγώ ‘I’, then change the person from ‘1’ to ‘2’ and ‘s’ means singular, that can be changed to ‘p’ plural). Make a small table with frequencies of different personal pronoun attestations divided by gender. [NB. 3rd person does not give many hits; this is because a separate pronoun is usually used for this purpose, lemma=αὐτός, which lacks the marking of person number)
- Add the syntactic relation feature in the same search and see how many of the pronouns appear as objects (relation=OBJ) and how many as subjects (relation=SBJ)
- Try also the above queries without any gender distinctions!
II Optional KTB search
Ancient Greek word order allows for a lot of freedom in the ordering of subject, object, and verb, which can appear in any configuration in the sentence. In KTB, there are facets which record the position of these elements and calculate the counts of sentences that exhibit any individual ordering pattern. Using the platform, we can observe whether there are noticeable differences in the position of the subject relative to the verb between texts from Literary Greek, on one hand, and papyrological materials, on the other.
- From the Metadata section, select the corpus papygreek_letters
- Expand the Subject section and select “Pronoun” from the B W O Subjects PoS facet
- Once you have applied this facet and the results have been loaded, expand the Order patterns section and look at the counts in the Subject Verb facet. Using them, calculate what percentage of the sentences have the subject preceding the verb, and in what percentage it is the opposite.
- Repeat the same for the corpus “literature_prose” (hint: you can deselect “papygreek_letters” and select the other corpus from the Metadata section without changing the other selected facets). Are the percentages different? Why do you think that is?
- Now try changing the part of speech of the subject to “Noun”. How do the results change?