8 Text Alignment - SunoikisisDC/SunoikisisDC-2024-2025 GitHub Wiki
Text Alignment
SunoikisisDC Digital Classics: Session 8
Date: Thursday March 13, 2025. 16:00-17:30 GMT.
Convenors: Megan Bushnell (Oxford Text Archive), Chiara Palladino (Furman University), Farnoosh Shamsian (University of Leipzig)
Youtube link: https://youtu.be/qpT-TgyqSVw
Slides: Combined slides (PDF)
Outline
This session introduces the topic of Text Alignment, particularly focusing on the use of alignment to explore unknown texts or languages. In the first part, we cover the general principles of Text Alignment as a manual and automatic task, and describe the related challenges in the establishment of cross-linguistic equivalences. Then, we give an overview of the Ugarit Project and Translation Alignment Editor, a tool to create word-based alignments of texts in different languages, and provide some examples of how TA has been used to explore ancient texts and their reception, in research, teaching, and in Machine Learning. In the second part of the session, we explore different ways in which alignment can support the exploration of unknown texts or languages: for example, by aligning equivalent text chunks, by studying how concepts or proper names are rendered in different languages, or by aligning and comparing three different translations.
Required readings
- de Pedro, R. (1999). The Translatability of Texts: A Historical Overview. Meta, XLIV, 4, 1999. Available: http://www3.uji.es/~aferna/EA0921/4a-Translatability.pdf
- G. Crane, A. Babeu, L.M. Cerrato, et al. Beyond translation: engaging with foreign languages in a digital library. Int J Digit Libr 24, 163–176 (2023). https://doi.org/10.1007/s00799-023-00349-2.
- Panou, D. (2013). “Equivalence in Translation Theories: A Critical Evaluation, Theory and Practice.” Language Studies 3.1, pp. 1-6. Available: http://www.academypublication.com/issues/past/tpls/vol03/01/01.pdf
Further readings
- Tariq Yousef, Chiara Palladino, and Farnoosh Shamsian. 2023. Classical Philology in the Time of AI: Exploring the Potential of Parallel Corpora in Ancient Language. In Proceedings of the Ancient Language Processing Workshop, pages 179–192, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Palladino, C., Shamsian, F. & Yousef, T., (2022) “Using Parallel Corpora to Evaluate Translations of Ancient Greek Literary Texts. An Application of Text Alignment for Digital Philology Research”, Journal of Computational Literary Studies 1(1). Available: https://doi.org/10.48694/jcls.100
- Bushnell, M. (2021). "Reconstructing Gavin Douglas’s Translation Practice in the Eneados Using a Corpus Linguistic-Based Method", DHBenelux 3, pp. 1-25. Available: https://journal.dhbenelux.org/journal/issues/003/article-16-Bushnell.pdf.
- T. Yousef, C. Palladino, F. Shamsian, A. d’Orange Ferreira, M. Ferreira dos Reis (2022). "An automatic model and Gold Standard for translation alignment of Ancient Greek". Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), Marseille, 20-25 June, 5894–5905. URL: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.634.pdf
- C. Palladino, M. Foradi, T. Yousef (2021). "Translation alignment for Historical Language Learning. A case study." DHQ 15.3. Available: http://www.digitalhumanities.org/dhq/vol/15/3/000563/000563.html
- T. Yousef, Translation Alignment Applied to Classical Languages (2023). PhD Thesis, University of Leipzig. Available: http://dx.doi.org/10.13140/RG.2.2.15623.57764.
- Yousef, Tariq, and Stefan Janicke. 2020. “A Survey of Text Alignment Visualization.” IEEE Transactions on Visualization and Computer Graphics PP (October):1–1. https://doi.org/10.1109/TVCG.2020.3028975.
Other Resources
- "Translation Alignment" in the Digital Classicist Wiki. https://wiki.digitalclassicist.org/Translation_alignment
- Dictionary of Ancient Greek and Latin: https://logeion.uchicago.edu/
- AntConc: https://www.laurenceanthony.net/software/antconc/
- Alpheios Standalone Text Alignment: https://alignment.alpheios.net/
- Ugarit Editor: https://ugarit.ialigner.com/
- Ugarit automatic alignment (demo): http://ugarit-aligner.com/
- iAligner: https://ialigner.com/
- Beyond Translation: https://beyond-translation.scaife-viewer.org/
Available aligned texts
- Iliad aligned by word against an English Translation in the Beyond Translation Project
- Iliad aligned by sentence against an English Translation in the Beyond Translation Project
- Iliad in Persian and Kurdish (alignment in spreadsheet): https://zenodo.org/records/8318111.
- Book 1 of the Odyssey aligned by word on Alpheios
- Propertius Elegy 1, aligned by word on Alpheios
- De Bello Alexandrino, aligned by word by Valeria Irene Boano on Ugarit
- Hafez aligned by word against English in Beyond Translation
- Hafez aligned by word against German in Beyond Translation
- Eneados aligned by word with Aeneid on Ugarit
Exercise
Option 1
Use translation alignment to see if you can read a text in a language you don’t know very well (or at all!).
- Start by recognizing sentence/line overlaps, perhaps using a spreadsheet to record your alignments.
- Then, upload the text into Ugarit and start aligning word by word. Try first with proper nouns of people and places.
- Use the dictionary and other context clues to align words at a finer granularity, e.g. words that are repeated often, conjunctions, verbs, etc.
- Try to recognize and align larger chunks, e.g. expressions or sentences.
- See how far you can get!
You may use the texts recommended in the session page, and then check your results against their aligned versions.
Think about these questions:
- What level of granularity in your alignments did you manage to achieve? Line or word, or larger chunks? Why do you think that is?
- What sort of context clues were you able to use, to manage the alignments?
- Was there always a perfect overlap between proper nouns in the original and the translation?
- Did the dictionary provide enough help to better understand words that you could not figure out? Why/why not?
Option 2
Take the first five lines of a text whose language and alphabet you don’t know. You can use some of the texts recommended in the session page, or any other text from an online repository.
Paste the original text in ChatGPT or other GenAI tool in smaller chunks (lines or sentences). Use the following prompt to process each chunk: “Transliterate this sentence. For each word, provide the meaning. Do not translate the sentence in full.” (Be aware that ChatGPT may not work very well for some languages!).
Then, with the help of the output generated by ChatGPT, align your text word-by-word against a scholarly translation using Ugarit.
At the end, you can use the already aligned texts to see how you did.
Compare the scholarly translation with the vocabulary provided by ChatGPT: where do they overlap, and where does the translation diverge from ChatGPT? Why do you think that is?