8 Text Alignment - SunoikisisDC/SunoikisisDC-2024-2025 GitHub Wiki

Text Alignment

SunoikisisDC Digital Classics: Session 8

Date: Thursday March 13, 2025. 16:00-17:30 GMT.

Convenors: Megan Bushnell (Oxford Text Archive), Chiara Palladino (Furman University), Farnoosh Shamsian (University of Leipzig)

Youtube link: https://youtu.be/qpT-TgyqSVw

Slides: Combined slides (PDF)

Outline

This session introduces the topic of Text Alignment, particularly focusing on the use of alignment to explore unknown texts or languages. In the first part, we cover the general principles of Text Alignment as a manual and automatic task, and describe the related challenges in the establishment of cross-linguistic equivalences. Then, we give an overview of the Ugarit Project and Translation Alignment Editor, a tool to create word-based alignments of texts in different languages, and provide some examples of how TA has been used to explore ancient texts and their reception, in research, teaching, and in Machine Learning. In the second part of the session, we explore different ways in which alignment can support the exploration of unknown texts or languages: for example, by aligning equivalent text chunks, by studying how concepts or proper names are rendered in different languages, or by aligning and comparing three different translations.

Required readings

de Pedro, R. (1999). The Translatability of Texts: A Historical Overview. Meta, XLIV, 4, 1999. Available: http://www3.uji.es/~aferna/EA0921/4a-Translatability.pdf
G. Crane, A. Babeu, L.M. Cerrato, et al. Beyond translation: engaging with foreign languages in a digital library. Int J Digit Libr 24, 163–176 (2023). https://doi.org/10.1007/s00799-023-00349-2.
Panou, D. (2013). “Equivalence in Translation Theories: A Critical Evaluation, Theory and Practice.” Language Studies 3.1, pp. 1-6. Available: http://www.academypublication.com/issues/past/tpls/vol03/01/01.pdf

Other Resources

"Translation Alignment" in the Digital Classicist Wiki. https://wiki.digitalclassicist.org/Translation_alignment
Dictionary of Ancient Greek and Latin: https://logeion.uchicago.edu/
AntConc: https://www.laurenceanthony.net/software/antconc/
Alpheios Standalone Text Alignment: https://alignment.alpheios.net/
Ugarit Editor: https://ugarit.ialigner.com/
Ugarit automatic alignment (demo): http://ugarit-aligner.com/
iAligner: https://ialigner.com/
Beyond Translation: https://beyond-translation.scaife-viewer.org/

Available aligned texts

Iliad aligned by word against an English Translation in the Beyond Translation Project
Iliad aligned by sentence against an English Translation in the Beyond Translation Project
Iliad in Persian and Kurdish (alignment in spreadsheet): https://zenodo.org/records/8318111.
Book 1 of the Odyssey aligned by word on Alpheios
Propertius Elegy 1, aligned by word on Alpheios
De Bello Alexandrino, aligned by word by Valeria Irene Boano on Ugarit
Hafez aligned by word against English in Beyond Translation
Hafez aligned by word against German in Beyond Translation
Eneados aligned by word with Aeneid on Ugarit

Exercise

Option 1

Use translation alignment to see if you can read a text in a language you don’t know very well (or at all!).

Start by recognizing sentence/line overlaps, perhaps using a spreadsheet to record your alignments.
Then, upload the text into Ugarit and start aligning word by word. Try first with proper nouns of people and places.
Use the dictionary and other context clues to align words at a finer granularity, e.g. words that are repeated often, conjunctions, verbs, etc.
Try to recognize and align larger chunks, e.g. expressions or sentences.
See how far you can get!

You may use the texts recommended in the session page, and then check your results against their aligned versions.

Think about these questions:

What level of granularity in your alignments did you manage to achieve? Line or word, or larger chunks? Why do you think that is?
What sort of context clues were you able to use, to manage the alignments?
Was there always a perfect overlap between proper nouns in the original and the translation?
Did the dictionary provide enough help to better understand words that you could not figure out? Why/why not?

Option 2

Take the first five lines of a text whose language and alphabet you don’t know. You can use some of the texts recommended in the session page, or any other text from an online repository.

Paste the original text in ChatGPT or other GenAI tool in smaller chunks (lines or sentences). Use the following prompt to process each chunk: “Transliterate this sentence. For each word, provide the meaning. Do not translate the sentence in full.” (Be aware that ChatGPT may not work very well for some languages!).

Then, with the help of the output generated by ChatGPT, align your text word-by-word against a scholarly translation using Ugarit.

At the end, you can use the already aligned texts to see how you did.

Compare the scholarly translation with the vocabulary provided by ChatGPT: where do they overlap, and where does the translation diverge from ChatGPT? Why do you think that is?