8 Text Alignment - SunoikisisDC/SunoikisisDC-2024-2025 GitHub Wiki

Text Alignment

SunoikisisDC Digital Classics: Session 8

Date: Thursday March 13, 2025. 16:00-17:30 GMT.

Convenors: Megan Bushnell (Oxford Text Archive), Chiara Palladino (Furman University), Farnoosh Shamsian (University of Leipzig)

Youtube link: https://youtu.be/qpT-TgyqSVw

Slides: Combined slides (PDF)

Outline

This session introduces the topic of Text Alignment, particularly focusing on the use of alignment to explore unknown texts or languages. In the first part, we cover the general principles of Text Alignment as a manual and automatic task, and describe the related challenges in the establishment of cross-linguistic equivalences. Then, we give an overview of the Ugarit Project and Translation Alignment Editor, a tool to create word-based alignments of texts in different languages, and provide some examples of how TA has been used to explore ancient texts and their reception, in research, teaching, and in Machine Learning. In the second part of the session, we explore different ways in which alignment can support the exploration of unknown texts or languages: for example, by aligning equivalent text chunks, by studying how concepts or proper names are rendered in different languages, or by aligning and comparing three different translations.

Required readings

Further readings

Other Resources

Available aligned texts

Exercise

Option 1

Use translation alignment to see if you can read a text in a language you don’t know very well (or at all!).

  1. Start by recognizing sentence/line overlaps, perhaps using a spreadsheet to record your alignments.
  2. Then, upload the text into Ugarit and start aligning word by word. Try first with proper nouns of people and places.
  3. Use the dictionary and other context clues to align words at a finer granularity, e.g. words that are repeated often, conjunctions, verbs, etc.
  4. Try to recognize and align larger chunks, e.g. expressions or sentences.
  5. See how far you can get!

You may use the texts recommended in the session page, and then check your results against their aligned versions.

Think about these questions:

  1. What level of granularity in your alignments did you manage to achieve? Line or word, or larger chunks? Why do you think that is?
  2. What sort of context clues were you able to use, to manage the alignments?
  3. Was there always a perfect overlap between proper nouns in the original and the translation?
  4. Did the dictionary provide enough help to better understand words that you could not figure out? Why/why not?

Option 2

Take the first five lines of a text whose language and alphabet you don’t know. You can use some of the texts recommended in the session page, or any other text from an online repository.

Paste the original text in ChatGPT or other GenAI tool in smaller chunks (lines or sentences). Use the following prompt to process each chunk: “Transliterate this sentence. For each word, provide the meaning. Do not translate the sentence in full.” (Be aware that ChatGPT may not work very well for some languages!).

Then, with the help of the output generated by ChatGPT, align your text word-by-word against a scholarly translation using Ugarit.

At the end, you can use the already aligned texts to see how you did.

Compare the scholarly translation with the vocabulary provided by ChatGPT: where do they overlap, and where does the translation diverge from ChatGPT? Why do you think that is?