1 Finding Texts - SunoikisisDC/SunoikisisDC-2024-2025 GitHub Wiki

Finding Free Classical Texts

SunoikisisDC Digital Classics: Session 1

Date: Thursday January 16, 2024. 16:00-17:30 GMT.

Convenors: Monica Berti (University of Leipzig), Gabriel Bodard (University of London), Katharine Shields (King's College London)

Youtube link: https://youtu.be/mbrGPcciIU0

Slides: tba

Outline

This session introduces and discusses some of the main sources of free and open classical texts in digital formats. Resources range from curated and highly structured, encoded corpora such as the Scaife Viewer and Diorisis Corpus, through catalogues of available texts and archives of digitised public domain books, to sophisticated workflows for optical character recognition (OCR) of ancient languages. We outline the scope and limitations of some of the main collections, discuss some of the potential for improving and analysing these texts (to be discussed in future sessions), and suggest an exercise involving comparing versions of a text found in multiple sites.

Required readings

  • Alison Babeu. 2019. "The Perseus Catalog: of FRBR, Finding Aids, Linked Data, and Open Greek and Latin". In M. Berti (ed.), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter Saur. Pp. 53-72. DOI: https://doi.org/10.1515/9783110599572-005
  • Bruce Robertson. 2019. "Optical Character Recognition for Classical Philology." In M. Berti (ed.), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter Saur. Pp. 117–136. DOI: https://doi.org/10.1515/9783110599572-008

Further readings

Resources

Exercise

Part 1

  1. Choose a classical text you are interested in (in the original or translation, as you prefer), and try to find copies of it on at least two or three of the repositories discussed in this session or in the list of resources above (Perseus, Wikisource, Gutenberg, etc.).
  2. What versions of the text are available on each site? How much information do they give you about publication, date, translation, digitisation, copyright, etc.?
  3. How much "noise" do you find in each text? Look for line numbers, citations, footnotes or inline references, headers and any other intervening text or data that you would not want included in any digital analysis of the text. Can you figure out what they all mean, how they got in there, why they might have deliberately or otherwise been left in this version?

Part 2

  1. Select another author and text, and find its catalogue entry in the main databases (you will find it in more places if you pick a Latin text). E.g. Compare the records for Cicero’s De Natura Deorum in DLL, Perseus and Wikidata.
  2. Once you have the work identifier (e.g. phi0474.phi050) you can also find the MODS file in the Perseus Catalogue Github repo.
  3. What metadata does each record hold about the (i) author, (ii) work, (iii) edition? What can you find out about the internal structure of the work? Can you find any metadata or paradata about the digital surrogate itself?
  4. How does any of this help you find texts in this database and/or elsewhere on the web? Could any of this be useful in a processing/analysis context? How "machine readable" is this content?

Bring your findings and any other questions that arise for discussion with your class or colleagues next week.