1 Finding Texts - SunoikisisDC/SunoikisisDC-2024-2025 GitHub Wiki
Finding Free Classical Texts
SunoikisisDC Digital Classics: Session 1
Date: Thursday January 16, 2024. 16:00-17:30 GMT.
Convenors: Monica Berti (University of Leipzig), Gabriel Bodard (University of London), Katharine Shields (King's College London)
Youtube link: https://youtu.be/mbrGPcciIU0
Slides: tba
Outline
This session introduces and discusses some of the main sources of free and open classical texts in digital formats. Resources range from curated and highly structured, encoded corpora such as the Scaife Viewer and Diorisis Corpus, through catalogues of available texts and archives of digitised public domain books, to sophisticated workflows for optical character recognition (OCR) of ancient languages. We outline the scope and limitations of some of the main collections, discuss some of the potential for improving and analysing these texts (to be discussed in future sessions), and suggest an exercise involving comparing versions of a text found in multiple sites.
Required readings
- Alison Babeu. 2019. "The Perseus Catalog: of FRBR, Finding Aids, Linked Data, and Open Greek and Latin". In M. Berti (ed.), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter Saur. Pp. 53-72. DOI: https://doi.org/10.1515/9783110599572-005
- Bruce Robertson. 2019. "Optical Character Recognition for Classical Philology." In M. Berti (ed.), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter Saur. Pp. 117–136. DOI: https://doi.org/10.1515/9783110599572-008
Further readings
- B. Almas & M.-C. Beaulieu. 2016. "The Perseids Platform: Scholarship for all!" In: Romanello M. & Bodard G (eds.), Digital Classics Outside the Echo-Chamber. London: Ubiquity Press. DOI: https://doi.org/10.5334/bat.j
- Gabriel Bodard & Juan Garcés. 2009. "Open Source Critical Editions: A Rationale." In M. Deegan & K. Sutherland ed. Text Editing, Print, and the Digital World. Routledge. Pp. 84-98. Available: https://blog.stoa.org/files/2010/09/Bodard-Garces_2009_Open-source-digital-editions.pdf
- Gregory Crane, Alison Babeu, David Bamman. 2009. "Classics in the Million Book Library." Digital Humanities Quarterly 3.1. Available: http://www.digitalhumanities.org/dhq/vol/003/1/000034/000034.html
- Andrew Hardie. 2014. "Modest XML for Corpora: Not a standard, but a suggestion." ICAME Journal 38. DOI: https://doi.org/10.2478/icame-2014-0004
- Samuel J. Huskey. 2019. "The Digital Latin Library: Cataloging and Publishing Critical Editions of Latin Texts." In M. Berti (ed.), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter Saur. Pp. 19–33. DOI: https://doi.org/10.1515/9783110599572-003
- Leonard Muellner. 2019. "The Free First Thousand Years of Greek". In M. Berti (ed.), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter Saur. Pp. 7-18. DOI https://doi.org/10.1515/9783110599572-002
- James K. Tauber. 2019. "Character Encoding of Classical Languages". In M. Berti (ed.), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter Saur. Pp. 137-158. DOI: https://doi.org/10.1515/9783110599572-009
Resources
- PerseusDL
- The Perseus Catalog
- Scaife Viewer Library
- Digital Latin Library
- Diorisis Ancient Greek Corpus
- LACE (OCRed Greek texts)
- Loebolus (public domain Loebs)
- Vicifons (Latin Wikisource); Βικιθήκη (Ancient Greek texts in Wikisource)
- Gutenberg (Latin) and Gutenberg (Ancient Greek)
- Oxford Text Archive (Latin) and OTA (Ancient Greek) (in ASCII transcription)
- Latin texts in HathiTrust
- Latin e-books in Internet Archive
Exercise
Part 1
- Choose a classical text you are interested in (in the original or translation, as you prefer), and try to find copies of it on at least two or three of the repositories discussed in this session or in the list of resources above (Perseus, Wikisource, Gutenberg, etc.).
- What versions of the text are available on each site? How much information do they give you about publication, date, translation, digitisation, copyright, etc.?
- How much "noise" do you find in each text? Look for line numbers, citations, footnotes or inline references, headers and any other intervening text or data that you would not want included in any digital analysis of the text. Can you figure out what they all mean, how they got in there, why they might have deliberately or otherwise been left in this version?
Part 2
- Select another author and text, and find its catalogue entry in the main databases (you will find it in more places if you pick a Latin text). E.g. Compare the records for Cicero’s De Natura Deorum in DLL, Perseus and Wikidata.
- Once you have the work identifier (e.g. phi0474.phi050) you can also find the MODS file in the Perseus Catalogue Github repo.
- What metadata does each record hold about the (i) author, (ii) work, (iii) edition? What can you find out about the internal structure of the work? Can you find any metadata or paradata about the digital surrogate itself?
- How does any of this help you find texts in this database and/or elsewhere on the web? Could any of this be useful in a processing/analysis context? How "machine readable" is this content?
Bring your findings and any other questions that arise for discussion with your class or colleagues next week.