20140908 ancient greek ocr - plembo/onemoretech GitHub Wiki

title: Ancient Greek OCR link: https://onemoretech.wordpress.com/2014/09/08/ancient-greek-ocr/ author: phil2nc description: post_id: 8394 created: 2014/09/08 21:16:53 created_gmt: 2014/09/09 01:16:53 comment_status: closed post_name: ancient-greek-ocr status: publish post_type: post

Ancient Greek OCR

Yes, you can scan documents written in ancient Greek (or images of such documents) and then copy the text where you need it (like pasting into an editor where you can make an interlinear translation). It's easy to set up on Linux, see below. Found this on the single page site, Ancient Greek OCR. Basically the solution recommded is based on the renowned Tesseract OCR backend. Fedora 20 includes a package for ancient Greek, tesseract-langpack-grc. I did not use the grc-traineddata file from the site, as I was pretty sure the one that shipped with the package would be up to date and probably more compatible with what I'd installed. The page author recommends OCRFeeder for Linux as a graphical frontend, but since that isn't available for Fedora 20, I installed gImageReader from the main repo instead (gImageReader is also recommended for Windows). Once installed, I gave it a try using a high resolution .tiff file from the source image library on the Textkit site. The results were very impressive, even when working with mixed English and ancient Greek -- something that Google Translate, among others, have a lot of difficulty with. One word of warning to those running slower CPUs, like the i3 in my machine, without a high end graphics card: the software will take quite a while to render a page. This is a solution I can recommend to those of my liberal arts brothers and sisters who have occasion to work with ancient Greek text. As Thales is reputed to have once said, τί εὔκολον; Τὸ ἄλλῳ ὑποτίθεσθαι.