Computer Vision - fcrimins/fcrimins.github.io GitHub Wiki

Google's Optical Character Recognition (OCR) software works for 248+ languages (5/11/17)

  • Google's OCR is probably using dependencies of Tesseract, an OCR engine released as free software, or OCRopus, a free document analysis and optical character recognition (OCR) system that is primarily used in Google Books.
  • Document layout analysis
    • "It is a common assumption in both document layout analysis algorithms and optical character recognition algorithms that the characters in the document image are oriented so that text lines are horizontal. Therefore, if there is skew present then it is important to rotate the document image so as to remove it. It follows that the first steps in any document layout analysis code are to remove image noise and to come up with an estimate for the skew angle of the document."
  • Tesseract (Tesseract installation and examples)
    • "Since version 3.00 Tesseract has supported output text formatting, hOCR[9] positional information and page-layout analysis."
    • "Tesseract is suitable for use as a backend and can be used for more complicated OCR tasks including layout analysis by using a frontend such as OCRopus."
    • "Tesseract's output will have very poor quality if the input images are not preprocessed to suit it: Images (especially screenshots) must be scaled up such that the text x-height is at least 20 pixels,[13] any rotation or skew must be corrected or no text will be recognized, low-frequency changes in brightness must be high-pass filtered, or Tesseract's binarization stage will destroy much of the page, and dark borders must be manually removed, or they will be misinterpreted as characters"
  • OCRopus (OCRopus installation and examples)
    • "A free document layout analysis and OCR system, implemented in C++ and Python and for FreeBSD, Linux, and Mac OS X. This software supports a plug-in architecture which allows the user to select from a variety of different document layout analysis and OCR algorithms"
  • OCRFeeder
    • "OCRFeeder is an optical character recognition suite for GNOME, which also supports virtually any command-line OCR engine, such as CuneiForm, GOCR, Ocrad and Tesseract."
  • pyocr

A short guide to learn NNs