Using free OCR in Ubuntu - hpaluch/hpaluch.github.io GitHub Wiki
The problem: you have image containing text (called input-image.png
in this example) and you want to extract its text into ordinary plain text file using OCR.
Tested on Ubuntu Ubuntu 16.04.1 LTS
Setup
Install following packages (support for English and Czech languages):
sudo apt-get install tesseract-ocr-ces tesseract-ocr tesseract-ocr-eng
Verify list of recognized languages:
tesseract --list-langs
List of available languages (4):
equ
ces
eng
osd
Run OCR
Use this example to process input-image.png
containing Czech characters and outputing
results into /tmp/output.txt
file (standard UTF-8 encoding):
tesseract input-image.png /tmp/output.txt -l ces