OCRopus installation and examples - fcrimins/fcrimins.github.io GitHub Wiki

Back to Computer Vision


$ git clone https://github.com/tmbdev/ocropy.git
$ cd ocropy
$ virtualenv -p /usr/bin/python2 venv-ocropus
$ source venv-ocropus/bin/activate
$ pip install -r requirements.txt
$ wget -nd http://www.tmbdev.net/en-default.pyrnn.gz
$ mv en-default.pyrnn.gz models/
$ python setup.py install  # requires python2
$ ./run-test
ImportError: No module named _tkinter, please install the python-tk package
$ sudo apt-get install python-tk
$ ./run-test

$ mkdir fred
$ mv ~/Pictures/fred_ocropy_test.png fred
$ ocropus-nlbin fred/fred_ocropy_test.png -o fred

$ ocropus-gpageseg 'fred/????.bin.png'
ERROR:  fred/0001.bin.png SKIPPED too many connnected components for a page image (3889 > 1021) (use -n to disable this check)

$ ocropus-gpageseg -n 'fred/????.bin.png'
INFO:  scale 7.48331
ERROR:  fred/0001.bin.png: scale (7.48331) less than --minscale; skipping

$ identify fred_ocropy_test.png
fred_ocropy_test.png PNG 918x1001 918x1001+0+0 8-bit sRGB 180KB 0.000u 0:00.000

$ convert fred_ocropy_test.png -resize 1836x2002 bigger.png
    # exact proportional resizing not required: "Resize will fit the
    # image into the requested size. It does NOT fill, the requested
    # box size." [http://www.imagemagick.org/Usage/resize/]

$ ocropus-nlbin fred/bigger.png -o fred

$ ocropus-gpageseg 'fred/????.bin.png'
INFO:  scale 14.966630
INFO:  number of lines 53
INFO:  finding reading order
INFO:  writing lines
INFO:      33  fred/0001.bin.png 15.0 34

$ ocropus-rpred -Q 4 -m models/en-default.pyrnn.gz 'fred/????/??????.bin.png'
INFO:  
INFO:  ########## /home/fred/Documents/code/ocropy/venv-ocropus/bin/ocropus-rp
INFO:  
INFO:  #inputs: 34
# loading object ././models/en-default.pyrnn.gz
INFO:  fred/0001/010002.bin.png:=.
INFO:  fred/0001/010003.bin.png:ocropy
INFO:  fred/0001/010001.bin.png:READMEAmd
INFO:  fred/0001/010005.bin.png:you may need to do some image preprocessing, and possibly also traIn new models.
INFO:  fred/0001/010006.bin.png:In addhton to the recogntlon scnpts themsehes, there are a number of scnpts for ground truth edting and correcton,
INFO:  fred/0001/010004.bin.png:OCRopus s a collecton of document analysis programs, not a turn-key OCR system. In order to apply t to your documents,
INFO:  fred/0001/010009.bin.png:since t seems to confuse too many users).
INFO:  fred/0001/010007.bin.png:measuring error rates, determming confuson matnsces, etc. OCRopus commands will generally pnint a stack trace akong wth
INFO:  fred/0001/01000a.bin.png:Installing

$ ocropus-hocr 'fred/????.bin.png' -o fred/out.html
writing to fred/out.html
median_xheight 16.0
=== fred/0001.bin.png

$ firefox !$ &