AddOns - nickjwhite/tesseract GitHub Wiki
External tools, wrappers and training projects for Tesseract.
Tesseract box editors and traning tools
Platform support depends on used language and experience of user.
For Tesseract-OCR 3.0x
Box file editors
Name | Last update | Language | Multipage support |
---|---|---|---|
jTessBoxEditor | March 2015 | Java | yes |
QT Box Editor | July 2013 | C++, Qt4/Qt5 | yes |
tesseract-box-editor | Feb 2013 | .NET 4 | yes |
Tesseract-OCR boxfile AJAX editor | March 2012 | online tool | |
cowboxer | Jan 2012 | C++, Qt4 | no |
moshPyTT | Feb 2011 | Python, GTK2 | no |
pytesseracttrainer | Jan 2011 | Python, GTK2 | no |
For Tesseract-OCR 2.0x
Box file editors
Name | Last update | Language |
---|---|---|
Tesseract-OCR boxfile AJAX editor | March 2012 | online tool |
owlboxer | Jul 2010 | C++, Qt4 |
Tessboxer | Jun 2009 | .NET |
boxfilereader.php | Mar 2009 | php |
tessboxes | Nov 2008 | C |
JTesseract | Oct 2008 | C# |
wx-tetra | Sep 2008 | perl, wx |
bbtesseract | Jul 2008 | VB.NET 2008 |
Other Training Tools
-
jTessBoxEditor | Box Editor and Training Tool
-
MzTesseract - MS Windows program that can train new langauge from top to bottom
-
FrankenPlus - tool for creating font training for Tesseract OCR engine from page images. More information about Franken+ is at at IT'S ALIVE! and Franken+ homepage.
-
python-tesseract-3.02-training - script to automate the generation of Tesseract 3.02 training files
-
tesseract-box-file - autoit script to make editing the box file easier
-
Serak Tesseract Trainer for Tesseract 3.02 - a front end GUI for training tesseract 3.02
-
BoxMaker is online tool for generating image&box pair. Offline version is available in download section of PersianOCR project
-
boxFactory is a tool for quickly creating box files to train the Tesseract OCR engine. You can identify characters in the image by simply drawing boxes around them..
-
https://github.com/BaltoRouberol/TesseractTrainer - TesseractTrainer is a simple Python API, taking over the tedious process of manually training Tesseract3
-
tess_school - a set of handy scripts to make the tesseract training process a bit easier
-
txt2img: Qt GUI application that generate image and box file based on text imput
-
DangAmbigs Generator: Creates a DangAmbigs file automatically given a set of OCR text output and correct text. Requirements: Python
-
train.ps1: Windows powershell script for Automate Tesseract 3.01 language data pack generation process.
-
Update unicharambigs.exe: A small (windows) C# program for editing "lang.unicharambigs" file
-
train_tess.pl: perl script to facilitate training
Community training projects
- Tesseract-MICR-OCR: https://github.com/BigPino67/Tesseract-MICR-OCR
- Latin: https://github.com/ryanfb/latinocr-lattraining
- tesseract-georgian: https://github.com/ddohler/tesseract-georgian
- Polish Fraktur: training as result of the IMPACT project, trained dataset
- Ancient Greek: http://ancientgreekocr.org
- Chinese simplefied: see issue 296; download http://ubuntuone.com/p/3Yw/
- Indic: http://code.google.com/p/tesseractindic/, https://github.com/debayan/Tesseract-Indic-OCR/, http://code.google.com/p/parichit/ (All are Obsolete)
- Irish uncial: https://github.com/jimregan/tesseract-gle-uncial
- Polish: http://code.google.com/p/tesseract-polish/
- Fraktur (dan, deu, swe): https://github.com/paalberti/tesseract-dan-fraktur
- Hebrew: http://roidayan.com/wordpress/?p=26, http://code.google.com/p/tesseract-ocr/issues/detail?id=432
- Myanmar: http://code.google.com/p/myaocr/
- Persian (Farsi): https://github.com/reza1615/PersianOcr
- 7 segments font: https://github.com/arturaugusto/display_ocr/tree/master/letsgodigital
Commercial Training Services
Ports
- tesseract-emscripten: fork of tesseract for emscripten - part of Project Naptha
Tesseract wrappers
Tesseract 3.0x
C wrapper:
- 3.02 includes C API
- for 3.00 (and 3.01?) see http://code.google.com/p/ocrivist/source/browse/#svn%2Ftessintf where is C wrapper with Freepascal bindings
.Net:
- https://github.com/charlesw/tesseract - project offers also tesseract-ocr 64bit Windows library
- http://code.google.com/p/tesseractdotnet/
Python:
- http://code.google.com/p/python-tesseract/ is a wrapper class for Tesseract OCR that allows any conventional image files (SWIG based)
- https://github.com/virtuald/python-tesseract-sip - A python SIP wrapper for libtesseract (Apache license)
- https://github.com/jflesch/pyocr - A python wrapper for Tesseract and Cuneiform
- https://github.com/gregjurman/tesserwrap - Python bindings to the Tesseract API
- http://code.google.com/p/pytess/ - A simple SWIG-based interface to Tesseract
Ruby:
- https://github.com/meh/ruby-tesseract-ocr/ - wrapper for tesseract 3.0x using the C++ API
- https://github.com/dannnylo/rtesseract
Clojure: https://github.com/antoniogarrote/clj-tesseract
Java: http://tess4j.sourceforge.net/ (JNA wrapper)
PHP: https://code.google.com/p/php-tesseract/
Objective-C: https://github.com/ldiqual/tesseract-ios
Tesseract 2.0x
Python:
- https://github.com/hoffstaetter/python-tesseract/wiki
- http://code.google.com/p/pytesser/
- http://code.google.com/p/tesseract-python (pytesser clone)
- https://github.com/hoffstaetter/python-tesseract/wiki
- http://pokerai.org/pf3/viewtopic.php?f=3&t=2677&start=0&st=0&sk=t&sd=a
- patches of SWIG wrapper for python
.NET: http://www.pixel-technology.com/freeware/tessnet2/
Java:
- http://code.google.com/p/tesjeract (JNI wrapper)
- http://tess4j.sourceforge.net/ (JNA wrapper)