AddOns - kana112233/tesseract GitHub Wiki
有关Tesseract和其他第三方项目的GUI界面,请参阅用户项目 - 第三方
#Tesseract的外部工具,包装和培训项目
Tesseract盒子编辑器和培训工具
平台支持取决于使用的语言和用户体验.
###对于Tesseract-OCR 3.0x
Box文件编辑器
Name | Last update | Language | Multipage support |
---|---|---|---|
jTessBoxEditor | 2018 | Java | yes |
QT Box Editor | 2018 | C++, Qt4/Qt5 | yes |
tesseract-box-editor | 2013 | .NET 4 | yes |
Tesseract-OCR boxfile AJAX editor | 2012 | online tool | |
cowboxer | 2012 | C++, Qt4 | no |
moshPyTT | 2011 | Python, GTK2 | no |
pytesseracttrainer | 2011 | Python, GTK2 | no |
###对于Tesseract-OCR 2.0x
Box文件编辑器
Name | Last update | Language |
---|---|---|
Tesseract-OCR boxfile AJAX editor | 2012 | online tool |
owlboxer | 2010 | C++, Qt4 |
Tessboxer | 2009 | .NET |
boxfilereader.php | 2009 | php |
tessboxes | 2008 | C |
JTesseract | 2008 | C# |
wx-tetra | 2008 | perl, wx |
bbtesseract | 2008 | VB.NET 2008 |
Other Training Tools
-
jTessBoxEditor | Box Editor and Training Tool
-
MzTesseract - MS Windows program that can train new language from top to bottom
-
FrankenPlus - tool for creating font training for Tesseract OCR engine from page images. More information about Franken+ is at at IT'S ALIVE! and Franken+ homepage.
-
python-tesseract-3.02-training - script to automate the generation of Tesseract 3.02 training files
-
tesseract-box-file - autoit script to make editing the box file easier
-
Serak Tesseract Trainer for Tesseract 3.02 - a front end GUI for training tesseract 3.02
-
BoxMaker is online tool for generating image&box pair. Offline version is available in download section of PersianOCR project
-
boxFactory is a tool for quickly creating box files to train the Tesseract OCR engine. You can identify characters in the image by simply drawing boxes around them.
-
https://github.com/BaltoRouberol/TesseractTrainer - TesseractTrainer is a simple Python API, taking over the tedious process of manually training Tesseract3
-
tess_school - a set of handy scripts to make the tesseract training process a bit easier
-
txt2img: Qt GUI application that generate image and box file based on text imput
-
DangAmbigs Generator: Creates a DangAmbigs file automatically given a set of OCR text output and correct text. Requirements: Python
-
train.ps1: Windows powershell script for Automate Tesseract 3.01 language data pack generation process.
-
Update unicharambigs.exe: A small (windows) C# program for editing "lang.unicharambigs" file
-
train_tess.pl: perl script to facilitate training
-
boxedit A web-based editor for Tesseract box files
-
TrainYourTesseract | Free online "no-hassle" TTF file to trainedata converter
##社区培训项目
- ** Tesseract-MICR-OCR **:https://github.com/BigPino67/Tesseract-MICR-OCR
- ** MRZ **:https://groups.google.com/group/tesseract-ocr/attach/10d7c711c9cc80/mrz.traineddata
- 拉丁语:https://github.com/ryanfb/latinocr-lattraining
- ** tesseract-georgian **:https://github.com/ddohler/tesseract-georgian
- ** Polish Fraktur **:作为[IMPACT项目的结果]进行培训(http://dl.psnc.pl/activities/projekty/impact/results/),[训练数据集](http://dl.psnc 特等/下载/tesseract_traineddata.zip)
- 古希腊语:http://ancientgreekocr.org
- 印度:http://code.google.com/p/tesseractindic/,https://github.com/debayan/Tesseract-Indic-OCR/,http://code.google.com/p /parichit/(全部已过时)
- **印度语 - OCR ** http://indic-ocr.github.io/tessdata/
- **爱尔兰uncial **:https://github.com/jimregan/tesseract-gle-uncial
- 波兰语:http://code.google.com/p/tesseract-polish/
- ** Fraktur **(dan,deu,swe):https://github.com/paalberti/tesseract-dan-fraktur
- 缅甸:http://code.google.com/p/myaocr/
- 波斯语(波斯语):https://github.com/reza1615/PersianOcr
- ** 7段字体**:https://github.com/arturaugusto/display_ocr/tree/master/letsgodigital
##端口
- Naptha项目
- tesseract.js-core - Tesseract C ++ API的Emscripten端口
- tesseract.js - 纯Javascript OCR
Tesseract包装器
Tesseract 4.0x
** **的Java
- bytedeco - 基于来自https://bytedeco.org的“JavaCPP-Presets”库的Tesseract的Java配置和接口类 - https://github.com/bytedeco/javacpp-presets
Tesseract 3.0x
C
- Tesseract 3.02及更高版本包括C API
.净
- charlesw/tesseract - 项目提供[tesseract-ocr 64bit Windows库](https://github.com/charlesw/tesseract/tree/master/src/ LIB/TesseractOcr/x64)的
- http://code.google.com/p/tesseractdotnet/
蟒蛇
- tesserocr - 围绕Tesseract的C ++ API的Python包装器
- pyocr - Tesseract(和楔形文字)的Python包装器
- tesserwrap - 与Tesseract API的Python绑定
- tesseract-sip - 用于libtesseract的python SIP包装器(Apache许可证)
- pytesseract - Tesseract OCR的包装类
- python-tesseract(替代链接) - 一个包装类 Tesseract OCR允许任何传统图像文件(基于SWIG)
- http://code.google.com/p/pytess/ - 一个简单的基于SWIG的Tesseract界面
** R **
- tesseract与R编程语言的C ++ API绑定
红宝石
- ruby-tesseract-ocr - 使用C ++ API的tesseract 3.0x包装器
- rtesseract
** **的Java
- bytedeco - 基于来自https://bytedeco.org的“JavaCPP-Presets”库的Tesseract的Java配置和接口类 - https://github.com/bytedeco/javacpp-presets
- tess4j - JNA包装器. 文档和讨论 - http://tess4j.sourceforge.net/
** **的Node.js
- penteract - 本机node.js绑定到Tesseract OCR项目.
** ** PHP
**目标C **
走
** **的Clojure
Tesseract 2.0x
蟒蛇
- https://github.com/hoffstaetter/python-tesseract/wiki
- http://code.google.com/p/pytesser/
- http://code.google.com/p/tesseract-python(pytesser clone)
- https://github.com/hoffstaetter/python-tesseract/wiki
- http://pokerai.org/pf3/viewtopic.php?f=3&t=2677&start=0&st=0&sk=t&sd=a
- SWIG包装的补丁for python
.净
** **的Java
- tess4j(0.4) - JNA包装器. 文档和讨论 - http://tess4j.sourceforge.net/