ControlParams - kana112233/tesseract GitHub Wiki

#有用的控制参数和配置文件列表.

介绍

Tesseract is extremely flexible, if you know how to control it. There is a large number of control parameters to modify its behaviour. While these change from time to time, most of them are fairly stable. List of all parameters with default value and short description can be retrieved with:

tesseract --print-parameters

有3种不同的类型:

###初始化

在其初始化宏中以** INIT **为特征.

这些参数只能在TessBaseAPI :: Init函数中设置,该函数获取配置文件列表.

:您不能使用tesseract可执行选项-c更改init only参数.

其余的可以通过TessBaseAPI :: SetVariable设置并再创建2个组:

###一般参数

控制Tesseract功能的许多不同方面.

###调试参数

在其名称中包含调试,在Tesseract工作时控制大量可选的调试文本和图形输出.

#有用的参数

请注意,默认值可能会更改; 如果您需要确定它,请检查源代码.

Name Type Default value Init only Description
load_system_dawg boolean (0/1) 1 Yes Controls whether or not to load the main dictionary for the selected language.
user_words_suffix string "" Yes The extension of the users-words word list file. If non-empty, it will attempt to load the relevant list of words to add to the dictionary for the selected language. Eg if set to user-words Tesseract will attempt to load eng.user-words from the tessdata directory at initialization time.
language_model_penalty_non_dict_word double (0-1) 0.15 No The penalty to apply to words not in the word_dawg / user_words wordlists.
language_model_penalty_non_freq_dict_word double (0-1) 0.1 No The penalty to apply to words not in the freq_dawg wordlist.

###日语和中文的有用参数

部分日语tesseract用户发现这些参数有助于提高日语的tesseract-ocr(3.02)准确度:

Name Suggested value Description
chop_enable T Chop enable.
use_new_state_cost F Use new state cost heuristics for segmentation state evaluation
segment_segcost_rating F Incorporate segmentation cost in word rating?
enable_new_segsearch 0 Enable new segmentation search path. It could solve the problem of dividing one character to two characters
language_model_ngram_on 0 Turn on/off the use of character ngram model.
textord_force_make_prop_words F Force proportional word segmentation on all rows.
edges_max_children_per_outline 40 Max number of children inside a character outline. Increase this value if some of KANJI characters are not recognized (rejected).