Training Comparison - UB-Mannheim/AustrianNewspapers GitHub Wiki

Comparison of different OCR engines

The data set is relatively small, maybe ideal for a comparison of different OCR engines like Calamari, Kraken, Tesseract).

To compare different engines, all should use identical parameters as far as possible.

Each training must use the same ground truth pairs (line images and text) for the training.
Image preprocessing (binarization, ...) must be identical.
The order of the images used for training must be identical.
The internally used height of the images must be identical.
The number of training epochs / iterations must be identical.
The network specification must be identical.
...