Training Comparison - UB-Mannheim/AustrianNewspapers GitHub Wiki
Comparison of different OCR engines
The data set is relatively small, maybe ideal for a comparison of different OCR engines like Calamari, Kraken, Tesseract).
To compare different engines, all should use identical parameters as far as possible.
- Each training must use the same ground truth pairs (line images and text) for the training.
- Image preprocessing (binarization, ...) must be identical.
- The order of the images used for training must be identical.
- The internally used height of the images must be identical.
- The number of training epochs / iterations must be identical.
- The network specification must be identical.
- ...