Training Comparison - UB-Mannheim/AustrianNewspapers GitHub Wiki

Comparison of different OCR engines

The data set is relatively small, maybe ideal for a comparison of different OCR engines like Calamari, Kraken, Tesseract).

To compare different engines, all should use identical parameters as far as possible.

  • Each training must use the same ground truth pairs (line images and text) for the training.
  • Image preprocessing (binarization, ...) must be identical.
  • The order of the images used for training must be identical.
  • The internally used height of the images must be identical.
  • The number of training epochs / iterations must be identical.
  • The network specification must be identical.
  • ...