Tesseract - sksenthilkumar/NumberPlateExtractor GitHub Wiki

Tesseract is installed in ubuntu 18.04 using the instructions from:

https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/

The accuracy of tesseract: https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance

Accuracy test for this project

Data

To Test the performance of Tesseract the images available in the folder "images/singapore_numberPlates/".

Images were taken from the wiki page of Singapore number plates: https://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Singapore

Image 01:

image 01

Image 02:

image 02

Image 03:

image 03

Source code:

https://github.com/sksenthilkumar/NumberPlateExtractor/blob/master/test_teserract.py

Results:

The configuration of tesseract has two parameters that has to be changed according to the project:

OEM value:

  • There are four possible OEM values but other than 1 and 3 none of th other values gives results
  • When experiments where conducted with OEM 1 and 3.It is seen that they don't produce any difference in the result.

PSM value.

Preprocessing the input images:

  • When the images where given as it is, half of the images never gave an output escpecially the one with coloured background.
  • When the images where converted into black and white, the results showed improvement
  • When the adaptive thresholding was applied the tesseract was able to predict a lot more characters correctly.
  • However, removing the noise using equal histogram method and canny edge detector didn't help.

Existing problem:

  • The algorithm gets confused between 1 and i always.

Conclusion:

Best Configuration: OEM - 1 or 3, PSM - 8 or 14 Preprocessing the image: Turn it to balck and white and apply adaptive thresholding