Tesseract - sksenthilkumar/NumberPlateExtractor GitHub Wiki
Tesseract is installed in ubuntu 18.04 using the instructions from:
https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/
The accuracy of tesseract: https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance
Accuracy test for this project
Data
To Test the performance of Tesseract the images available in the folder "images/singapore_numberPlates/".
Images were taken from the wiki page of Singapore number plates: https://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Singapore
Image 01:
Image 02:
Image 03:
Source code:
https://github.com/sksenthilkumar/NumberPlateExtractor/blob/master/test_teserract.py
Results:
The configuration of tesseract has two parameters that has to be changed according to the project:
OEM value:
- There are four possible OEM values but other than 1 and 3 none of th other values gives results
- When experiments where conducted with OEM 1 and 3.It is seen that they don't produce any difference in the result.
PSM value.
- what is a PSM value: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
- PSM = 2 also throws an error always and PSM =5 never gave a signle right answer.
- PSM = 8 and PSM = 13 gave the best result.
Preprocessing the input images:
- When the images where given as it is, half of the images never gave an output escpecially the one with coloured background.
- When the images where converted into black and white, the results showed improvement
- When the adaptive thresholding was applied the tesseract was able to predict a lot more characters correctly.
- However, removing the noise using equal histogram method and canny edge detector didn't help.
Existing problem:
- The algorithm gets confused between 1 and i always.
Conclusion:
Best Configuration: OEM - 1 or 3, PSM - 8 or 14 Preprocessing the image: Turn it to balck and white and apply adaptive thresholding