Possible Improvements - PPilger/text-detection GitHub Wiki
General
Automated parameterization
Some parameters may be automatically set. Analysing the histogram may be a good approach. The analysis done so far is stored in the folder "analysis".
BackgroundProcessor (parameter distance):
- Calculating the average area per object (
number of white pixels / number of objectswhere the latter one can be determinded by counting the contours) after processing the image (so the image has to be processed many times to figure out the optimal value) proves very promising for 2 of the 3 test images (Portolan Atlas completely failed).
Failed attemps:
- Using the histogram of the difference image (variable img before validating) as the histogram has no "keypoints" near the optimal parameter (manually determined).
Providing a GUI
A GUI would simplify parameterization.
Parallelization
To improve the performance of the tool. It may also be a good idea to look at OpenCVs GPU computing abilities.
Image Processing
BinaryChromaticityProcessor
As the ChromaticityProcessor does quite a good job, this technique could also be used to create a binary processor.
ThicknessVarianceProcessor
Remove objects where the thickness varies strongly (the lines of a text have roughly the same thickness along the whole line in most fonts).
SkelettonProcessor can be used as basis for this processor.
SeperationProcessor
Seperate objects that have a weak connection.
Possible approaches:
- analysing the contour of the object
- analyzing the thickness distribution inside the object
- analyzing the color/chromaticity distribution inside the object
Feature Linking
Improve linked features
Linked features may be improved by taking away single features that are not likely to be part of the text. Maybe the ranking could be used to do so. This may also be part of the feature filtering (after feature linking)
Feature Filtering
Remove strongly varying features
It is assumed, that most of the features are detected correctly as text.
Find values of feature-attributes, that most features have in common. Remove all features with strongly differing values.