Possible Improvements - PPilger/text-detection GitHub Wiki

General

Automated parameterization

Some parameters may be automatically set. Analysing the histogram may be a good approach. The analysis done so far is stored in the folder "analysis".

BackgroundProcessor (parameter distance):

Calculating the average area per object (number of white pixels / number of objects where the latter one can be determinded by counting the contours) after processing the image (so the image has to be processed many times to figure out the optimal value) proves very promising for 2 of the 3 test images (Portolan Atlas completely failed).

Failed attemps:

Using the histogram of the difference image (variable img before validating) as the histogram has no "keypoints" near the optimal parameter (manually determined).

Providing a GUI

A GUI would simplify parameterization.

Parallelization

To improve the performance of the tool. It may also be a good idea to look at OpenCVs GPU computing abilities.

Image Processing

BinaryChromaticityProcessor

As the ChromaticityProcessor does quite a good job, this technique could also be used to create a binary processor.

ThicknessVarianceProcessor

Remove objects where the thickness varies strongly (the lines of a text have roughly the same thickness along the whole line in most fonts). SkelettonProcessor can be used as basis for this processor.

SeperationProcessor

Seperate objects that have a weak connection.

Possible approaches:

analysing the contour of the object
analyzing the thickness distribution inside the object
analyzing the color/chromaticity distribution inside the object

Feature Linking

Improve linked features

Linked features may be improved by taking away single features that are not likely to be part of the text. Maybe the ranking could be used to do so. This may also be part of the feature filtering (after feature linking)

Feature Filtering

Remove strongly varying features

It is assumed, that most of the features are detected correctly as text.

Find values of feature-attributes, that most features have in common. Remove all features with strongly differing values.