FeatureRule - PPilger/text-detection GitHub Wiki

These rules enable you to define validity of features based on certain attributes. The rules can be used in [feature detection](feature detection) (only valid features are stored) and to remove features from a FeatureSet (e.g. after feature linking)

Usage

  • Use AreaFeatureRule to remove too large or too small features. Note that some letters have a small area because of their shape (e.g. 'i').
  • Use RankingFeatureRule to remove linked features with a low ranking (it is unlikely that the feature is a word or a text). As every ContourFeature has the ranking 1 (may be changed in future versions), it should not be used for feature detection.
  • Use SizeFeatureRule to remove too wide/tall or too small/short features.

The rules should be used in feature detection and after feature linking. For feature linking it is possible to set stricter rules (as single letters should not be present any more).

Parameter

area

The area is defined by the bounding rectangle, not by the real shape.

Reasonable values for feature detection:

  • minimum: the area of the smallest letter.
  • maximum: the area of the biggest detected letter or word-fragment.

and after feature linking:

  • minimum: the area of the smallest text in the image.

ranking

The optimal minimum value is somewhere between 0 and the smallest number of letters in a word.

1 is a reaonable minimum value to start with. If words or text is removed the value should be decreased, otherwise increased.

width

The width is the longer side of the bounding rectangle.

Reasonable values for feature detection:

  • minimum: the height of the smallest letter (as 'i' is tall, the height is the longer side and therefor stored as width).
  • maximum: the greatest dimension of the biggest detected letter or word-fragment.

and after feature linking:

  • minimum: the width of the smallest text in the image.