Simple Object detection

1. Abstract

Simple Objects is the core of this solution. A Simple Object is a set of attributes that together form an integral object. Each attribute can be recognized by one or another method of computer vision. This structure allows you to combine various attributes with each other, increasing the set of objects that can be recognized by this solution. Attribute detectors can operate in three modes: Detect, Check and Extract. In the detection mode, an image is fed into the input, and an area corresponding to the position of the object in the real world is determined at the output. In the check mode, a sub-area of the image is fed into the input, and the output is a conclusion about whether this area satisfies the attribute or not. The extraction mode does not directly affect the process and the result of recognition, it only records some information about the object, but this information can be used by other attributes in the check mode.

2. Hard and soft detection

All attributes are defined separately from each other. The task of determining which of the attributes together form a simple object is solved in two ways. In hard detection mode, all attributes must be recognized or successfully checked, depending on the detector mode. Accordingly, the absence of any of them makes the object unrecognizable as a whole.

hard mode

The image above shows an example of the recognition process for an object, which is described by three detection attributes and one verification attribute, and the order is important in this case. At the first stage, the attribute 1 detector returns a list of N1 areas that satisfy it. The same thing happens with attribute 2. Then both sets of regions are fed to the input of the matcher, which uses IOU (intersection over union) as a measure of the similarity of the regions. The matcher in hard recognition mode removes areas of each feature that do not have sufficient IOU with areas of another attribute. The same areas that were not removed are combined with each other, forming a new set of areas of N3 pieces. N3 in this case will not exceed the minimum size of the two primary sets of regions, since each object from one set can be combined with only one of the other, with which it has the largest IOU-measure. Next comes attribute 3, which has a check mode. Each area obtained at the previous stage is checked for compliance with this attribute. Areas that fail validation are removed from the set. Next comes again an attribute in the check mode, which generated N5 areas. These areas are also fed to the input of the IOU of the matcher with the areas that have passed the test, forming the final list of areas, which are the areas corresponding to the object in the image.

When the object has been recognized, its final confidence factor is calculated using the formula:

Where D is the set of recognized attributes, O is the set of attributes that describe the object, ki is the weight of the attribute specified by the user, pi is the attribute's contribution, normalized to one.

In soft recognition mode, some of the attributes may be missed. However, this will affect the final recognition confidence factor for this object.

soft mode

The image above shows an example of the process of recognizing the same object (as in the previous example), but in soft mode. In the first steps, tags also generate sets of areas. However, when they enter the input of the IOU-matcher, those areas that do not have a "pair" from the list of another attribute are not removed. Instead, they also pass through the matcher, but 0 is set at the place of the attribute contribution. The same areas that have a "pair" are combined in a similar hard mode. When the list of areas gets into the check flag, the regions that have not passed it are also not deleted and receive 0 in the contribution of this sign. Similarly, the resulting areas pass the matcher with the output of attribute 4. Next, the confidence coefficient is calculated for each area, and if it is less than the specified threshold, then the object is removed from the list, and all the rest are recognized objects.

3. XML-description

To set the recognition of a Simple Object in the configuration file, you need to put a description of all its attributes in the AttributeLib tag, and a description of the object itself in the SimpleObjectBase tag, referring to its attributes by name.

<AttributeLib>
    
    <Attribute Name="HSVColorBrightYellow" Type="HSVColor" Hmin="35" Hmax="103" Smin="104" Smax="255" Vmin="0" Vmax="255"/>
  
</AttributeLib>

<SimpleObjectBase>  
      
    <SimpleObject Name="YellowSticer" ID="1">              
        <Attribute Type="Detect">HSVColorBrightYellow</Attribute>            
    </SimpleObject>  
    
</SimpleObjectBase>

Detailed examples of creating a configuration file are given in the descriptions of specific attributes. The package also contains a configuration file containing the examples provided in this documentation.

3.1. Attributes description

3.1.1. Attribute tag parameters

Name (string, must be set) The unique name of the attribute to which the simple object description will refer.
Type (string, must be set) The attribute type must be one of the following list.

Type	Description	Modes*	Possibility to set accuracy*	3D-pose estimation	Contour extraction	Keypoint extraction
HSVcolor	Threshold color filtering	DC	C	❌	✔	❌
HaarCascade	Haar cascade detection	D	-	❌	❌	❌
Size	Checking the size in the image	C	-	❌	❌	❌
HistColor	Histogram color filtering	DC	C	❌	✔	❌
Hough	Hough transform detection	D	-	❌	❌	❌
Dimension	Checking the aspect ratio of objects	C	-	❌	❌	❌
BasicMotion	Simple motion detector	D	-	❌	✔	❌
Aruco	ArUco-marker detector	D	-	✔	✔	❌
Feature	Key points method detector	D	-	✔	✔	✔
Pose	Checking object pose on image	C	-	❌	❌	❌
DNN	Detection by convolutional neural networks imported into OpenCV	D	D	❌	❌	❌
QR	Detecting QR codes using OpenCV	D	-	❌	✔	❌
QR_Zbar	Detection of QR codes using the Zbar library	D	-	✔	✔	❌
LogicAnd	Logical AND over two attributes	DC	-	❌	❌	❌
LogicNot	Logical NOT over attribute	DC	-	❌	❌	❌
LogicOR	Logical OR over two attributes	DC	-	❌	❌	❌
Blob	Blob detector	D	-	❌	❌	❌
Depth	Extracting disnace to object using depth camera	E	-	✔	❌	❌
RoughDist	Rough determination of the distance to an object with known geometric characteristics	E	-	✔	❌	❌
Dist	Object distance checking	C	-	❌	❌	❌
FaceDlib	Face detection and identification	DE	DE	❌	❌	❌
ExractedInfoId	Checking identifiers in extracted information	C	-	❌	❌	❌
ExractedInfoString	Checking strings in extracted information	C	-	❌	❌	❌
UnitTranslation	Unit translation extraction	E	-	✔	❌	❌
SquareObjectDistance	Distance extraction for squared objects	E	-	✔	✔	❌
SquareObjectDistance	Distance extraction for squared objects	E	-	✔	✔	❌
TorchYOLOv7Attribute	YOLOv7 CNN detector	D	D	❌	❌	❌
TorchYOLOv7KeypointAttribute	YOLOv7 keypoint CNN detector	D	D	❌	❌	✔
ROSSubcriberOpenPoseRaw	OpenPose raw output decoder	D	D	❌	❌	✔
KeypointPoseAttribute	3D-pose extraction for keypoints	E	-	✔	❌	❌
DummyHumanPose	Hard-coded human poses	E	-	❌	❌	❌

* - D - Detect mode, С - Check mode, E - Extract mode.

Probability (double, default: 0.75) The lower threshold of the coefficient of confidence in attribute recognition, if supported by the detector.
Contour (bool, default: true) Returns the contour if able to.
The rest of the parameters are specific for different types of attributes, see the sections dedicated to specific attributes.

3.1.2. Inner tags

Clusterization - allows you to specify the clustering method for the characteristic output. Read more in the clusterization section.
Filter - allows you to set an additional filter for the attributes. Read more in the Filtering section.

3.2. Object description

3.2.1. SimpleObject tag parameters

ID (int, must be set) The unique identifier for the object. At the moment, automatic correction of duplicate numbers is not implemented, watch out for error messages.
Name (string, must be set) Unice object number.
Probability (double, default: 0.75) The lower threshold for the confidence coefficient.
IOU (double, default: 0.75) The lower threshold for intersection over union (IOU).
Mode (string, default: Hard) Recognition mode (see above), can take values Hard or Soft.
MergingPolicy (string, default: Intersection) Merging Policy of regions and contours. Accepts the values Intersection and Union.
Weight (double, default: 1) Attribute weight. See the confidence factor formula above.

3.1. Inner tags

Attribute The main tag, inside the tag is the attribute name from the AttributeLib. This tag has a number of parameters.

Type accepts the values Detect (by default), Check or Extract.
Channel accepts RGB values (by default) for working with a color image, or DEPTH for working with a depth map.

Tracker Allows you to configure tracking of an object, see the tracking section for details.

3.3. Extended example

Below is an example of the description of the "red cup" object. It consists of the HistDarkRed attribute of red areas, which has a declared filter that removes recognition inside its own areas. Next comes the NotFractal image size attribute , which filters out the small areas detected by the color attribute . After that, the image attribute is declared, which, using the DNN module, determines the cups in the image. An NMS filter has been added to this attribute. At this point, the subject already recognizes the red cups. The following is the sign of information extraction DepthKinect, which allows you to localize an object in three-dimensional space using a depth camera. The object description ends with the declaration of a tracker that will correlate objects on successive frames and try to find the object in the case when the detector cannot do this.

<AttributeLib>    
    
    <Attribute Name="HistDarkRed" Type="HistColor" Histogram="histograms/DarkRed.yaml">    
        <Filter Type="Insider"/>
    </Attribute>
    
    <Attribute Name="NotFractal" Type="Size" MinAreaPc="0.5" MaxAreaPc="100"/>
        
    <Attribute Name="CupDnn" Type="Dnn" framework="tensorflow" weights="ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb" config="ssd_mobilenet_v1_coco_2017_11_17/config.pbtxt" labels="ssd_mobilenet_v1_coco_2017_11_17/mscoco_label_map.pbtxt" inputWidth="300" inputHeight="300" Probability="0.75" obj_id="47">    
        <Filter Type="NMS" threshold="0.5"/>
    </Attribute>
            
    <Attribute Name="DepthKinect" Type="Depth" depthScale="0.001"/>
        
</AttributeLib>

<SimpleObjectBase> 
    
    <SimpleObject Name="RedCup" ID="61" Mode="Soft" MergingPolicy="Union">           
        <Attribute Type="Detect">HistDarkRed</Attribute>                    
        <Attribute Type="Check">NotFractal</Attribute>
        <Attribute Type="Detect">CupDnn</Attribute>
        <Attribute Type="Extract" Channel="DEPTH">DepthKinect</Attribute>                  
        <Tracker IOU="0.25" decay="0.01">MOSSE</Tracker>
    </SimpleObject>
    
</SimpleObjectBase>

simpleobject_detection - Extended-Object-Detection-ROS/wiki_english GitHub Wiki

Simple Object detection

1. Abstract

2. Hard and soft detection

3. XML-description

3.1. Attributes description

3.1.1. Attribute tag parameters

3.1.2. Inner tags

3.2. Object description

3.2.1. SimpleObject tag parameters

3.1. Inner tags

3.3. Extended example

⚠️ GitHub.com Fallback ⚠️

simpleobject_detection - Extended-Object-Detection-ROS/wiki_english GitHub Wiki

Simple Object detection

1. Abstract

2. Hard and soft detection

3. XML-description

3.1. Attributes description

3.1.1. Attribute tag parameters

3.1.2. Inner tags

3.2. Object description

3.2.1. SimpleObject tag parameters

3.1. Inner tags

3.3. Extended example

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️