simpleobject_detection - Extended-Object-Detection-ROS/wiki_english GitHub Wiki

Simple Object detection

1. Abstract

Simple Objects is the core of this solution. A Simple Object is a set of attributes that together form an integral object. Each attribute can be recognized by one or another method of computer vision. This structure allows you to combine various attributes with each other, increasing the set of objects that can be recognized by this solution. Attribute detectors can operate in three modes: Detect, Check and Extract. In the detection mode, an image is fed into the input, and an area corresponding to the position of the object in the real world is determined at the output. In the check mode, a sub-area of ​​the image is fed into the input, and the output is a conclusion about whether this area satisfies the attribute or not. The extraction mode does not directly affect the process and the result of recognition, it only records some information about the object, but this information can be used by other attributes in the check mode.

2. Hard and soft detection

All attributes are defined separately from each other. The task of determining which of the attributes together form a simple object is solved in two ways. In hard detection mode, all attributes must be recognized or successfully checked, depending on the detector mode. Accordingly, the absence of any of them makes the object unrecognizable as a whole.

hard mode

The image above shows an example of the recognition process for an object, which is described by three detection attributes and one verification attribute, and the order is important in this case. At the first stage, the attribute 1 detector returns a list of N1 areas that satisfy it. The same thing happens with attribute 2. Then both sets of regions are fed to the input of the matcher, which uses IOU (intersection over union) as a measure of the similarity of the regions. The matcher in hard recognition mode removes areas of each feature that do not have sufficient IOU with areas of another attribute. The same areas that were not removed are combined with each other, forming a new set of areas of N3 pieces. N3 in this case will not exceed the minimum size of the two primary sets of regions, since each object from one set can be combined with only one of the other, with which it has the largest IOU-measure. Next comes attribute 3, which has a check mode. Each area obtained at the previous stage is checked for compliance with this attribute. Areas that fail validation are removed from the set. Next comes again an attribute in the check mode, which generated N5 areas. These areas are also fed to the input of the IOU of the matcher with the areas that have passed the test, forming the final list of areas, which are the areas corresponding to the object in the image.

When the object has been recognized, its final confidence factor is calculated using the formula:

Where D is the set of recognized attributes, O is the set of attributes that describe the object, ki is the weight of the attribute specified by the user, pi is the attribute's contribution, normalized to one.

In soft recognition mode, some of the attributes may be missed. However, this will affect the final recognition confidence factor for this object.

soft mode

The image above shows an example of the process of recognizing the same object (as in the previous example), but in soft mode. In the first steps, tags also generate sets of areas. However, when they enter the input of the IOU-matcher, those areas that do not have a "pair" from the list of another attribute are not removed. Instead, they also pass through the matcher, but 0 is set at the place of the attribute contribution. The same areas that have a "pair" are combined in a similar hard mode. When the list of areas gets into the check flag, the regions that have not passed it are also not deleted and receive 0 in the contribution of this sign. Similarly, the resulting areas pass the matcher with the output of attribute 4. Next, the confidence coefficient is calculated for each area, and if it is less than the specified threshold, then the object is removed from the list, and all the rest are recognized objects.

3. XML-description

To set the recognition of a Simple Object in the configuration file, you need to put a description of all its attributes in the AttributeLib tag, and a description of the object itself in the SimpleObjectBase tag, referring to its attributes by name.

<AttributeLib>
    
    <Attribute Name="HSVColorBrightYellow" Type="HSVColor" Hmin="35" Hmax="103" Smin="104" Smax="255" Vmin="0" Vmax="255"/>
  
</AttributeLib>

<SimpleObjectBase>  
      
    <SimpleObject Name="YellowSticer" ID="1">              
        <Attribute Type="Detect">HSVColorBrightYellow</Attribute>            
    </SimpleObject>  
    
</SimpleObjectBase>

Detailed examples of creating a configuration file are given in the descriptions of specific attributes. The package also contains a configuration file containing the examples provided in this documentation.

3.1. Attributes description

3.1.1. Attribute tag parameters

  1. Name (string, must be set) The unique name of the attribute to which the simple object description will refer.
  2. Type (string, must be set) The attribute type must be one of the following list.
Type Description Modes* Possibility to set accuracy* 3D-pose estimation Contour extraction Keypoint extraction
HSVcolor Threshold color filtering DC C ❌ βœ” ❌
HaarCascade Haar cascade detection D - ❌ ❌ ❌
Size Checking the size in the image C - ❌ ❌ ❌
HistColor Histogram color filtering DC C ❌ βœ” ❌
Hough Hough transform detection D - ❌ ❌ ❌
Dimension Checking the aspect ratio of objects C - ❌ ❌ ❌
BasicMotion Simple motion detector D - ❌ βœ” ❌
Aruco ArUco-marker detector D - βœ” βœ” ❌
Feature Key points method detector D - βœ” βœ” βœ”
Pose Checking object pose on image C - ❌ ❌ ❌
DNN Detection by convolutional neural networks imported into OpenCV D D ❌ ❌ ❌
QR Detecting QR codes using OpenCV D - ❌ βœ” ❌
QR_Zbar Detection of QR codes using the Zbar library D - βœ” βœ” ❌
LogicAnd Logical AND over two attributes DC - ❌ ❌ ❌
LogicNot Logical NOT over attribute DC - ❌ ❌ ❌
LogicOR Logical OR over two attributes DC - ❌ ❌ ❌
Blob Blob detector D - ❌ ❌ ❌
Depth Extracting disnace to object using depth camera E - βœ” ❌ ❌
RoughDist Rough determination of the distance to an object with known geometric characteristics E - βœ” ❌ ❌
Dist Object distance checking C - ❌ ❌ ❌
FaceDlib Face detection and identification DE DE ❌ ❌ ❌
ExractedInfoId Checking identifiers in extracted information C - ❌ ❌ ❌
ExractedInfoString Checking strings in extracted information C - ❌ ❌ ❌
UnitTranslation Unit translation extraction E - βœ” ❌ ❌
SquareObjectDistance Distance extraction for squared objects E - βœ” βœ” ❌
SquareObjectDistance Distance extraction for squared objects E - βœ” βœ” ❌
TorchYOLOv7Attribute YOLOv7 CNN detector D D ❌ ❌ ❌
TorchYOLOv7KeypointAttribute YOLOv7 keypoint CNN detector D D ❌ ❌ βœ”
ROSSubcriberOpenPoseRaw OpenPose raw output decoder D D ❌ ❌ βœ”
KeypointPoseAttribute 3D-pose extraction for keypoints E - βœ” ❌ ❌
DummyHumanPose Hard-coded human poses E - ❌ ❌ ❌

* - D - Detect mode, Π‘ - Check mode, E - Extract mode.

  1. Probability (double, default: 0.75) The lower threshold of the coefficient of confidence in attribute recognition, if supported by the detector.
  2. Contour (bool, default: true) Returns the contour if able to.
  3. The rest of the parameters are specific for different types of attributes, see the sections dedicated to specific attributes.

3.1.2. Inner tags

  1. Clusterization - allows you to specify the clustering method for the characteristic output. Read more in the clusterization section.
  2. Filter - allows you to set an additional filter for the attributes. Read more in the Filtering section.

3.2. Object description

3.2.1. SimpleObject tag parameters

  1. ID (int, must be set) The unique identifier for the object. At the moment, automatic correction of duplicate numbers is not implemented, watch out for error messages.
  2. Name (string, must be set) Unice object number.
  3. Probability (double, default: 0.75) The lower threshold for the confidence coefficient.
  4. IOU (double, default: 0.75) The lower threshold for intersection over union (IOU).
  5. Mode (string, default: Hard) Recognition mode (see above), can take values Hard or Soft.
  6. MergingPolicy (string, default: Intersection) Merging Policy of regions and contours. Accepts the values Intersection and Union.
  7. Weight (double, default: 1) Attribute weight. See the confidence factor formula above.

3.1. Inner tags

  1. Attribute The main tag, inside the tag is the attribute name from the AttributeLib. This tag has a number of parameters.
  • Type accepts the values Detect (by default), Check or Extract.
  • Channel accepts RGB values (by default) for working with a color image, or DEPTH for working with a depth map.
  1. Tracker Allows you to configure tracking of an object, see the tracking section for details.

3.3. Extended example

Below is an example of the description of the "red cup" object. It consists of the HistDarkRed attribute of red areas, which has a declared filter that removes recognition inside its own areas. Next comes the NotFractal image size attribute , which filters out the small areas detected by the color attribute . After that, the image attribute is declared, which, using the DNN module, determines the cups in the image. An NMS filter has been added to this attribute. At this point, the subject already recognizes the red cups. The following is the sign of information extraction DepthKinect, which allows you to localize an object in three-dimensional space using a depth camera. The object description ends with the declaration of a tracker that will correlate objects on successive frames and try to find the object in the case when the detector cannot do this.

<AttributeLib>    
    
    <Attribute Name="HistDarkRed" Type="HistColor" Histogram="histograms/DarkRed.yaml">    
        <Filter Type="Insider"/>
    </Attribute>
    
    <Attribute Name="NotFractal" Type="Size" MinAreaPc="0.5" MaxAreaPc="100"/>
        
    <Attribute Name="CupDnn" Type="Dnn" framework="tensorflow" weights="ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb" config="ssd_mobilenet_v1_coco_2017_11_17/config.pbtxt" labels="ssd_mobilenet_v1_coco_2017_11_17/mscoco_label_map.pbtxt" inputWidth="300" inputHeight="300" Probability="0.75" obj_id="47">    
        <Filter Type="NMS" threshold="0.5"/>
    </Attribute>
            
    <Attribute Name="DepthKinect" Type="Depth" depthScale="0.001"/>
        
</AttributeLib>

<SimpleObjectBase> 
    
    <SimpleObject Name="RedCup" ID="61" Mode="Soft" MergingPolicy="Union">           
        <Attribute Type="Detect">HistDarkRed</Attribute>                    
        <Attribute Type="Check">NotFractal</Attribute>
        <Attribute Type="Detect">CupDnn</Attribute>
        <Attribute Type="Extract" Channel="DEPTH">DepthKinect</Attribute>                  
        <Tracker IOU="0.25" decay="0.01">MOSSE</Tracker>
    </SimpleObject>
    
</SimpleObjectBase>
⚠️ **GitHub.com Fallback** ⚠️