Detection - RecycleAI/RecycleIT-A GitHub Wiki

Detection Metrics

1. Intersection Over Union

Intersection over Union (IoU) is a metric that quantifies the degree of overlap between two regions. IoU metric evaluates the correctness of a prediction. The value ranges from 0 to 1. With the help of IoU threshold value, we can decide whether a prediction is True Positive, False Positive, or False Negative.

IoU is useful through tresholding, that is, we need a threshold (α, say) and using this threshold we can decide if a detection is correct or not.


2. Confusion Matrix

In object detection, the correctness of the prediction (TP, FP, or FN) is decided with the help of IoU threshold. Whereas in image segmentation, it is decided by referencing the Ground Truth pixels. Ground truth meaning known objects.

Object detection predictions based on IoU

Image segmentation predictions based on Ground Truth Mask


3. Precision

P = TP/(TP + FP)


4. Recall

R = TP / (TP + FN)


5. Average Precision(AP)

it is the area under the precision-recall curve. APα means that AP precision is evaluated at α IoU theshold. AP is calculated individually for each class.


6. mAP Mean Average Precision or mAP is the average of AP over all detected classes.

mAP = 1/n * sum(AP), where n is the number of classes.


7. Average recall

Average recall (AR) is also a numerical metric that can be used to compare detector performance. In essence, AR is the recall averaged over all IoU ∈ [0.5, 1.0] and can be computed as two times the area under the recall-IoU curve:

AR = 2∫recall(o)do ,where o is IoU


8. Mean average recall

Mean average recall is defined as the mean of AR across all K classes.


9. Inference Time

The inference time is how long is takes for a forward propagation. To get the number of Frames per Second, we divide 1/inference time.

Detection Models

The state-of-the-art object detection methods can be categorized into two main types: One-stage vs. two-stage object detectors. n general, deep learning based object detectors extract features from the input image or video frame.

An object detector solves two subsequent tasks:

  1. Find an arbitrary number of objects (possibly even zero).

  2. Classify every single object and estimate its size with a bounding box.

1. Two-stage detectors

  • The two-stage architecture involves (1) object region proposal with conventional Computer Vision methods or deep networks, followed by (2) object classification based on features extracted from the proposed region with bounding-box regression.
  • Two-stage methods achieve the highest detection accuracy but are typically slower. Because of the many inference steps per image, the performance (frames per second) is not as good as one-stage detectors.
  • Various two-stage detectors include region convolutional neural network (RCNN), with evolutions Faster R-CNN or Mask R-CNN. The latest evolution is the granulated RCNN (G-RCNN)

2. One-stage detectors

  • One-stage object detectors prioritize inference speed and are super fast but not as good at recognizing irregularly shaped objects or a group of small objects.
  • The most popular one-stage detectors include the YOLO, SSD, and RetinaNet. The latest real-time detectors are YOLOv4-Scaled (2020) and YOLOR (2021). View the benchmark comparisons below.
  • The main advantage of single-stage is that those algorithms are generally faster than multi-stage detectors and structurally simpler.

3. Comparison above different types of detection algorithms Frames per second

Accuracy

Models with lowest inference time

Models with hieghest MAP

Comparison of two-stage and one-stage detection models

https://www.researchgate.net/figure/Performance-comparison-of-various-object-detection-models_fig5_349960212

Memory usage

GPU time

Detection Models specifications

Yolov7

by https://github.com/WongKinYiu

Model Test Size APtest AP50test AP75test batch 1 fps batch 32 average time
YOLOv7 640 51.4% 69.7% 55.9% 161 fps 2.8 ms
YOLOv7-X 640 53.1% 71.2% 57.8% 114 fps 4.3 ms
YOLOv7-W6 1280 54.9% 72.6% 60.1% 84 fps 7.6 ms
YOLOv7-E6 1280 56.0% 73.5% 61.2% 56 fps 12.3 ms
YOLOv7-D6 1280 56.6% 74.0% 61.8% 44 fps 15.0 ms
YOLOv7-E6E 1280 56.8% 74.4% 62.1% 36 fps 18.7 ms

Yolov5

Model size
(pixels)
mAPval
0.5:0.95
mAPval
0.5
Speed
CPU b1
(ms)
Speed
V100 b1
(ms)
Speed
V100 b32
(ms)
params
(M)
FLOPs
@640 (B)
YOLOv5n 640 28.0 45.7 45 6.3 0.6 1.9 4.5
YOLOv5s 640 37.4 56.8 98 6.4 0.9 7.2 16.5
YOLOv5m 640 45.4 64.1 224 8.2 1.7 21.2 49.0
YOLOv5l 640 49.0 67.3 430 10.1 2.7 46.5 109.1
YOLOv5x 640 50.7 68.9 766 12.1 4.8 86.7 205.7
YOLOv5n6 1280 36.0 54.4 153 8.1 2.1 3.2 4.6
YOLOv5s6 1280 44.8 63.7 385 8.2 3.6 12.6 16.8
YOLOv5m6 1280 51.3 69.3 887 11.1 6.8 35.7 50.0
YOLOv5l6 1280 53.7 71.3 1784 15.8 10.5 76.8 111.4
YOLOv5x6
+ [TTA][TTA]
1280
1536
55.0
55.8
72.7
72.7
3136
-
26.2
-
19.4
-
140.7
-
209.8
-

Yolor

Model Test Size APval AP50val AP75val APSval APMval APLval FLOPs weights batch1 throughput
YOLOR-P6 1280 52.5% 70.6% 57.4% 37.4% 57.3% 65.2% 326G yolor-p6.pt 76 fps
YOLOR-W6 1280 54.0% 72.1% 59.1% 38.1% 58.8% 67.0% 454G yolor-w6.pt 66 fps
YOLOR-E6 1280 54.6% 72.5% 59.8% 39.9% 59.0% 67.9% 684G yolor-e6.pt 45 fps
YOLOR-D6 1280 55.4% 73.5% 60.6% 40.4% 60.1% 68.7% 937G yolor-d6.pt 34 fps
YOLOR-S 640 40.7% 59.8% 44.2% 24.3% 45.7% 53.6% 21G
YOLOR-SDWT 640 40.6% 59.4% 43.8% 23.4% 45.8% 53.4% 21G
YOLOR-S2DWT 640 39.9% 58.7% 43.3% 21.7% 44.9% 53.4% 20G
YOLOR-S3S2D 640 39.3% 58.2% 42.4% 21.3% 44.6% 52.6% 18G
YOLOR-S3DWT 640 39.4% 58.3% 42.5% 21.7% 44.3% 53.0% 18G
YOLOR-S4S2D 640 36.9% 55.3% 39.7% 18.1% 41.9% 50.4% 16G weights
YOLOR-S4DWT 640 37.0% 55.3% 39.9% 18.4% 41.9% 51.0% 16G weights

Comparison of all yolo models

Label and image format of yolo models

Yolov7 and Yolov5 labels should be in text format and images should be in jpg format.

⚠️ **GitHub.com Fallback** ⚠️