Card Detection research - dnum-mi/basegun-ml GitHub Wiki

Objectives

To measure the length of the weapon, a reference object will be placed next to it. After interviewing the end users (French police forces), we've selected a card as the reference object. This can be any card, as long as it's the size of a credit card.

Consequently, we need to develop a card detection algorithm. There are several ways to achieve this, and we'll explore some of them in this section.

Keypoint Detection

Our initial idea for card detection was to use the same method as before, namely keypoint detection. The model would detect the card and pinpoint each corner for greater precision.

Keypoint on card

However, the results were not satisfactory, as shown in the image. We hypothesize that the poor performance is due to the keypoint detection model's need to differentiate each keypoint from one another, which is not possible for the corners of a card.

Contour Detection

Another method is contour detection, a computer vision technique used to detect the borders of objects in images. This method is purely mathematical and does not require any training. However, its performance may vary depending on the image's background.

After some experimentation, we rejected this method due to the variability of the results depending on the background and other objects in the image, such as weapons.

Semantic Segmentation

Semantic segmentation is a method based on analyzing the context of objects in the image to determine and segment different objects. One of the most well-known models for this task is SAM (Segment Anything Model) from Meta. However, semantic segmentation models are quite heavy and generally require a GPU for training and inference. In our use case, only a CPU will be provided. After some experiments, it took more than a minute to segment a card on our machine, which is far longer than the defined limit of 2 seconds. Some recent work has been published about "light" semantic segmentation models, which are faster.

Oriented Bounding Box

Oriented bounding box detection is similar to simple bounding box detection. An object is detected in an image, and the prediction is a set of coordinates of a box surrounding the object. For OBB, an angle is added to the coordinates to obtain a more precise box.

BB vs OBB

OBB models are lighter than segmentation models and more efficient and consistent than contour detection algorithms. However, OBB models might be difficult to train and require a lot of data.

Conclusion

After experimenting with these methods, we've decided to use OBB models to detect and measure the cards in the image.