Building a Loss Prevention at Self‐Checkout Retail Applications with Computer Vision and DLStreamer: A Beginner's Guide - intel-retail/loss-prevention GitHub Wiki

Introduction:

Computer vision is changing the way we interact with the world. But beyond the buzzwords, what does it actually take to bring a vision-based system to life? And how can developers start building real-world applications without reinventing the wheel?

In this article, we'll walk you through a practical use case in retail: loss prevention at self-checkout using computer vision and deep learning. This tutorial is beginner friendly and built around an open-source project and Intel® Distribution of OpenVINO™ toolkit. This solution also leverages Intel DLStreamer which is a powerful tool designed to make AI inference at the edge fast and scalable.

We'll break down the key components you need to know:

What is Computer Vision? We'll explain how machines "see" using deep learning models like YOLO for object detection and EfficientNet for image classification.
Using DLStreamer for Real-Time Inference Learn how to deploy these models efficiently using DLStreamer, a streaming pipeline toolkit optimized for edge devices.
Extracting Inference Metadata Discover how to pull out actionable insights like tracking objects inside a region of interest (ROI) to build smart business logic such as detecting theft or item misplacement.

Whether you're just starting out in AI or looking for a real-world computer vision project to dig into, this article will give you a solid foundation and hands-on tools to start building.

What is Computer Vision?

Computer vision is a field of artificial intelligence that enables computers to understand and interpret visual information from the world, much like humans do.

For example, imagine a camera looking at a store shelf. With computer vision, the system can tell if there are any missing products, whether someone picked up an item, or even if an item was returned to the wrong place. This ability comes from training models on thousands (or millions) of labeled images so they can learn to recognize patterns, objects, and actions.

To make this possible, deep learning plays a key role. Deep learning models, especially those based on neural networks, are excellent at identifying complex visual patterns. These models are trained to spot things like people, shopping baskets, or specific products by learning from massive datasets.

Two of the most popular types of models used in computer vision are:

YOLO (You Only Look Once)

YOLO is an open-source object detection model that can detect and locate multiple objects in a single image very quickly. It draws bounding boxes around things like people, bags, or products and labels them with a class name and confidence score. YOLO is perfect for real-time applications because it's both fast and accurate.

EfficientNet

EfficientNet is a deep learning model used for image classification. Instead of detecting where an object is, it's used to determine what the object is. For example, it can take an image of a product and classify it as a "soda bottle" or "water bottle" with high accuracy. It's called "efficient" because it delivers strong results without needing a ton of computing power.

But recognizing things in a video is only part of the puzzle. The next step is figuring out how to run these models efficiently, especially when you're dealing with real-time video feeds. That’s where DLStreamer comes in.

Setting Up the Models

Before we jump into running DLStreamer pipelines, we need to make sure the required AI models are downloaded and ready to use. In our project, we use:

YOLOv8 for object detection
EfficientNet for image classification

These models have been converted to OpenVINO format so they can run efficiently on Intel hardware using DLStreamer.

To make setup easier, the repository includes a Makefile with a target that will download both models automatically.

Step 1: Download the Models

From the root of the repository, simply run:

make download-models

This command will execute the script located at download_models/downloadModels.sh, which pulls the pre-converted models into the appropriate directory structure so they're ready to be used in the pipeline.

Running AI Models Efficiently with DLStreamer

Once you have a trained model like YOLO or EfficientNet, the next big question is: how do you actually run it in a real application, especially one that needs to process video in real time?

This is where DLStreamer comes into play.

DLStreamer is an open-source framework developed by Intel to simplify and accelerate the deployment of deep learning pipelines. It's built on top of GStreamer, a multimedia framework widely used for handling video and audio streams. What DLStreamer adds is a set of smart plugins that can handle AI inference, video decoding, tracking, post-processing, and more.

DLStreamer pipelines are built using GStreamer commands that define how a video flows through a series of processing elements—from decoding the input, running inference, to publishing metadata.

Let's take a look at a real example and break it down step by step:

gst-launch-1.0 http://example.com/video.mp4 ! decodebin ! \
gvadetect ame=detection model=models/object_detection/yolov5s/FP16-INT8/yolov5s.xml \
model-proc=models/object_detection/yolov5s/yolov5s.json threshold=.5 device=CPU !

This pipeline uses gst-launch-1.0 to:

stream a video from a URL (http://example.com/video.mp4)
decode the video using decodebin
run object detection on each frame using the gvadetect element

The detection model used is a YOLOv5s network in OpenVINO format, specified by the .xml file, along with a model processing configuration (model-proc.json) that defines how to interpret the model inputs and outputs. The pipeline filters detections based on a confidence threshold of 0.5 and runs inference on the CPU. This setup allows for real-time object detection directly from an online video source using DLStreamer.

We have build a more complex pipeline using region of interest, classification and MQTT to publish inference metadata:

🔗 https://github.com/intel-retail/loss-prevention/blob/main/src/pipelines/yolov8s_roi.sh

Extracting Inference Metadata

Once an object detection model runs on a video frame, it produces valuable metadata such as the type of object detected, where it is in the frame, and how confident the model is. But to make this useful for real-world applications, we need a way to access that metadata outside of the video pipeline.

This is where gvametapublish comes in. It's a DLStreamer element that allows you to publish inference results to external systems, and one of the most useful publishing methods is MQTT.

Here's a simplified version of how it works:

gvametapublish method=mqtt address=localhost:1883 topic=event/detection mqtt-client-id=yolopipeline

In this example:

method=mqtt tells gvametapublish to use MQTT as the transport.
address is the address of your MQTT broker (local or cloud-based).
topic defines where the metadata will be published.
mqtt-client-id is a unique ID for the pipeline instance.

Once enabled, every detection result is published as a JSON message to the specified MQTT topic.

Build your Business Application

This setup allows you to decouple the computer vision pipeline from the decision-making logic. For example:

Count how many people entered a certain area.
flag suspicious behavior, like picking up an item but not placing it in the basket.

You can write a lightweight MQTT subscriber in Python or any language with MQTT support, and trigger custom logic based on the incoming metadata.

For reference, we have provided a business logic application that receives data from MQTT and track which item is in each region of interest. As well as flag suspicious behavior when an item is in the bagging area and gets removed:

🔗 https://github.com/intel-retail/loss-prevention/blob/main/src/app/loss_prevention.py

Try It Yourself

To run the Loss prevention application, follow these simple steps:

Step 1: Clone the repository

git clone https://github.com/intel-retail/loss-prevention.git

Step 2: Navigate to the folder and run the following command

cd loss-prevention

RTSP=1 make run-demo

Step 3: Open Grafana Dashboard

🔗 Grafana

Step 4: Analyze the Running Containers

docker ps

Step 5: Stop the App

make down