Inference - mariodiasbatista/object-detection GitHub Wiki

🔍 What is Inference in YOLO?

Inference in YOLO (You Only Look Once) is the process where a trained model is used to make predictions — it analyzes new images or video frames and identifies objects in real time.

It’s what happens after training — the model has already learned, and now it applies that knowledge.

🧠 How It Works

During inference, YOLO takes in an input (image or video frame) and:

Divides the image into a grid.
Predicts bounding boxes and confidence scores for each grid cell.
Identifies the object class for each box.
Applies non-max suppression (NMS) to remove overlapping boxes and retain the best predictions.

🗂️ What Does It Output?

YOLO returns:

🟩 Bounding Boxes — coordinates of detected objects.
🏷️ Class Labels — e.g., “person”, “car”, “dog”.
📊 Confidence Scores — how sure the model is about its predictions.

This data can be used to draw boxes on the image or for further processing (e.g., counting objects, triggering events, etc.).

⚙️ Inference Flow

INPUT (Image/Frame)
      ↓
Preprocessing (resize, normalize)
      ↓
Model Prediction (YOLO forward pass)
      ↓
Postprocessing (NMS, thresholding)
      ↓
OUTPUT (Detected objects with boxes, labels, and scores)