Inference - mariodiasbatista/object-detection GitHub Wiki

πŸ” What is Inference in YOLO?

Inference in YOLO (You Only Look Once) is the process where a trained model is used to make predictions β€” it analyzes new images or video frames and identifies objects in real time.

It’s what happens after training β€” the model has already learned, and now it applies that knowledge.


🧠 How It Works

During inference, YOLO takes in an input (image or video frame) and:

  1. Divides the image into a grid.
  2. Predicts bounding boxes and confidence scores for each grid cell.
  3. Identifies the object class for each box.
  4. Applies non-max suppression (NMS) to remove overlapping boxes and retain the best predictions.

πŸ—‚οΈ What Does It Output?

YOLO returns:

  • 🟩 Bounding Boxes β€” coordinates of detected objects.
  • 🏷️ Class Labels β€” e.g., β€œperson”, β€œcar”, β€œdog”.
  • πŸ“Š Confidence Scores β€” how sure the model is about its predictions.

This data can be used to draw boxes on the image or for further processing (e.g., counting objects, triggering events, etc.).


βš™οΈ Inference Flow

INPUT (Image/Frame)
      ↓
Preprocessing (resize, normalize)
      ↓
Model Prediction (YOLO forward pass)
      ↓
Postprocessing (NMS, thresholding)
      ↓
OUTPUT (Detected objects with boxes, labels, and scores)