Training - mariodiasbatista/object-detection GitHub Wiki

🧠 What is YOLO Training?

YOLO is a real-time object detection model that learns to identify and locate objects in images or videos. Training YOLO means teaching it to recognize specific objects (like people, cars, or animals) based on labeled examples — images with known bounding boxes and classes.


🎯 Why Training Matters

YOLO doesn’t come out of the box knowing what a dog or a traffic sign looks like. It has to learn from data:

  1. Images — a variety of visual examples of the objects.
  2. Annotations — the coordinates (bounding boxes) showing where objects are, and labels showing what they are.

Without training:

  • The model cannot generalize or detect anything new.
  • Pre-trained models may not detect custom objects relevant to your use case (e.g., helmets, medical tools, defects in parts).

🛠️ What Happens During Training?

During training, YOLO:

  1. Loads batches of images and their labels.
  2. Makes predictions of where objects might be and what they are.
  3. Compares its predictions to the ground truth.
  4. Updates its weights via backpropagation to improve accuracy.

This process repeats for many epochs, gradually improving the model’s understanding.


⚖️ Loss Function: The Learning Compass

YOLO uses a combined loss function that considers:

  • Localization loss (how far predicted boxes are from the real ones)
  • Confidence loss (is there really an object there?)
  • Classification loss (did it guess the right class?)

This ensures the model learns not just to detect where but also what.


🚀 Why It’s Important to Detection

Training is what turns YOLO into a smart detector. Here's why it’s so critical:

Benefit Explanation
🎓 Learning Custom Classes You can teach YOLO to detect anything — even unusual or domain-specific objects.
🎯 Accuracy Boost Tailored training improves performance over generic pre-trained models.
📦 Optimized for Your Data You can adjust input sizes, anchors, and augmentations for your exact scenario.
🔍 Generalization A well-trained YOLO model can detect objects it’s never seen before — if they’re similar enough to the training set.

🔄 Fine-tuning vs. Training from Scratch

  • Fine-tuning (recommended): Start from a pre-trained model and adapt it to your data. Faster and more effective.
  • Training from scratch: Used for very unique tasks but needs lots of data and compute power.

In short, training is what gives YOLO its "eyes" — without it, detection is just a guess. With it, YOLO becomes a powerful tool for extracting structure and meaning from images in real-time.