Module 9 Project and Applications - iffatAGheyas/computer-vision-handbook GitHub Wiki

πŸ› οΈ Module 9: Project & Applications

πŸ” Mini-Project: β€œDog vs. Person” Detector with YOLOv8


πŸ“Œ 1. Project Definition

🎯 Goal

Build a lightweight, real-time object detector that can distinguish dogs from people using YOLOv8.

πŸ’‘ Why YOLOv8?

  • Pretrained backbones offer strong accuracy out-of-the-box
  • The β€œnano” variant (yolov8n.pt) runs efficiently on CPU or lightweight GPU setups
  • Supports transfer learning with a clean API (ultralytics)

🧠 Approach

Use transfer learning:

  • Freeze the majority of the YOLO backbone
  • Fine-tune the detection head on your custom dataset

πŸ“ 2. Dataset Selection & Annotation

πŸ“‚ Data Source

  • 50 training + 16 validation images per class (dog, person)
  • 6 test images held out for final evaluation

πŸ”— Annotation Tool

Used MakeSense.ai to:

  • Draw bounding boxes
  • Export annotations in YOLO TXT format

πŸ–ΌοΈ Sample Image Annotation

Below is an example of how the annotation interface looked:
image

πŸ“ Example YOLO-Format Annotation

1 0.471743 0.335897 0.231190 0.145299
0 0.714493 0.629060 0.238896 0.588034

This file has two lines because two objects were annotated in the image.

Column Meaning Notes
1st class_id Integer index of the object’s class: 0 β†’ person, 1 β†’ dog
2nd x_center_norm x-coordinate of the box center, normalized to image width (0–1)
3rd y_center_norm y-coordinate of the box center, normalized to image height (0–1)
4th width_norm Width of the bounding box, normalized to image width (0–1)
5th height_norm Height of the bounding box, normalized to image height (0–1)

What That Means

  • Line 1: 1 0.471743 0.335897 0.231190 0.145299

    • Class = 1 β†’ dog
    • Center at (0.4717 Γ— W, 0.3359 Γ— H)
    • Box size = (0.2312 Γ— W, 0.1453 Γ— H)
  • Line 2: 0 0.714493 0.629060 0.238896 0.588034

    • Class = 0 β†’ person
    • Center at (0.7145 Γ— W, 0.6291 Γ— H)
    • Box size = (0.2389 Γ— W, 0.5880 Γ— H)

All coordinates are fractions of the full image dimensions, which makes the same .txt file usable regardless of the actual pixel size of each image.

πŸ“ Directory Layout

β”œβ”€β”€ trainingset/    # 50 training images
β”œβ”€β”€ labels/         # 50 YOLO-format .txt files
β”œβ”€β”€ val/            # 16 validation images
β”œβ”€β”€ labels_val/     # 16 matching .txt files
β”œβ”€β”€ test/           # 6 final test images
└── data.yaml       # defines class names and folder paths

## πŸ§ͺ 3. Model Training, Evaluation & Optimization

### πŸ”§ Model Setup
- **Base model:** `yolov8n.pt` (nano)  
- **Training strategy:** freeze first 10 layers  
- **Classes:** `["person", "dog"]`  

### πŸ“Š Hyperparameters
- **Epochs:** `30`  
- **Batch size:** `8`  
- **Image size:** `640Γ—640`  
- **Optimizer / Loss:** YOLOv8 defaults  
- **Output weights saved to:**  
  `runs/freeze_backbone/weights/best.pt`  

### πŸ’» Full Code
```python
# train_freeze_yolo.py

import os, glob, shutil
from ultralytics import YOLO
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
from PIL import Image

# ────────────────────────────────────────────────────────────────────────────
# 0) YOUR DIRECTORY STRUCTURE
# ────────────────────────────────────────────────────────────────────────────
# images-only folders:
TRAIN_IMG_DIR = "trainingset"   # i1.jpg…i50.jpg
VAL_IMG_DIR   = "val"           # v1.jpg…v16.jpg
TEST_DIR      = "test"          # d1.jpg…d6.jpg

# label–only folders (from makesense.ai):
TRAIN_LBL_DIR = "labels"        # i1.txt…i50.txt
VAL_LBL_DIR   = "labels_val"    # v1.txt…v16.txt

PDF_OUT       = "outputs.pdf"   # final multipage PDF

# class names must match the 0/1 indices in your .txt files:
#   0 β†’ person  
#   1 β†’ dog
CLASS_NAMES = ["person","dog"]
# ────────────────────────────────────────────────────────────────────────────

# 1) COPY LABELS INTO THE CORRECT IMAGE FOLDERS
os.makedirs(TRAIN_IMG_DIR, exist_ok=True)
os.makedirs(VAL_IMG_DIR,   exist_ok=True)

for src in glob.glob(os.path.join(TRAIN_LBL_DIR, "*.txt")):
    dst = os.path.join(TRAIN_IMG_DIR, os.path.basename(src))
    shutil.copy(src, dst)

for src in glob.glob(os.path.join(VAL_LBL_DIR, "*.txt")):
    dst = os.path.join(VAL_IMG_DIR, os.path.basename(src))
    shutil.copy(src, dst)

# 2) WRITE data.yaml FOR YOLOv8
data_yaml = f"""
train: {TRAIN_IMG_DIR}
val:   {VAL_IMG_DIR}
nc:    2
names: {CLASS_NAMES}
""".strip()
with open("data.yaml","w") as f:
    f.write(data_yaml)

# 3) TRAIN YOLOv8-nano WITH FREEZE
model = YOLO("yolov8n.pt")  # tiny pretrained backbone
model.train(
    data     = "data.yaml",
    epochs   = 30,
    imgsz    = 640,
    batch    = 8,
    project  = "runs",
    name     = "freeze_backbone",
    exist_ok = True,
    # freeze the first 10 layers of the backbone
    freeze   = [10],
)

best_weights = os.path.join("runs","freeze_backbone","weights","best.pt")

# 4) BATCH-INFERENCE ON test/ β†’ outputs.pdf
detector = YOLO(best_weights)
with PdfPages(PDF_OUT) as pdf:
    for img_path in sorted(glob.glob(os.path.join(TEST_DIR, "*.*"))):
        if not img_path.lower().endswith((".jpg","jpeg","png")):
            continue

        res = detector(img_path)[0]  # detect

        # plot
        img = Image.open(img_path).convert("RGB")
        fig, ax = plt.subplots(figsize=(6,6))
        ax.imshow(img); ax.axis("off")

        # draw boxes+labels
        for box in res.boxes:
            x1,y1,x2,y2 = box.xyxy[0].cpu().numpy()
            cls_id      = int(box.cls[0].cpu().numpy())
            label       = CLASS_NAMES[cls_id]
            rect = plt.Rectangle((x1,y1), x2-x1, y2-y1,
                                 fill=False, edgecolor="red", linewidth=2)
            ax.add_patch(rect)
            ax.text(x1, y1-6, label,
                    color="white", backgroundcolor="red",
                    fontsize=10, weight="bold")

        # page title = comma-list of predicted classes (or β€œnone”)
        preds = sorted({ CLASS_NAMES[int(b.cls[0])] for b in res.boxes })
        title = ", ".join(preds) if preds else "none"
        ax.set_title(f"{os.path.basename(img_path)} β†’ {title}", fontsize=12)

        pdf.savefig(fig, bbox_inches="tight")
        plt.close(fig)

print("βœ… Done β€” results written to", PDF_OUT)

πŸ–ΌοΈ Visualization of Test Outputs

The YOLOv8 inference script produced a multi-page annotated PDF (outputs.pdf) showcasing predictions on the 6 held-out test images. Each page contains:

  • βœ… The original test image
  • πŸŸ₯ Red bounding boxes drawn around detected objects
  • 🏷️ Class labels and predictions (e.g., dog, person)
  • πŸ“„ A page title summarizing the filename and detected classes

πŸ“Œ Example Pages from outputs.pdf

Filename Predicted Classes Visual Quality
d1.jpg βœ… dog, βœ… person Correct and clean
d2.jpg βœ… dog, βœ… person Correct and clean
d3.jpg βœ… person One object missing (dog not detected)
d4.jpg βœ… dog, βœ… person Accurate
d5.jpg βœ… dog, βœ… person Accurate
d6.jpg βœ… dog, ⚠️ person (mislabeled) One label appeared as β€œdog on” due to overlapping label rendering

πŸ“Έ Annotated Results from YOLOv8

Below are the fully annotated test images produced by our trained YOLOv8 detector, showing predicted classes, bounding boxes, and labels on the 6 held-out test examples:

image image image image image image

πŸ” Observations

  • The detector performed well on most test cases, correctly detecting both classes.
  • In one case (d3.jpg), it missed the dog.
  • In another (d6.jpg), it rendered overlapping text, showing "dog on" instead of "person" β€” a text overlay bug, not a detection failure.

πŸ“‚ Output File

Your full visualization is saved as:

outputs.pdf

βœ… Summary

Step Details
Model YOLOv8n (nano) pretrained on COCO
Classes dog, person
Training data 50 training + 16 validation images per class
Test data 6 images (final evaluation)
Annotation tool MakeSense.ai
Output Annotated predictions PDF (outputs.pdf)
Goal Achieved βœ… Real-time detector trained via transfer learning