Module 9 Project and Applications - iffatAGheyas/computer-vision-handbook GitHub Wiki

🛠️ Module 9: Project & Applications

🔍 Mini-Project: “Dog vs. Person” Detector with YOLOv8

📌 1. Project Definition

🎯 Goal

Build a lightweight, real-time object detector that can distinguish dogs from people using YOLOv8.

💡 Why YOLOv8?

Pretrained backbones offer strong accuracy out-of-the-box
The “nano” variant (yolov8n.pt) runs efficiently on CPU or lightweight GPU setups
Supports transfer learning with a clean API (ultralytics)

🧠 Approach

Use transfer learning:

Freeze the majority of the YOLO backbone
Fine-tune the detection head on your custom dataset

📁 2. Dataset Selection & Annotation

📂 Data Source

50 training + 16 validation images per class (dog, person)
6 test images held out for final evaluation

🔗 Annotation Tool

Used MakeSense.ai to:

Draw bounding boxes
Export annotations in YOLO TXT format

🖼️ Sample Image Annotation

Below is an example of how the annotation interface looked:

📝 Example YOLO-Format Annotation

1 0.471743 0.335897 0.231190 0.145299
0 0.714493 0.629060 0.238896 0.588034

This file has two lines because two objects were annotated in the image.

Column	Meaning	Notes
1st	`class_id`	Integer index of the object’s class: `0` → person, `1` → dog
2nd	`x_center_norm`	x-coordinate of the box center, normalized to image width (0–1)
3rd	`y_center_norm`	y-coordinate of the box center, normalized to image height (0–1)
4th	`width_norm`	Width of the bounding box, normalized to image width (0–1)
5th	`height_norm`	Height of the bounding box, normalized to image height (0–1)

What That Means

Line 1: 1 0.471743 0.335897 0.231190 0.145299
- Class = 1 → dog
- Center at (0.4717 × W, 0.3359 × H)
- Box size = (0.2312 × W, 0.1453 × H)
Line 2: 0 0.714493 0.629060 0.238896 0.588034
- Class = 0 → person
- Center at (0.7145 × W, 0.6291 × H)
- Box size = (0.2389 × W, 0.5880 × H)

All coordinates are fractions of the full image dimensions, which makes the same .txt file usable regardless of the actual pixel size of each image.

📁 Directory Layout

├── trainingset/    # 50 training images
├── labels/         # 50 YOLO-format .txt files
├── val/            # 16 validation images
├── labels_val/     # 16 matching .txt files
├── test/           # 6 final test images
└── data.yaml       # defines class names and folder paths

## 🧪 3. Model Training, Evaluation & Optimization

### 🔧 Model Setup
- **Base model:** `yolov8n.pt` (nano)  
- **Training strategy:** freeze first 10 layers  
- **Classes:** `["person", "dog"]`  

### 📊 Hyperparameters
- **Epochs:** `30`  
- **Batch size:** `8`  
- **Image size:** `640×640`  
- **Optimizer / Loss:** YOLOv8 defaults  
- **Output weights saved to:**  
  `runs/freeze_backbone/weights/best.pt`  

### 💻 Full Code
```python
# train_freeze_yolo.py

import os, glob, shutil
from ultralytics import YOLO
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
from PIL import Image

# ────────────────────────────────────────────────────────────────────────────
# 0) YOUR DIRECTORY STRUCTURE
# ────────────────────────────────────────────────────────────────────────────
# images-only folders:
TRAIN_IMG_DIR = "trainingset"   # i1.jpg…i50.jpg
VAL_IMG_DIR   = "val"           # v1.jpg…v16.jpg
TEST_DIR      = "test"          # d1.jpg…d6.jpg

# label–only folders (from makesense.ai):
TRAIN_LBL_DIR = "labels"        # i1.txt…i50.txt
VAL_LBL_DIR   = "labels_val"    # v1.txt…v16.txt

PDF_OUT       = "outputs.pdf"   # final multipage PDF

# class names must match the 0/1 indices in your .txt files:
#   0 → person  
#   1 → dog
CLASS_NAMES = ["person","dog"]
# ────────────────────────────────────────────────────────────────────────────

# 1) COPY LABELS INTO THE CORRECT IMAGE FOLDERS
os.makedirs(TRAIN_IMG_DIR, exist_ok=True)
os.makedirs(VAL_IMG_DIR,   exist_ok=True)

for src in glob.glob(os.path.join(TRAIN_LBL_DIR, "*.txt")):
    dst = os.path.join(TRAIN_IMG_DIR, os.path.basename(src))
    shutil.copy(src, dst)

for src in glob.glob(os.path.join(VAL_LBL_DIR, "*.txt")):
    dst = os.path.join(VAL_IMG_DIR, os.path.basename(src))
    shutil.copy(src, dst)

# 2) WRITE data.yaml FOR YOLOv8
data_yaml = f"""
train: {TRAIN_IMG_DIR}
val:   {VAL_IMG_DIR}
nc:    2
names: {CLASS_NAMES}
""".strip()
with open("data.yaml","w") as f:
    f.write(data_yaml)

# 3) TRAIN YOLOv8-nano WITH FREEZE
model = YOLO("yolov8n.pt")  # tiny pretrained backbone
model.train(
    data     = "data.yaml",
    epochs   = 30,
    imgsz    = 640,
    batch    = 8,
    project  = "runs",
    name     = "freeze_backbone",
    exist_ok = True,
    # freeze the first 10 layers of the backbone
    freeze   = [10],
)

best_weights = os.path.join("runs","freeze_backbone","weights","best.pt")

# 4) BATCH-INFERENCE ON test/ → outputs.pdf
detector = YOLO(best_weights)
with PdfPages(PDF_OUT) as pdf:
    for img_path in sorted(glob.glob(os.path.join(TEST_DIR, "*.*"))):
        if not img_path.lower().endswith((".jpg","jpeg","png")):
            continue

        res = detector(img_path)[0]  # detect

        # plot
        img = Image.open(img_path).convert("RGB")
        fig, ax = plt.subplots(figsize=(6,6))
        ax.imshow(img); ax.axis("off")

        # draw boxes+labels
        for box in res.boxes:
            x1,y1,x2,y2 = box.xyxy[0].cpu().numpy()
            cls_id      = int(box.cls[0].cpu().numpy())
            label       = CLASS_NAMES[cls_id]
            rect = plt.Rectangle((x1,y1), x2-x1, y2-y1,
                                 fill=False, edgecolor="red", linewidth=2)
            ax.add_patch(rect)
            ax.text(x1, y1-6, label,
                    color="white", backgroundcolor="red",
                    fontsize=10, weight="bold")

        # page title = comma-list of predicted classes (or “none”)
        preds = sorted({ CLASS_NAMES[int(b.cls[0])] for b in res.boxes })
        title = ", ".join(preds) if preds else "none"
        ax.set_title(f"{os.path.basename(img_path)} → {title}", fontsize=12)

        pdf.savefig(fig, bbox_inches="tight")
        plt.close(fig)

print("✅ Done — results written to", PDF_OUT)

🖼️ Visualization of Test Outputs

The YOLOv8 inference script produced a multi-page annotated PDF (outputs.pdf) showcasing predictions on the 6 held-out test images. Each page contains:

✅ The original test image
🟥 Red bounding boxes drawn around detected objects
🏷️ Class labels and predictions (e.g., dog, person)
📄 A page title summarizing the filename and detected classes

📌 Example Pages from `outputs.pdf`

Filename	Predicted Classes	Visual Quality
`d1.jpg`	✅ dog, ✅ person	Correct and clean
`d2.jpg`	✅ dog, ✅ person	Correct and clean
`d3.jpg`	✅ person	One object missing (dog not detected)
`d4.jpg`	✅ dog, ✅ person	Accurate
`d5.jpg`	✅ dog, ✅ person	Accurate
`d6.jpg`	✅ dog, ⚠️ person (mislabeled)	One label appeared as “dog on” due to overlapping label rendering

📸 Annotated Results from YOLOv8

Below are the fully annotated test images produced by our trained YOLOv8 detector, showing predicted classes, bounding boxes, and labels on the 6 held-out test examples:

🔍 Observations

The detector performed well on most test cases, correctly detecting both classes.
In one case (d3.jpg), it missed the dog.
In another (d6.jpg), it rendered overlapping text, showing "dog on" instead of "person" — a text overlay bug, not a detection failure.

📂 Output File

Your full visualization is saved as:

outputs.pdf

✅ Summary

Step	Details
Model	YOLOv8n (nano) pretrained on COCO
Classes	dog, person
Training data	50 training + 16 validation images per class
Test data	6 images (final evaluation)
Annotation tool	MakeSense.ai
Output	Annotated predictions PDF (`outputs.pdf`)
Goal Achieved	✅ Real-time detector trained via transfer learning