Module 9 Project and Applications - iffatAGheyas/computer-vision-handbook GitHub Wiki
π οΈ Module 9: Project & Applications
π Mini-Project: βDog vs. Personβ Detector with YOLOv8
π 1. Project Definition
π― Goal
Build a lightweight, real-time object detector that can distinguish dogs from people using YOLOv8.
π‘ Why YOLOv8?
- Pretrained backbones offer strong accuracy out-of-the-box
- The βnanoβ variant (
yolov8n.pt
) runs efficiently on CPU or lightweight GPU setups - Supports transfer learning with a clean API (
ultralytics
)
π§ Approach
Use transfer learning:
- Freeze the majority of the YOLO backbone
- Fine-tune the detection head on your custom dataset
π 2. Dataset Selection & Annotation
π Data Source
- 50 training + 16 validation images per class (
dog
,person
) - 6 test images held out for final evaluation
π Annotation Tool
Used MakeSense.ai to:
- Draw bounding boxes
- Export annotations in YOLO TXT format
πΌοΈ Sample Image Annotation
Below is an example of how the annotation interface looked:
π Example YOLO-Format Annotation
1 0.471743 0.335897 0.231190 0.145299
0 0.714493 0.629060 0.238896 0.588034
This file has two lines because two objects were annotated in the image.
Column | Meaning | Notes |
---|---|---|
1st | class_id |
Integer index of the objectβs class: 0 β person, 1 β dog |
2nd | x_center_norm |
x-coordinate of the box center, normalized to image width (0β1) |
3rd | y_center_norm |
y-coordinate of the box center, normalized to image height (0β1) |
4th | width_norm |
Width of the bounding box, normalized to image width (0β1) |
5th | height_norm |
Height of the bounding box, normalized to image height (0β1) |
What That Means
-
Line 1:
1 0.471743 0.335897 0.231190 0.145299
- Class =
1
β dog - Center at
(0.4717 Γ W, 0.3359 Γ H)
- Box size =
(0.2312 Γ W, 0.1453 Γ H)
- Class =
-
Line 2:
0 0.714493 0.629060 0.238896 0.588034
- Class =
0
β person - Center at
(0.7145 Γ W, 0.6291 Γ H)
- Box size =
(0.2389 Γ W, 0.5880 Γ H)
- Class =
All coordinates are fractions of the full image dimensions, which makes the same .txt
file usable regardless of the actual pixel size of each image.
π Directory Layout
βββ trainingset/ # 50 training images
βββ labels/ # 50 YOLO-format .txt files
βββ val/ # 16 validation images
βββ labels_val/ # 16 matching .txt files
βββ test/ # 6 final test images
βββ data.yaml # defines class names and folder paths
## π§ͺ 3. Model Training, Evaluation & Optimization
### π§ Model Setup
- **Base model:** `yolov8n.pt` (nano)
- **Training strategy:** freeze first 10 layers
- **Classes:** `["person", "dog"]`
### π Hyperparameters
- **Epochs:** `30`
- **Batch size:** `8`
- **Image size:** `640Γ640`
- **Optimizer / Loss:** YOLOv8 defaults
- **Output weights saved to:**
`runs/freeze_backbone/weights/best.pt`
### π» Full Code
```python
# train_freeze_yolo.py
import os, glob, shutil
from ultralytics import YOLO
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
from PIL import Image
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# 0) YOUR DIRECTORY STRUCTURE
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# images-only folders:
TRAIN_IMG_DIR = "trainingset" # i1.jpgβ¦i50.jpg
VAL_IMG_DIR = "val" # v1.jpgβ¦v16.jpg
TEST_DIR = "test" # d1.jpgβ¦d6.jpg
# labelβonly folders (from makesense.ai):
TRAIN_LBL_DIR = "labels" # i1.txtβ¦i50.txt
VAL_LBL_DIR = "labels_val" # v1.txtβ¦v16.txt
PDF_OUT = "outputs.pdf" # final multipage PDF
# class names must match the 0/1 indices in your .txt files:
# 0 β person
# 1 β dog
CLASS_NAMES = ["person","dog"]
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# 1) COPY LABELS INTO THE CORRECT IMAGE FOLDERS
os.makedirs(TRAIN_IMG_DIR, exist_ok=True)
os.makedirs(VAL_IMG_DIR, exist_ok=True)
for src in glob.glob(os.path.join(TRAIN_LBL_DIR, "*.txt")):
dst = os.path.join(TRAIN_IMG_DIR, os.path.basename(src))
shutil.copy(src, dst)
for src in glob.glob(os.path.join(VAL_LBL_DIR, "*.txt")):
dst = os.path.join(VAL_IMG_DIR, os.path.basename(src))
shutil.copy(src, dst)
# 2) WRITE data.yaml FOR YOLOv8
data_yaml = f"""
train: {TRAIN_IMG_DIR}
val: {VAL_IMG_DIR}
nc: 2
names: {CLASS_NAMES}
""".strip()
with open("data.yaml","w") as f:
f.write(data_yaml)
# 3) TRAIN YOLOv8-nano WITH FREEZE
model = YOLO("yolov8n.pt") # tiny pretrained backbone
model.train(
data = "data.yaml",
epochs = 30,
imgsz = 640,
batch = 8,
project = "runs",
name = "freeze_backbone",
exist_ok = True,
# freeze the first 10 layers of the backbone
freeze = [10],
)
best_weights = os.path.join("runs","freeze_backbone","weights","best.pt")
# 4) BATCH-INFERENCE ON test/ β outputs.pdf
detector = YOLO(best_weights)
with PdfPages(PDF_OUT) as pdf:
for img_path in sorted(glob.glob(os.path.join(TEST_DIR, "*.*"))):
if not img_path.lower().endswith((".jpg","jpeg","png")):
continue
res = detector(img_path)[0] # detect
# plot
img = Image.open(img_path).convert("RGB")
fig, ax = plt.subplots(figsize=(6,6))
ax.imshow(img); ax.axis("off")
# draw boxes+labels
for box in res.boxes:
x1,y1,x2,y2 = box.xyxy[0].cpu().numpy()
cls_id = int(box.cls[0].cpu().numpy())
label = CLASS_NAMES[cls_id]
rect = plt.Rectangle((x1,y1), x2-x1, y2-y1,
fill=False, edgecolor="red", linewidth=2)
ax.add_patch(rect)
ax.text(x1, y1-6, label,
color="white", backgroundcolor="red",
fontsize=10, weight="bold")
# page title = comma-list of predicted classes (or βnoneβ)
preds = sorted({ CLASS_NAMES[int(b.cls[0])] for b in res.boxes })
title = ", ".join(preds) if preds else "none"
ax.set_title(f"{os.path.basename(img_path)} β {title}", fontsize=12)
pdf.savefig(fig, bbox_inches="tight")
plt.close(fig)
print("β
Done β results written to", PDF_OUT)
πΌοΈ Visualization of Test Outputs
The YOLOv8 inference script produced a multi-page annotated PDF (outputs.pdf
) showcasing predictions on the 6 held-out test images. Each page contains:
- β The original test image
- π₯ Red bounding boxes drawn around detected objects
- π·οΈ Class labels and predictions (e.g.,
dog
,person
) - π A page title summarizing the filename and detected classes
outputs.pdf
π Example Pages from Filename | Predicted Classes | Visual Quality |
---|---|---|
d1.jpg |
β dog, β person | Correct and clean |
d2.jpg |
β dog, β person | Correct and clean |
d3.jpg |
β person | One object missing (dog not detected) |
d4.jpg |
β dog, β person | Accurate |
d5.jpg |
β dog, β person | Accurate |
d6.jpg |
β dog, β οΈ person (mislabeled) | One label appeared as βdog onβ due to overlapping label rendering |
πΈ Annotated Results from YOLOv8
Below are the fully annotated test images produced by our trained YOLOv8 detector, showing predicted classes, bounding boxes, and labels on the 6 held-out test examples:
π Observations
- The detector performed well on most test cases, correctly detecting both classes.
- In one case (
d3.jpg
), it missed the dog. - In another (
d6.jpg
), it rendered overlapping text, showing"dog on"
instead of"person"
β a text overlay bug, not a detection failure.
π Output File
Your full visualization is saved as:
outputs.pdf
β Summary
Step | Details |
---|---|
Model | YOLOv8n (nano) pretrained on COCO |
Classes | dog, person |
Training data | 50 training + 16 validation images per class |
Test data | 6 images (final evaluation) |
Annotation tool | MakeSense.ai |
Output | Annotated predictions PDF (outputs.pdf ) |
Goal Achieved | β Real-time detector trained via transfer learning |