SSD - iffatAGheyas/computer-vision-handbook GitHub Wiki
🚀 SSD-MobileNet v2 — Object Detection with TensorFlow Hub
Single-Shot Multibox Detector (SSD) is a deep-learning model designed for fast, single-pass object detection. When paired with a lightweight backbone like MobileNet v2, it delivers real-time performance even on CPU-only systems.
🧠 What Is SSD?
SSD predicts what objects are present and where they are (bounding boxes) in a single forward pass. This makes it highly efficient and suitable for real-time applications :contentReference[oaicite:1]{index=1}.
⚙️ How SSD Works
-
Feature Extraction
A CNN backbone (e.g., MobileNet v2) processes the input image to produce multiple feature maps at different resolutions. -
Box & Class Predictions
Each cell in each feature map predicts:- One or more bounding boxes
- Corresponding class probability scores
-
Non-Maximum Suppression (NMS)
Overlapping detections are filtered so that only the highest-confidence box remains for each object.
✅ SSD is lightweight and performs well even on CPUs. :contentReference[oaicite:2]{index=2}
🛠️ Run SSD-MobileNet v2 Using TensorFlow Hub
Step 1: Install Required Packages
pip install --quiet tensorflow tensorflow-hub matplotlib opencv-python
Step 2: Full Example Code
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import cv2
import matplotlib.pyplot as plt
import os
# COCO label map (1-based indices)
COCO_LABELS = {
1: 'person', 2: 'bicycle', 3: 'car', 4: 'motorcycle', 5: 'airplane',
6: 'bus', 7: 'train', 8: 'truck', 9: 'boat', 10: 'traffic light',
11: 'fire hydrant', 13: 'stop sign', 14: 'parking meter', 15: 'bench',
16: 'bird', 17: 'cat', 18: 'dog', 19: 'horse', 20: 'sheep',
21: 'cow', 22: 'elephant', 23: 'bear', 24: 'zebra', 25: 'giraffe',
27: 'backpack', 28: 'umbrella', 31: 'handbag', 32: 'tie', 33: 'suitcase',
34: 'frisbee', 35: 'skis', 36: 'snowboard', 37: 'sports ball',
38: 'kite', 39: 'baseball bat', 40: 'baseball glove', 41: 'skateboard',
42: 'surfboard', 43: 'tennis racket', 44: 'bottle', 46: 'wine glass',
47: 'cup', 48: 'fork', 49: 'knife', 50: 'spoon', 51: 'bowl',
52: 'banana', 53: 'apple', 54: 'sandwich', 55: 'orange', 56: 'broccoli',
57: 'carrot', 58: 'hot dog', 59: 'pizza', 60: 'donut', 61: 'cake',
62: 'chair', 63: 'couch', 64: 'potted plant', 65: 'bed', 67: 'dining table',
70: 'toilet', 72: 'tv', 73: 'laptop', 74: 'mouse', 75: 'remote',
76: 'keyboard', 77: 'cell phone', 78: 'microwave', 79: 'oven',
80: 'toaster', 81: 'sink', 82: 'refrigerator', 84: 'book',
85: 'clock', 86: 'vase', 87: 'scissors', 88: 'teddy bear',
89: 'hair drier', 90: 'toothbrush'
}
# 1) Load the model
print("Loading SSD-MobileNet v2 from TF-Hub…")
model = hub.load("https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2")
detector = model.signatures['serving_default']
# 2) Read & prep the image
image_path = "frame1.png" # ← replace with your file
if not os.path.isfile(image_path):
raise FileNotFoundError(f"File not found: {image_path}")
img_bytes = tf.io.read_file(image_path)
orig = tf.image.decode_image(img_bytes, channels=3) # uint8 [H,W,3]
orig = tf.image.convert_image_dtype(orig, tf.uint8) # ensure uint8
h, w = orig.shape[0], orig.shape[1]
# add batch dimension: [1, H, W, 3]
input_tensor = orig[tf.newaxis, ...]
# 3) Run inference
print("Running detection…")
outputs = detector(input_tensor)
# 3a) See what keys we actually got
print("Output keys:", list(outputs.keys()))
# 4) Parse outputs
num = int(outputs['num_detections'][0].numpy())
boxes = outputs['detection_boxes'][0].numpy()[:num] # [ymin, xmin, ymax, xmax]
scores = outputs['detection_scores'][0].numpy()[:num]
class_ids = outputs['detection_classes'][0].numpy().astype(int)[:num]
# 5) Draw boxes & class names on an OpenCV image
img_bgr = cv2.cvtColor(orig.numpy(), cv2.COLOR_RGB2BGR)
box_color = (255, 0, 0) # blue in BGR
text_color= (255, 0, 0)
for box, score, cid in zip(boxes, scores, class_ids):
if score < 0.3: # confidence threshold
continue
cls_name = COCO_LABELS.get(cid, f'ID {cid}')
ymin, xmin, ymax, xmax = box
pt1 = (int(xmin * w), int(ymin * h))
pt2 = (int(xmax * w), int(ymax * h))
# draw box & label
cv2.rectangle(img_bgr, pt1, pt2, box_color, 2)
cv2.putText(
img_bgr,
f"{cls_name}: {score:.2f}",
(pt1[0], pt1[1] - 10),
cv2.FONT_HERSHEY_SIMPLEX,
0.5,
text_color,
2
)
# 6) Display with matplotlib
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 8))
plt.imshow(img_rgb)
plt.axis("off")
plt.title("SSD-MobileNet v2 Detection (class names, blue boxes)")
plt.show()
👀 What You’ll See
-
Blue/red boxes around detected objects
-
Class names like person, traffic light, bird, etc.
-
Confidence scores (e.g., person: 0.92, dog: 0.84)
🖼️ Example Output
✅ Summary Table
Feature | Description |
---|---|
📦 Model | SSD-MobileNet v2 |
🧠 Framework | TensorFlow Hub (auto-downloads weights) |
🎯 Accuracy | Moderate (good for fast use cases) |
⚡ Speed | ✅ Real-time (~20–30 FPS on CPU/GPU) |
🔍 Classes | COCO dataset (90 classes: people, animals, objects) |
🔧 Use Case | Real-time detection, mobile apps, robotics |