SSD - iffatAGheyas/computer-vision-handbook GitHub Wiki

🚀 SSD-MobileNet v2 — Object Detection with TensorFlow Hub

Single-Shot Multibox Detector (SSD) is a deep-learning model designed for fast, single-pass object detection. When paired with a lightweight backbone like MobileNet v2, it delivers real-time performance even on CPU-only systems.


🧠 What Is SSD?

SSD predicts what objects are present and where they are (bounding boxes) in a single forward pass. This makes it highly efficient and suitable for real-time applications :contentReference[oaicite:1]{index=1}.


⚙️ How SSD Works

  1. Feature Extraction
    A CNN backbone (e.g., MobileNet v2) processes the input image to produce multiple feature maps at different resolutions.

  2. Box & Class Predictions
    Each cell in each feature map predicts:

    • One or more bounding boxes
    • Corresponding class probability scores
  3. Non-Maximum Suppression (NMS)
    Overlapping detections are filtered so that only the highest-confidence box remains for each object.

✅ SSD is lightweight and performs well even on CPUs. :contentReference[oaicite:2]{index=2}


🛠️ Run SSD-MobileNet v2 Using TensorFlow Hub

Step 1: Install Required Packages

pip install --quiet tensorflow tensorflow-hub matplotlib opencv-python

Step 2: Full Example Code

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import cv2
import matplotlib.pyplot as plt
import os

# COCO label map (1-based indices)
COCO_LABELS = {
    1: 'person', 2: 'bicycle', 3: 'car', 4: 'motorcycle', 5: 'airplane',
    6: 'bus', 7: 'train', 8: 'truck', 9: 'boat', 10: 'traffic light',
    11: 'fire hydrant', 13: 'stop sign', 14: 'parking meter', 15: 'bench',
    16: 'bird', 17: 'cat', 18: 'dog', 19: 'horse', 20: 'sheep',
    21: 'cow', 22: 'elephant', 23: 'bear', 24: 'zebra', 25: 'giraffe',
    27: 'backpack', 28: 'umbrella', 31: 'handbag', 32: 'tie', 33: 'suitcase',
    34: 'frisbee', 35: 'skis', 36: 'snowboard', 37: 'sports ball',
    38: 'kite', 39: 'baseball bat', 40: 'baseball glove', 41: 'skateboard',
    42: 'surfboard', 43: 'tennis racket', 44: 'bottle', 46: 'wine glass',
    47: 'cup', 48: 'fork', 49: 'knife', 50: 'spoon', 51: 'bowl',
    52: 'banana', 53: 'apple', 54: 'sandwich', 55: 'orange', 56: 'broccoli',
    57: 'carrot', 58: 'hot dog', 59: 'pizza', 60: 'donut', 61: 'cake',
    62: 'chair', 63: 'couch', 64: 'potted plant', 65: 'bed', 67: 'dining table',
    70: 'toilet', 72: 'tv', 73: 'laptop', 74: 'mouse', 75: 'remote',
    76: 'keyboard', 77: 'cell phone', 78: 'microwave', 79: 'oven',
    80: 'toaster', 81: 'sink', 82: 'refrigerator', 84: 'book',
    85: 'clock', 86: 'vase', 87: 'scissors', 88: 'teddy bear',
    89: 'hair drier', 90: 'toothbrush'
}

# 1) Load the model
print("Loading SSD-MobileNet v2 from TF-Hub…")
model = hub.load("https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2")
detector = model.signatures['serving_default']

# 2) Read & prep the image
image_path = "frame1.png"  # ← replace with your file
if not os.path.isfile(image_path):
    raise FileNotFoundError(f"File not found: {image_path}")

img_bytes = tf.io.read_file(image_path)
orig = tf.image.decode_image(img_bytes, channels=3)               # uint8 [H,W,3]
orig = tf.image.convert_image_dtype(orig, tf.uint8)              # ensure uint8
h, w = orig.shape[0], orig.shape[1]

# add batch dimension: [1, H, W, 3]
input_tensor = orig[tf.newaxis, ...]

# 3) Run inference
print("Running detection…")
outputs = detector(input_tensor)

# 3a) See what keys we actually got
print("Output keys:", list(outputs.keys()))

# 4) Parse outputs
num        = int(outputs['num_detections'][0].numpy())
boxes      = outputs['detection_boxes'][0].numpy()[:num]       # [ymin, xmin, ymax, xmax]
scores     = outputs['detection_scores'][0].numpy()[:num]
class_ids  = outputs['detection_classes'][0].numpy().astype(int)[:num]

# 5) Draw boxes & class names on an OpenCV image
img_bgr   = cv2.cvtColor(orig.numpy(), cv2.COLOR_RGB2BGR)
box_color = (255, 0, 0)  # blue in BGR
text_color= (255, 0, 0)

for box, score, cid in zip(boxes, scores, class_ids):
    if score < 0.3:  # confidence threshold
        continue
    cls_name = COCO_LABELS.get(cid, f'ID {cid}')
    ymin, xmin, ymax, xmax = box
    pt1 = (int(xmin * w), int(ymin * h))
    pt2 = (int(xmax * w), int(ymax * h))
    # draw box & label
    cv2.rectangle(img_bgr, pt1, pt2, box_color, 2)
    cv2.putText(
        img_bgr,
        f"{cls_name}: {score:.2f}",
        (pt1[0], pt1[1] - 10),
        cv2.FONT_HERSHEY_SIMPLEX,
        0.5,
        text_color,
        2
    )

# 6) Display with matplotlib
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 8))
plt.imshow(img_rgb)
plt.axis("off")
plt.title("SSD-MobileNet v2 Detection (class names, blue boxes)")
plt.show()

👀 What You’ll See

  • Blue/red boxes around detected objects

  • Class names like person, traffic light, bird, etc.

  • Confidence scores (e.g., person: 0.92, dog: 0.84)

🖼️ Example Output

image

✅ Summary Table

Feature Description
📦 Model SSD-MobileNet v2
🧠 Framework TensorFlow Hub (auto-downloads weights)
🎯 Accuracy Moderate (good for fast use cases)
Speed ✅ Real-time (~20–30 FPS on CPU/GPU)
🔍 Classes COCO dataset (90 classes: people, animals, objects)
🔧 Use Case Real-time detection, mobile apps, robotics