Faster R CNN - iffatAGheyas/computer-vision-handbook GitHub Wiki

🧠 Faster R-CNN — Object Detection with PyTorch

Faster R-CNN (Region-based Convolutional Neural Network) is a two-stage object detector renowned for its high accuracy, especially on complex scenes.

📌 What Is Faster R-CNN?

Region Proposal Network (RPN):
Scans the image and proposes candidate object regions (bounding boxes).
Classifier + Regressor:
For each proposed region, predicts the object class and refines the box coordinates.

✅ Strengths of Faster R-CNN

Feature	Benefit
🧠 High Accuracy	Detects objects with great precision
🎯 Precise Boxes	Sharp, clean bounding boxes
🔎 Multi-class	Trained on 90+ COCO categories
📦 Built into PyTorch	Available via `torchvision.models`

⚙️ Model Details

Architecture: ResNet-50 backbone + Feature Pyramid Network (FPN)
Framework: PyTorch
Pretrained Dataset: COCO (Common Objects in Context)
Classes: 91 total (including background)

💻 How to Run Faster R-CNN in PyTorch

Step 1: Install Required Packages

pip install torch torchvision matplotlib opencv-python

Step 2: Full Python Code

# 2) Faster R-CNN object detection & drawing code

import torch
from torchvision import models, transforms
import cv2
import matplotlib.pyplot as plt
import numpy as np
import os

# COCO category names (0 is background)
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
    'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
    'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
    'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',
    'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
    'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass',
    'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
    'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
    'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet',
    'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
    'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
    'scissors', 'teddy bear', 'hair dryer', 'toothbrush'
]

# 1) Load the pre-trained Faster R-CNN model
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # set to evaluation mode

# 2) Read & prepare the image
image_path = "frame2.png"  # ← replace with your file
if not os.path.isfile(image_path):
    raise FileNotFoundError(f"Cannot find {image_path}")

# Load with OpenCV, convert BGR→RGB
img_bgr = cv2.imread(image_path)
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)

# Transform to tensor [0,1] and add batch dim
transform = transforms.Compose([
    transforms.ToTensor(),  # scales to [0,1] and converts to float32
])
img_tensor = transform(img_rgb)  # shape [3, H, W]
input_tensor = img_tensor.unsqueeze(0)  # shape [1, 3, H, W]

# 3) Run inference
with torch.no_grad():
    outputs = model(input_tensor)[0]

# 4) Parse outputs
boxes  = outputs['boxes'].cpu().numpy()   # [N,4] in (xmin,ymin,xmax,ymax)
scores = outputs['scores'].cpu().numpy()  # [N]
labels = outputs['labels'].cpu().numpy()  # [N] integer IDs

# 5) Draw boxes & class names
threshold = 0.5  # confidence threshold
box_color  = (255, 0, 0)  # blue in BGR
text_color = (255, 0, 0)

for box, score, lbl in zip(boxes, scores, labels):
    if score < threshold:
        continue
    xmin, ymin, xmax, ymax = box.astype(int)
    class_name = COCO_INSTANCE_CATEGORY_NAMES[lbl]
    # Draw rectangle
    cv2.rectangle(img_bgr,
                  (xmin, ymin),
                  (xmax, ymax),
                  box_color, 2)
    # Draw label
    cv2.putText(img_bgr,
                f"{class_name}: {score:.2f}",
                (xmin, ymin - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, text_color, 2)

# 6) Display with matplotlib
img_out = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 8))
plt.imshow(img_out)
plt.axis('off')
plt.title("Faster R-CNN Object Detection (blue boxes, class names)")
plt.show()

🖼️ What You’ll See

Blue bounding boxes around detected objects
Class names and confidence scores printed on the image

Example based on your screenshot:

person: 0.99
person: 0.96

✅ Summary Table

Feature	Faster R-CNN
🎯 Accuracy	✅ Very high (best among common models)
🚀 Speed	❌ Slower (~2–5 FPS on CPU)
🧩 Model Type	Two-stage (RPN + classifier/regressor)
📦 Framework	PyTorch (`torchvision.models`)
📚 Classes	COCO Dataset (91 categories)
🏷️ Use Case	Image analysis, security, annotation tools