Faster R CNN - iffatAGheyas/computer-vision-handbook GitHub Wiki

🧠 Faster R-CNN — Object Detection with PyTorch

Faster R-CNN (Region-based Convolutional Neural Network) is a two-stage object detector renowned for its high accuracy, especially on complex scenes.


📌 What Is Faster R-CNN?

  1. Region Proposal Network (RPN):
    Scans the image and proposes candidate object regions (bounding boxes).

  2. Classifier + Regressor:
    For each proposed region, predicts the object class and refines the box coordinates.


✅ Strengths of Faster R-CNN

Feature Benefit
🧠 High Accuracy Detects objects with great precision
🎯 Precise Boxes Sharp, clean bounding boxes
🔎 Multi-class Trained on 90+ COCO categories
📦 Built into PyTorch Available via torchvision.models

⚙️ Model Details

  • Architecture: ResNet-50 backbone + Feature Pyramid Network (FPN)
  • Framework: PyTorch
  • Pretrained Dataset: COCO (Common Objects in Context)
  • Classes: 91 total (including background)

💻 How to Run Faster R-CNN in PyTorch

Step 1: Install Required Packages

pip install torch torchvision matplotlib opencv-python

Step 2: Full Python Code

# 2) Faster R-CNN object detection & drawing code

import torch
from torchvision import models, transforms
import cv2
import matplotlib.pyplot as plt
import numpy as np
import os

# COCO category names (0 is background)
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
    'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
    'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
    'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',
    'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
    'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass',
    'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
    'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
    'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet',
    'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
    'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
    'scissors', 'teddy bear', 'hair dryer', 'toothbrush'
]

# 1) Load the pre-trained Faster R-CNN model
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # set to evaluation mode

# 2) Read & prepare the image
image_path = "frame2.png"  # ← replace with your file
if not os.path.isfile(image_path):
    raise FileNotFoundError(f"Cannot find {image_path}")

# Load with OpenCV, convert BGR→RGB
img_bgr = cv2.imread(image_path)
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)

# Transform to tensor [0,1] and add batch dim
transform = transforms.Compose([
    transforms.ToTensor(),  # scales to [0,1] and converts to float32
])
img_tensor = transform(img_rgb)  # shape [3, H, W]
input_tensor = img_tensor.unsqueeze(0)  # shape [1, 3, H, W]

# 3) Run inference
with torch.no_grad():
    outputs = model(input_tensor)[0]

# 4) Parse outputs
boxes  = outputs['boxes'].cpu().numpy()   # [N,4] in (xmin,ymin,xmax,ymax)
scores = outputs['scores'].cpu().numpy()  # [N]
labels = outputs['labels'].cpu().numpy()  # [N] integer IDs

# 5) Draw boxes & class names
threshold = 0.5  # confidence threshold
box_color  = (255, 0, 0)  # blue in BGR
text_color = (255, 0, 0)

for box, score, lbl in zip(boxes, scores, labels):
    if score < threshold:
        continue
    xmin, ymin, xmax, ymax = box.astype(int)
    class_name = COCO_INSTANCE_CATEGORY_NAMES[lbl]
    # Draw rectangle
    cv2.rectangle(img_bgr,
                  (xmin, ymin),
                  (xmax, ymax),
                  box_color, 2)
    # Draw label
    cv2.putText(img_bgr,
                f"{class_name}: {score:.2f}",
                (xmin, ymin - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, text_color, 2)

# 6) Display with matplotlib
img_out = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 8))
plt.imshow(img_out)
plt.axis('off')
plt.title("Faster R-CNN Object Detection (blue boxes, class names)")
plt.show()

🖼️ What You’ll See

  • Blue bounding boxes around detected objects

  • Class names and confidence scores printed on the image

Example based on your screenshot:

person: 0.99
person: 0.96

Summary Table

Feature Faster R-CNN
🎯 Accuracy ✅ Very high (best among common models)
🚀 Speed ❌ Slower (~2–5 FPS on CPU)
🧩 Model Type Two-stage (RPN + classifier/regressor)
📦 Framework PyTorch (torchvision.models)
📚 Classes COCO Dataset (91 categories)
🏷️ Use Case Image analysis, security, annotation tools