Object Tracking - iffatAGheyas/computer-vision-handbook GitHub Wiki

📊 Object Tracking Using Kalman Filters

A Kalman filter is a recursive predictive algorithm that estimates the state of a dynamic system—such as the position of a moving object—even when observations are noisy or partially missing. It “guesses” where the object is now and where it will be next.


🧠 What Is a Kalman Filter?

  • Predicts the future state (e.g. position, velocity) of a moving object
  • Corrects its prediction using actual (noisy) measurements

🤔 Why Use It for Object Tracking?

  • Smooths noisy detections (e.g. jumping blobs or jittery bounding boxes)
  • Fills gaps when the object briefly disappears (occlusion)
  • Predicts next location before it arrives—critical for real-time systems

🎯 How It Works (Conceptually)

The Kalman filter alternates between two phases each frame:

Phase Description
Predict Estimate the next state based on the current state
Correct Update that estimate using the actual measurement

Typically the state vector includes: Position (x, y)
Velocity

image

🧪 Step-by-Step Code: Object Tracking with Kalman Filter

import cv2
import numpy as np
import matplotlib.pyplot as plt

# 1. Create Kalman filter with 4 state vars (x, y, dx, dy) and 2 measurements (x, y)
kf = cv2.KalmanFilter(4, 2)

# 2. Define transition matrix (how state evolves)
kf.transitionMatrix = np.array([
    [1, 0, 1, 0],
    [0, 1, 0, 1],
    [0, 0, 1, 0],
    [0, 0, 0, 1]
], np.float32)

# 3. Define measurement matrix (what we observe)
kf.measurementMatrix = np.array([
    [1, 0, 0, 0],
    [0, 1, 0, 0]
], np.float32)

# 4. Add process & measurement noise
kf.processNoiseCov     = np.eye(4, dtype=np.float32) * 1e-3
kf.measurementNoiseCov = np.eye(2, dtype=np.float32) * 1

# 5. Simulate true positions & noisy measurements
true_positions = []
measured_positions = []
predicted_positions = []

x, y, dx, dy = 100, 100, 2, 1
for _ in range(80):
    # true motion
    x += dx; y += dy
    true_positions.append((x, y))
    # noisy measurement
    mx = x + np.random.randn() * 5
    my = y + np.random.randn() * 5
    measured_positions.append((mx, my))
    # predict
    pred = kf.predict()
    # correct with measurement
    kf.correct(np.array([mx], [my](/iffatAGheyas/computer-vision-handbook/wiki/mx],-[my), np.float32))
    predicted_positions.append((pred[0], pred[1]))

# 6. Plot results
tp = np.array(true_positions)
mp = np.array(measured_positions)
pp = np.array(predicted_positions)

plt.figure(figsize=(8, 6))
plt.plot(tp[:,0], tp[:,1],  label="True",           color="black")
plt.scatter(mp[:,0], mp[:,1], label="Measured",       color="red",   s=10)
plt.plot(pp[:,0], pp[:,1],  label="Kalman Prediction", color="green")
plt.gca().invert_yaxis()  # match image coords
plt.xlabel("X Position")
plt.ylabel("Y Position")
plt.title("Kalman Filter Object Tracking")
plt.legend()
plt.grid(True)
plt.show()

🔍 What You’ll See

  • Black line = true motion
  • Red dots = noisy measurements
  • Green line = Kalman-filtered smooth tracking

🗓️ When to Use Kalman Filters in Computer Vision

Use Case Why Kalman Helps
Real-time object tracking Predicts motion between frames
Handling occlusion Continues predicting when object disappears
Noisy detections (blobs) Smooths out jittery measurements

Summary

Kalman Filter Role Description
Predict Guess next state from current state
Correct Update guess based on observation
Advantage Smooth, noise-resistant tracking

🎯 Object Tracking with Mean-Shift

Mean-Shift is a simple, histogram-based tracking algorithm that locates an object by iteratively shifting a search window to maximize the similarity between the target’s appearance model (colour histogram) and the current frame.


🧠 How It Works (Plain English)

  1. Select ROI
    Choose an initial bounding box around the object in the first frame.

  2. Build Appearance Model
    Compute a colour histogram of the ROI in HSV space.

  3. Back-Project Histogram
    For each new frame, calculate a back-projection: a probability map showing how well each pixel matches the ROI histogram.

  4. Mean-Shift Iteration
    Apply the Mean-Shift algorithm to shift the window towards the region of highest probability—i.e. where the histogram best matches.


🐍 Step-by-Step Code: Mean-Shift Tracking

import cv2
import os

# 1. Load video
cap = cv2.VideoCapture("baby.mp4")
ret, frame = cap.read()
if not ret:
    print("❌ Failed to read video")
    cap.release()
    exit()

# 2. Initialise tracking window (x, y, w, h)
x, y, w, h = 700, 300, 600, 600
track_window = (x, y, w, h)

# 3. Compute ROI histogram in HSV
roi = frame[y:y+h, x:x+w]
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
roi_hist = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# 4. Setup termination criteria and output folder
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)
output_dir = "tracking_snapshots"
os.makedirs(output_dir, exist_ok=True)

frame_count = 0
snapshot_interval = 30  # save every 30 frames
snapshot_count = 0

# 5. Process frames
while True:
    ret, frame = cap.read()
    if not ret:
        break

    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    back_proj = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

    # apply Mean-Shift to get new window
    ret, track_window = cv2.meanShift(back_proj, track_window, term_crit)
    x, y, w, h = track_window

    # draw bounding box
    result = cv2.rectangle(frame.copy(), (x, y), (x+w, y+h), (0, 255, 0), 2)

    # save snapshot periodically
    if frame_count % snapshot_interval == 0:
        filename = os.path.join(output_dir, f"frame_{frame_count:04d}.jpg")
        cv2.imwrite(filename, result)
        snapshot_count += 1

    # display live result
    cv2.imshow("Mean-Shift Tracking", result)
    if cv2.waitKey(30) & 0xFF == 27:
        break

    frame_count += 1

cap.release()
cv2.destroyAllWindows()
print(f"✅ Saved {snapshot_count} snapshots in '{output_dir}/'")

image

🖼️ Sample Output

Snapshots are saved in tracking_snapshots/ with the object enclosed in a green box each time the window shifts.


📋 Summary: What Each Output Means

Output Purpose
Video frames with green box The tracker shifts the window to follow the object

✅ Strengths & Limitations

Strengths Limitations
Simple and fast Poor handling of scale changes
Works using only colour histograms Struggles with occlusion or dramatic background changes
Effective for distinct colours Fails if the object’s appearance changes significantly

🔄 CamShift (Continuously Adaptive Mean-Shift)

CamShift builds on the Mean-Shift tracker by automatically adapting the window size and estimating the object’s orientation, making it robust to scale and rotation changes (e.g. a person moving closer or turning their head).


🔍 Key Differences: CamShift vs. Mean-Shift

Feature Mean-Shift CamShift (Adaptive)
Window size Fixed ✅ Resizes dynamically
Object rotation Not handled ✅ Tracks orientation (angle)
Output New position only ✅ Position, size & angle

✅ Step-by-Step Code: CamShift Object Tracking

import cv2
import os

# 1. Load video
cap = cv2.VideoCapture("baby.mp4")
ret, frame = cap.read()
if not ret:
    raise RuntimeError("Failed to read the first frame")

# 2. Initialise ROI and tracking window (x, y, w, h)
x, y, w, h = 700, 300, 600, 600
track_window = (x, y, w, h)

# 3. Compute ROI histogram in HSV
roi        = frame[y:y+h, x:x+w]
hsv_roi    = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
roi_hist   = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# 4. Setup termination criteria & output folder
term_crit   = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)
output_dir  = "camshift_snapshots"
os.makedirs(output_dir, exist_ok=True)

frame_count, snapshot_count = 0, 0
snapshot_interval = 30

# 5. Process video frames
while True:
    ret, frame = cap.read()
    if not ret:
        break

    hsv       = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    back_proj = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

    # apply CamShift → returns new window & rotated box
    ret_cs, track_window = cv2.CamShift(back_proj, track_window, term_crit)

    # draw rotated box
    pts     = cv2.boxPoints(ret_cs)
    pts     = pts.astype(int)
    result  = cv2.polylines(frame.copy(), [pts], True, (0, 255, 0), 2)

    # save snapshot periodically
    if frame_count % snapshot_interval == 0:
        fname = f"frame_{frame_count:04d}.jpg"
        cv2.imwrite(os.path.join(output_dir, fname), result)
        snapshot_count += 1

    # display
    cv2.imshow("CamShift Tracking", result)
    if cv2.waitKey(30) & 0xFF == 27:
        break

    frame_count += 1

cap.release()
cv2.destroyAllWindows()
print(f"✅ Saved {snapshot_count} CamShift snapshots in '{output_dir}/'")

image

🔍 What’s New in CamShift Code

Code Part What It Does
cv2.CamShift() Returns updated position, size and angle
cv2.boxPoints() Converts that info into a rotated box
cv2.polylines() Draws the rotated box (not just an axis-aligned rectangle)

Summary Table

Feature Benefit
Adaptive window Tracks zooming objects seamlessly
Orientation-aware Handles rotating objects smoothly
Histogram-based Retains colour-based robustness like Mean-Shift