Object Tracking - iffatAGheyas/computer-vision-handbook GitHub Wiki

📊 Object Tracking Using Kalman Filters

A Kalman filter is a recursive predictive algorithm that estimates the state of a dynamic system—such as the position of a moving object—even when observations are noisy or partially missing. It “guesses” where the object is now and where it will be next.

🧠 What Is a Kalman Filter?

Predicts the future state (e.g. position, velocity) of a moving object
Corrects its prediction using actual (noisy) measurements

🤔 Why Use It for Object Tracking?

Smooths noisy detections (e.g. jumping blobs or jittery bounding boxes)
Fills gaps when the object briefly disappears (occlusion)
Predicts next location before it arrives—critical for real-time systems

🎯 How It Works (Conceptually)

The Kalman filter alternates between two phases each frame:

Phase	Description
Predict	Estimate the next state based on the current state
Correct	Update that estimate using the actual measurement

Typically the state vector includes: Position (x, y)
Velocity

🧪 Step-by-Step Code: Object Tracking with Kalman Filter

import cv2
import numpy as np
import matplotlib.pyplot as plt

# 1. Create Kalman filter with 4 state vars (x, y, dx, dy) and 2 measurements (x, y)
kf = cv2.KalmanFilter(4, 2)

# 2. Define transition matrix (how state evolves)
kf.transitionMatrix = np.array([
    [1, 0, 1, 0],
    [0, 1, 0, 1],
    [0, 0, 1, 0],
    [0, 0, 0, 1]
], np.float32)

# 3. Define measurement matrix (what we observe)
kf.measurementMatrix = np.array([
    [1, 0, 0, 0],
    [0, 1, 0, 0]
], np.float32)

# 4. Add process & measurement noise
kf.processNoiseCov     = np.eye(4, dtype=np.float32) * 1e-3
kf.measurementNoiseCov = np.eye(2, dtype=np.float32) * 1

# 5. Simulate true positions & noisy measurements
true_positions = []
measured_positions = []
predicted_positions = []

x, y, dx, dy = 100, 100, 2, 1
for _ in range(80):
    # true motion
    x += dx; y += dy
    true_positions.append((x, y))
    # noisy measurement
    mx = x + np.random.randn() * 5
    my = y + np.random.randn() * 5
    measured_positions.append((mx, my))
    # predict
    pred = kf.predict()
    # correct with measurement
    kf.correct(np.array([mx], [my](/iffatAGheyas/computer-vision-handbook/wiki/mx],-[my), np.float32))
    predicted_positions.append((pred[0], pred[1]))

# 6. Plot results
tp = np.array(true_positions)
mp = np.array(measured_positions)
pp = np.array(predicted_positions)

plt.figure(figsize=(8, 6))
plt.plot(tp[:,0], tp[:,1],  label="True",           color="black")
plt.scatter(mp[:,0], mp[:,1], label="Measured",       color="red",   s=10)
plt.plot(pp[:,0], pp[:,1],  label="Kalman Prediction", color="green")
plt.gca().invert_yaxis()  # match image coords
plt.xlabel("X Position")
plt.ylabel("Y Position")
plt.title("Kalman Filter Object Tracking")
plt.legend()
plt.grid(True)
plt.show()

🔍 What You’ll See

Black line = true motion
Red dots = noisy measurements
Green line = Kalman-filtered smooth tracking

🗓️ When to Use Kalman Filters in Computer Vision

Use Case	Why Kalman Helps
Real-time object tracking	Predicts motion between frames
Handling occlusion	Continues predicting when object disappears
Noisy detections (blobs)	Smooths out jittery measurements

✅ Summary

Kalman Filter Role	Description
Predict	Guess next state from current state
Correct	Update guess based on observation
Advantage	Smooth, noise-resistant tracking

🎯 Object Tracking with Mean-Shift

Mean-Shift is a simple, histogram-based tracking algorithm that locates an object by iteratively shifting a search window to maximize the similarity between the target’s appearance model (colour histogram) and the current frame.

🧠 How It Works (Plain English)

Select ROI
Choose an initial bounding box around the object in the first frame.
Build Appearance Model
Compute a colour histogram of the ROI in HSV space.
Back-Project Histogram
For each new frame, calculate a back-projection: a probability map showing how well each pixel matches the ROI histogram.
Mean-Shift Iteration
Apply the Mean-Shift algorithm to shift the window towards the region of highest probability—i.e. where the histogram best matches.

🐍 Step-by-Step Code: Mean-Shift Tracking

import cv2
import os

# 1. Load video
cap = cv2.VideoCapture("baby.mp4")
ret, frame = cap.read()
if not ret:
    print("❌ Failed to read video")
    cap.release()
    exit()

# 2. Initialise tracking window (x, y, w, h)
x, y, w, h = 700, 300, 600, 600
track_window = (x, y, w, h)

# 3. Compute ROI histogram in HSV
roi = frame[y:y+h, x:x+w]
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
roi_hist = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# 4. Setup termination criteria and output folder
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)
output_dir = "tracking_snapshots"
os.makedirs(output_dir, exist_ok=True)

frame_count = 0
snapshot_interval = 30  # save every 30 frames
snapshot_count = 0

# 5. Process frames
while True:
    ret, frame = cap.read()
    if not ret:
        break

    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    back_proj = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

    # apply Mean-Shift to get new window
    ret, track_window = cv2.meanShift(back_proj, track_window, term_crit)
    x, y, w, h = track_window

    # draw bounding box
    result = cv2.rectangle(frame.copy(), (x, y), (x+w, y+h), (0, 255, 0), 2)

    # save snapshot periodically
    if frame_count % snapshot_interval == 0:
        filename = os.path.join(output_dir, f"frame_{frame_count:04d}.jpg")
        cv2.imwrite(filename, result)
        snapshot_count += 1

    # display live result
    cv2.imshow("Mean-Shift Tracking", result)
    if cv2.waitKey(30) & 0xFF == 27:
        break

    frame_count += 1

cap.release()
cv2.destroyAllWindows()
print(f"✅ Saved {snapshot_count} snapshots in '{output_dir}/'")

🖼️ Sample Output

Snapshots are saved in tracking_snapshots/ with the object enclosed in a green box each time the window shifts.

📋 Summary: What Each Output Means

Output	Purpose
Video frames with green box	The tracker shifts the window to follow the object

✅ Strengths & Limitations

Strengths	Limitations
Simple and fast	Poor handling of scale changes
Works using only colour histograms	Struggles with occlusion or dramatic background changes
Effective for distinct colours	Fails if the object’s appearance changes significantly

🔄 CamShift (Continuously Adaptive Mean-Shift)

CamShift builds on the Mean-Shift tracker by automatically adapting the window size and estimating the object’s orientation, making it robust to scale and rotation changes (e.g. a person moving closer or turning their head).

🔍 Key Differences: CamShift vs. Mean-Shift

Feature	Mean-Shift	CamShift (Adaptive)
Window size	Fixed	✅ Resizes dynamically
Object rotation	Not handled	✅ Tracks orientation (angle)
Output	New position only	✅ Position, size & angle

✅ Step-by-Step Code: CamShift Object Tracking

import cv2
import os

# 1. Load video
cap = cv2.VideoCapture("baby.mp4")
ret, frame = cap.read()
if not ret:
    raise RuntimeError("Failed to read the first frame")

# 2. Initialise ROI and tracking window (x, y, w, h)
x, y, w, h = 700, 300, 600, 600
track_window = (x, y, w, h)

# 3. Compute ROI histogram in HSV
roi        = frame[y:y+h, x:x+w]
hsv_roi    = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
roi_hist   = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# 4. Setup termination criteria & output folder
term_crit   = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)
output_dir  = "camshift_snapshots"
os.makedirs(output_dir, exist_ok=True)

frame_count, snapshot_count = 0, 0
snapshot_interval = 30

# 5. Process video frames
while True:
    ret, frame = cap.read()
    if not ret:
        break

    hsv       = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    back_proj = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

    # apply CamShift → returns new window & rotated box
    ret_cs, track_window = cv2.CamShift(back_proj, track_window, term_crit)

    # draw rotated box
    pts     = cv2.boxPoints(ret_cs)
    pts     = pts.astype(int)
    result  = cv2.polylines(frame.copy(), [pts], True, (0, 255, 0), 2)

    # save snapshot periodically
    if frame_count % snapshot_interval == 0:
        fname = f"frame_{frame_count:04d}.jpg"
        cv2.imwrite(os.path.join(output_dir, fname), result)
        snapshot_count += 1

    # display
    cv2.imshow("CamShift Tracking", result)
    if cv2.waitKey(30) & 0xFF == 27:
        break

    frame_count += 1

cap.release()
cv2.destroyAllWindows()
print(f"✅ Saved {snapshot_count} CamShift snapshots in '{output_dir}/'")

🔍 What’s New in CamShift Code

Code Part	What It Does
`cv2.CamShift()`	Returns updated position, size and angle
`cv2.boxPoints()`	Converts that info into a rotated box
`cv2.polylines()`	Draws the rotated box (not just an axis-aligned rectangle)

✅ Summary Table

Feature	Benefit
Adaptive window	Tracks zooming objects seamlessly
Orientation-aware	Handles rotating objects smoothly
Histogram-based	Retains colour-based robustness like Mean-Shift