Object Tracking - iffatAGheyas/computer-vision-handbook GitHub Wiki
📊 Object Tracking Using Kalman Filters
A Kalman filter is a recursive predictive algorithm that estimates the state of a dynamic system—such as the position of a moving object—even when observations are noisy or partially missing. It “guesses” where the object is now and where it will be next.
🧠 What Is a Kalman Filter?
- Predicts the future state (e.g. position, velocity) of a moving object
- Corrects its prediction using actual (noisy) measurements
🤔 Why Use It for Object Tracking?
- Smooths noisy detections (e.g. jumping blobs or jittery bounding boxes)
- Fills gaps when the object briefly disappears (occlusion)
- Predicts next location before it arrives—critical for real-time systems
🎯 How It Works (Conceptually)
The Kalman filter alternates between two phases each frame:
Phase | Description |
---|---|
Predict | Estimate the next state based on the current state |
Correct | Update that estimate using the actual measurement |
Typically the state vector includes:
Position (x, y)
Velocity
🧪 Step-by-Step Code: Object Tracking with Kalman Filter
import cv2
import numpy as np
import matplotlib.pyplot as plt
# 1. Create Kalman filter with 4 state vars (x, y, dx, dy) and 2 measurements (x, y)
kf = cv2.KalmanFilter(4, 2)
# 2. Define transition matrix (how state evolves)
kf.transitionMatrix = np.array([
[1, 0, 1, 0],
[0, 1, 0, 1],
[0, 0, 1, 0],
[0, 0, 0, 1]
], np.float32)
# 3. Define measurement matrix (what we observe)
kf.measurementMatrix = np.array([
[1, 0, 0, 0],
[0, 1, 0, 0]
], np.float32)
# 4. Add process & measurement noise
kf.processNoiseCov = np.eye(4, dtype=np.float32) * 1e-3
kf.measurementNoiseCov = np.eye(2, dtype=np.float32) * 1
# 5. Simulate true positions & noisy measurements
true_positions = []
measured_positions = []
predicted_positions = []
x, y, dx, dy = 100, 100, 2, 1
for _ in range(80):
# true motion
x += dx; y += dy
true_positions.append((x, y))
# noisy measurement
mx = x + np.random.randn() * 5
my = y + np.random.randn() * 5
measured_positions.append((mx, my))
# predict
pred = kf.predict()
# correct with measurement
kf.correct(np.array([mx], [my](/iffatAGheyas/computer-vision-handbook/wiki/mx],-[my), np.float32))
predicted_positions.append((pred[0], pred[1]))
# 6. Plot results
tp = np.array(true_positions)
mp = np.array(measured_positions)
pp = np.array(predicted_positions)
plt.figure(figsize=(8, 6))
plt.plot(tp[:,0], tp[:,1], label="True", color="black")
plt.scatter(mp[:,0], mp[:,1], label="Measured", color="red", s=10)
plt.plot(pp[:,0], pp[:,1], label="Kalman Prediction", color="green")
plt.gca().invert_yaxis() # match image coords
plt.xlabel("X Position")
plt.ylabel("Y Position")
plt.title("Kalman Filter Object Tracking")
plt.legend()
plt.grid(True)
plt.show()
🔍 What You’ll See
- Black line = true motion
- Red dots = noisy measurements
- Green line = Kalman-filtered smooth tracking
🗓️ When to Use Kalman Filters in Computer Vision
Use Case | Why Kalman Helps |
---|---|
Real-time object tracking | Predicts motion between frames |
Handling occlusion | Continues predicting when object disappears |
Noisy detections (blobs) | Smooths out jittery measurements |
✅ Summary
Kalman Filter Role | Description |
---|---|
Predict | Guess next state from current state |
Correct | Update guess based on observation |
Advantage | Smooth, noise-resistant tracking |
🎯 Object Tracking with Mean-Shift
Mean-Shift is a simple, histogram-based tracking algorithm that locates an object by iteratively shifting a search window to maximize the similarity between the target’s appearance model (colour histogram) and the current frame.
🧠 How It Works (Plain English)
-
Select ROI
Choose an initial bounding box around the object in the first frame. -
Build Appearance Model
Compute a colour histogram of the ROI in HSV space. -
Back-Project Histogram
For each new frame, calculate a back-projection: a probability map showing how well each pixel matches the ROI histogram. -
Mean-Shift Iteration
Apply the Mean-Shift algorithm to shift the window towards the region of highest probability—i.e. where the histogram best matches.
🐍 Step-by-Step Code: Mean-Shift Tracking
import cv2
import os
# 1. Load video
cap = cv2.VideoCapture("baby.mp4")
ret, frame = cap.read()
if not ret:
print("❌ Failed to read video")
cap.release()
exit()
# 2. Initialise tracking window (x, y, w, h)
x, y, w, h = 700, 300, 600, 600
track_window = (x, y, w, h)
# 3. Compute ROI histogram in HSV
roi = frame[y:y+h, x:x+w]
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
roi_hist = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)
# 4. Setup termination criteria and output folder
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)
output_dir = "tracking_snapshots"
os.makedirs(output_dir, exist_ok=True)
frame_count = 0
snapshot_interval = 30 # save every 30 frames
snapshot_count = 0
# 5. Process frames
while True:
ret, frame = cap.read()
if not ret:
break
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
back_proj = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)
# apply Mean-Shift to get new window
ret, track_window = cv2.meanShift(back_proj, track_window, term_crit)
x, y, w, h = track_window
# draw bounding box
result = cv2.rectangle(frame.copy(), (x, y), (x+w, y+h), (0, 255, 0), 2)
# save snapshot periodically
if frame_count % snapshot_interval == 0:
filename = os.path.join(output_dir, f"frame_{frame_count:04d}.jpg")
cv2.imwrite(filename, result)
snapshot_count += 1
# display live result
cv2.imshow("Mean-Shift Tracking", result)
if cv2.waitKey(30) & 0xFF == 27:
break
frame_count += 1
cap.release()
cv2.destroyAllWindows()
print(f"✅ Saved {snapshot_count} snapshots in '{output_dir}/'")
🖼️ Sample Output
Snapshots are saved in tracking_snapshots/
with the object enclosed in a green box each time the window shifts.
📋 Summary: What Each Output Means
Output | Purpose |
---|---|
Video frames with green box | The tracker shifts the window to follow the object |
✅ Strengths & Limitations
Strengths | Limitations |
---|---|
Simple and fast | Poor handling of scale changes |
Works using only colour histograms | Struggles with occlusion or dramatic background changes |
Effective for distinct colours | Fails if the object’s appearance changes significantly |
🔄 CamShift (Continuously Adaptive Mean-Shift)
CamShift builds on the Mean-Shift tracker by automatically adapting the window size and estimating the object’s orientation, making it robust to scale and rotation changes (e.g. a person moving closer or turning their head).
🔍 Key Differences: CamShift vs. Mean-Shift
Feature | Mean-Shift | CamShift (Adaptive) |
---|---|---|
Window size | Fixed | ✅ Resizes dynamically |
Object rotation | Not handled | ✅ Tracks orientation (angle) |
Output | New position only | ✅ Position, size & angle |
✅ Step-by-Step Code: CamShift Object Tracking
import cv2
import os
# 1. Load video
cap = cv2.VideoCapture("baby.mp4")
ret, frame = cap.read()
if not ret:
raise RuntimeError("Failed to read the first frame")
# 2. Initialise ROI and tracking window (x, y, w, h)
x, y, w, h = 700, 300, 600, 600
track_window = (x, y, w, h)
# 3. Compute ROI histogram in HSV
roi = frame[y:y+h, x:x+w]
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
roi_hist = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)
# 4. Setup termination criteria & output folder
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)
output_dir = "camshift_snapshots"
os.makedirs(output_dir, exist_ok=True)
frame_count, snapshot_count = 0, 0
snapshot_interval = 30
# 5. Process video frames
while True:
ret, frame = cap.read()
if not ret:
break
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
back_proj = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)
# apply CamShift → returns new window & rotated box
ret_cs, track_window = cv2.CamShift(back_proj, track_window, term_crit)
# draw rotated box
pts = cv2.boxPoints(ret_cs)
pts = pts.astype(int)
result = cv2.polylines(frame.copy(), [pts], True, (0, 255, 0), 2)
# save snapshot periodically
if frame_count % snapshot_interval == 0:
fname = f"frame_{frame_count:04d}.jpg"
cv2.imwrite(os.path.join(output_dir, fname), result)
snapshot_count += 1
# display
cv2.imshow("CamShift Tracking", result)
if cv2.waitKey(30) & 0xFF == 27:
break
frame_count += 1
cap.release()
cv2.destroyAllWindows()
print(f"✅ Saved {snapshot_count} CamShift snapshots in '{output_dir}/'")
🔍 What’s New in CamShift Code
Code Part | What It Does |
---|---|
cv2.CamShift() |
Returns updated position, size and angle |
cv2.boxPoints() |
Converts that info into a rotated box |
cv2.polylines() |
Draws the rotated box (not just an axis-aligned rectangle) |
✅ Summary Table
Feature | Benefit |
---|---|
Adaptive window | Tracks zooming objects seamlessly |
Orientation-aware | Handles rotating objects smoothly |
Histogram-based | Retains colour-based robustness like Mean-Shift |