Each detection instance should contain (x1, y1, x2, y2, score [, cls]). While the class prediction is not required, it is still useful to save for custom filtering. Otherwise, each track additionally keeps track of the moving centers, sizes

Option Semantics

Before: necessary to know whether tracking is enabled

Tracking OFF ON
--det-all dets as tracks track all
person only dets as tracks track persons

After: explicit detections and person tracks

Tracking OFF ON
--det-all all dets, no tracks obj dets, person tracks
person only person dets, no tracks no dets, person tracks



  • Detection: (x, y, w, h, score, features)
  • Track: (id, mean, cov, features,...)


  • Detection: (C, x1, y1, x2, y2, p, f, di)
  • Track: (id, C, cx, cy, w, h, history)
p: detection probability or score
f: object features
di: detection index w.r.t. detection frame
history: last few detections


