models bytetrack_yolox_x_crowdhuman_mot17 private half - Azure/azureml-assets GitHub Wiki

bytetrack_yolox_x_crowdhuman_mot17-private-half

Overview

bytetrack_yolox_x_crowdhuman_mot17-private-half model is from OpenMMLab's MMTracking library. Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain identities by associating detection boxes whose scores are higher than a threshold. The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating every detection box instead of only the high score ones. For the low score detection boxes, we utilize their similarities with tracklets to recover true objects and filter out the background detections. When applied to 9 different state-of-the-art trackers, our method achieves consistent improvement on IDF1 score ranging from 1 to 10 points. To put forwards the state-of-the-art performance of MOT, we design a simple and strong tracker, named ByteTrack. For the first time, we achieve 80.3 MOTA, 77.3 IDF1 and 63.1 HOTA on the test set of MOT17 with 30 FPS running speed on a single V100 GPU.

Training Details

Training Data

The model developers used CrowdHuman + MOT17-half-train dataset for training the model.

Training Procedure

Training Techniques:

  • SGD with Momentum

Training Resources: 8x V100 GPUs

Evaluation Results

MOTA: 78.6 IDF1: 79.2

License

apache-2.0

Inference Samples

Inference type Python sample (Notebook) CLI with YAML
Real time video-multi-object-tracking-online-endpoint.ipynb video-multi-object-tracking-online-endpoint.sh

Finetuning Samples

Task Use case Dataset Python sample (Notebook) CLI with YAML
Video multi-object tracking Video multi-object tracking MOT17 tiny mot17-tiny-video-multi-object-tracking.ipynb mot17-tiny-video-multi-object-tracking.sh

Sample input and output

Sample input

{
  "input_data": {
    "columns": [
      "video"
    ],
    "data": ["video_link"]
  }
}

Note: "video_link" should be a publicly accessible url.

Sample output

[
  {
    "det_bboxes": [
      {
        "box": {
          "topX": 703.9149780273,
          "topY": -5.5951070786,
          "bottomX": 756.9875488281,
          "bottomY": 158.1963806152
        },
        "label": 0,
        "score": 0.9597821236
      },
      {
        "box": {
          "topX": 1487.9072265625,
          "topY": 67.9468841553,
          "bottomX": 1541.1591796875,
          "bottomY": 217.5476837158
        },
        "label": 0,
        "score": 0.9568068385
      }
    ],
    "track_bboxes": [
      {
        "box": {
          "instance_id": 0,
          "topX": 703.9149780273,
          "topY": -5.5951070786,
          "bottomX": 756.9875488281,
          "bottomY": 158.1963806152
        },
        "label": 0,
        "score": 0.9597821236
      },
      {
        "box": {
          "instance_id": 1,
          "topX": 1487.9072265625,
          "topY": 67.9468841553,
          "bottomX": 1541.1591796875,
          "bottomY": 217.5476837158
        },
        "label": 0,
        "score": 0.9568068385
      }
    ],
    "frame_id": 0,
    "video_url": "video_link"
  }
]

Visualization of inference result for a sample image

mot visualization

Version: 6

Tags

license : apache-2.0 model_specific_defaults : ordereddict({'apply_deepspeed': 'false', 'apply_ort': 'false'}) task : multi-object-tracking hiddenlayerscanned openmmlab_model_id : bytetrack_yolox_x_crowdhuman_mot17-private-half SharedComputeCapacityEnabled finetune_compute_allow_list : ['Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2'] inference_compute_allow_list : ['Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_NC8as_T4_v3', 'Standard_NC96ads_A100_v4', 'Standard_ND40rs_v2', 'Standard_ND96amsr_A100_v4', 'Standard_ND96asr_v4']

View in Studio: https://ml.azure.com/registries/azureml/models/bytetrack_yolox_x_crowdhuman_mot17-private-half/version/6

License: apache-2.0

Properties

SharedComputeCapacityEnabled: True

finetuning-tasks: video-multi-object-tracking

finetune-min-sku-spec: 4|1|28|176

finetune-recommended-sku: Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

inference-min-sku-spec: 4|1|28|176

inference-recommended-sku: Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_NC8as_T4_v3, Standard_NC96ads_A100_v4, Standard_ND40rs_v2, Standard_ND96amsr_A100_v4, Standard_ND96asr_v4

⚠️ **GitHub.com Fallback** ⚠️