Benchmark Workflows
May 30, 2026 · View on GitHub
Use this guide when you want to run the benchmark-driven modes: generate, eval, tune, and research.
Core idea
Benchmark workflows resolve the dataset, detector, and ReID defaults from self-contained YAMLs under boxmot/configs/benchmarks/. The first run generates detections and embeddings, and later runs reuse that cache.
boxmot generate --benchmark mot17 --split ablation
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack
boxmot tune --benchmark mot17 --split ablation --tracker bytetrack
Built-in benchmark ids
mot17for MOT17 and the ablation split workflowsportsmotfor SportsMOTmmotfor the MMOT benchmark config backed by OBB.npyframesmmot-minifor a local MMOT-style OBB benchmark rooted atassets/mmot-mini
mmot is the CLI benchmark id for MMOT.
Cache reuse
generate, eval, tune, and research share a cache key derived from the dataset, detector, and ReID configuration.
- Keep the same benchmark, split, detector, and ReID settings when you want later runs to reuse an existing cache.
- Changing any of those inputs creates a different cache bucket and forces regeneration.
--detection-sourcealso affects the cache key — public detection caches are stored separately from private detector caches.- Native
--tracker-backend cppreplay can still reuse the same detection cache, but trackers with native ReID write embeddings to a separate__cppcache bucket.
Public detections
Some benchmarks ship with public detections from the original challenge (e.g., MOT17 includes FRCNN, SDP, and DPM). Use --detection-source to generate and evaluate with these instead of the configured detector:
# Generate cache with public FRCNN detections
boxmot generate --benchmark mot17 --split ablation --detection-source frcnn
# Evaluate using the same public detections
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source frcnn
# Tune against public detections
boxmot tune --benchmark mot17 --split ablation --tracker ocsort --detection-source frcnn --n-trials 10
Public detectors are defined in the benchmark YAML under public_detectors. The detection files are downloaded from the parquet repository and cached. ReID embeddings are generated automatically for the public detections.
--detection-source public uses the default public detector specified in the benchmark config's download.public_detector field.
Replay image loading
Most cached replay runs do not need to read images at all. BoxMOT skips image loading when the selected tracker can work from cached detections and embeddings alone.
Trackers that need image data during replay, such as camera-motion-compensation paths, still read frames from the dataset during replay.
Outputs
Benchmark workflows write reusable detection and embedding caches under the project run directory, plus tracker outputs and evaluation artifacts for the selected mode.
generatewrites the cache only.evalwrites tracker outputs and TrackEval summaries.tunewrites trial outputs and the best parameter set.researchwrites benchmark summaries for each evaluated code proposal.
README benchmark table notes
The README benchmark table uses the following conventions and inputs:
- MOT17 ablation runs evaluate on the second half of the MOT17 training set because the public validation split is not available and the ablation detector was trained on the first half.
- MOT17 cells are shown as
Py (C++). The value in parentheses is the native replay path using--tracker-backend cpp. —means that tracker does not currently have a native replay backend for that benchmark path.- MOT17 results use pre-generated detections and embeddings with each tracker configured from its default repository settings.
- SportsMOT val results use the
yolox_x_sportsmotdetector andlmbn_n_dukeReID model. - MMOT test metrics are class-averaged across all 8 categories, following the MMOT paper convention, and use the
yolo11l_3chdetector with thelmbn_n_dukeReID model.