Evaluation and Postprocessing

May 30, 2026 · View on GitHub

Use this guide when you need to interpret benchmark outputs from boxmot eval, Boxmot.val(...), tune, or research.

For cache reuse, benchmark ids, and replay image-loading behavior, see Benchmark Workflows.

Core metrics

  • HOTA for overall tracking quality
  • MOTA for CLEAR-style summary quality
  • IDF1 for identity consistency
  • AssA and AssRe for association quality
  • IDSW and IDs for identity-switch context

Where metrics appear

  • eval reports benchmark results directly
  • tune uses validation results to score parameter trials
  • research optimizes code changes against combined benchmark summaries

For raw runtime summaries from the Python API, evaluate(...) aggregates counts and timings but does not replace TrackEval ground-truth evaluation.

Detection sources

By default, benchmark modes run the detector configured in the benchmark YAML (--detection-source private, the implicit default). Use --detection-source to switch to public MOTChallenge detections:

ValueBehavior
(omitted) or privateRun the configured detector model
publicUse the default public detector from the benchmark YAML
frcnnUse Faster R-CNN public detections
sdpUse SDP public detections
dpmUse DPM public detections

Public detections are downloaded from the benchmark's public_detectors config and cached alongside the standard detection cache. ReID embeddings are generated for the public detections automatically.

# Generate and evaluate with public FRCNN detections
boxmot generate --benchmark mot17 --split ablation --detection-source frcnn
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source frcnn

Kalman filter noise tuning

Use --tune-kf to estimate per-sequence Kalman filter process and measurement noise (Q/R matrices) from cached detections and ground truth before tracking:

boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --tune-kf

This fits noise parameters to the specific dataset and is most useful for KF-based trackers. It requires ground truth to be available for the selected split.

For tune, --tune-kf estimates noise once before the search loop and reuses it for all trials:

boxmot tune --benchmark mot17 --split ablation --tracker botsort --tune-kf --n-trials 20

For runtime adaptation without ground truth (e.g., deployment to new domains), use --adaptive-kf instead, which estimates noise online via the Mehra (1970) method.

Postprocessing modes

eval supports three postprocessing modes through --postprocessing. Multiple steps can be chained in order using comma separation:

  • none – no postprocessing (default)
  • gsi – Gaussian-smoothed interpolation: fills gaps via linear interpolation, then smooths trajectories with a Gaussian process
  • gbrc – gradient-boosting reconnection: uses a GradientBoostingRegressor to interpolate and smooth trajectories
  • gta – global tracklet association: offline pipeline that splits and reconnects tracklets across the full sequence using cached ReID embeddings
# Single step
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gsi

# Multiple steps applied in order
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gbrc,gta

!!! warning "Chained steps overwrite in place" When chaining multiple steps (e.g., gsi,gta), each step reads the MOT result files from the experiment directory, transforms them, and writes the results back. The second step operates on the first step's output, not the original tracker output.

Native C++ tracker backends

eval, tune, and research can swap the cached tracking replay stage to a native C++ tracker runner via --tracker-backend cpp. See Native C++ Integration for supported trackers, build requirements, and ReID notes.

Common commands

# Standard evaluation
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack

# With postprocessing
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gsi,gta

# With KF noise tuning
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --tune-kf

# With public detections
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source frcnn

# Native C++ replay
boxmot eval --benchmark mot17 --split ablation --tracker bytetrack --tracker-backend cpp
boxmot eval --benchmark mot17 --split ablation --tracker botsort:cpp

Main outputs

  • combined benchmark metrics such as HOTA, MOTA, and IDF1
  • per-sequence summaries
  • MOT-style tracker outputs
  • reused cache paths and evaluation artifacts in the run directory