GTR model zoo

March 24, 2022 ยท View on GitHub

Introduction

This file documents a collection of models reported in our paper. Our experiments are trained on a DGX machine with 8 32G V100 GPUs. Most of our models use 4 GPUs.

How to Read the Tables

The "Name" column contains a link to the config file. To train a model, run

python train_net.py --num-gpus 4 --config-file /path/to/config/name.yaml

To evaluate a model with a trained/ pretrained model, run

python train_net.py --config-file /path/to/config/name.yaml --eval-only MODEL.WEIGHTS /path/to/weight.pth

MOT

Validation set

NameMOTAIDF1HOTADetAAssADownload
GTR_MOT_FPN71.375.963.060.466.2model
GTR_MOT_FPN (local)71.174.262.160.264.4same as above

Test set

NameMOTAIDF1HOTADetAAssADownload
GTR_MOTFull_FPN75.371.559.161.657.0model

Note

  • The validation set follows the half-half training set split from CenterTrack.
  • All models are finetuned from a detection-only model trained on Crowdhuman (config, model). Download or train the model and place it as GTR_ROOT/models/CH_FPN_1x.pth before training. Training the detection-only models takes ~12 hours on 4 GPUs.
  • Training GTR takes ~3 hours on 4 V100 GPUs (32G memory).
  • GTR_MOT_FPN is our model with a temporal-window size of 32. It needs more than 12G GPU memory in testing. To change the temporal-window size, append INPUT.VIDEO.TEST_LEN 16 to the command.
  • GTR_MOT_FPN (local) is our local tracker baseline, which applies FairMOT to our detections and features. To run it, append VIDEO_TEST.LOCAL_TRACK True to the command.

TAO

Namevalidation mAPTest mAPDownload
GTR_TAO_DR210122.520.1model

Note

  • The model is evaluated on TAO keyframes only, which are sampled in ~1 frame-per-second.
  • Our model is trained on LVIS+COCO only. The TAO training set is not used anywhere.
  • Our model is finetuned on a detection-only CenterNet2 model trained on LVIS+COCO (config, model). Download or train the model and place it as GTR_ROOT/models/C2_LVISCOCO_DR2101_4x.pth before training. Training the detection-only models takes ~3 days on 8 GPUs.
  • Training GTR takes ~13 hours on 4 V100 GPUs (32G memory).