MODEL ZOO

November 11, 2022 ยท View on GitHub

Common settings and notes

  • The experiments are run with pytorch 0.4.1, CUDA 9.0, and CUDNN 7.1.
  • Training times are measured on our servers with 8 TITAN V GPUs (12 GB Memeory).
  • Testing times are measured on our local machine with TITAN Xp GPU.
  • The models can be downloaded directly from Google drive.

Object Detection

COCO

ModelGPUsTrain time(h)Test time (ms)APDownload
ctdet_coco_hg510971 / 129 / 67440.3 / 42.2 / 45.1model
ctdet_coco_dla_1x85719 / 36 / 24836.3 / 38.2 / 40.7model
ctdet_coco_dla_2x89219 / 36 / 24837.4 / 39.2 / 41.7model
ctdet_coco_resdcn10186522 / 40 / 25934.6 / 36.2 / 39.3model
ctdet_coco_resdcn184287 / 14 / 8128.1 / 30.0 / 33.2model
exdet_coco_hg5215134 / 246/134035.8 / 39.8 / 42.4model
exdet_coco_dla813351 / 90 / 48133.0 / 36.5 / 38.5model

Notes

  • All models are trained on COCO train 2017 and evaluated on val 2017.
  • We show test time and AP with no augmentation / flip augmentation / multi scale (0.5, 0.75, 1, 1.25, 1.5) augmentation.
  • Results on COCO test-dev can be found in the paper or add --trainval for test.py.
  • exdet is our re-implementation of ExtremeNet. The testing does not include edge aggregation.
  • For dla and resnets, 1x means the training schedule that train 140 epochs with learning rate dropped 10 times at the 90 and 120 epoch (following SimpleBaseline). 2x means train 230 epochs with learning rate dropped 10 times at the 180 and 210 epoch. The training schedules are not carefully investigated.
  • The hourglass trained schedule follows ExtremeNet: trains 50 epochs (approximately 250000 iterations in batch size 24) and drops learning rate at the 40 epoch.
  • Testing time include network forwarding time, decoding time, and nms time (for ExtremeNet).
  • We observed up to 0.4 AP performance jitter due to randomness in training.

Pascal VOC

ModelGPUsTrain time (h)Test time (ms)mAPDownload
ctdet_pascal_dla_3841152079.3model
ctdet_pascal_dla_5122153080.7model
ctdet_pascal_resdcn18_38413772.6model
ctdet_pascal_resdcn18_512151075.7model
ctdet_pascal_resdcn101_384272277.1model
ctdet_pascal_resdcn101_512473378.7model

Notes

  • All models are trained on trainval 07+12 and tested on test 2007.
  • Flip test is used by default.
  • Training schedule: train for 70 epochs with learning rate dropped 10 times at the 45 and 60 epoch.
  • We observed up to 1 mAP performance jitter due to randomness in training.

Human pose estimation

COCO

ModelGPUsTrain time(h)Test time (ms)APDownload
multi_pose_hg_1x56215158.7model
multi_pose_hg_3x518815164.0model
multi_pose_dla_1x8304454.7model
multi_pose_dla_3x8704458.9model

Notes

  • All models are trained on keypoint train 2017 images which contains at least one human with keypoint annotations (64115 images).
  • The evaluation is done on COCO keypoint val 2017 (5000 images).
  • Flip test is used by default.
  • The models are fine-tuned from the corresponding center point detection models.
  • Dla training schedule: 1x: train for 140 epochs with learning rate dropped 10 times at the 90 and 120 epoch.3x: train for 320 epochs with learning rate dropped 10 times at the 270 and 300 epoch.
  • Hourglass training schedule: 1x: train for 50 epochs with learning rate dropped 10 times at the 40 epoch.3x: train for 150 epochs with learning rate dropped 10 times at the 130 epoch.

3D bounding box detection

Notes

  • The 3dop split is from 3DOP and the suborn split is from SubCNN.
  • No augmentation is used in testing.
  • The models are trained for 70 epochs with learning rate dropped at the 45 and 60 epoch.

KITTI 3DOP split

ModelGPUsTrain timeTest timeAP-EAP-MAP-HAOS-EAOS-MAOS-HBEV-EBEV-MBEV-HDownload
ddd_3dop27h31ms96.987.879.293.984.375.734.030.526.8model

KITTI SubCNN split

ModelGPUsTrain timeTest timeAP-EAP-MAP-HAOS-EAOS-MAOS-HBEV-EBEV-MBEV-HDownload
ddd_sub27h31ms89.679.870.385.775.265.934.927.726.4model