Benchmark and Model Zoo

October 18, 2024 ยท View on GitHub

CNN-based (w/ ImageNet-1k pretrained)

Faster R-CNN

BackboneLr Schdbox mAP (minival)#paramsFLOPsconfiglogmodel
DB-ResNet501x40.869M284Gconfiggithubgithub

Cascade R-CNN (1600x1400)

BackboneLr Schdbox mAP (minival/test-dev)#paramsFLOPsconfigmodel
DB-Res2Net101-DCN20e53.7/-141M429Gconfiggithub
DB-Res2Net101-DCN20e + 1x (swa)54.8/55.3141M429Gconfig (test only)github

Cascade R-CNN w/ 4conv1fc (1600x1400)

BackboneLr Schdbox mAP (minival/test-dev)#paramsFLOPsconfigmodel
DB-Res2Net101-DCN20e54.1/-146M774Gconfiggithub
DB-Res2Net101-DCN20e + 1x (swa)55.3/55.6146M774Gconfig (test only)github

Notes:

Transformer-based (w/ ImageNet-1k pretrained)

Mask R-CNN

BackboneLr Schdbox mAP (minival)mask mAP (minival)#paramsFLOPsconfiglogmodel
DB-Swin-T3x50.244.576M357Gconfiggithubgithub

Cascade Mask R-CNN w/ 4conv1fc

BackboneLr Schdbox mAP (minival)mask mAP (minival)#paramsFLOPsconfiglogmodel
DB-Swin-T3x53.646.2114M836Gconfiggithubgithub

Cascade Mask R-CNN w/ 4conv1fc (1600x1400)

BackboneLr Schdbox mAP (minival/test-dev)mask mAP (minival/test-dev)#paramsFLOPsconfigmodel
DB-Swin-S3x56.3/56.948.6/49.1156M1016Gconfiggithub

Transformer-based (w/ ImageNet-22k pretrained)

HTC (1600x1400)

BackboneLr Schdbox mAP (minival/test-dev)mask mAP (minival/test-dev)#paramsFLOPsconfigmodel
DB-Swin-B20e57.9/-50.2/-231M1004Gconfiggithub
DB-Swin-B20e + 1x (swa)58.2/58.650.4/51.1231M1004Gconfig (test only)github

HTC (bbox head w/ 4conv1fc) (1600x1400)

Compared to regular HTC, our HTC uses 4conv1fc in bbox head.

BackboneLr Schdbox mAP (minival/test-dev)mask mAP (minival/test-dev)#paramsFLOPsconfigmodel
DB-Swin-B20e58.4/58.750.7/51.1235M1348Gconfiggithub
DB-Swin-L1x59.1/59.451.0/51.6453M2162Gconfiggithub
DB-Swin-L (TTA)1x59.6/60.151.8/52.3453M-configgithub

TTA denotes test time augmentation.

EVA02 (1536x1536)

BackboneLr Schdmask mAP (test-dev)#paramsconfigmodel
DB-EVA02-L1x56.1674MconfigHF

Notes: