MODEL ZOO

August 23, 2023 ยท View on GitHub


TAdaConvV2

Kinetics 710 pretrained

arch.pt.#framesckp.
TAdaFormer-B/16CLIP16ckp
TAdaFormer-L/14CLIP16ckp
TAdaFormer-L/14CLIP32ckp
TAdaFormer-L/14CLIP64ckp

Kinetics 400

arch.pt.#framesGFLOPStop1ckp.
TAdaConvNeXtV2-TIN1K1647x3x479.6ckp
TAdaConvNeXtV2-TIN1K3294x3x480.8ckp
TAdaConvNeXtV2-SIN1K1691x3x480.8ckp
TAdaConvNeXtV2-SIN1K32183x3x481.9ckp
TAdaConvNeXtV2-SIN21K32183x3x482.9ckp
TAdaConvNeXtV2-BIN1K16162x3x481.4ckp
TAdaConvneXtV2-BIN1K32324x3x482.3ckp
TAdaConvNeXtV2-BIN21K32324x3x483.7ckp
arch.pt.#framesGFLOPStop1ckp.
TAdaFormer-B/16CLIP16153x3x484.5ckp
TAdaFormer-L/14CLIP16703x3x487.6ckp
TAdaFormer-B/16CLIP+K71016153x3x486.6ckp
TAdaFormer-L/14CLIP+K71016703x3x488.9ckp
TAdaFormer-L/14CLIP+K710321406x3x489.5ckp
TAdaFormer-L/14CLIP+K710642812x3x489.9ckp

Something-Something

The checkpoints in this part is provided for SSV2.

arch.pt.#framesGFLOPSSSV1SSV2ckp.
TAdaConvNeXtV2-TIN1K+K4001647x3x254.167.2ckp
TAdaConvNeXtV2-TIN1K+K4003294x3x256.469.8ckp
TAdaConvNeXtV2-SIN1K+K4001691x3x255.668.4ckp
TAdaConvNeXtV2-SIN1K+K40032183x3x258.570.0ckp
TAdaConvNeXtV2-SIN21K+K40032183x3x259.770.6ckp
TAdaConvneXtV2-BIN21K+K40032324x3x260.771.1ckp
arch.pt.#framesGFLOPSSSV1SSV2ckp.
TAdaFormer-B/16CLIP16187x3x259.270.4ckp
TAdaFormer-B/16CLIP32374x3x261.271.3ckp
TAdaFormer-L/14CLIP16858x3x262.072.4ckp
TAdaFormer-L/14CLIP321716x3x263.773.6ckp

TAdaConv

Kinetics-400

architecturedepthinitclips x crops#frames x sampling rateacc@1acc@5checkpointconfig
TAda2DR50IN-1K10 x 38 x 876.792.6[google drive][baidu(code:p06d)]tada2d_8x8.yaml
TAda2DR50IN-1K10 x 316 x 577.493.1[google drive][baidu(code:6k8h)]tada2d_16x5.yaml
ViViT Fact. Enc.B16x2IN-21K4 x 332 x 279.494.0[google drive][baidu(code:1t51)]vivit_fac_enc_b16x2.yaml

Something-Something

architecturedepthinitclips x crops#framesacc@1acc@5checkpointconfig
TAda2DR50IN-1K2 x 3864.288.0[google drive][baidu(code:dlil)]tada2d_8f.yaml
TAda2DR50IN-1K2 x 31665.689.1[google drive][baidu(code:f857)]tada2d_16f.yaml

Epic-Kitchens Action Recognition

architectureinitresolutionclips x crops#frames x sampling rateaction acc@1verb acc@1noun acc@1checkpointconfig
ViViT Fact. Enc.-B16x2K7003204 x 332 x 246.367.458.9[google drive][baidu(code:rinh)]vivit_fac_enc.yaml
ir-CSN-R152K70022410 x 332 x 244.568.455.9[google drive][baidu(code:s0uj)]csn.yaml

Epic-Kitchens Temporal Action Localization

featureclassificationtypeIoU@0.1IoU@0.2IoU@0.3IoU@0.4IoU@0.5Avgcheckpointconfig
ViViTViViTVerb22.9021.9320.7419.0816.0020.13[google drive][baidu(code:3sud)]vivit-os-local.yaml
ViViTViViTNoun28.9527.3825.5222.6718.9524.69[google drive][baidu(code:3sud)]vivit-os-local.yaml
ViViTViViTAction20.8219.9318.6717.0215.0618.30[google drive][baidu(code:3sud)]vivit-os-local.yaml
TAda2DTAda2DVerb19.7018.4917.4115.5012.7816.78[google drive][baidu(code:d01j)]-
TAda2DTAda2DNoun20.5419.3217.9415.7713.3917.39[google drive][baidu(code:d01j)]-
TAda2DTAda2DAction15.1514.3213.5912.1810.6513.18[google drive][baidu(code:d01j)]-

MoSI

Note: for the following models, decord 0.4.1 are used rather than the default 0.6.0 for the codebase.

Pre-trained

datasetbackbonecheckpointconfig
HMDB51R-2D3D-18[google drive][baidu(code:ahqg)]pt-hmdb/r2d3ds.yaml
HMDB51R(2+1)D-10[google drive][baidu(code:1ktb)]pt-hmdb/r2p1d.yaml
UCF101R-2D3D-18[google drive][baidu(code:61uw)]pt-ucf/r2d3ds.yaml
UCF101R(2+1)D-10[google drive][baidu(code:drq2)]pt-ucf/r2p1d.yaml

Finetuned

datasetbackboneacc@1acc@5checkpointconfig
HMDB51R-2D3D-1846.9374.71[google drive][baidu(code:2puu)]ft-hmdb/r2d3ds.yaml
HMDB51R(2+1)D-1051.8378.63[google drive][baidu(code:hgnc)]ft-hmdb/r2p1d.yaml
UCF101R-2D3D-1871.7589.14[google drive][baidu(code:ndt6)]ft-ucf/r2d3ds.yaml
UCF101R(2+1)D-1082.7995.78[google drive][baidu(code:ecsf)]ft-ucf/r2p1d.yaml