Model Zoo

August 11, 2024 · View on GitHub

Note

  • For all the pretraining and finetuning, we adopt spaese/uniform sampling.
  • #Frame == #input_frame ×\times #crop ×\times #clip
  • #input_frame means how many frames are input for model per inference
  • #crop means spatial crops (e.g., 3 for left/right/center)
  • #clip means temporal clips (e.g., 4 means repeted sampling four clips with different start indices)

Pretraining

ModelSettingModelShell
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash-1.1M 300e:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash-2M 300eTBDrun.sh

Distillation

ModelSettingTeacherModelShell
InternVideo2dist\text{InternVideo2}_{dist}-S/14K-Mash-1.1M 100eInternVideo2s2\text{InternVideo2}_{s2}-1B:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-B/14K-Mash-1.1M 100eInternVideo2s2\text{InternVideo2}_{s2}-1B:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-L/14K-Mash-1.1M 100eInternVideo2s2\text{InternVideo2}_{s2}-1B:hugs: HF linkrun.sh

Finetuning

K710

ModelSetting#FrameTop-1ModelShell
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT8x3x487.6:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT8x3x488.1TBDrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-S/14K-Mash PT8x3x479.6:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-B/14K-Mash PT8x3x483.5:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-L/14K-Mash PT8x3x486.2:hugs: HF linkrun.sh

K400

ModelSetting#FrameTop-1ModelShell
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT + K710 FT8x3x491.3:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT + K710 FT16x3x491.6:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT8x3x491.9TBDrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT16x3x492.1TBDrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-S/14K-Mash PT + K710 FT8x3x485.4:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-B/14K-Mash PT + K710 FT8x3x488.4:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-L/14K-Mash PT + K710 FT8x3x490.4:hugs: HF linkrun.sh

K600

ModelSetting#FrameTop-1ModelShell
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT + K710 FT8x3x491.4:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT + K710 FT16x3x491.6:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT8x3x491.7TBDrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT16x3x491.9TBDrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-S/14K-Mash PT + K710 FT8x3x486.0:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-B/14K-Mash PT + K710 FT8x3x488.9:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-L/14K-Mash PT + K710 FT8x3x490.6:hugs: HF linkrun.sh

K700

ModelSetting#FrameTop-1ModelShell
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT + K710 FT8x3x485.0:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT + K710 FT16x3x485.4:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT8x3x485.7TBDrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT16x3x485.9TBDrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-S/14K-Mash PT + K710 FT8x3x475.7:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-B/14K-Mash PT + K710 FT8x3x480.5:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-L/14K-Mash PT + K710 FT8x3x483.5:hugs: HF linkrun.sh

MiT V1

ModelSetting#FrameTop-1ModelShell
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT + K710 FT + K400 FT8x3x450.8:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT + K400 FT8x3x451.0TBDrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6B 336↑K-Mash PT + K710 FT + K400 FT8x3x451.2TBDrun.sh

SthSth V1

ModelSetting#FrameTop-1ModelShell
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT8x3x468.5:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT8x3x469.7TBDrun.sh

SthSth V2

ModelSetting#FrameTop-1ModelShell
InternVideo2s1\text{InternVideo2}_{s1}-1BK-Mash PT8x3x477.1:hugs: HF linkrun.sh
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT8x3x477.5TBDrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-S/14K-Mash PT8x3x471.6:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-B/14K-Mash PT8x3x473.5:hugs: HF linkrun.sh
InternVideo2dist\text{InternVideo2}_{dist}-L/14K-Mash PT8x3x476.4:hugs: HF linkrun.sh

ANet

ModelSetting#FrameTop-1mAPModelShell
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT + K400 FT8x3x495.998.2TBDrun.sh

HACS

ModelSetting#FrameTop-1mAPModelShell
InternVideo2s1\text{InternVideo2}_{s1}-6BK-Mash PT + K710 FT + K400 FT8x3x497.098.8TBDrun.sh