Model Zoo
April 29, 2025 · View on GitHub
Note
- For all the pretraining and finetuning, we adopt spaese/uniform sampling.
#Frame#input_frame#crop#clip#input_framemeans how many frames are input for model per inference#cropmeans spatial crops (e.g., 3 for left/right/center)#clipmeans temporal clips (e.g., 4 means repeted sampling four clips with different start indices)
Pretraining
TBD
Distillation
TBD
Finetuning
K710
TBD
K400
| Model | Setting | #Frame | Top-1 | Model | Shell |
|---|---|---|---|---|---|
| -1B | K-Mash PT + K710 FT | 8x3x4 | 91.3 | :hugs: HF link | TBD |
| -1B | K-Mash PT + K710 FT | 16x3x4 | 91.6 | :hugs: HF link | TBD |
| -6B | K-Mash PT + K710 FT | 8x3x4 | 91.9 | TBD | TBD |
| -6B | K-Mash PT + K710 FT | 16x3x4 | 92.1 | TBD | TBD |
| -S/14 | K-Mash PT + K710 FT | 8x3x4 | 85.4 | :hugs: HF link | TBD |
| -B/14 | K-Mash PT + K710 FT | 8x3x4 | 88.4 | :hugs: HF link | TBD |
| -L/14 | K-Mash PT + K710 FT | 8x3x4 | 90.4 | :hugs: HF link | TBD |
| -S/14 | K-Mash PT + K710 FT | 8x3x4 | 87.3 | Link | run.sh |
| -B/14 | K-Mash PT + K710 FT | 8x3x4 | 89.3 | Link | run.sh |
SthSth V2
TBD