EfficientViT Classification
October 25, 2024 · View on GitHub
Datasets
ImageNet: https://www.image-net.org/
Our code expects the ImageNet dataset directory to follow the following structure:
imagenet
├── train
├── val
Pretrained EfficientViT Classification Models
Latency/Throughput is measured on NVIDIA Jetson Nano, NVIDIA Jetson AGX Orin, and NVIDIA A100 GPU with TensorRT, fp16. Data transfer time is included.
ImageNet
All EfficientViT classification models are trained on ImageNet-1K with random initialization (300 epochs + 20 warmup epochs) using supervised learning. Please put the downloaded checkpoints under ${efficientvit_repo}/assets/checkpoints/efficientvit_cls/
| Model | Resolution | ImageNet Top1 Acc | ImageNet Top5 Acc | Params | MACs | A100 Throughput | Checkpoint |
|---|---|---|---|---|---|---|---|
| EfficientNetV2-S | 384x384 | 83.9 | - | 22M | 8.4G | 2869 image/s | - |
| EfficientNetV2-M | 480x480 | 85.2 | - | 54M | 25G | 1160 image/s | - |
| EfficientViT-L1 | 224x224 | 84.484 | 96.862 | 53M | 5.3G | 6207 image/s | link |
| EfficientViT-L2 | 224x224 | 85.050 | 97.090 | 64M | 6.9G | 4998 image/s | link |
| EfficientViT-L2 | 256x256 | 85.366 | 97.216 | 64M | 9.1G | 3969 image/s | link |
| EfficientViT-L2 | 288x288 | 85.630 | 97.364 | 64M | 11G | 3102 image/s | link |
| EfficientViT-L2 | 320x320 | 85.734 | 97.438 | 64M | 14G | 2525 image/s | link |
| EfficientViT-L2 | 384x384 | 85.978 | 97.518 | 64M | 20G | 1784 image/s | link |
| EfficientViT-L3 | 224x224 | 85.814 | 97.198 | 246M | 28G | 2081 image/s | link |
| EfficientViT-L3 | 256x256 | 85.938 | 97.318 | 246M | 36G | 1641 image/s | link |
| EfficientViT-L3 | 288x288 | 86.070 | 97.440 | 246M | 46G | 1276 image/s | link |
| EfficientViT-L3 | 320x320 | 86.230 | 97.474 | 246M | 56G | 1049 image/s | link |
| EfficientViT-L3 | 384x384 | 86.408 | 97.632 | 246M | 81G | 724 image/s | link |
EfficientViT B series
| Model | Resolution | ImageNet Top1 Acc | ImageNet Top5 Acc | Params | MACs | Jetson Nano (bs1) | Jetson Orin (bs1) | Checkpoint |
|---|---|---|---|---|---|---|---|---|
| EfficientViT-B1 | 224x224 | 79.390 | 94.346 | 9.1M | 0.52G | 24.8ms | 1.48ms | link |
| EfficientViT-B1 | 256x256 | 79.918 | 94.704 | 9.1M | 0.68G | 28.5ms | 1.57ms | link |
| EfficientViT-B1 | 288x288 | 80.410 | 94.984 | 9.1M | 0.86G | 34.5ms | 1.82ms | link |
| EfficientViT-B2 | 224x224 | 82.100 | 95.782 | 24M | 1.6G | 50.6ms | 2.63ms | link |
| EfficientViT-B2 | 256x256 | 82.698 | 96.096 | 24M | 2.1G | 58.5ms | 2.84ms | link |
| EfficientViT-B2 | 288x288 | 83.086 | 96.302 | 24M | 2.6G | 69.9ms | 3.30ms | link |
| EfficientViT-B3 | 224x224 | 83.468 | 96.356 | 49M | 4.0G | 101ms | 4.36ms | link |
| EfficientViT-B3 | 256x256 | 83.806 | 96.514 | 49M | 5.2G | 120ms | 4.74ms | link |
| EfficientViT-B3 | 288x288 | 84.150 | 96.732 | 49M | 6.5G | 141ms | 5.63ms | link |
Usage
# classification
from efficientvit.cls_model_zoo import create_efficientvit_cls_model
model = create_efficientvit_cls_model(name="efficientvit-l3-r384", pretrained=True)
Evaluation
Please run eval_efficientvit_cls_model.py to evaluate our models.
Examples: classification
Export
Onnx
To generate ONNX files, please refer to onnx_export.py.
Example:
python assets/onnx_export.py --export_path assets/export_models/efficientvit_cls_l3_r224.onnx --model efficientvit-l3 --resolution 224 224 --bs 1
TFLite
To generate TFLite files, please refer to tflite_export.py.
Example:
python assets/tflite_export.py --export_path assets/export_models/efficientvit_cls_b3_r224.tflite --model efficientvit-b3 --resolution 224 224
Training
Please refer to train_efficientvit_cls_model.py for training models on imagenet.
EfficientViT L Series
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_l1.yaml --amp bf16 \
--data_provider.data_dir ~/dataset/imagenet \
--path .exp/efficientvit_cls/imagenet/efficientvit_l1_r224/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_l2.yaml --amp bf16 \
--data_provider.data_dir ~/dataset/imagenet \
--path .exp/efficientvit_cls/imagenet/efficientvit_l2_r224/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_l3.yaml --amp bf16 \
--data_provider.data_dir ~/dataset/imagenet \
--path .exp/efficientvit_cls/imagenet/efficientvit_l3_r224/
EfficientViT B Series
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b1.yaml \
--data_provider.data_dir ~/dataset/imagenet \
--path .exp/efficientvit_cls/imagenet/efficientvit_b1_r224/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b1.yaml \
--data_provider.image_size "[128,160,192,224,256,288]" \
--data_provider.data_dir ~/dataset/imagenet \
--run_config.eval_image_size "[288]" \
--path .exp/efficientvit_cls/imagenet/efficientvit_b1_r288/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b2.yaml \
--data_provider.data_dir ~/dataset/imagenet \
--path .exp/efficientvit_cls/imagenet/efficientvit_b2_r224/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b2.yaml \
--data_provider.image_size "[128,160,192,224,256,288]" \
--data_provider.data_dir ~/dataset/imagenet \
--run_config.eval_image_size "[288]" \
--data_provider.data_aug "{n:1,m:5}" \
--path .exp/efficientvit_cls/imagenet/efficientvit_b2_r288/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b3.yaml \
--data_provider.data_dir ~/dataset/imagenet \
--path .exp/efficientvit_cls/imagenet/efficientvit_b3_r224/
Reference
If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:
@inproceedings{cai2023efficientvit,
title={Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction},
author={Cai, Han and Li, Junyan and Hu, Muyan and Gan, Chuang and Han, Song},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={17302--17313},
year={2023}
}