EfficientViT Classification

October 25, 2024 · View on GitHub

Datasets

ImageNet: https://www.image-net.org/
Our code expects the ImageNet dataset directory to follow the following structure:

imagenet
├── train
├── val

Pretrained EfficientViT Classification Models

Latency/Throughput is measured on NVIDIA Jetson Nano, NVIDIA Jetson AGX Orin, and NVIDIA A100 GPU with TensorRT, fp16. Data transfer time is included.

ImageNet

All EfficientViT classification models are trained on ImageNet-1K with random initialization (300 epochs + 20 warmup epochs) using supervised learning. Please put the downloaded checkpoints under ${efficientvit_repo}/assets/checkpoints/efficientvit_cls/

ModelResolutionImageNet Top1 AccImageNet Top5 AccParamsMACsA100 ThroughputCheckpoint
EfficientNetV2-S384x38483.9-22M8.4G2869 image/s-
EfficientNetV2-M480x48085.2-54M25G1160 image/s-
EfficientViT-L1224x22484.48496.86253M5.3G6207 image/slink
EfficientViT-L2224x22485.05097.09064M6.9G4998 image/slink
EfficientViT-L2256x25685.36697.21664M9.1G3969 image/slink
EfficientViT-L2288x28885.63097.36464M11G3102 image/slink
EfficientViT-L2320x32085.73497.43864M14G2525 image/slink
EfficientViT-L2384x38485.97897.51864M20G1784 image/slink
EfficientViT-L3224x22485.81497.198246M28G2081 image/slink
EfficientViT-L3256x25685.93897.318246M36G1641 image/slink
EfficientViT-L3288x28886.07097.440246M46G1276 image/slink
EfficientViT-L3320x32086.23097.474246M56G1049 image/slink
EfficientViT-L3384x38486.40897.632246M81G724 image/slink
EfficientViT B series
ModelResolutionImageNet Top1 AccImageNet Top5 AccParamsMACsJetson Nano (bs1)Jetson Orin (bs1)Checkpoint
EfficientViT-B1224x22479.39094.3469.1M0.52G24.8ms1.48mslink
EfficientViT-B1256x25679.91894.7049.1M0.68G28.5ms1.57mslink
EfficientViT-B1288x28880.41094.9849.1M0.86G34.5ms1.82mslink
EfficientViT-B2224x22482.10095.78224M1.6G50.6ms2.63mslink
EfficientViT-B2256x25682.69896.09624M2.1G58.5ms2.84mslink
EfficientViT-B2288x28883.08696.30224M2.6G69.9ms3.30mslink
EfficientViT-B3224x22483.46896.35649M4.0G101ms4.36mslink
EfficientViT-B3256x25683.80696.51449M5.2G120ms4.74mslink
EfficientViT-B3288x28884.15096.73249M6.5G141ms5.63mslink

Usage

# classification
from efficientvit.cls_model_zoo import create_efficientvit_cls_model

model = create_efficientvit_cls_model(name="efficientvit-l3-r384", pretrained=True)

Evaluation

Please run eval_efficientvit_cls_model.py to evaluate our models.

Examples: classification

Export

Onnx

To generate ONNX files, please refer to onnx_export.py.

Example:

python assets/onnx_export.py --export_path assets/export_models/efficientvit_cls_l3_r224.onnx --model efficientvit-l3 --resolution 224 224 --bs 1

TFLite

To generate TFLite files, please refer to tflite_export.py.

Example:

python assets/tflite_export.py --export_path assets/export_models/efficientvit_cls_b3_r224.tflite --model efficientvit-b3 --resolution 224 224

Training

Please refer to train_efficientvit_cls_model.py for training models on imagenet.

EfficientViT L Series

torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_l1.yaml --amp bf16 \
    --data_provider.data_dir ~/dataset/imagenet \
    --path .exp/efficientvit_cls/imagenet/efficientvit_l1_r224/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_l2.yaml --amp bf16 \
    --data_provider.data_dir ~/dataset/imagenet \
    --path .exp/efficientvit_cls/imagenet/efficientvit_l2_r224/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_l3.yaml --amp bf16 \
    --data_provider.data_dir ~/dataset/imagenet \
    --path .exp/efficientvit_cls/imagenet/efficientvit_l3_r224/

EfficientViT B Series

torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b1.yaml \
    --data_provider.data_dir ~/dataset/imagenet \
    --path .exp/efficientvit_cls/imagenet/efficientvit_b1_r224/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b1.yaml \
    --data_provider.image_size "[128,160,192,224,256,288]" \
    --data_provider.data_dir ~/dataset/imagenet \
    --run_config.eval_image_size "[288]" \
    --path .exp/efficientvit_cls/imagenet/efficientvit_b1_r288/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b2.yaml \
    --data_provider.data_dir ~/dataset/imagenet \
    --path .exp/efficientvit_cls/imagenet/efficientvit_b2_r224/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b2.yaml \
    --data_provider.image_size "[128,160,192,224,256,288]" \
    --data_provider.data_dir ~/dataset/imagenet \
    --run_config.eval_image_size "[288]" \
    --data_provider.data_aug "{n:1,m:5}" \
    --path .exp/efficientvit_cls/imagenet/efficientvit_b2_r288/
torchrun --nnodes 1 --nproc_per_node=8 \
python applications/efficientvit_cls/train_efficientvit_cls_model.py applications/efficientvit_cls/configs/imagenet/efficientvit_b3.yaml \
    --data_provider.data_dir ~/dataset/imagenet \
    --path .exp/efficientvit_cls/imagenet/efficientvit_b3_r224/

Reference

If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@inproceedings{cai2023efficientvit,
  title={Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction},
  author={Cai, Han and Li, Junyan and Hu, Muyan and Gan, Chuang and Han, Song},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={17302--17313},
  year={2023}
}