Pre-trained Models & Evaluation & Fine-tuning

May 14, 2024 · View on GitHub

Here we provide the pre-trained models and the evaluation/fine-tuning instructions.

ImageNet-1K trained models

These models are also available at Tsinghua Cloud.

Model	#Param	#FLOPs	Acc@1	Training Speedup	#Equivalent Epochs	link
ResNet-50	26M	4.1G	79.7%	~1.5x	200	Google Drive
ConvNeXt-Tiny	29M	4.5G	82.2%	~1.5x	200	Google Drive
ConvNeXt-Small	50M	8.7G	83.2%	~1.5x	200	Google Drive
ConvNeXt-Base	89M	15.4G	83.8%	~1.5x	200	Google Drive
DeiT-Tiny	5M	1.3G	72.5%	~3.0x	100	Google Drive
			73.4%	~2.0x	150	Google Drive
			73.8%	~1.5x	200	Google Drive
			74.4%	~1.0x	300	Google Drive
DeiT-Small	22M	4.6G	79.9%	~3.0x	100	Google Drive
			80.6%	~2.0x	150	Google Drive
			81.0%	~1.5x	200	Google Drive
			81.4%	~1.0x	300	Google Drive
Swin-Tiny	28M	4.5G	80.9%	~3.0x	100	Google Drive
			81.4%	~2.0x	150	Google Drive
			81.6%	~1.5x	200	Google Drive
Swin-Small	50M	8.7G	82.8%	~3.0x	100	Google Drive
			83.1%	~2.0x	150	Google Drive
			83.2%	~1.5x	200	Google Drive
Swin-Base	88M	15.4G	83.3%	~3.0x	100	Google Drive
			83.5%	~2.0x	150	Google Drive
			83.6%	~1.5x	200	Google Drive
CSWin-Tiny	23M	4.3G	82.9%	~1.5x	200	Google Drive
CSWin-Small	35M	6.9G	83.6%	~1.5x	200	Google Drive
CSWin-Base	78M	15.0G	84.3%	~1.5x	200	Google Drive
CAFormer-S18	26M	4.1G	83.4%	~1.5x	200	Google Drive
CAFormer-S36	39M	8.0G	84.3%	~1.5x	200	Google Drive
CAFormer-M36	56M	13.2G	85.0%	~1.5x	200	Google Drive

ImageNet-22K -> ImageNet-1K fine-tuned models

These models are also available at Tsinghua Cloud.

Model	#Param	#FLOPs	Acc@1	Pre-training Speedup	link
CSWin-Base-224	78M	15.0G	86.1%	~3.0x	Google Drive
			86.3%	~2.0x	Google Drive
CSWin-Base-384	78M	47.0G	87.1%	~3.0x	Google Drive
			87.4%	~2.0x	Google Drive
CSWin-Large-224	173M	31.5G	86.9%	~3.0x	Google Drive
			87.1%	~2.0x	Google Drive
CSWin-Large-384	173M	96.8G	87.9%	~3.0x	Google Drive
			88.1%	~2.0x	Google Drive

Evaluation

We give an example command for evaluating Swin-Tiny:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
    python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=12345 main_buffer.py \
    --model swin_tiny --drop_path 0.0 \
    --eval true --batch_size 128 --input_size 224 \
    --data_path /path/to/imagenet-1k \
    --resume /path/to/checkpoint/ET_pp_200ep_swinT.pth

This should yield

* Acc@1 81.626 Acc@5 95.694 loss 0.785

For other models, please change --model, --resume, and --input_size accordingly. You can get the pre-trained models from the tables above.
Setting a model-specific --drop_path is not required in evaluation, as the DropPath module in timm behaves the same during evaluation, but it is required in training.

ImageNet-22K pre-trained models

These models are also available at Tsinghua Cloud.

Model	#Param	#FLOPs	Pre-training Speedup	link
CSWin-Base-224	78M	15.0G	~3.0x	Google Drive
		15.0G	~2.0x	Google Drive
CSWin-Large-224	173M	31.5G	~3.0x	Google Drive
		31.5G	~2.0x	Google Drive

Fine-tuning ImageNet-22K pre-trained models

We give an example command for fine-tuning an ImageNet-22K pre-trained CSWin-Base-224 model on ImageNet-1K:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
    python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=12345 main_buffer.py \
    --model CSWin_96_24322_base_224 --drop_path 0.2 --weight_decay 1e-8 \
    --batch_size 64 --lr 5e-5 --update_freq 1 \
    --warmup_epochs 0 --epochs 30 --end_epoch 30 \
    --cutmix 0 --mixup 0 --layer_decay 0.9 --input_size 224 \
    --use_amp true \
    --model_ema true --model_ema_eval true --model_ema_decay 0.9998 \
    --data_path /path/to/imagenet-1k \
    --output_dir /path/to/save/results \
    --finetune /path/to/checkpoint/ET_pp_in22k_pre_trained_speedup2x_cswinB.pth

For other models, please change --model, --finetune, and --input_size accordingly. You can get the pre-trained models from the table above.
For better performance, --drop_path, --layer_decay, and --model_ema_decay can be adjusted. In our paper, we determine these hyper-parameters on top of the baseline models, and directly use these obtained configurations for fine-tuning our ImageNet-22K pre-trained models.