ViTKD

July 17, 2023 ยท View on GitHub

Paper: ViTKD: Practical Guidelines for ViT feature knowledge distillation

architecture

Train

#multi GPU
bash tools/dist_train.sh configs/distillers/imagenet/deit-s3_distill_deit-t_img.py 4

Transfer

# Tansfer the Distillation model into mmcls model
python pth_transfer.py --dis_path $dis_ckpt --output_path $new_mmcls_ckpt

Test

#multi GPU
bash tools/dist_test.sh configs/deit/deit-tiny_pt-4xb256_in1k.py $new_mmcls_ckpt 8 --metrics accuracy

Results

ModelTeacherT_weightBaselineViTKDweightViTKD+NKDweightdis_config
DeiT-TinyDeiT III-Smallbaidu/one drive74.4276.06 (+1.64)baidu/one drive77.78 (+3.36)baidu/one driveconfig
DeiT-SmallDeiT III-Basebaidu/one drive80.5581.95 (+1.40)baidu/one drive83.59 (+3.04)baidu/one driveconfig
DeiT-BaseDeiT III-Largebaidu/one drive81.7683.46 (+1.70)baidu/one drive85.41 (+3.65)baidu/one driveconfig

Citation

@article{yang2022vitkd,
  title={ViTKD: Practical Guidelines for ViT feature knowledge distillation},
  author={Yang, Zhendong and Li, Zhe and Zeng, Ailing and Li, Zexian and Yuan, Chun and Li, Yu},
  journal={arXiv preprint arXiv:2209.02432},
  year={2022}
}