README.md

September 11, 2025 · View on GitHub

This repository exactly follows the code and the training settings of PVT.

Image classification on the ImageNet-1K dataset

Methods	Size	#Params	#FLOPs	Acc@1	Pretrained Models
HAT-Net-Tiny	224 x 224	12.7M	2.0G	79.8	Github
HAT-Net-Small	224 x 224	25.7M	4.3G	82.6	Github
HAT-Net-Medium	224 x 224	42.9M	8.3G	84.0	Github
HAT-Net-Large	224 x 224	63.1M	11.5G	84.2	Github

Citation

If you are using the code/models provided here in a publication, please consider citing:

@article{liu2024vision,
  title={Vision Transformers with Hierarchical Attention},
  author={Liu, Yun and Wu, Yu-Huan and Sun, Guolei and Zhang, Le and Chhatkuli, Ajad and Van Gool, Luc},
  journal={Machine Intelligence Research},
  volume={21},
  pages={670--683},
  year={2024},
  publisher={Springer}
}

@article{liu2021transformer,
  title={Transformer in Convolutional Neural Networks},
  author={Liu, Yun and Sun, Guolei and Qiu, Yu and Zhang, Le and Chhatkuli, Ajad and Van Gool, Luc},
  journal={arXiv preprint arXiv:2106.03180},
  year={2021}
}

README.md

Vision Transformers with Hierarchical Attention

This work is first titled "Transformer in Convolutional Neural Networks".

Installation

Image classification on the ImageNet-1K dataset

Citation