Vision Transformers in 2022: An Update on Tiny ImageNet
August 9, 2025 ยท View on GitHub
This is the official PyTorch repository of Vision Transformers in 2022: An Update on Tiny ImageNet with pretrained models and training and evaluation scripts.
Model Zoo
I provide the following models finetuned with a 384x384 image resolution on Tiny ImageNet.
| name | acc@1 | #params | url |
|---|---|---|---|
| ViT-L | 86.43 | 304M | model |
| CaiT-S36 | 86.74 | 68M | model |
| DeiT-B distilled | 87.29 | 87M | model |
| Swin-L | 91.35 | 195M | model |
Usage
First, clone the repository:
git clone https://github.com/ehuynh1106/TinyImageNet-Transformers.git
Then install the dependencies:
pip install -r requirements.txt
Data Preparation
Download and extract Tiny ImageNet at https://image-net.org/ in the home directory of this repository.
Stanford hosts a copy that works to reproduce the results of this repo. For example, from the home directory of this project run in terminal
wget http://cs231n.stanford.edu/tiny-imagenet-200.zip
and extract the contents to the home directory
unzip tiny-imagenet-200.zip && mv tiny-imagenet-200/* . && rmdir tiny-imagenet-200
Then run python fileio.py to format the data. This will convert the images into tensors and pickle them into two files, train_dataset.pkl and val_dataset.pkl that will be used in the main code.
Training
To train a Swin-L model on Tiny ImageNet run the following command:
python main.py --train --model swin
Note: Training checkpoints are automatically saved in /models and visualizations of predictions on the validation set are automically saved to /predictions after half of the epochs have passed.
To train DeiT, ViT, and CaiT, replace --model swin with --model deit/vit/cait.
To resume training a Swin-L model on Tiny ImageNet run the following command:
python main.py --train --model swin --resume /path/to/checkpoint
Evaluate
To evaluate a Swin-L model on the validation set of Tiny ImageNet run the following command:
python main.py --evaluate /path/to/model --model swin
Citing
@misc{huynh2022vision,
title={Vision Transformers in 2022: An Update on Tiny ImageNet},
author={Ethan Huynh},
year={2022},
eprint={2205.10660},
archivePrefix={arXiv},
primaryClass={cs.CV}
}