README.md
March 10, 2026 ยท View on GitHub
MedViT
MedViTV2: Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention
๐ฅ News
- [2025.08.27] We have released the pre-trained weights.
- [2025.10.06] Our paper accepted for publication in Applied Soft Computing.
Train & Test --- Prepare data
To train or evaluate MedViT models on 17 medical datasets, follow this "Evaluation".
Important: This code also supports training all TIMM models.
Introduction
Convolutional networks, transformers, hybrid models, and Mamba-based architectures have shown strong performance in medical image classification but are typically designed for clean, labeled data. Real-world clinical datasets, however, often contain corruptions arising from multi-center studies and variations in imaging equipment. To address this, we introduce the Medical Vision Transformer (MedViTV2), the first architecture to integrate KolmogorovโArnold Network (KAN) layers into a transformer for generalized medical image classification. We design an efficient KAN block to lower computational cost while improving accuracy over the original MedViT. To overcome scaling fragility, we propose Dilated Neighborhood Attention (DiNA), an adaptation of fused dot-product attention that expands receptive fields and mitigates feature collapse. Additionally, a hierarchical hybrid strategy balances local and global feature perception through efficient stacking of Local and Global Feature Perception blocks. Evaluated on 17 classification and 12 corrupted datasets, MedViTV2 achieved state-of-the-art performance in 27 of 29 benchmarks, improving efficiency by 44% and boosting accuracy by 4.6% on MedMNIST, 5.8% on NonMNIST, and 13.4% on MedMNIST-C.
Overview
Visual Examples
You can find a tutorial for visualizing the Grad-CAM heatmap of MedViT in this repository "visualize".

Usage
First, clone the repository locally:
git clone https://github.com/whai362/PVT.git](https://github.com/Omid-Nejati/MedViTV2.git
cd MedViTV2
Install PyTorch 2.5
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124
Then, install natten 0.17.3
pip install natten==0.17.3+torch250cu124 -f https://shi-labs.com/natten/wheels/
Also, install requirements
pip install -r requirements.txt
Training
To train MedViT-small on breastMNIST on a single gpu for 100 epochs run:
python main.py --model_name 'MedViT_small' --dataset 'breastmnist' --pretrained False
๐ Performance Overview
Below is the performance summary of MedViT on various medical imaging datasets.
๐น Model weights are available now.
| Dataset | Task | MedViTV2-tiny (%) | MedViTV2-small (%) | MedViTV2-base (%) | MedViTV2-large (%) |
|---|---|---|---|---|---|
| ChestMNIST | Multi-Class (14) | 96.3 (model) | 96.4 (model) | 96.4 (model) | 96.7 (model) |
| PathMNIST | Multi-Class (9) | 95.9 (model) | 96.5 (model) | 97.0 (model) | 97.7 (model) |
| DermaMNIST | Multi-Class (7) | 78.1 (model) | 79.2 (model) | 80.8 (model) | 81.7 (model) |
| OCTMNIST | Multi-Class (4) | 92.7 (model) | 94.2 (model) | 94.4 (model) | 95.2 (model) |
| PneumoniaMNIST | Multi-Class (2) | 95.1 (model) | 96.5 (model) | 96.9 (model) | 97.3 (model) |
| RetinaMNIST | Multi-Class (5) | 54.7 (model) | 56.2 (model) | 57.5 (model) | 57.8 (model) |
| BreastMNIST | Multi-Class (2) | 88.2 (model) | 89.5 (model) | 90.4 (model) | 91.0 (model) |
| BloodMNIST | Multi-Class (8) | 97.9 (model) | 98.5 (model) | 98.5 (model) | 98.7 (model) |
| TissueMNIST | Multi-Class (8) | 69.9 (model) | 70.5 (model) | 71.1 (model) | 71.6 (model) |
| OrganAMNIST | Multi-Class (11) | 95.8 (model) | 96.6 (model) | 96.9 (model) | 97.3 (model) |
| OrganCMNIST | Multi-Class (11) | 93.5 (model) | 95.0 (model) | 95.3 (model) | 96.1 (model) |
| OrganSMNIST | Multi-Class (11) | 82.4 (model) | 83.9 (model) | 84.4 (model) | 85.1 (model) |
| PAD-UFES-20 | Multi-Class (6) | 63.6 (model) | |||
| ISIC2018 | Multi-Class (7) | 77.1 (model) | |||
| CPN X-ray | Multi-Class (3) | 95.3 (model) | |||
| Kvasir | Multi-Class (8) | 82.8 (model) | |||
| Fetal-Planes-DB | Multi-Class (6) | 95.3 (model) |
License
MedViT is released under the MIT License.
๐๐ธ If you find my GitHub repository useful, please consider giving it a star!๐
References
Citation
@article{manzari2025medical,
title={Medical image classification with KAN-integrated transformers and dilated neighborhood attention},
author={Manzari, Omid Nejati and Asgariandehkordi, Hojat and Koleilat, Taha and Xiao, Yiming and Rivaz, Hassan},
journal={Applied Soft Computing},
pages={114045},
year={2025},
publisher={Elsevier}
}
@article{manzari2023medvit,
title={MedViT: a robust vision transformer for generalized medical image classification},
author={Manzari, Omid Nejati and Ahmadabadi, Hamid and Kashiani, Hossein and Shokouhi, Shahriar B and Ayatollahi, Ahmad},
journal={Computers in Biology and Medicine},
volume={157},
pages={106791},
year={2023},
publisher={Elsevier}
}