MERIT

November 26, 2025 · View on GitHub

This is the implementation of Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation, MIDL 2023 Video.
Md Mostafijur Rahman, Radu Marculescu

The University of Texas at Austin

🔍 Check out our CVPR 2024 paper! EMCAD

🔍 Check out our CVPRW 2024 paper! PP-SAM

🔍 Check out our WACV 2024 paper! G-CASCADE

🔍 Check out our WACV 2023 paper! CASCADE

Architectures

Qualitative Results on Synapse Multi-organ dataset

Usage:

Recommended environment:

Python 3.8
Pytorch 1.11.0
torchvision 0.12.0

Please use pip install -r requirements.txt to install the dependencies.

Data preparation:

Synapse Multi-organ dataset: Sign up in the official Synapse website and download the dataset. Then split the 'RawData' folder into 'TrainSet' (18 scans) and 'TestSet' (12 scans) following the TransUNet's lists and put in the './data/synapse/Abdomen/RawData/' folder. Finally, preprocess using python ./utils/preprocess_synapse_data.py or download the preprocessed data and save in the './data/synapse/' folder. Note: If you use the preprocessed data from TransUNet, please make necessary changes (i.e., remove the code segment (line# 88-94) to convert groundtruth labels from 14 to 9 classes) in the utils/dataset_synapse.py.
ACDC dataset: Download the preprocessed ACDC dataset from Google Drive and move into './data/ACDC/' folder.

Pretrained model:

You should download the pretrained MaxViT models from Google Drive, and then put it in the './pretrained_pth/maxvit/' folder for initialization.

Training:

cd into MERIT

For Synapse Multi-organ training run CUDA_VISIBLE_DEVICES=0 python -W ignore train_synapse.py

For ACDC training run CUDA_VISIBLE_DEVICES=0 python -W ignore train_ACDC.py

Testing:

cd into MERIT

For Synapse Multi-organ testing run CUDA_VISIBLE_DEVICES=0 python -W ignore test_synapse.py

For ACDC testing run CUDA_VISIBLE_DEVICES=0 python -W ignore test_ACDC.py

Acknowledgement

We are very grateful for these excellent works timm, CASCADE, PraNet, Polyp-PVT and TransUNet, which have provided the basis for our framework.

Citations

@inproceedings{rahman2023multi,
  title={Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation},
  author={Rahman, Md Mostafijur and Marculescu, Radu},
  booktitle={Medical Imaging with Deep Learning (MIDL)},
  month={July},
  year={2023}
}