MERIT

November 26, 2025 ยท View on GitHub

This is the implementation of Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation, MIDL 2023 Video.
Md Mostafijur Rahman, Radu Marculescu

The University of Texas at Austin

๐Ÿ” Check out our CVPR 2024 paper! EMCAD

๐Ÿ” Check out our CVPRW 2024 paper! PP-SAM

๐Ÿ” Check out our WACV 2024 paper! G-CASCADE

๐Ÿ” Check out our WACV 2023 paper! CASCADE

Architectures

Qualitative Results on Synapse Multi-organ dataset

Usage:

Python 3.8
Pytorch 1.11.0
torchvision 0.12.0

Please use pip install -r requirements.txt to install the dependencies.

Data preparation:

  • Synapse Multi-organ dataset: Sign up in the official Synapse website and download the dataset. Then split the 'RawData' folder into 'TrainSet' (18 scans) and 'TestSet' (12 scans) following the TransUNet's lists and put in the './data/synapse/Abdomen/RawData/' folder. Finally, preprocess using python ./utils/preprocess_synapse_data.py or download the preprocessed data and save in the './data/synapse/' folder. Note: If you use the preprocessed data from TransUNet, please make necessary changes (i.e., remove the code segment (line# 88-94) to convert groundtruth labels from 14 to 9 classes) in the utils/dataset_synapse.py.

  • ACDC dataset: Download the preprocessed ACDC dataset from Google Drive and move into './data/ACDC/' folder.

Pretrained model:

You should download the pretrained MaxViT models from Google Drive, and then put it in the './pretrained_pth/maxvit/' folder for initialization.

Training:

cd into MERIT

For Synapse Multi-organ training run CUDA_VISIBLE_DEVICES=0 python -W ignore train_synapse.py

For ACDC training run CUDA_VISIBLE_DEVICES=0 python -W ignore train_ACDC.py

Testing:

cd into MERIT 

For Synapse Multi-organ testing run CUDA_VISIBLE_DEVICES=0 python -W ignore test_synapse.py

For ACDC testing run CUDA_VISIBLE_DEVICES=0 python -W ignore test_ACDC.py

Acknowledgement

We are very grateful for these excellent works timm, CASCADE, PraNet, Polyp-PVT and TransUNet, which have provided the basis for our framework.

Citations

@inproceedings{rahman2023multi,
  title={Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation},
  author={Rahman, Md Mostafijur and Marculescu, Radu},
  booktitle={Medical Imaging with Deep Learning (MIDL)},
  month={July},
  year={2023}
}