3D-EffiViTCaps: 3D Effcient Vision Transformer with Capsule for Medical Image Segmentation

March 17, 2026 ยท View on GitHub

Table of Contents

Introduction

image

The figure above illustrates our 3D-EffiViTCaps architecture. Details about it are described in our paper here. The main implementation of this whole network can be found here. In addition, the implementation of 3D Patch Merging block and 3D EfficientViT block can be find here. A visualization example is shown below.

image

Usage

Installation

  • Clone the repository:
git clone https://github.com/HidNeuron/3D-EffiViTCaps.git
  • Install dependencies depends on your cuda version (CUDA 10 or CUDA 11)
conda env create -f environment_cuda10.yml

or

conda env create -f environment_cuda11.yml

Data preparation

Our method is evaluated on three datasets:

The directory structure of the dataset is expected to be the following:

path/to/iseg/
  domainA/
  domainA_val/

path/to/cardiac/
  imagesTr
  labelsTr

path/to/hippocampus/
  imagesTr
  labelsTr

Training

For train.py and effiViTcaps.py, the args are set respectively as follows:

  1. train.py:

    • basic arguments: gpus, root_dir, log_dir, dataset, fold, cache_rate, cache_dir, model_name, train_patch_size, num_workers, batch_size, num_samples.
    • arguments for Trainer class from Pytorch Lightning: benchmark, logger, callbacks, num_sanity_val_steps, accelerator, max_epochs, terminate_on_nan, check_val_every_n_epoch.
  2. effiViTcaps.py:

    • network arguments: in_channels, out_channels, val_frequency, val_patch_size, sw_batch_size, overlap.

The training example script is available here

Evaluation

For effiViTcaps.py, the args are set referring to Training. For evaluate.py, the args are set as follows:

  • basic arguments: root_dir, save_image, output_dir, model_name, dataset, fold, checkpoint_path.

The evaluating example script is available here

GUI

Run the gui:

python main.py

The directory structure of the weights is expected to be the following:

path/to/weights/
  model.ckpt

Trained models

Our trained 3D-EffiViTCaps models on three datasets can be downloaded as follows:

Acknowledgement

The implementation makes liberal use of code from 3DConvCaps and EfficientViT.

Citation

@inproceedings{gan20243d,
  title={{3D-EffiViTCaps}: {3D} efficient vision {Transformer} with capsule for medical image segmentation},
  author={Gan, Dongwei and Chang, Ming and Chen, Juan},
  booktitle={International Conference on Pattern Recognition},
  pages={141--156},
  year={2024},
  organization={Springer}
}

Contacts

We are honored to help you if you have any questions. Please feel free to open an issue or contact us directly. Hope our code helps and look forward to your citations.