nnMamba ISBI 2025 Oral

February 18, 2025 · View on GitHub

nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model

In biomedical image analysis, capturing long-range dependencies is crucial. Traditional Convolutional Neural Networks (CNNs) are limited by their local receptive fields, while Transformers, although proficient at global context integration, are computationally demanding for high-dimensional medical images.

We introduce nnMamba, a novel architecture that leverages the strengths of CNNs along with the long-range modeling capabilities of State Space Models (SSMs). Our method features the Mamba-In-Convolution with Channel-Spatial Siamese Learning (MICCSS) block that effectively models long-range voxel relationships. Additionally, we employ channel scaling and channel-sequential learning techniques to enhance performance across dense prediction and classification tasks.

Extensive experiments on seven datasets demonstrate that nnMamba outperforms current state-of-the-art methods in 3D image segmentation, classification, and landmark detection. By integrating the local representation power of CNNs with the global context processing of SSMs, nnMamba sets a new benchmark for modeling long-range dependencies in medical image analysis.

Overview of the framework

The nnMamba framework is designed for 3D biomedical tasks, focusing on dense prediction and classification. Our approach addresses the challenge of long-range modeling by harnessing the lightweight and robust capabilities of State Space Models.

Deployment

For segmentation or landmark detection task, please refer to nnMamba.py; For classification task, please refer to nnMamba4cls.py. The detailed training pipelines are available at nnunet folder for segmentation and classification folder for ADNI classification.

Checkpoints are available at:
https://drive.google.com/drive/folders/1wYHVSvSU-wGJdU62WZ0B5rfv_bZ68bJl?usp=drive_link

Methods

Architecture Overview Details of the design

  • Dense Prediction (Segmentation and Landmark Detection): Panels (a) and (b) illustrate the network structure.
  • Classification: Panel (b) shows the network structure.
  • Detailed Blocks: Panels (c), (d), and (e) provide specifics of the blocks used within the networks.

Algorithm: CSS – Channel-Spatial Siamese Learning

  1. SiamSSM
    SSM with shared parameters.
  2. x_flat ← input feature with shape [B, L, C]
  3. x_mamba ← SiamSSM(x_flat)
  4. For each dimension set d in { [1], [2], [1, 2] }:
    • x_flip ← flip(x_flat, dims = d)
    • x_mamba ← x_mamba + flip(SiamSSM(x_flip), dims = d)
  5. x_mamba ← (1/4) × x_mamba

Visualization results on segmentation

Visualization on the AMOS22 CT validation dataset: By modeling long-range dependencies, nnMamba reduces over-segmentation and under-segmentation, especially over long distances.

Results

BraTS 2023 Glioma Segmentation

MethodsWTTCETAverageWTTCETAverage
DIT [Peebles et al., 2023]93.4990.2284.3889.364.215.2713.647.71
UNETR [Hatamizadeh et al., 2022]93.3389.8985.1989.474.767.2712.788.27
nnUNet [Isensee et al., 2021]93.3190.2485.1889.584.494.9511.917.12
nnMamba93.4690.7485.7289.974.185.1210.316.53

Note: The first four columns correspond to Dice scores, while the last four columns report HD95 values.

AMOS2022 Dataset

MethodsParameters (M)FLOPs (G)CT-Test mDiceCT-Test mNSDMRI-Test mDiceMRI-Test mNSD
nnUNet [Isensee et al., 2021]31.18680.3189.0478.3267.6359.02
nnFormer [Zhou et al., 2023]150.14425.7885.6172.4862.9254.06
UNETR [Hatamizadeh et al., 2022]93.02177.5179.4360.8457.9147.25
SwinUNetr [Hatamizadeh et al., 2021]62.83668.1586.3273.8357.5047.04
U-mamba [Ma et al., 2024]40.00792.8787.5375.8374.2164.79
nnMamba15.55141.1489.6379.7373.9865.13

ADNI Classification

NC vs. AD Classification

MethodsACCF1AUC
ResNet [He et al., 2016]88.40 ± 3.4188.00 ± 2.8194.93 ± 0.72
DenseNet [Huang et al., 2017]87.95 ± 0.7086.93 ± 0.8794.86 ± 0.40
ViT [Dosovitskiy et al., 2021]88.85 ± 1.1787.66 ± 1.7294.12 ± 1.29
CRATE [Yu et al., 2023]84.69 ± 2.5382.66 ± 3.4791.42 ± 1.43
nnMamba89.53 ± 0.6888.16 ± 1.1695.76 ± 0.18

sMCI vs. pMCI Classification

MethodsACCF1AUC
ResNet [He et al., 2016]67.96 ± 1.5052.14 ± 1.5174.94 ± 2.18
DenseNet [Huang et al., 2017]73.12 ± 3.1053.30 ± 2.9976.31 ± 3.09
ViT [Dosovitskiy et al., 2021]67.16 ± 3.1651.68 ± 5.7275.08 ± 6.88
CRATE [Yu et al., 2023]70.63 ± 2.6053.41 ± 2.5376.06 ± 2.98
nnMamba68.06 ± 4.6553.43 ± 1.6477.55 ± 1.29

Landmark Detection

LFC Test Set

MethodsTCD1TCD2HDV1HDV2ADV1ADV2Average
ResUNet [Xu et al., 2019]1.38 ± 0.071.42 ± 0.101.46 ± 0.091.41 ± 0.041.52 ± 0.001.18 ± 0.051.39 ± 0.01
Hourglass [Newell et al., 2016]1.40 ± 0.021.39 ± 0.031.52 ± 0.031.45 ± 0.041.47 ± 0.051.24 ± 0.021.41 ± 0.02
VitPose [Xu et al., 2022]1.65 ± 0.011.73 ± 0.051.69 ± 0.031.71 ± 0.041.74 ± 0.041.38 ± 0.031.65 ± 0.02
SwinUnetr [Hatamizadeh et al., 2021]1.81 ± 0.011.87 ± 0.031.82 ± 0.031.87 ± 0.021.94 ± 0.021.42 ± 0.031.79 ± 0.02
nnMamba1.27 ± 0.011.40 ± 0.021.48 ± 0.001.35 ± 0.021.43 ± 0.011.14 ± 0.041.34 ± 0.01

FeTA Test Set

MethodsTCD1TCD2HDV1HDV2ADV1ADV2Average
ResUNet [Xu et al., 2019]1.90 ± 0.301.46 ± 0.232.19 ± 0.231.96 ± 0.362.55 ± 0.481.73 ± 0.021.97 ± 0.27
Hourglass [Newell et al., 2016]2.43 ± 0.481.47 ± 0.012.27 ± 0.042.10 ± 0.332.85 ± 0.281.75 ± 0.052.15 ± 0.12
VitPose [Xu et al., 2022]8.46 ± 3.369.88 ± 0.9716.47 ± 4.115.62 ± 0.9814.46 ± 3.867.07 ± 3.2410.32 ± 1.64
SwinUnetr [Hatamizadeh et al., 2021]8.41 ± 0.876.50 ± 1.293.83 ± 0.454.16 ± 0.494.62 ± 0.392.44 ± 0.174.99 ± 0.25
nnMamba1.70 ± 0.101.41 ± 0.021.96 ± 0.021.65 ± 0.032.20 ± 0.041.61 ± 0.031.76 ± 0.01

Citation

If you find this project useful, please consider citing us:

@inproceedings{gong2025nnmamba,  
  title={nnmamba: 3D biomedical image segmentation, classification and landmark detection with state space model},  
  author={Gong, Haifan and Kang, Luoyao and Wang, Yitao and Wang, Yihan and Wan, Xiang and Wu, Xusheng and Li, Haofeng},  
  booktitle={ISBI},  
  year={2025}  
}