README.md
May 12, 2025 ยท View on GitHub
Multi-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
Jie Mei1, Chenyu Lin2, Yu Qiu3, Yaonan Wang1, Hui Zhang1, Ziyang Wang4, Dong Dai4
1 Hunan University, 2 Nankai University, 3 Hunan Normal University, 4 Tianjin Medical University Cancer Institute and Hospital
- This repository contains the official code for paper Multi-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images.
- This paper has been accepted to CVPR 2025.
- This code and PCLT20K dataset are licensed for non-commerical research purpose only.
Introduction
Lung cancer is a leading cause of cancer-related deaths globally. PET-CT is crucial for imaging lung tumors, providing essential metabolic and anatomical information, while it faces challenges such as poor image quality, motion artifacts, and complex tumor morphology. Deep learning-based models are expected to address these problems, however, existing small-scale and private datasets limit significant performance improvements for these methods. Hence, we introduce a large-scale PET-CT lung tumor segmentation dataset, termed PCLT20K, which comprises 21,930 pairs of PET-CT images from 605 patients. Furthermore, we propose a cross-modal interactive perception network with Mamba (CIPA) for lung tumor segmentation in PET-CT images. Specifically, we design a channel-wise rectification module (CRM) that implements a channel state space block across multi-modal features to learn correlated representations and helps filter out modality-specific noise. A dynamic cross-modality interaction module (DCIM) is designed to effectively integrate position and context information, which employs PET images to learn regional position information and serves as a bridge to assist in modeling the relationships between local features of CT images. Extensive experiments on a comprehensive benchmark demonstrate the effectiveness of our CIPA compared to the current state-of-the-art segmentation methods. We hope our research can provide more exploration opportunities for medical image segmentation.

Environment
-
Create environment.
conda create -n MIPA python=3.10 conda activate MIPA -
Install all dependencies. Install pytorch, cuda and cudnn, then install other dependencies via:
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118pip install -r requirements.txt -
Install selective_scan_cuda_core.
cd models/encoders/selective_scan pip install . cd ../../..
PCLT20K
Please contact Jie Mei (jiemei AT hnu.edu.cn) for the dataset. We will get back to you shortly. The email should contain the following information. Note: For better academic communication, a real-name system is encouraged and your email suffix must match your affiliation (e.g., hello@hnu.edu.cn). If not, you need to explain why.
Name: (Tell us who you are.)
Affiliation: (The name/url of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, PhD, etc.)
Email: (Dataset will be sent to this email.)
How to use: (Only for non-commercial use.)
Data Preparation
-
For our dataset PCLT20K, we orgnize the dataset folder in the following structure:
<PCLT20K> |-- <0001> |-- <name1_CT.png> |-- <name1_PET.png> |-- <name1_mask.png> ... |-- <0002> |-- <name2_CT.png> |-- <name2_PET.png> |-- <name2_mask.png> ... ... |-- train.txt |-- test.txttrain.txt/test.txtcontains the names of items in training/testing set, e.g.:<name1> <name2> ... -
Please put our dataset in the
datadirectory
Usage
Training
-
Please download the pretrained VMamba weights, and put them under
pretrained/vmamba/. We use VMamba_Tiny as default. -
Config setting.
Edit config in the
train.py. Change C.backbone tosigma_tiny/sigma_small/sigma_baseto use the three versions of VMamba. -
Run multi-GPU distributed training:
torchrun --nproc_per_node 'GPU_Numbers' train.py -
You can also use single-GPU training:
python train.py -
Results will be saved in
save_modelfolder.
Testing
The pretrained model of CIPA (CIPA.pth) can be downloaded:
- Baidu Yunpan, Password: CIPA
- Google Drive
python pred.py
Citation
If you are using the code/model provided here in a publication, please consider citing:
@inproceedings{mei2025cross,
title={Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images},
author={Mei, Jie and Lin, Chenyu and Qiu, Yu and Wang, Yaonan and Zhang, Hui and Wang, Ziyang and Dai, Dong},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
Contact
For any questions, please contact me via e-mail: jiemei AT hnu.edu.cn.
Acknowledgment
This project is based on the VMamba and Sigma, thanks for their excellent works.