install cuda121 for windows

October 29, 2025 ยท View on GitHub

VSSD

VSSD: Vision Mamba with Non-Causal State Space Duality

Paper: (arXiv:2407.18559)

Updates

  • Oct. 29th, 2025: We update code for ICCV2025 camera-ready version.
  • June. 26th, 2025: This paper is accepted by ICCV2025.
  • August. 05th, 2024: We release log and ckpt for VSSD with MESA.
  • July. 29th, 2024: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !
  • July. 25th, 2024: We release the code, log and ckpt for VSSD.

Introduction

Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

Main Results

Classification on ImageNet-1K (ICCV2025 Version)

namepretrainresolutionacc@1#paramsFLOPslogsckpts
VSSD-TinyImageNet-1K224x22483.828M5.0G-ckpt
VSSD-SmallImageNet-1K224x22484.650M8.1G-ckpt
VSSD-BaseImageNet-1K224x22485.489M16.1G-ckpt

We add several tricks including ASYNC_STATE, 2D RoPE Embedding and Normalization in the NC-SSD block to further improve the performance. Check the config with suffix _iccv2025 and source code for details.

For weights of downstream tasks, please contact me if needed.

Classification on ImageNet-1K

namepretrainresolutionacc@1#paramsFLOPslogsckpts
VSSD-MicroImageNet-1K224x22482.514M2.3Glogckpt
VSSD-TinyImageNet-1K224x22483.624M4.5Glogckpt
VSSD-SmallImageNet-1K224x22484.140M7.4Glogckpt
VSSD-BaseImageNet-1K224x22484.789M16.1Glogckpt

Enhanced model with MESA:

namepretrainresolutionacc@1#paramsFLOPslogsckpts
VSSD-TinyImageNet-1K224x22484.124M4.5Glogckpt
VSSD-SmallImageNet-1K224x22484.540M7.4Glogckpt
VSSD-BaseImageNet-1K224x22485.489M16.1Glogckpt

Object Detection on COCO

Backbone#paramsFLOPsDetectorbox mAPmask mAPlogsckpts
VSSD-Micro33M220GMaskRCNN@1x45.441.3logckpt
VSSD-Tiny44M265GMaskRCNN@1x46.942.6logckpt
VSSD-Small59M325GMaskRCNN@1x48.443.5logckpt
VSSD-Micro33M220GMaskRCNN@3x47.742.8logckpt
VSSD-Tiny44M265GMaskRCNN@3x48.843.6logckpt
VSSD-Small59M325GMaskRCNN@3x50.044.6-ckpt

Semantic Segmentation on ADE20K

BackboneInput#paramsFLOPsSegmentormIoU(SS)mIoU(MS)logsckpts
VSSD-Micro512x51242M893GUperNet@160k45.646.0logckpt
VSSD-Tiny512x51253M941GUperNet@160k47.948.7logckpt

Getting Started

Installation

Step 1: Clone the VSSD repository:

git clone https://github.com/YuHengsss/VSSD.git
cd VSSD

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n VSSD
conda activate VSSD

Install Dependencies

pip install -r requirements.txt

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Quick Start

Classification

To train VSSD models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If VSSD is helpful for your research, please cite the following paper:

@article{shi2024vssd,
         title={VSSD: Vision Mamba with Non-Causal State Space Duality}, 
         author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
         journal={arXiv preprint arXiv:2407.18559},
         year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.