install cuda121 for windows
October 29, 2025 ยท View on GitHub
Updates
Oct. 29th, 2025: We update code for ICCV2025 camera-ready version.June. 26th, 2025: This paper is accepted by ICCV2025.August. 05th, 2024: We release log and ckpt for VSSD with MESA.July. 29th, 2024: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !July. 25th, 2024: We release the code, log and ckpt for VSSD.
Introduction
Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.
Main Results
Classification on ImageNet-1K (ICCV2025 Version)
| name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
|---|---|---|---|---|---|---|---|
| VSSD-Tiny | ImageNet-1K | 224x224 | 83.8 | 28M | 5.0G | - | ckpt |
| VSSD-Small | ImageNet-1K | 224x224 | 84.6 | 50M | 8.1G | - | ckpt |
| VSSD-Base | ImageNet-1K | 224x224 | 85.4 | 89M | 16.1G | - | ckpt |
We add several tricks including ASYNC_STATE, 2D RoPE Embedding and Normalization in the NC-SSD block to further improve the performance. Check the config with suffix _iccv2025 and source code for details.
For weights of downstream tasks, please contact me if needed.
Classification on ImageNet-1K
| name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
|---|---|---|---|---|---|---|---|
| VSSD-Micro | ImageNet-1K | 224x224 | 82.5 | 14M | 2.3G | log | ckpt |
| VSSD-Tiny | ImageNet-1K | 224x224 | 83.6 | 24M | 4.5G | log | ckpt |
| VSSD-Small | ImageNet-1K | 224x224 | 84.1 | 40M | 7.4G | log | ckpt |
| VSSD-Base | ImageNet-1K | 224x224 | 84.7 | 89M | 16.1G | log | ckpt |
Enhanced model with MESA:
| name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
|---|---|---|---|---|---|---|---|
| VSSD-Tiny | ImageNet-1K | 224x224 | 84.1 | 24M | 4.5G | log | ckpt |
| VSSD-Small | ImageNet-1K | 224x224 | 84.5 | 40M | 7.4G | log | ckpt |
| VSSD-Base | ImageNet-1K | 224x224 | 85.4 | 89M | 16.1G | log | ckpt |
Object Detection on COCO
| Backbone | #params | FLOPs | Detector | box mAP | mask mAP | logs | ckpts |
|---|---|---|---|---|---|---|---|
| VSSD-Micro | 33M | 220G | MaskRCNN@1x | 45.4 | 41.3 | log | ckpt |
| VSSD-Tiny | 44M | 265G | MaskRCNN@1x | 46.9 | 42.6 | log | ckpt |
| VSSD-Small | 59M | 325G | MaskRCNN@1x | 48.4 | 43.5 | log | ckpt |
| VSSD-Micro | 33M | 220G | MaskRCNN@3x | 47.7 | 42.8 | log | ckpt |
| VSSD-Tiny | 44M | 265G | MaskRCNN@3x | 48.8 | 43.6 | log | ckpt |
| VSSD-Small | 59M | 325G | MaskRCNN@3x | 50.0 | 44.6 | - | ckpt |
Semantic Segmentation on ADE20K
| Backbone | Input | #params | FLOPs | Segmentor | mIoU(SS) | mIoU(MS) | logs | ckpts |
|---|---|---|---|---|---|---|---|---|
| VSSD-Micro | 512x512 | 42M | 893G | UperNet@160k | 45.6 | 46.0 | log | ckpt |
| VSSD-Tiny | 512x512 | 53M | 941G | UperNet@160k | 47.9 | 48.7 | log | ckpt |
Getting Started
Installation
Step 1: Clone the VSSD repository:
git clone https://github.com/YuHengsss/VSSD.git
cd VSSD
Step 2: Environment Setup:
Create and activate a new conda environment
conda create -n VSSD
conda activate VSSD
Install Dependencies
pip install -r requirements.txt
Dependencies for Detection and Segmentation (optional)
pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0
Quick Start
Classification
To train VSSD models for classification on ImageNet, use the following commands for different configurations:
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp
If you only want to test the performance (together with params and flops):
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval
Detection and Segmentation
To evaluate with mmdetection or mmsegmentation:
bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1
use --tta to get the mIoU(ms) in segmentation
To train with mmdetection or mmsegmentation:
bash ./tools/dist_train.sh </path/to/config> 8
Citation
If VSSD is helpful for your research, please cite the following paper:
@article{shi2024vssd,
title={VSSD: Vision Mamba with Non-Causal State Space Duality},
author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
journal={arXiv preprint arXiv:2407.18559},
year={2024}
}
Acknowledgment
This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.