install cuda121 for windows

December 23, 2024 ยท View on GitHub

MSVMamba

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Paper: (arXiv:2405.14174)

Updates

  • Dec. 23th, 2024: We release code for the updated experiments in our NeurIPS 2024 paper in the v2 branch.
  • Sep. 26th, 2024: Our paper has been accepted by NeurIPS 2024 as a poster. Updated experiments will be available soon.
  • May. 23th, 2024: We release the code, log and ckpt for MSVMamba

Introduction

MSVMamba is a visual state space model that introduces a hierarchy in hierarchy design to the VMamba model. This repository contains the code for training and evaluating MSVMamba models on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

Main Results

Classification on ImageNet-1K

namepretrainresolutionacc@1#paramsFLOPslogs&ckpts
MSVMamba-NanoImageNet-1K224x22477.37M0.9Glog&ckpt
MSVMamba-MicroImageNet-1K224x22479.812M1.5Glog&ckpt
MSVMamba-TinyImageNet-1K224x22482.833M4.6Glog&ckpt

Object Detection on COCO

Backbone#paramsFLOPsDetectorbox mAPmask mAPlogs&ckpts
MSVMamba-Micro32M201GMaskRCNN@1x43.839.9log&ckpt
MSVMamba-Tiny53M252GMaskRCNN@1x46.942.2log&ckpt
MSVMamba-Micro32M201GMaskRCNN@3x46.341.8log&ckpt
MSVMamba-Tiny53M252GMaskRCNN@3x48.343.2log&ckpt

Semantic Segmentation on ADE20K

BackboneInput#paramsFLOPsSegmentormIoU(SS)mIoU(MS)logs&ckpts
MSVMamba-Micro512x51242M875GUperNet@160k45.145.4log&ckpt
MSVMamba-Tiny512x51265M942GUperNet@160k47.8-log&ckpt

Getting Started

The steps to create env, train and evaluate MSVMamba models are followed by the same steps as VMamba.

Installation

Step 1: Clone the MSVMamba repository:

git clone https://github.com/YuHengsss/MSVMamba.git
cd MSVMamba

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n msvmamba
conda activate msvmamba

Install Dependencies

pip install -r requirements.txt
cd kernels/selective_scan && pip install .

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Quick Start

Classification

To train MSVMamba models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If MSVMamba is helpful for your research, please cite the following paper:

@article{shi2024multiscale,
      title={Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model}, 
      author={Yuheng Shi and Minjing Dong and Chang Xu},
      journal={arXiv preprint arXiv:2405.14174},
      year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mamba (paper, code), Swin-Transformer (paper, code), ConvNeXt (paper, code), OpenMMLab, thanks for their excellent works.