Vision-RWKV

February 18, 2025 Β· View on GitHub

The official implementation of "Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures".

NewsπŸš€πŸš€πŸš€

  • 2025/02/18: A new version of the CUDA code has been added in the cuda_new folder to eliminate the hardcoding of T_MAX.
  • 2025/02/11: 🎊🎊 Vison-RWKV is accepted by ICLR 2025!
  • 2024/04/14: We support rwkv6 in classification task, higher performance!
  • 2024/03/04: We release the code and models of Vision-RWKV.

Highlights

  • High-Resolution Efficiency: Processed high-resolution images smoothly with a global receptive field.
  • Scalability: Pre-trained with large-scale datasets and posses scale up stablity.
  • Superior Performance: Achieved a better performance in classfication tasks than ViTs. Surpassed window-based ViTs and comparabled to global attention ViTs with lower flops and higher speed in dense prediction tasks.
  • Efficient Alternative: Capability to be an alternative backbone to ViT in comprehensive vision tasks.
image

Overview

image

Schedule

  • Support RWKV6 as VRWKV6
  • Release VRWKV-L
  • Release VRWKV-T/S/B

Model Zoo

Pretrained Models

ModelSizePretrainDownload
VRWKV-L192ImageNet-22Kckpt

Image Classification (ImageNet-1K)

ModelSize#Param#FLOPsTop-1 AccDownload
VRWKV-T2246.2M1.2G75.1ckpt | cfg
VRWKV-S22423.8M4.6G80.1ckpt | cfg
VRWKV-B22493.7M18.2G82.0ckpt | cfg
VRWKV-L384334.9M189.5G86.0ckpt | cfg
VRWKV6-T2247.6M1.6G76.6ckpt | cfg
VRWKV6-S22427.7M5.6G81.1ckpt | cfg
VRWKV6-B224104.9M20.9G82.6ckpt | cfg
  • VRWKV-L is pretrained on ImageNet-22K and then finetuned on ImageNet-1K.
  • We train VRWKV-L with the internimage codebase for a higher speed.

Object Detection with Mask-RCNN head (COCO)

Model#Param#FLOPsbox APmask APDownload
VRWKV-T8.4M67.9G41.738.0ckpt | cfg
VRWKV-S29.3M189.9G44.840.2ckpt | cfg
VRWKV-B106.6M599.0G46.841.7ckpt | cfg
VRWKV-L351.9M1730.6G50.644.9ckpt | cfg
  • We report the #Param and #FLOPs of the backbone in this table.

Semantic Segmentation with UperNet head (ADE20K)

Model#Param#FLOPsmIoUDownload
VRWKV-T8.4M16.6G43.3ckpt | cfg
VRWKV-S29.3M46.3G47.2ckpt | cfg
VRWKV-B106.6M146.0G49.2ckpt | cfg
VRWKV-L351.9M421.9G53.5ckpt | cfg
  • We report the #Param and #FLOPs of the backbone in this table.

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{duan2024vrwkv,
  title={Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures},
  author={Duan, Yuchen and Wang, Weiyun and Chen, Zhe and Zhu, Xizhou and Lu, Lewei and Lu, Tong and Qiao, Yu and Li, Hongsheng and Dai, Jifeng and Wang, Wenhai},
  journal={arXiv preprint arXiv:2403.02308},
  year={2024}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Vision-RWKV is built with reference to the code of the following projects: RWKV, MMPretrain, MMDetection, MMSegmentation, ViT-Adapter, InternImage. Thanks for their awesome work!