README.md

July 9, 2025 ยท View on GitHub

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

๐Ÿ“– Paper ๐Ÿค— SpaceR ๐Ÿ“Š SpaceR-151k

๐Ÿ“… News

๐Ÿš€ [07/06/2025] SpaceR-Eval now supports more models (e.g., Qwen2.5VL, InternVL, KimiVL, MiniCPM-V, VideoLLaMA3) and benchmarks (e.g., VSI-Bench, STI-Bench, SPAR-Bench, Video-MME, LongVideoBench, TempCompass, Video-Holmes). SpatialScore also supports SpaceR evaluation.

๐Ÿš€ [06/04/2025] Our SpaceR model achieves 37.28% accuracy on VGBench and 53.72% accuracy on SpatialScore, representing the state-of-the-art performance among all 7B/8B models to date.

๐Ÿš€ [05/29/2025] Our SpaceR achieves 35.2% accuracy on the new video reasoning benchmark Video-Holmes, beating the commercial model o4-mini (29.9%) and Gemini-2.0-Flash (30.6%).

๐Ÿš€ [05/19/2025] We release SpaceR-151k dataset.

๐Ÿš€ [05/10/2025] We release SpaceR checkpoint.

๐Ÿš€ [04/29/2025] We release SR-91k dataset.

๐Ÿš€ [04/10/2025] We update the training framework of SpaceR.

๐Ÿš€ [04/02/2025] We share the paper SpaceR on arxiv.

๐Ÿš€ [03/31/2025] We release evluation and training code.

SpaceR

The first MLLM empowered by SG-RLVR for video spatial reasoning

๐Ÿ† Performance Comparison

Data Statistics of SpaceR-151k

QA Examples of SR-91k

We curate SpaceR-151k dataset and propose SpaceR. It achieves promising gains in VSI-Bench, SPAR-Bench and STI-Bench. NOTE We have excluded videos used in VSI-Bench to prevent data leakage.

Training

git clone https://github.com/OuyangKun10/SpaceR.git
cd SpaceR/SpaceR

# build environment
conda create -n SpaceR python=3.11 
conda activate SpaceR
bash setup.sh

# qwen video extraction setting, e.g., max frames, resolutions
# Use the [decord] feature to improve speed
cd src/qwen-vl-utils
pip install -e .[decord]
cd ..

Data Preparation:

  1. Download SpaceR-151k dataset.

  2. Decompress it

bash decompress.sh

Training script for SpaceR

bash ./src/scripts/run_SpaceR_SG_RLVR.sh

Evaluation

SpaceR-Eval

Setup

  1. Environment: Python 3.8+, CUDA-enabled GPUs.
  2. Install Libraries:
    pip install torch pandas numpy pillow accelerate transformers sentencepiece decord flash-attn --no-build-isolation
    
  3. Dataset: VSI-Bench STI-Bench, SPAR-Bench, Video-MME, TempCompass, LongVideoBench

Usage

python evaluate.py

Citation:

@article{ouyang2025spacer,
  title={SpaceR: Reinforcing MLLMs in Video Spatial Reasoning},
  author={Ouyang, Kun and Liu, Yuanxin and Wu, Haoning and Liu, Yi and Zhou, Hao and Zhou, Jie and Meng, Fandong and Sun, Xu},
  journal={arXiv preprint arXiv:2504.01805},
  year={2025}
}

License

  • The code in this repo is released under the CC BY-NC 4.0 License.
  • The usage of SpaceR-151k dataset and SpaceR model weights must strictly follow CC BY-NC 4.0 License.