VideoChat-R1 & -R1.5: Spatio-Temporal RL for Video Perception and Reasoning
October 17, 2025 Β· View on GitHub
:fire: Updates
- 2025/09/26:π₯π₯π₯ We release our VideoChat-R1.5 model at Huggingface, paper, and eval code.
- 2025/09/22: πππ Our VideoChat-R1.5 is accepted by NIPS2025.
- 2025/04/22:π₯π₯π₯ We release our VideoChat-R1-caption at Huggingface.
- 2025/04/14:π₯π₯π₯ We release our VideoChat-R1 and VideoChat-R1-thinking at Huggingface.
- 2025/04/10:π₯π₯π₯ We release our VideoChat-R1 paper and code.
π― Performances on Video Benchmarks

Across short-form & long-form videos, temporal grounding, video reasoning, and spatio-temporal perception, the model delivers consistently stronger results.
:parrot: Introduction

We adopt multi-task joint RL to strengthen the modelβs spatio-temporal perception and reasoning capabilities.

During inference, we simulate hierarchical human attention to enable the model to progressively localize the Region of Interest (ROI) within input videos. This multi-step perception process ensures that the model's performance improves with each step.
Demo & Inference
Please refer to hf README for the steps required to perform inference..
Evaluation
See eval_scripts and lmms-eval_videochat.
Training
See training_scripts.
:page_facing_up: Citation
If you find this project useful in your research, please consider cite:
@article{li2025videochatr1,
title={VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning},
author={Li, Xinhao and Yan, Ziang and Meng, Desen and Dong, Lu and Zeng, Xiangyu and He, Yinan and Wang, Yali and Qiao, Yu and Wang, Yi and Wang, Limin},
journal={arXiv preprint arXiv:2504.06958},
year={2025}
}
@article{yan2025videochatr15,
title={VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception},
author={Yan, Ziang and Li, Xinhao and He, Yinan and Zhengrong Yue and Zeng, Xiangyu and Wang, Yali and Qiao, Yu and Wang, Limin and Wang, Yi},
journal={arXiv preprint arXiv:2509.21100},
year={2025}
}
For any inquiries regarding this work, please contact us at yanziang@pjlab.org.cn .