Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
November 24, 2025 ยท View on GitHub
Yolo Yunlong Tang1, Daiki Shimada2, Hang Hua3, Chao Huang1, Jing Bi1, Rogerio Feris3, Chenliang Xu1
1University of Rochester, 2Sony Group Corporation, 3MIT-IBM Watson AI Lab
๐ News
- [2025-11-23] Introducing Video-R4, a reinforced video agent with visual rumination for text-rich video reasoning. The arXiv paper has been released. Code, model, and dataset are coming soon.
๐ Video-R4 Training Framework
๐ Data Curation Pipeline
๐ Performance
๐ฆ Installation
conda create -n video-r4 python=3.10
conda activate video-r4
git clone https://github.com/yunlong10/Video-R4.git
cd Video-R4
pip install -r requirements.txt
๐ Citation
If you find this work useful, please consider citing:
@article{tang2025video-r4,
title={Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination},
author={Tang, Yunlong and Shimada, Daiku and Hua, Hang and Huang, Chao and Bi, Jing and Feris, Rogerio and Xu, Chenliang},
journal={arXiv preprint arXiv:2511.17490},
year={2025}
}
๐ค Acknowledgments
This work was supported by Sony Group Corporation. We would like to thank Sayaka Nakamura and Jerry Jun Yokono for their insightful discussion.
We also thank the authors of the following projects for their contributions: