Demo
January 19, 2026 · View on GitHub
This folder provides a simple Gradio UI to run Video-R2 on a single video using Hugging Face Transformers.
1) Install
conda create -n video-r2 python=3.12 -y
conda activate video-r2
pip install -U pip
# We use torch v2.7.0, torchvision v0.22.0 and transformers v2.51.1 in the development of Video-R2
# Please see requirements.txt and environment.yml for all requirements
pip install -r requirements.txt
2) Run
From the repository root:
cd demo
python gradio_demo.py --ckpt MBZUAI/Video-R2 --port 7860
Open the printed URL in your browser.
3) Notes
- If you hit OOM, reduce the number of sampled frames in the UI.
- If your browser cannot play the uploaded video, convert it to MP4.
Citation ✏️
If you find Video-R2 helpful, please cite:
@article{maaz2025video-r2,
title={Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models},
author={Maaz, Muhammad and Rasheed, Hanoona and Khan, Fahad Shahbaz and Khan, Salman},
journal={arXiv preprint arXiv:2511.23478},
year={2025}
}


