Demo

January 19, 2026 · View on GitHub

This folder provides a simple Gradio UI to run Video-R2 on a single video using Hugging Face Transformers.

1) Install

conda create -n video-r2 python=3.12 -y
conda activate video-r2
pip install -U pip

# We use torch v2.7.0, torchvision v0.22.0 and transformers v2.51.1 in the development of Video-R2
# Please see requirements.txt and environment.yml for all requirements
pip install -r requirements.txt

2) Run

From the repository root:

cd demo
python gradio_demo.py --ckpt MBZUAI/Video-R2 --port 7860

Open the printed URL in your browser.

3) Notes

  • If you hit OOM, reduce the number of sampled frames in the UI.
  • If your browser cannot play the uploaded video, convert it to MP4.

Citation ✏️

If you find Video-R2 helpful, please cite:

@article{maaz2025video-r2,
  title={Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models},
  author={Maaz, Muhammad and Rasheed, Hanoona and Khan, Fahad Shahbaz and Khan, Salman},
  journal={arXiv preprint arXiv:2511.23478},
  year={2025}
}