ViaRL
January 18, 2026 ยท View on GitHub
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning
Setup
conda create -n viarl python=3.10 -y
conda activate viarl
pip install -r requirements.txt
pip install flash-attn==2.7.4.post1
pip install transformers==4.51.3
Dataset
Please download LLaVA-Video-178k firstly. Next, download our jsonl files from https://huggingface.co/datasets/ViaRL/ViaRL_data.
Training
Note: Please set model1/model2/output_dir/data_path in the bash file. Training sequence: cycle1-stage1 -> cycle1-stage2 -> cycle2-stage1 -> cycle2-stage2. Update model1 or model2 after the corresponding training.
cycle1-stage1/cycle2-stage1
bash scripts/run_reinforce_plus_plus_video_stage1_qwen25vl.sh
cycle1-stage2/cycle2-stage2
bash scripts/run_reinforce_plus_plus_video_stage2_qwen25vl.sh
Evaluation
The relevant code can be found in the evaluation directory.
cd evaluation
-
Step1: Download the models from https://huggingface.co/ViaRL/ViaRL_model.
-
Step2: Prepare the datasets following the docs.
-
Step3: Run script Note: Please set model_path1/model_path2/n_gpus in the bash file.
bash scripts/infer_eval_qwenvl_videomme_rl.sh bash scripts/infer_eval_qwenvl_mlvu_rl.sh bash scripts/infer_eval_qwenvl_lvbench_rl.sh
Citation
If you find this work useful, please cite
@misc{xu2025viarladaptivetemporalgrounding,
title={ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning},
author={Ziqiang Xu and Qi Dai and Tian Xie and Yifan Yang and Kai Qiu and DongDong Chen and Zuxuan Wu and Chong Luo},
year={2025},
eprint={2505.15447},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.15447},
}