VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning
November 28, 2025 ยท View on GitHub
VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning
This is the official code of VQAThinker, the first open-source NR-VQA model enhanced via reinforcement learning, capable of performing both video quality scoring and understanding.
๐ง Abstract
Video quality assessment (VQA) aims to objectively quantify perceptual quality degradation in alignment with human visual perception. Despite recent advances, existing VQA models still suffer from two critical limitations: poor generalization to out-of-distribution (OOD) videos and limited explainability, which restrict their applicability in real-world scenarios.
To address these challenges, we propose VQAThinker, a reasoning-based VQA framework that leverages large multimodal models (LMMs) with reinforcement learning to jointly model video quality understanding and scoring, emulating human perceptual decision-making.
Specifically, we adopt group relative policy optimization (GRPO), a rule-guided reinforcement learning algorithm that enables reasoning over video quality under score-level supervision, and introduce three VQA-specific rewards:
- a bell-shaped regression reward that increases rapidly as the prediction error decreases and becomes progressively less sensitive near the ground truth;
- a pairwise ranking reward that guides the model to correctly determine the relative quality between video pairs; and
- a temporal consistency reward that encourages the model to prefer temporally coherent videos over their perturbed counterparts.
Extensive experiments demonstrate that VQAThinker achieves state-of-the-art performance on both in-domain and OOD VQA benchmarks, showing strong generalization for video quality scoring. Furthermore, evaluations on video quality understanding tasks validate its superiority in distortion attribution and quality description compared to existing explainable VQA models and LMMs. These findings demonstrate that reinforcement learning offers an effective pathway toward building generalizable and explainable VQA models solely with score-level supervision.
๐๏ธ Model Architecture
๐ Release
- [2025/11/28] ๐ฅ Released the training code.
- [2025/11/08] ๐ฅ VQAThinker is accepted by AAAI 2026!
- [2025/08/11] ๐ฅ Released the inference code and weight.
โ๏ธ Installation
conda create -n vqathinker python=3.11
conda activate vqathinker
bash setup.sh
๐ Inference
cd test
1. Download model weights
You need to download the pre-trained model weights before running inference: ๐InternVL3-VQAThinker-8B.
The weights should be saved in the folder InternVL3-VQAThinker-8B/.
โ ๏ธ Note: When evaluating your own fine-tuned checkpoint, please replace the modeling_internvl_chat.py file in the checkpoint directory with the version provided in the test directory.
2. Single video quality evaluation
python single_infer.py
Before running, please modify the parameters in single_infer.py:
- MODEL_PATH - set this to the directory containing the pre-trained weights.
- video_path - set this to the actual path of your test video.
3. Batch videos quality evaluation
python batch_infer.py
This script is used to evaluate the 10 datasets reported in the paper.
Before running, please modify the parameters in batch_infer.py:
- MODEL_PATH โ set this to the directory containing the pre-trained weights.
- video_paths โ set this to the correct folder path containing the videos to be tested.
- json_prefix โ this folder should contain the meta JSON files for the 10 datasets to be evaluated.
- csv_output_folder โ set this to the folder where you want the results to be saved.
โ ๏ธ Note: The default batch_size is 16, which requires at least 48 GB of GPU memory for testing.
Adjust batch_size according to your available GPU memory.
๐งช Training
cd train
โ ๏ธ Note: First, replace the modeling_internvl_chat.py file under the InternVL3-8B/ checkpoint directory with the version provided in the train directory.
bash run_scripts/run_grpo_vqa_internvl.sh
To enable the temporal reward, set the --temporal flag to true.
Training Data Preparation
For using your own dataset, please format the data as a JSONL file. The expected format is shown below:
```json
{"id": 0, "dataset_name": "LSVQ", "image": ["yfcc-batch6/4134.mp4"], "conversations": [{"from": "human", "value": "You are doing the video quality assessment task. Here is the question: What is your overall rating on the quality of this video? The rating should be a float between 1 and 5, rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality."}, {"from": "gpt", "value": 2.611842}]}
{"id": 1, "dataset_name": "LSVQ", "image": ["ia-batch14/btvnj-MIF_711_Holiday_Fire_Safety.mp4"], "conversations": [{"from": "human", "value": "You are doing the video quality assessment task. Here is the question: What is your overall rating on the quality of this video? The rating should be a float between 1 and 5, rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality."}, {"from": "gpt", "value": 3.232857}]}
...
๐ Citation
If you find this code is useful for your research, please cite:
@article{cao2025vqathinker,
title={Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning},
author={Cao, Linhan and Sun, Wei and Zhang, Weixia and Zhu, Xiangyang and Jia, Jun and Zhang, Kaiwei and Zhu, Dandan and Zhai, Guangtao and Min, Xiongkuo},
journal={arXiv preprint arXiv:2508.06051},
year={2025}
}