V2X-QA
April 6, 2026 ยท View on GitHub
Official repository for V2X-QA and V2X-MoE, a multi-view visual question answering dataset, benchmark, and baseline for autonomous driving across vehicle-side (VS), infrastructure-side (IS), and cooperative (CO) views.
Paper: V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views
๐ฐ News
- Initial public release of the V2X-QA repository.
๐ Overview
V2X-QA is a real-world multi-view autonomous driving VQA dataset and benchmark built on top of V2X-Seq-SPD. It supports controlled evaluation under three evidence conditions:
- VS: vehicle-side reasoning
- IS: infrastructure-side reasoning
- CO: cooperative reasoning with both views
The repository also includes V2X-MoE, a Qwen3-VL-based baseline with explicit view routing and three viewpoint-specific LoRA experts:
vs_expertis_expertco_expert
โ Important Note on Raw Images
This repository does not redistribute the original vehicle-side and infrastructure-side images from V2X-Seq-SPD. Due to dataset licensing and redistribution constraints, users must download the raw images from the official V2X-Seq / V2X-Seq-SPD source separately and place them under the expected local directories before training or evaluation.
Please see:
data/README.mddata/raw_external/README.mddocs/REPRODUCE.md
๐๏ธ Repository Structure
V2X-QA/
โโโ assets/
โ โโโ dataset_statistics.png
โ โโโ V2X-QA_overview.png
โ โโโ V2X-MoE_pipeline.png
โโโ checkpoints/
โ โโโ vs_expert/
โ โโโ is_expert/
โ โโโ co_expert/
โ โโโ chat_template.jinja
โ โโโ processor_config.json
โ โโโ tokenizer.json
โ โโโ tokenizer_config.json
โโโ data/
โ โโโ train/
โ โโโ test/
โ โโโ raw_external/
โ โโโ README.md
โ โโโ schema.md
โโโ docs/
โ โโโ REPRODUCE.md
โ โโโ EVALUATION.md
โโโ model/
โ โโโ train/
โ โ โโโ v2x_moe_train_mcqa_qwen3_stage1.py
โ โ โโโ v2x_moe_train_mcqa_qwen3_co_boost.py
โ โ โโโ v2x_moe_train_mcqa_qwen3_is_boost.py
โ โโโ eval/
โ โโโ v2x_moe_eval_mcqa_qwen3.py
โโโ .gitignore
โโโ environment.yml
โโโ LICENSE
โโโ requirements.txt
โโโ README.md
๐ Dataset Statistics
V2X-QA is organized into twelve viewpoint-aligned tasks covering perception, prediction, and reasoning/planning under vehicle-side, infrastructure-side, and cooperative settings.
๐ง V2X-MoE Pipeline
V2X-MoE is a reproducible Qwen3-VL-based baseline with explicit viewpoint routing and three viewpoint-specific LoRA experts. The released training pipeline follows a three-stage procedure:
- Stage 1: joint MCQA training across all tasks
- Stage 2: CO-focused refinement
- Stage 3: IS-focused refinement
โ๏ธ Installation
Option 1: Conda
conda env create -f environment.yml
conda activate v2x-qa
Option 2: pip
python -m venv .venv
source .venv/bin/activate # On Windows use: .venv\Scripts\activate
pip install -r requirements.txt
๐งพ Data Preparation
- Put the released V2X-QA annotation JSONL files under:
data/train/
data/test/
-
Download the original raw V2X-Seq-SPD images from the official source.
-
Place them under:
data/raw_external/V2X-Seq-SPD-vehicle-side-image/
data/raw_external/V2X-Seq-SPD-infrastructure-side-image/
๐ฆ Released Checkpoints
The checkpoints/ directory contains the released final V2X-MoE adapters and processor/tokenizer files needed for evaluation and reuse.
checkpoints/
โโโ vs_expert/
โ โโโ adapter_config.json
โ โโโ adapter_model.safetensors
โโโ is_expert/
โ โโโ adapter_config.json
โ โโโ adapter_model.safetensors
โโโ co_expert/
โ โโโ adapter_config.json
โ โโโ adapter_model.safetensors
โโโ chat_template.jinja
โโโ processor_config.json
โโโ tokenizer.json
โโโ tokenizer_config.json
๐๏ธ Training
Stage 1: Joint MCQA Training
python model/train/v2x_moe_train_mcqa_qwen3_stage1.py
This will save checkpoints to:
outputs/stage1/
Stage 2: CO-Focused Refinement
python model/train/v2x_moe_train_mcqa_qwen3_co_boost.py
This will save checkpoints to:
outputs/stage2_co/
Stage 3: IS-Focused Refinement
python model/train/v2x_moe_train_mcqa_qwen3_is_boost.py
This will save checkpoints to:
outputs/stage3_is/
๐ Evaluation
To run evaluation with the released final checkpoint:
python model/eval/v2x_moe_eval_mcqa_qwen3.py
By default, the evaluation script reads from checkpoints/ and writes outputs to:
outputs/eval/
โ What Is Released Here
This repository is intended to release:
- V2X-QA annotation files (
JSONL) - V2X-MoE training and evaluation scripts
- Final released V2X-MoE LoRA adapters and tokenizer/processor files
- Reproduction and evaluation documentation
๐ซ What Is Not Redistributed Here
This repository does not include:
- Original V2X-Seq-SPD raw images
- Any redistributed copy of the upstream base model weights
Users must comply with the licenses and usage terms of:
- the original V2X-Seq / V2X-Seq-SPD dataset source
- the upstream Qwen3-VL base model
๐ Citation
If you find this work helpful, please cite the paper below.
@article{you2026v2xqa,
title = {V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views},
author = {You, Junwei and Li, Pei and Jiang, Zhuoyu and Tang, Weizhe and Huang, Zilin and Gan, Rui and Liu, Jiaxi and Zhao, Yan and Chen, Sikai and Ran, Bin},
journal = {arXiv preprint arXiv:2604.02710},
year = {2026},
url = {https://arxiv.org/abs/2604.02710}
}
๐ Acknowledgment
This work builds on publicly available upstream resources, including V2X-Seq-SPD and Qwen3-VL. Please cite and follow the corresponding upstream licenses and terms.