V2X-QA

April 6, 2026 ยท View on GitHub

Official repository for V2X-QA and V2X-MoE, a multi-view visual question answering dataset, benchmark, and baseline for autonomous driving across vehicle-side (VS), infrastructure-side (IS), and cooperative (CO) views.

Paper: V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views
arXiv


๐Ÿ“ฐ News

  • Initial public release of the V2X-QA repository.

๐Ÿš— Overview

V2X-QA is a real-world multi-view autonomous driving VQA dataset and benchmark built on top of V2X-Seq-SPD. It supports controlled evaluation under three evidence conditions:

  • VS: vehicle-side reasoning
  • IS: infrastructure-side reasoning
  • CO: cooperative reasoning with both views

The repository also includes V2X-MoE, a Qwen3-VL-based baseline with explicit view routing and three viewpoint-specific LoRA experts:

  • vs_expert
  • is_expert
  • co_expert

V2X-QA Overview


โ— Important Note on Raw Images

This repository does not redistribute the original vehicle-side and infrastructure-side images from V2X-Seq-SPD. Due to dataset licensing and redistribution constraints, users must download the raw images from the official V2X-Seq / V2X-Seq-SPD source separately and place them under the expected local directories before training or evaluation.

Please see:

  • data/README.md
  • data/raw_external/README.md
  • docs/REPRODUCE.md

๐Ÿ—‚๏ธ Repository Structure

V2X-QA/
โ”œโ”€โ”€ assets/
โ”‚   โ”œโ”€โ”€ dataset_statistics.png
โ”‚   โ”œโ”€โ”€ V2X-QA_overview.png
โ”‚   โ””โ”€โ”€ V2X-MoE_pipeline.png
โ”œโ”€โ”€ checkpoints/
โ”‚   โ”œโ”€โ”€ vs_expert/
โ”‚   โ”œโ”€โ”€ is_expert/
โ”‚   โ”œโ”€โ”€ co_expert/
โ”‚   โ”œโ”€โ”€ chat_template.jinja
โ”‚   โ”œโ”€โ”€ processor_config.json
โ”‚   โ”œโ”€โ”€ tokenizer.json
โ”‚   โ””โ”€โ”€ tokenizer_config.json
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”œโ”€โ”€ test/
โ”‚   โ”œโ”€โ”€ raw_external/
โ”‚   โ”œโ”€โ”€ README.md
โ”‚   โ””โ”€โ”€ schema.md
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ REPRODUCE.md
โ”‚   โ””โ”€โ”€ EVALUATION.md
โ”œโ”€โ”€ model/
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ v2x_moe_train_mcqa_qwen3_stage1.py
โ”‚   โ”‚   โ”œโ”€โ”€ v2x_moe_train_mcqa_qwen3_co_boost.py
โ”‚   โ”‚   โ””โ”€โ”€ v2x_moe_train_mcqa_qwen3_is_boost.py
โ”‚   โ””โ”€โ”€ eval/
โ”‚       โ””โ”€โ”€ v2x_moe_eval_mcqa_qwen3.py
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ environment.yml
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐Ÿ“Š Dataset Statistics

V2X-QA Dataset Statistics

V2X-QA is organized into twelve viewpoint-aligned tasks covering perception, prediction, and reasoning/planning under vehicle-side, infrastructure-side, and cooperative settings.


๐Ÿง  V2X-MoE Pipeline

V2X-MoE Pipeline

V2X-MoE is a reproducible Qwen3-VL-based baseline with explicit viewpoint routing and three viewpoint-specific LoRA experts. The released training pipeline follows a three-stage procedure:

  1. Stage 1: joint MCQA training across all tasks
  2. Stage 2: CO-focused refinement
  3. Stage 3: IS-focused refinement

โš™๏ธ Installation

Option 1: Conda

conda env create -f environment.yml
conda activate v2x-qa

Option 2: pip

python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
pip install -r requirements.txt

๐Ÿงพ Data Preparation

  1. Put the released V2X-QA annotation JSONL files under:
data/train/
data/test/
  1. Download the original raw V2X-Seq-SPD images from the official source.

  2. Place them under:

data/raw_external/V2X-Seq-SPD-vehicle-side-image/
data/raw_external/V2X-Seq-SPD-infrastructure-side-image/

๐Ÿ“ฆ Released Checkpoints

The checkpoints/ directory contains the released final V2X-MoE adapters and processor/tokenizer files needed for evaluation and reuse.

checkpoints/
โ”œโ”€โ”€ vs_expert/
โ”‚   โ”œโ”€โ”€ adapter_config.json
โ”‚   โ””โ”€โ”€ adapter_model.safetensors
โ”œโ”€โ”€ is_expert/
โ”‚   โ”œโ”€โ”€ adapter_config.json
โ”‚   โ””โ”€โ”€ adapter_model.safetensors
โ”œโ”€โ”€ co_expert/
โ”‚   โ”œโ”€โ”€ adapter_config.json
โ”‚   โ””โ”€โ”€ adapter_model.safetensors
โ”œโ”€โ”€ chat_template.jinja
โ”œโ”€โ”€ processor_config.json
โ”œโ”€โ”€ tokenizer.json
โ””โ”€โ”€ tokenizer_config.json

๐Ÿ‹๏ธ Training

Stage 1: Joint MCQA Training

python model/train/v2x_moe_train_mcqa_qwen3_stage1.py

This will save checkpoints to:

outputs/stage1/

Stage 2: CO-Focused Refinement

python model/train/v2x_moe_train_mcqa_qwen3_co_boost.py

This will save checkpoints to:

outputs/stage2_co/

Stage 3: IS-Focused Refinement

python model/train/v2x_moe_train_mcqa_qwen3_is_boost.py

This will save checkpoints to:

outputs/stage3_is/

๐Ÿ“ˆ Evaluation

To run evaluation with the released final checkpoint:

python model/eval/v2x_moe_eval_mcqa_qwen3.py

By default, the evaluation script reads from checkpoints/ and writes outputs to:

outputs/eval/

โœ… What Is Released Here

This repository is intended to release:

  • V2X-QA annotation files (JSONL)
  • V2X-MoE training and evaluation scripts
  • Final released V2X-MoE LoRA adapters and tokenizer/processor files
  • Reproduction and evaluation documentation

๐Ÿšซ What Is Not Redistributed Here

This repository does not include:

  • Original V2X-Seq-SPD raw images
  • Any redistributed copy of the upstream base model weights

Users must comply with the licenses and usage terms of:

  • the original V2X-Seq / V2X-Seq-SPD dataset source
  • the upstream Qwen3-VL base model

๐Ÿ“š Citation

If you find this work helpful, please cite the paper below.

@article{you2026v2xqa,
  title   = {V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views},
  author  = {You, Junwei and Li, Pei and Jiang, Zhuoyu and Tang, Weizhe and Huang, Zilin and Gan, Rui and Liu, Jiaxi and Zhao, Yan and Chen, Sikai and Ran, Bin},
  journal = {arXiv preprint arXiv:2604.02710},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.02710}
}

๐Ÿ™ Acknowledgment

This work builds on publicly available upstream resources, including V2X-Seq-SPD and Qwen3-VL. Please cite and follow the corresponding upstream licenses and terms.