V2X-QA

April 6, 2026 · View on GitHub

Official repository for V2X-QA and V2X-MoE, a multi-view visual question answering dataset, benchmark, and baseline for autonomous driving across vehicle-side (VS), infrastructure-side (IS), and cooperative (CO) views.

Paper: V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views

📰 News

Initial public release of the V2X-QA repository.

🚗 Overview

V2X-QA is a real-world multi-view autonomous driving VQA dataset and benchmark built on top of V2X-Seq-SPD. It supports controlled evaluation under three evidence conditions:

VS: vehicle-side reasoning
IS: infrastructure-side reasoning
CO: cooperative reasoning with both views

The repository also includes V2X-MoE, a Qwen3-VL-based baseline with explicit view routing and three viewpoint-specific LoRA experts:

vs_expert
is_expert
co_expert

V2X-QA Overview

❗ Important Note on Raw Images

This repository does not redistribute the original vehicle-side and infrastructure-side images from V2X-Seq-SPD. Due to dataset licensing and redistribution constraints, users must download the raw images from the official V2X-Seq / V2X-Seq-SPD source separately and place them under the expected local directories before training or evaluation.

Please see:

data/README.md
data/raw_external/README.md
docs/REPRODUCE.md

🗂️ Repository Structure

V2X-QA/
├── assets/
│   ├── dataset_statistics.png
│   ├── V2X-QA_overview.png
│   └── V2X-MoE_pipeline.png
├── checkpoints/
│   ├── vs_expert/
│   ├── is_expert/
│   ├── co_expert/
│   ├── chat_template.jinja
│   ├── processor_config.json
│   ├── tokenizer.json
│   └── tokenizer_config.json
├── data/
│   ├── train/
│   ├── test/
│   ├── raw_external/
│   ├── README.md
│   └── schema.md
├── docs/
│   ├── REPRODUCE.md
│   └── EVALUATION.md
├── model/
│   ├── train/
│   │   ├── v2x_moe_train_mcqa_qwen3_stage1.py
│   │   ├── v2x_moe_train_mcqa_qwen3_co_boost.py
│   │   └── v2x_moe_train_mcqa_qwen3_is_boost.py
│   └── eval/
│       └── v2x_moe_eval_mcqa_qwen3.py
├── .gitignore
├── environment.yml
├── LICENSE
├── requirements.txt
└── README.md

📊 Dataset Statistics

V2X-QA Dataset Statistics

V2X-QA is organized into twelve viewpoint-aligned tasks covering perception, prediction, and reasoning/planning under vehicle-side, infrastructure-side, and cooperative settings.

🧠 V2X-MoE Pipeline

V2X-MoE Pipeline

V2X-MoE is a reproducible Qwen3-VL-based baseline with explicit viewpoint routing and three viewpoint-specific LoRA experts. The released training pipeline follows a three-stage procedure:

Stage 1: joint MCQA training across all tasks
Stage 2: CO-focused refinement
Stage 3: IS-focused refinement

⚙️ Installation

Option 1: Conda

conda env create -f environment.yml
conda activate v2x-qa

Option 2: pip

python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
pip install -r requirements.txt

🧾 Data Preparation

Put the released V2X-QA annotation JSONL files under:

data/train/
data/test/

Download the original raw V2X-Seq-SPD images from the official source.
Place them under:

data/raw_external/V2X-Seq-SPD-vehicle-side-image/
data/raw_external/V2X-Seq-SPD-infrastructure-side-image/

📦 Released Checkpoints

The checkpoints/ directory contains the released final V2X-MoE adapters and processor/tokenizer files needed for evaluation and reuse.

checkpoints/
├── vs_expert/
│   ├── adapter_config.json
│   └── adapter_model.safetensors
├── is_expert/
│   ├── adapter_config.json
│   └── adapter_model.safetensors
├── co_expert/
│   ├── adapter_config.json
│   └── adapter_model.safetensors
├── chat_template.jinja
├── processor_config.json
├── tokenizer.json
└── tokenizer_config.json

🏋️ Training

Stage 1: Joint MCQA Training

python model/train/v2x_moe_train_mcqa_qwen3_stage1.py

This will save checkpoints to:

outputs/stage1/

python model/train/v2x_moe_train_mcqa_qwen3_co_boost.py

This will save checkpoints to:

outputs/stage2_co/

python model/train/v2x_moe_train_mcqa_qwen3_is_boost.py

This will save checkpoints to:

outputs/stage3_is/

📈 Evaluation

To run evaluation with the released final checkpoint:

python model/eval/v2x_moe_eval_mcqa_qwen3.py

By default, the evaluation script reads from checkpoints/ and writes outputs to:

outputs/eval/

✅ What Is Released Here

This repository is intended to release:

V2X-QA annotation files (JSONL)
V2X-MoE training and evaluation scripts
Final released V2X-MoE LoRA adapters and tokenizer/processor files
Reproduction and evaluation documentation

🚫 What Is Not Redistributed Here

This repository does not include:

Original V2X-Seq-SPD raw images
Any redistributed copy of the upstream base model weights

Users must comply with the licenses and usage terms of:

the original V2X-Seq / V2X-Seq-SPD dataset source
the upstream Qwen3-VL base model

📚 Citation

If you find this work helpful, please cite the paper below.

@article{you2026v2xqa,
  title   = {V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views},
  author  = {You, Junwei and Li, Pei and Jiang, Zhuoyu and Tang, Weizhe and Huang, Zilin and Gan, Rui and Liu, Jiaxi and Zhao, Yan and Chen, Sikai and Ran, Bin},
  journal = {arXiv preprint arXiv:2604.02710},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.02710}
}

🙏 Acknowledgment

This work builds on publicly available upstream resources, including V2X-Seq-SPD and Qwen3-VL. Please cite and follow the corresponding upstream licenses and terms.

V2X-QA

📰 News

🚗 Overview

❗ Important Note on Raw Images

🗂️ Repository Structure

📊 Dataset Statistics

🧠 V2X-MoE Pipeline

⚙️ Installation

Option 1: Conda

Option 2: pip

🧾 Data Preparation

📦 Released Checkpoints

🏋️ Training

Stage 1: Joint MCQA Training

Stage 2: CO-Focused Refinement

Stage 3: IS-Focused Refinement

📈 Evaluation

✅ What Is Released Here

🚫 What Is Not Redistributed Here

📚 Citation

🙏 Acknowledgment