Prepare data for DriveVLA

August 16, 2025 · View on GitHub

Modified from UniAD and GPT-Driver.

nuScenes

Download nuScenes V1.0 full dataset data, CAN bus and map(v1.3) extensions HERE, then follow the steps below to prepare the data.

Download nuScenes, CAN_bus and Map extensions

cd DriveVLA
mkdir data
# Download nuScenes V1.0 full dataset data directly (or soft link) to data/
# Download CAN_bus and Map(v1.3) extensions directly (or soft link) to data/nuscenes/

Download UniAD data info

cd DriveVLA/data
mkdir infos && cd infos
wget https://github.com/OpenDriveLab/UniAD/releases/download/v1.0/nuscenes_infos_temporal_train.pkl  # train_infos
wget https://github.com/OpenDriveLab/UniAD/releases/download/v1.0/nuscenes_infos_temporal_val.pkl  # val_infos

Download cached nuScenes information

We use the pre-cached nuScenes information cached_nuscenes_info.pkl following GPT-Driver. The cached data can be downloaded at Google Drive or using the following command.

pip install gdown
cd DriveVLA/data/nuscenes
gdown 16X0_-v-iXP9hVLNaDMmIiGhZKj24YOnb

The Overall Structure

DriveVLA
├── data/
   ├── infos/
   ├── nuscenes_infos_temporal_train.pkl
   ├── nuscenes_infos_temporal_val.pkl
   ├── nuscenes/
   ├── can_bus/
   ├── maps/
   ├── samples/
   ├── sweeps/
   ├── v1.0-test/
   ├── v1.0-trainval/
   ├── cached_nuscenes_info.pkl

Evaluation Dataset

We adopt the GT cache from GPT-Driver. Download gt for evaluation at Google Drive.

The structure is as follows:

eval_share
├── gt
   ├── gt_traj_mask.pkl
   ├── gt_traj.pkl
   ├── planing_gt_segmentation_val
   └── vad_gt_seg.pkl
├── __init__.py
├── metric.py
└── README.md