Single entry from the info PKL (one agent-frame).

April 26, 2026 · View on GitHub

WHALES

A Multi-Agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving

arXiv

Introduction

WHALES (Wireless enHanced Autonomous vehicles with Large number of Engaged agentS) is a CARLA-based cooperative perception dataset averaging 8.4 agents per sequence. It captures diverse viewpoints, agent behaviors, and multitask interactions to study scheduling, perception, and planning under realistic multi-agent constraints.

News

  • 2025-11-21 – Released WHALES dataset v1.0 with cooperative scheduling benchmarks.
  • 2025-06-17 – WHALES was accepted by IROS 2025!

Table of Contents

Highlights

  • Largest agent count: 8.4 agents per scene with synchronized LiDAR-camera suites.
  • Rich annotations: 2.01M 3D boxes plus full agent-behavior recording.
  • Scheduling-ready: Provides perception, planning, and communication metadata for agent selection research.
  • Plug-in friendly: Ships with mmdetection3d-compatible configs and hooks for custom schedulers.

Dataset Overview

Comparison with Existing Benchmarks

DatasetYearReal/SimulatedV2XImagePoint Cloud3D AnnotationsClassesAvg. Agents
KITTI2012RealNo15k15k200k81
nuScenes2019RealNo1.4M400k1.4M231
DAIR-V2X2021RealV2V&I39k39k464k102
V2X-Sim2021SimulatedV2V&I010k26.6k22
OPV2V2022SimulatedV2V44k11k230k13
DOLPHINS2022SimulatedV2V&I42k42k293k33
V2V4Real2023RealV2V40k20k240k52
WHALES (Ours)2024SimulatedV2V&I70k17k2.01M38.4

Agent Types

LocationCategorySensorsPlanning & ControlTasksSpawning
On-roadUncontrolled CAVLiDAR ×1 + Camera ×4CARLA autopilotPerceptionRandom / deterministic
On-roadControlled CAVLiDAR ×1 + Camera ×4RL policyPerception & planningRandom / deterministic
RoadsideRSULiDAR ×1 + Camera ×4RL policyPerception & planningStatic
AnywhereObstacle agentCARLA autopilotRandom

Getting Started

Installation

  1. Clone the repository:
    git clone https://github.com/chensiweiTHU/WHALES.git
    
  2. Create and activate a Conda environment:
    conda create -n whales python=3.10 -y
    conda activate whales
    
  3. Install WHALES:
    pip install -e .
    
  4. Install mmdetection3d==0.17.1 following the official guide.
  5. (Optional) Install OpenCOOD for additional cooperative baselines.

Data Preparation

  1. Download the full dataset from Google Drive: Download Whales.
  2. Place extracted files under ./data/whales/.
  3. Preprocess:
    python tools/create_data.py whales --root-path ./data/whales/ --out-dir ./data/whales/ --extra-tag whales
    
    This emits, under ./data/whales/:
    • whales_infos_{train,val}.pkl — LiDAR info PKLs for WhalesDataset.
    • whales_infos_{train,val}_mono3d.coco.json — per-camera mono3D COCO files for WhalesMonoDataset (cam-only training).
    • whales_dbinfos_train.pkl + whales_gt_database/ — GT-sampling database used by LiDAR configs' augmentation step.

Training & Evaluation

Configs are organised as:

  • ./configs/_base_/ — shared dataset, model, and schedule bases.
  • ./configs/standalone/ — single-agent baselines (PointPillars, SECOND, CenterPoint, FCOS3D, VoxelNeXt, etc. on LiDAR and monocular 3D).
  • ./configs/cooperative/ — V2X cooperative-perception recipes (PointPillars, VoxelNeXt, BEVFusion, FCooper, V2VNet, V2X-ViT, OPV2V, FFNet, plus the scheduling studies).

Pick any leaf config under those trees and run:

  • Training
    bash tools/dist_train.sh <config>.py <gpu_num>
    
  • Testing
    bash tools/dist_test.sh <config>.py <model>.pth <gpu_num> --eval bbox
    

Both WhalesDataset (LiDAR) and WhalesMonoDataset (monocular 3D) are registered; the COCO JSONs emitted by the preprocessing step drive the mono3D path, the info PKLs drive the LiDAR path. Metrics: mAP and NDS.

Visualization

WHALES visualization

tools/misc/visualize_whales.py can render from all three data representations (raw frame_info.json, info PKL, mono3D COCO):

# Raw CARLA frame_info.json: reconstruct ego-frame boxes + overlay on the 4 cameras + BEV.
python tools/misc/visualize_whales.py frame_info \
    --path data/whales/<scene>/<frame>/frame_info.json --agent vehicle0

# Single entry from the info PKL (one agent-frame).
python tools/misc/visualize_whales.py pkl \
    --path data/whales/whales_infos_val.pkl --token <scene>_<frame>_<agent>

# Batched renders: 2x2 camera grid alongside the BEV, one frame per scene.
python tools/misc/visualize_whales.py pkl_grid \
    --pkls data/whales/whales_infos_{train,val}.pkl \
    --num-per-pkl 20 --one-per-scene --out whales_vis/

# Mono3D COCO renders with 2D bbox + 3D wireframe per annotation.
python tools/misc/visualize_whales.py coco \
    --path data/whales/whales_infos_val_mono3d.coco.json --image-id <image_id>
python tools/misc/visualize_whales.py coco_batch \
    --path data/whales/whales_infos_val_mono3d.coco.json \
    --num-tokens 20 --one-per-scene --out whales_vis_coco/

Scheduling Algorithms

Agent scheduling pipelines live in ./mmdet3d_plugin/datasets/pipelines/cooperative_perception.py. CAHS prioritizes collaborators by historical coverage and predicted gains. CAHS overview

Experimental Results

All numbers below are reported as 50m / 100m, the two evaluation ranges used by the WHALES protocol (per-class radial distance from ego).

Stand-alone 3D Object Detection

MethodAP_Veh ↑AP_Ped ↑AP_Cyc ↑mAP ↑
PointPillars67.1 / 41.538.0 / 6.337.3 / 11.647.5 / 19.8
SECOND58.5 / 38.827.1 / 12.124.1 / 12.936.6 / 21.2
RegNet66.9 / 42.338.7 / 8.432.9 / 11.746.2 / 20.8
VoxelNeXt64.7 / 42.352.2 / 27.435.9 / 9.050.9 / 26.2

Cooperative 3D Object Detection

MethodAP_Veh ↑AP_Ped ↑AP_Cyc ↑mAP ↑
No Fusion67.1 / 41.538.0 / 6.337.3 / 11.647.5 / 19.8
F-Cooper75.4 / 52.850.1 / 9.144.7 / 20.456.8 / 27.4
Raw-level Fusion71.3 / 48.938.1 / 8.540.7 / 16.350.0 / 24.6
VoxelNeXt71.5 / 50.660.1 / 35.447.6 / 21.959.7 / 35.9

Scheduling Studies — Single-Agent Policies

mAP at 50m / 100m. Base detector: VoxelNeXt (LiDAR cooperative). Rows = inference-time policy, columns = training-time policy.

Inference \ TrainingNo FusionClosest FirstSingle RandomMultiple RandomFull Communication
No Fusion (Baseline)50.9 / 26.250.9 / 23.351.3 / 25.350.3 / 22.945.6 / 18.8
Closest First39.9 / 20.358.4 / 30.258.3 / 32.657.3 / 30.555.4 / 10.8
Single Random43.3 / 22.857.9 / 31.058.4 / 33.357.7 / 31.455.0 / 14.6
MASS55.5 / 11.058.8 / 33.758.9 / 34.057.3 / 32.354.1 / 27.4
CAHS (Proposed)56.1 / 29.662.5 / 31.762.7 / 35.958.3 / 32.659.9 / 31.0

Scheduling Studies — Multi-Agent Policies

mAP at 50m / 100m. Base detector: VoxelNeXt (LiDAR cooperative). Same axes as above.

Inference \ TrainingNo FusionClosest FirstSingle RandomMultiple RandomFull Communication
Multiple Random34.5 / 16.960.7 / 35.161.2 / 37.161.4 / 36.458.8 / 12.9
Full Communication29.1 / 10.563.7 / 38.463.7 / 39.164.0 / 41.165.1 / 39.2
MASS54.6 / 13.464.9 / 39.765.0 / 40.563.7 / 40.463.5 / 36.4
CAHS (Proposed)53.7 / 14.265.3 / 40.165.1 / 42.063.9 / 40.665.2 / 39.2

Roadmap

  • Publish dataset and checkpoints on HuggingFace.

Citation

@INPROCEEDINGS{11247472,
  author    = {Wang, Yinsong Richard and Chen, Siwei and Song, Ziyi and Zhou, Sheng},
  title     = {{WHALES: A Multi-Agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving}},
  booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year      = {2025},
  pages     = {20487-20493},
  keywords  = {Wireless communication; Three-dimensional displays; Scalability; Whales; Benchmark testing; Metadata; Scheduling; Vehicle dynamics; Vehicle-to-everything; Autonomous vehicles},
  doi       = {10.1109/IROS60139.2025.11247472}
}