DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
March 24, 2025 ยท View on GitHub
Project Page | Paper
News
- [2025/1/19] Release of NTA-IoU and NTL-IoU Code.
- [2024/10/17] Repository Initialization.
Abstract
Closed-loop simulation is essential for advancing end-to-end autonomous driving systems. Contemporary sensor simulation methods, such as NeRF and 3DGS, rely predominantly on conditions closely aligned with training data distributions, which are largely confined to forward-driving scenarios. Consequently, these methods face limitations when rendering complex maneuvers (e.g., lane change, acceleration, deceleration). Recent advancements in autonomous-driving world models have demonstrated the potential to generate diverse driving videos. However, these approaches remain constrained to 2D video generation, inherently lacking the spatiotemporal coherence required to capture intricacies of dynamic driving environments. In this paper, we introduce DriveDreamer4D, which enhances 4D driving scene representation leveraging world model priors. Specifically, we utilize the world model as a data machine to synthesize novel trajectory videos, where structured conditions are explicitly leveraged to control the spatial-temporal consistency of traffic elements. Besides, the cousin data training strategy is proposed to facilitate merging real and synthetic data for optimizing 4DGS. To our knowledge, DriveDreamer4D is the first to utilize video generation models for improving 4D reconstruction in driving scenarios. Experimental results reveal that DriveDreamer4D significantly enhances generation quality under novel trajectory views, achieving a relative improvement in FID by 32.1%, 46.4%, and 16.3% compared to PVG, S3Gaussian, and Deformable-GS. Moreover, DriveDreamer4D markedly enhances the spatiotemporal coherence of driving agents, which is verified by a comprehensive user study and the relative increases of 22.6%, 43.5%, and 15.6% in the NTA-IoU metric.
DriveDreamer4D Framework
Install
conda create -n drivedreamer4d python=3.8
conda activate drivedreamer4d
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirments.txt
pip install ./submodules/gsplat-1.3.0
pip install git+https://github.com/facebookresearch/pytorch3d.git
pip install ./submodules/nvdiffrast
pip install ./submodules/smplx
Prepare
NOTE To facilitate the calculation of metrics, we have updated our data processing code. If you previously downloaded our dataset, you can simply download the label.pkl file from the provided link and place it in the ./data/waymo/processed/validation/005 directory. Additionally, if you previously downloaded our checkpoint, please move the checkpoint and config file to the ./exp/pvg_example/005 directory.
Download data (Baidu, Google) and extract it to the ./data/waymo/ directory.
Download checkpoint (Baidu, Google) to ./exp/pvg_example/005
Furthermore, if you want to generate the label.pkl of other scenarios, you can do so by running the script located at datasets/prepare_data_label.py.
python datasets/prepare_data_label.py --data_root /PATH/TO/YOUR/WAYMO/SOURCE/DATA --scene_ids specific_scene_ids
Render
python tool/eval.py --resume_from ./exp/pvg_example/checkpoint_final.pth
Metrics
# calculate NTL-IoU
python utils/metrics/NTL-IoU/get_NTL-IoU.py --model_path /PATH/TO/YOUR/TWINLITENET/MODEL --exp_name pvg_example --exp_root ./exp --scene_ids 005 --data_root ./data/waymo/processed/validation --save_root ./results
# calculate NTA-IoU
python utils/metrics/NTL-IoU/get_NTA-IoU.py --model_path /PATH/TO/YOUR/YOLO11/MODEL --exp_name pvg_example --exp_root ./exp --scene_ids 005 --data_root ./data/waymo/processed/validation --save_root ./results
Scenario Selection
All selected scenes are sourced from the validation set of the Waymo dataset. The official file names of these scenes, are listed along with their respective starting and ending frames.
| Scene | Start Frame | End Frame |
|---|---|---|
| segment-10359308928573410754_720_000_740_000_with_camera_labels.tfrecord | 120 | 159 |
| segment-12820461091157089924_5202_916_5222_916_with_camera_labels.tfrecord | 0 | 39 |
| segment-15021599536622641101_556_150_576_150_with_camera_labels.tfrecord | 0 | 39 |
| segment-16767575238225610271_5185_000_5205_000_with_camera_labels.tfrecord | 0 | 39 |
| segment-17152649515605309595_3440_000_3460_000_with_camera_labels.tfrecord | 60 | 99 |
| segment-17860546506509760757_6040_000_6060_000_with_camera_labels.tfrecord | 90 | 129 |
| segment-2506799708748258165_6455_000_6475_000_with_camera_labels.tfrecord | 80 | 119 |
| segment-3015436519694987712_1300_000_1320_000_with_camera_labels.tfrecord | 40 | 79 |
Rendering Results in Lane Change Novel Trajectory
Comparisons of novel trajectory renderings during lane change scenarios. The left column shows PVG, S3Gaussian, and Deformable-GS, while the right column shows DriveDreamer4D-PVG, DriveDreamer4D-S3Gaussian, and DriveDreamer4D-Deformable-GS.
Rendering Results in Speed Change Novel Trajectory
Comparisons of novel trajectory renderings during speed change scenarios. The left column shows PVG, S3Gaussian, and Deformable-GS, while the right column shows DriveDreamer4D-PVG, DriveDreamer4D-S3Gaussian, and DriveDreamer4D-Deformable-GS.
Acknowledgements
We would like to thank the following works and projects, for their open research and exploration: DriveStudio, DriveDreamer, DriveDreamer-2, and DriveDreamer4D.
Bibtex
If this work is helpful for your research, please consider citing the following BibTeX entry.
@inproceedings{zhao2024drive,
title={DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation},
author={Guosheng Zhao and Chaojun Ni and Xiaofeng Wang and Zheng Zhu and Xueyang Zhang and Yida Wang and Guan Huang and Xinze Chen and Boyuan Wang and Youyi Zhang and Wenjun Mei and Xingang Wang},
journal={arxiv arXiv preprint arXiv:2410.13571},
year={2024},
}