README.md

March 28, 2025 ยท View on GitHub

MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction

ย  ย 

Yingshuang Zou, Yikang Ding, Chuanrui Zhang, Jiazhe Guo, Bohan Li,
Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Haoqian Wang


๐Ÿ”† News

๐Ÿ”ฅ๐Ÿ”ฅ (2025.03) Check out our other latest works on generative world models: UniScene, DiST-4D, HERMES.

๐Ÿ”ฅ๐Ÿ”ฅ (2025.03) The data processing code is released!

๐Ÿ”ฅ๐Ÿ”ฅ (2025.03) The training and inference code of Multi-modal Diffusion is available NOW!!!

๐Ÿ”ฅ๐Ÿ”ฅ (2025.03) Paper in on arXiv: MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction

๐Ÿ“ TODO List

  • Release data processing code.
  • Release the pretrained model.
  • Release training / inference code.

๐Ÿ‘€ Abstract

Recent breakthroughs in radiance fields have significantly advanced 3D scene reconstruction and novel view synthesis (NVS) in autonomous driving. Nevertheless, critical limitations persist: reconstruction-based methods exhibit substantial performance deterioration under significant viewpoint deviations from training trajectories, while generation-based techniques struggle with temporal coherence and precise scene controllability. To overcome these challenges, we present MuDG, an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction. MuDG leverages aggregated LiDAR point clouds with RGB and geometric priors to condition a multi-modal video diffusion model, synthesizing photorealistic RGB, depth, and semantic outputs for novel viewpoints. This synthesis pipeline enables feed-forward NVS without computationally intensive per-scene optimization, providing comprehensive supervision signals to refine 3DGS representations for rendering robustness enhancement under extreme viewpoint changes. Experiments on the Open Waymo Dataset demonstrate that MuDG outperforms existing methods in both reconstruction and synthesis quality.

๐Ÿงฐ Models

ModelResolutionCheckpoint
MDM1024576x1024Hugging Face
MDM512320x512Hugging Face

โš™๏ธ Setup

conda create -n mudg python=3.8.5
conda activate mudg
pip install -r requirements.txt

๐Ÿ’ซ Inference for Novel View Viewpoint

1. Sparse Conditional Generation

We project the fused point clouds onto novel viewpoints to generate sparse color and depth maps.

Note: The detailed data processing steps can be found in the Data Processing section.

For your convenience, we have also provided pre-processed data. You can access it via this link.

2. Generate item list

python virtual_render/generate_virtual_item.py

3. Multi-modal Diffusion

  1. Download pretrained models, and put the model.ckpt with the required resolution in checkpoints/[1024|512]_mdm/[1024|512]-mdm-checkpoint.ckpt.
  2. Run the commands based on your devices and needs in terminal.
  sh virtual_render/scripts/render.sh 15365

15365 is the item id, and you can change it to any item id following the item list.

๐Ÿ’ฅ Training

Novel View Generation

  1. Process the data and generate the item list.
  2. Generate the train data list:
python data/create_data_infos.py
  1. Download the pretrained model DynamiCrafter512 and put the model.ckpt in checkpoints/512_mdm/512-mdm-checkpoint.ckpt.
  2. We train the 320 * 512 model with the following command:
  sh configs/stage1-512_mdm_waymo/run-512.sh
  1. Then we use the following command to train the 576 * 1024 model:
  sh configs/stage2-1024_mdm_waymo/run-1024.sh

๐Ÿ“œ License

This repository is released under the Apache 2.0 license.

๐Ÿ˜‰ Citation

Please consider citing our paper if our code are useful:

@article{zou2025mudg,
  title={MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction},
  author={Zou, Yingshuang and Ding, Yikang and Zhang, Chuanrui and Guo, Jiazhe and Li, Bohan and Lyu, Xiaoyang and Tan, Feiyang and Qi, Xiaojuan and Wang, Haoqian},
  journal={arXiv preprint arXiv:2503.10604},
  year={2025}
}

๐Ÿ™ Acknowledgements

We would like to thank the contributors of the following repositories for their valuable contributions to the community: