DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
March 31, 2026 ยท View on GitHub
Welcome to the official implementation of DriveMoE. Follow the instructions below to set up your environment, prepare data, and start training.
๐ ๏ธ Installation
1. Prerequisites
- CUDA Version: 12.1 (Required)
- Python: 3.10+ (Recommended)
2. Setup
Clone the repository and install the dependencies in editable mode:
git clone https://github.com/Thinklab-SJTU/DriveMoE.git
cd DriveMoE
conda create -n drivemoe python=3.10
conda activate drivemoe
pip install -e .
๐ Data & Checkpoint Preparation
1. Download Assets
Please download the following components to their respective directories:
- Datasets: Download the Bench2Drive Dataset and Camera Labels and Scenario Labels into the data/ directory.
- Model Weights: Download the PaliGemma pre-trained weights into the ckpt/ directory.
2.Preprocessing
After organizing the files, run the preprocessing script to prepare the training data:
bash script/generate_data.sh
For OPEN-LOOP evaluation (testing with ground truth history), you must be set HORIZON_SIZE to 20 to ensure fair comparison with baseline methods. For CLOSED-LOOP evaluation (testing with predicted history), it can be set to any value based on your requirements.
๐ก Note: Changing HORIZON_SIZE requires clearing the previous experiment cache. Simply run: rm -r exp/b2d_action
๐ Training & Evaluation
Start Training
Launch the training process using the provided script:
bash script/training/train_drivepi0_closed_loop.sh # train drivepi0
bash script/training/train_drivemoe_stage1_closed_loop.sh # train drivemoe stage1
bash script/training/train_drivemoe_stage2_closed_loop.sh # train drivemoe stage2
Evaluation
We support both Open-loop and Closed-loop evaluations. For detailed evaluation steps, please refer to Evaluation on Bench2Drive.
Acknowledgments
This project has been developed based on the following pioneering works on GitHub repositories. We express our profound gratitude for these foundational resources:
Citation
@article{yang2025drivemoe,
title={DriveMoE: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving},
author={Yang, Zhenjie and Chai, Yilin and Jia, Xiaosong and Li, Qifeng and Shao, Yuqian and Zhu, Xuekai and Su, Haisheng and Yan, Junchi},
journal={arXiv preprint arXiv:2505.16278},
year={2025}
}