DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

March 31, 2026 ยท View on GitHub

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Project Page arXiv Hugging Face

Welcome to the official implementation of DriveMoE. Follow the instructions below to set up your environment, prepare data, and start training.

๐Ÿ› ๏ธ Installation

1. Prerequisites

  • CUDA Version: โ‰ฅ\ge 12.1 (Required)
  • Python: 3.10+ (Recommended)

2. Setup

Clone the repository and install the dependencies in editable mode:

git clone https://github.com/Thinklab-SJTU/DriveMoE.git
cd DriveMoE
conda create -n drivemoe python=3.10
conda activate drivemoe
pip install -e .

๐Ÿ“‚ Data & Checkpoint Preparation

1. Download Assets

Please download the following components to their respective directories:

2.Preprocessing

After organizing the files, run the preprocessing script to prepare the training data:

bash script/generate_data.sh

For OPEN-LOOP evaluation (testing with ground truth history), you must be set HORIZON_SIZE to 20 to ensure fair comparison with baseline methods. For CLOSED-LOOP evaluation (testing with predicted history), it can be set to any value based on your requirements.

๐Ÿ’ก Note: Changing HORIZON_SIZE requires clearing the previous experiment cache. Simply run: rm -r exp/b2d_action

๐Ÿš„ Training & Evaluation

Start Training

Launch the training process using the provided script:

bash script/training/train_drivepi0_closed_loop.sh          # train drivepi0
bash script/training/train_drivemoe_stage1_closed_loop.sh   # train drivemoe stage1
bash script/training/train_drivemoe_stage2_closed_loop.sh   # train drivemoe stage2

Evaluation

We support both Open-loop and Closed-loop evaluations. For detailed evaluation steps, please refer to Evaluation on Bench2Drive.

Acknowledgments

This project has been developed based on the following pioneering works on GitHub repositories. We express our profound gratitude for these foundational resources:

Citation

@article{yang2025drivemoe,
      title={DriveMoE: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving},
      author={Yang, Zhenjie and Chai, Yilin and Jia, Xiaosong and Li, Qifeng and Shao, Yuqian and Zhu, Xuekai and Su, Haisheng and Yan, Junchi},
      journal={arXiv preprint arXiv:2505.16278},
      year={2025}
}