DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

July 3, 2026 · View on GitHub

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Welcome to the official implementation of DriveMoE. Follow the instructions below to set up your environment, prepare data, and start training.

🛠️ Installation

1. Prerequisites

CUDA Version: $\ge$ 12.1 (Required)
Python: 3.10+ (Recommended)

2. Setup

Clone the repository and install the dependencies in editable mode:

git clone https://github.com/Thinklab-SJTU/DriveMoE.git
cd DriveMoE
conda create -n drivemoe python=3.10
conda activate drivemoe
pip install -e .

📂 Data & Checkpoint Preparation

1. Download Assets

Please download the following components to their respective directories:

Datasets: Download the Bench2Drive Dataset and Camera Labels and Scenario Labels into the data/ directory.
Model Weights: Download the PaliGemma pre-trained weights into the ckpt/ directory.

2.Preprocessing

After organizing the files, run the preprocessing script to prepare the training data:

bash script/generate_data.sh

For OPEN-LOOP evaluation (testing with ground truth history), you must be set HORIZON_SIZE to 20 to ensure fair comparison with baseline methods. For CLOSED-LOOP evaluation (testing with predicted history), it can be set to any value based on your requirements.

💡 Note: Changing HORIZON_SIZE requires clearing the previous experiment cache. Simply run: rm -r exp/b2d_action

🚄 Training & Evaluation

Start Training

Launch the training process using the provided script:

bash script/training/train_drivepi0_closed_loop.sh          # train drivepi0
bash script/training/train_drivemoe_stage1_closed_loop.sh   # train drivemoe stage1
bash script/training/train_drivemoe_stage2_closed_loop.sh   # train drivemoe stage2

@InProceedings{Yang_2026_CVPR,
    author    = {Yang, Zhenjie and Chai, Yilin and Jia, Xiaosong and Li, Qifeng and Shao, Yuqian and Zhu, Xuekai and Su, Haisheng and Yan, Junchi},
    title     = {DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {10678-10688}
}

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

🛠️ Installation

1. Prerequisites

2. Setup

📂 Data & Checkpoint Preparation

1. Download Assets

2.Preprocessing

🚄 Training & Evaluation

Start Training

Evaluation

Acknowledgments

Citation