DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

December 16, 2024 · View on GitHub

This repository contains the implementation of the paper:

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang*, Chenjian Feng*, Feng Yan, Baihui Xiao, Zequn Jie, Yujie Zhong, Xiaodan liang†, Lin Ma†
*Equal Contribution †Corresponding Authors

:fire: Updates

2024.12: We release DriveMM paper on arxiv！We release the models and inference code!

🔥 We propose a novel all-in-one large multimodal model, DriveMM, robustly equipped with the general capabilities to execute a wide range of AD tasks and the generalization ability to effectively transfer to new datasets.

🔥 We introduce comprehensive benchmarks for evaluating autonomous driving LMMs, which include six public datasets, four input types, and thirteen challenging tasks. To the best of our knowledge, this is the first to use multiple benchmarks to evaluate autonomous driving LLMs.

🔥 We present a curriculum principle for pre-training and fine-tuning on both diverse multimodal data and AD data. DriveMM demonstrates state-of-the-art performances and consistently outperforms models trained on the individual dataset across all evaluated benchmarks.

:checkered_flag: Getting Started

Installation

1. Clone this repository and navigate to the DriveMM folder:

git clone https://github.com/zhijian11/DriveMM
cd DriveMM

2. Install the inference package:

conda create -n drivemm python=3.10 -y
conda activate drivemm
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"

3. Inference DriveMM demo:

Download the checkpoint and put them on ckpt/ floder.

cd scripts/inference_demo
python demo_image.py # for image input 
python demo_video.py # for video input

@article{huang2024drivemm,
  title={DriveMM: All-in-One Large Multimodal Model for Autonomous Driving},
  author={Huang, Zhijian and Fen, Chengjian and Yan, Feng and Xiao, Baihui and Jie, Zequn and Zhong, Yujie and Liang, Xiaodan and Ma, Lin},
  journal={arXiv preprint arXiv:2412.07689},
  year={2024}
}

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

:fire: Updates

:sparkles: Hightlights

:checkered_flag: Getting Started

Installation

1. Clone this repository and navigate to the DriveMM folder:

2. Install the inference package:

3. Inference DriveMM demo:

:white_check_mark: TODO

:blush: Acknowledge

:pushpin: Citation