DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
December 16, 2024 · View on GitHub
This repository contains the implementation of the paper:
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang*, Chenjian Feng*, Feng Yan, Baihui Xiao, Zequn Jie, Yujie Zhong, Xiaodan liang†, Lin Ma†
*Equal Contribution †Corresponding Authors
:fire: Updates
:sparkles: Hightlights
🔥 We propose a novel all-in-one large multimodal model, DriveMM, robustly equipped with the general capabilities to execute a wide range of AD tasks and the generalization ability to effectively transfer to new datasets.
:checkered_flag: Getting Started
Installation
1. Clone this repository and navigate to the DriveMM folder:
git clone https://github.com/zhijian11/DriveMM
cd DriveMM
2. Install the inference package:
conda create -n drivemm python=3.10 -y
conda activate drivemm
pip install --upgrade pip # Enable PEP 660 support.
pip install -e ".[train]"
3. Inference DriveMM demo:
- Download the checkpoint and put them on ckpt/ floder.
cd scripts/inference_demo
python demo_image.py # for image input
python demo_video.py # for video input
:white_check_mark: TODO
- DriveMM models
- DriveMM inference code
- DriveMM evaluation code
- DriveMM training data
- DriveMM training code
:blush: Acknowledge
This project has referenced some excellent open-sourced repos(LLaVa-NeXT). Thanks for their wonderful works and contributions to the community.
:pushpin: Citation
If you find DriveMM is helpful for your research or applications, please consider giving us a star 🌟 and citing it by the following BibTex entry.
@article{huang2024drivemm,
title={DriveMM: All-in-One Large Multimodal Model for Autonomous Driving},
author={Huang, Zhijian and Fen, Chengjian and Yan, Feng and Xiao, Baihui and Jie, Zequn and Zhong, Yujie and Liang, Xiaodan and Ma, Lin},
journal={arXiv preprint arXiv:2412.07689},
year={2024}
}