MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving

March 14, 2025 ยท View on GitHub

๐Ÿ“‘ arxiv link : https://arxiv.org/pdf/2409.07267

Citation

To cite our work, please use the following BibTeX entry:

@article{zhang2024minidrive,
  title={MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving},
  author={Zhang, Enming and Dai, Xingyuan and Lv, Yisheng and Miao, Qinghai},
  journal={arXiv preprint arXiv:2409.07267},
  year={2024}
}