BadVLMDriver

April 29, 2024 · View on GitHub

Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models

Arxiv | Project Page

04/29/2024 update: We release the preprint paper in arxiv, and the code is coming soon!

Abstract

Vision-Large-Language-models(VLMs) have great application prospects in autonomous driving. Despite the ability of VLMs to comprehend and make decisions in complex scenarios, their integration into safety-critical autonomous driving systems poses serious security risks. In this paper, we propose BadVLMDriver, the first backdoor attack against VLMs for autonomous driving that can be launched in practice using physical objects. Unlike existing backdoor attacks against VLMs that rely on digital modifications, BadVLMDriver uses common physical items, such as a red balloon, to induce unsafe actions like sudden acceleration, highlighting a significant real-world threat to autonomous vehicle safety. To execute BadVLMDriver, we develop an automated pipeline utilizing natural language instructions to generate backdoor training samples with embedded malicious behaviors. This approach allows for flexible trigger and behavior selection, enhancing the stealth and practicality of the attack in diverse scenarios. We conduct extensive experiments to evaluate BadVLMDriver for two representative VLMs, five different trigger objects, and two types of malicious backdoor behaviors. BadVLMDriver achieves a 92% attack success rate in inducing a sudden acceleration when coming across a pedestrian holding a red balloon. Thus, BadVLMDriver not only demonstrates a critical security risk but also emphasizes the urgent need for developing robust defense mechanisms to protect against such vulnerabilities in autonomous driving technologies.

Framework

BadVLMDriver includes two main steps. In the first step, we synthesize a small number of backdoor training samples using instruction-guided generative models. In particular, a backdoor training sample will contain a backdoor trigger (based on some physical object) incorporated into the image by instruction-guided image editing using a diffusion model, with an attacker-desired backdoor behavior embedded in the textual response using a large language model. Then, in the second step, the victim VLM is visual-instruction tuned on the generated backdoor training samples and their benign ‘replays’ using a blended loss.

Citation

@article{ni2024physical,
    title={Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models},
    author={Ni, Zhenyang and Ye, Rui and Wei, Yuxi and Xiang, Zhen and Wang, Yanfeng and Chen, Siheng},
    journal={arXiv preprint arXiv:2404.12916},
    year={2024}
}

Todo

  • Code release
  • Data and model release