π Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks (ICCV 2025 official implementation)
July 22, 2025 Β· View on GitHub
π¨βπ» Authors
Jiawei Wang*1, Yushen Zuo*2, Yuanjun Chai3, Zhendong Liu4, Yicheng Fu5, Yichun Fengβ 6, Kin-man Lamβ 2
1University of Science and Technology of China
2The Hong Kong Polytechnic University
3University of Washington
4Nanjing University
5Stanford University
6University of the Chinese Academy of Sciences
π§ Contact Emails:
wangjiawei@mail.ustc.edu.cn, yushen.zuo@polyu.edu.hk, yjchai@uw.edu, dz20330019@smail.nju.edu.cn,
easonfu@stanford.edu, fengyichun22@mails.ucas.ac.cn, kin.man.lam@polyu.edu.hk
* Equal contribution.
β Corresponding authors.
Welcome! This repository hosts the official implementation of our paper, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks."
π Whatβs New?
We propose state-of-the-art solutions to enhance the robustness of Vision-Language Models (VLMs) against Gaussian noise and adversarial attacks. Key highlights include:
-
π― Robust-VLGuard: A pioneering multimodal safety dataset covering both aligned and misaligned image-text pair scenarios.

-
π‘οΈ DiffPure-VLM: A novel defense framework that leverages diffusion models to neutralize adversarial noise by transforming it into Gaussian-like noise, significantly improving VLM resilience.

β¨ Key Contributions
- π Conducted a comprehensive vulnerability analysis revealing the sensitivity of mainstream VLMs to Gaussian noise.
- π Developed Robust-VLGuard, a dataset designed to improve model robustness without compromising helpfulness or safety alignment.
- βοΈ Introduced DiffPure-VLM, an effective pipeline for defending against complex optimization-based adversarial attacks.
- π Demonstrated strong performance across multiple benchmarks, outperforming existing baseline methods.
π‘ Quickstart
π οΈ Installation
Different models require different environments. We provide conda environment files in the env_configs directory. For example:
conda env create -f env_configs/environment-omi.yml
conda activate Omi-Environment
π Pretrained Models Setup
mkdir -p ckpts/
ln -s your_path/vicuna ckpts/vicuna
ln -s your_path/pretrained_minigpt4.pth ckpts/pretrained_minigpt4.pth
mkdir -p ckpts/diffpure_models/diffusion/Guide_Diffusion/
ln -s your_path/256x256_diffusion_uncond.pt ckpts/diffpure_models/diffusion/Guide_Diffusion/256x256_diffusion_uncond.pt
- MiniGPT-4: https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view
- Vicuna: https://huggingface.co/Vision-CAIR/vicuna/tree/main
- Diffusion Model: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt
π Dataset Setup
- RealToxicityPrompts is available in the
harmful_corpus/directory. - Download Robust-VLGuard from Huggingface
- Noisy MMVet benchmark: Google Drive
π Fine-tuning VLMs
Our Robust-VLGuard dataset is preprocessed and ready for fine-tuning. You can use the official code of the respective VLMs, but the Gaussian Noise Augmentation Strategy must be implemented. We have already incorporated this strategy into the official codebases. To see the implementation for LLaVA, refer to this commit. You can also directly fine-tune using this repo. For other codebases, you can follow the implementation approach used in LLaVA.
π Fine-tuned Models
| Model Name | Hugging Face Path |
|---|---|
| llava-v1.5-7b-RobustVLGuard | LLaVA |
| MiniGPT4-RobustVLGuard | MiniGPT-4 |
| InternVL2-8B-RobustVLGuard | InternVL2-8B |
π Evaluation
On RealToxicityPrompts
bash general_scripts/omi_eval_rtp.sh {OUTPUT_PATH} adversarial_images/clean.jpeg {MODEL_PATH}
or
bash general_scripts/omi_eval_rtp.sh {OUTPUT_PATH} adversarial_images_add_noise_G30/clean.jpeg {MODEL_PATH}
On MMVet Benchmark (LLaVA)
python llava_inference_mmvet.py --model_path {MODEL_PATH} --clean --output_path {OUTPUT_PATH}
or
python llava_inference_mmvet.py --model_path {MODEL_PATH} --output_path {OUTPUT_PATH}
Use minigpt_inference_mmvet.py for MiniGPT-4. For InternVL2, refer to this commit.
IMPORTANT: Remember to replace the mmvet_path and image_path in the python script with the correct paths.
π§ Optimization-based Adversarial Attack
bash llava-attack.sh {GPU_ID} {OUTPUT_PATH} {MODEL_PATH} {EPSILON}
Where EPSILON controls perturbation strength (e.g., 16, 32, or 64). For MiniGPT4, use minigpt_visual_attack.py. For InternVL2, refer to this commit.
π Deploying DiffPure-VLM Defense
Run the following command:
bash general_scripts/omi_eval_rtp_diffpure.sh {output_path} {image_prompt_path} {model_path} {def_num_denoising_steps}
For MiniGPT-4 and Qwen-VL, use:
minigpt_scripts/minigpt_eval_rtp_diffpure_single_gpu.shqwen25_vl_scripts/qwen25_vl_rtp_diffpure.sh
π§ͺ Deploying JailGuard Defense
We provide a script for JailGuard defense for comparison. Use the following command:
bash general_scripts/omi_eval_rtp_jailguard.sh {output_path} {image_prompt_path} {model_path}
π Experimental Results
Detailed results and analysis are included in the paper and supplementary materials. See the results/ directory for specific outcomes.

π Citation
If you find this work helpful, please consider citing our paper.
π License
This project is licensed under the MIT License. See the LICENSE file for more details.
π’ Contact
For questions or collaboration opportunities, feel free to reach out at [jarvisustc@gmail.com]. We welcome your feedback!
π Acknowledgments
Our repo is built upon https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models. Thanks to the authors of the original models and datasets, including LLaVA, MiniGPT-4, InternVL2, MMVet, and others. We also acknowledge the support of our institutions and collaborators. We are grateful for the resources and tools provided by the community that made this research possible. We are committed to advancing the field of multimodal learning and look forward to future collaborations.