MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models
October 31, 2025 Β· View on GitHub
[ICME 2025] MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models
Haoyang Li, Siyu Zhou, Liang Wang and Guodong Long.
Shanghai University, University of Technology Sydney
π¦ Supplementary Material: https://github.com/JREion/M.A.O/releases/tag/docs
π° Full Text: https://arxiv.org/abs/2503.18160
π₯ News
-
NOTE: We are preparing our code repository (mainly rewriting comments to improve readability). We hope to release code in April.
-
(24 Jun. 2025) We upload the poster of M.A.O.
-
(15 Apr. 2025) The code of PromptSRC+M.A.O is released.
-
(25 Mar. 2025) Full text and supplementary material are available at Arxiv.
-
(21 Mar. 2025) Our paper is accepted by ICME 2025!
Abstract
Though CLIP-based prompt tuning significantly enhances pre-trained Vision-Language Models, existing research focuses on reconstructing the model architecture, e.g., additional loss calculation and meta-networks. These approaches generally lead to increased complexity and extended training cost. To maintain the efficiency of the tuning process, we propose plug-and-play Model-Agnostic Optimization (M.A.O) for prompt tuning.
Without altering any components of the prompt tuning backbone, we introduce a Data-Driven Enhancement framework to optimize the distribution of the initial data, and incorporate an Alterable Regularization module to boost the task-specific feature processing pipeline, thereby improving overall performance while maintaining low computational cost. Extensive experiments on MAO demonstrate its outstanding performance and efficiency.
Framework

Main Contributions
(1) MAO efficiently optimizes prompt tuning backbones at data and feature level in a plug-and-play manner, consuming almost no additional computational resources.
(2) We introduce task-related Data-Driven Enhancement to MAO, improving the data distribution of base and new classes through hard negative sampling and rapid pseudo-label allocation, respectively.
(3) We incorporate Alterable Regularization into the procedure of feature processing, constraining the model to dynamically focus more on the features of updated data to enhance performance and generalization.
Experimental Results
Base-to-New
Cross-Dataset
While source accuracy improves, MAO also attains higher accuracy on multiple target datasets. Remarkably, this is achieved without any target-task fine-tuning.
We attribute this to MAO's Alterable Regularization design, which mitigates overfitting to the ImageNet source, thus guaranteeing favorable generalization to out-of-distribution data.
Poster

Citation
If you find our paper or repo helpful for your research, please consider citing our paper and giving this repo a starβ. Thank you!
@INPROCEEDINGS{li2025mao,
author={Li, Haoyang and Zhou, Siyu and Wang, Liang and Long, Guodong},
booktitle={2025 IEEE International Conference on Multimedia and Expo (ICME)},
title={MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models},
year={2025},
pages={1-6},
doi={10.1109/ICME59968.2025.11209968}}
Acknowledgements
π§° Repositories
Our code is based on DPC, DePT, PromptSRC, MaPLe and CoOp repository. We thank the authors for releasing their code.
π Special Thanks
The author extends heartfelt gratitude to the two M.A.O. β to him and to her β whose presence has enriched the soul and bestowed the strength to journey forward.