MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models

October 31, 2025 Β· View on GitHub

[ICME 2025] MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models
Haoyang Li, Siyu Zhou, Liang Wang and Guodong Long.
Shanghai University, University of Technology Sydney

πŸ“¦ Supplementary Material: https://github.com/JREion/M.A.O/releases/tag/docs

πŸ“° Full Text: https://arxiv.org/abs/2503.18160


πŸ”₯ News

  • NOTE: We are preparing our code repository (mainly rewriting comments to improve readability). We hope to release code in April.

  • (24 Jun. 2025) We upload the poster of M.A.O.

  • (15 Apr. 2025) The code of PromptSRC+M.A.O is released.

  • (25 Mar. 2025) Full text and supplementary material are available at Arxiv.

  • (21 Mar. 2025) Our paper is accepted by ICME 2025!


Abstract

Though CLIP-based prompt tuning significantly enhances pre-trained Vision-Language Models, existing research focuses on reconstructing the model architecture, e.g., additional loss calculation and meta-networks. These approaches generally lead to increased complexity and extended training cost. To maintain the efficiency of the tuning process, we propose plug-and-play Model-Agnostic Optimization (M.A.O) for prompt tuning.
Without altering any components of the prompt tuning backbone, we introduce a Data-Driven Enhancement framework to optimize the distribution of the initial data, and incorporate an Alterable Regularization module to boost the task-specific feature processing pipeline, thereby improving overall performance while maintaining low computational cost. Extensive experiments on MAO demonstrate its outstanding performance and efficiency.

Framework

Figure 1. Framework of proposed MAO. MAO builds a two-step fine-tuning structure without altering components of prompt tuning backbones. In (a) base tasks, MAO introduces a hard negative sampler as Data-Driven Enhancement (DDE), and an Alterable Regularization (reg-B) that guides the model in learning the feature distribution of hard negatives and keeps generalization. Then in (b) new tasks, rapid pseudo-labeling is performed on unlabeled images as DDE using shared-parameter CLIP, followed by reg-N to constrain the fine-tuning on new classes. The inference process follows the settings of the original backbones.

Main Contributions

(1) MAO efficiently optimizes prompt tuning backbones at data and feature level in a plug-and-play manner, consuming almost no additional computational resources.

(2) We introduce task-related Data-Driven Enhancement to MAO, improving the data distribution of base and new classes through hard negative sampling and rapid pseudo-label allocation, respectively.

(3) We incorporate Alterable Regularization into the procedure of feature processing, constraining the model to dynamically focus more on the features of updated data to enhance performance and generalization.


Experimental Results

Base-to-New

fail

Cross-Dataset

While source accuracy improves, MAO also attains higher accuracy on multiple target datasets. Remarkably, this is achieved without any target-task fine-tuning.

We attribute this to MAO's Alterable Regularization design, which mitigates overfitting to the ImageNet source, thus guaranteeing favorable generalization to out-of-distribution data.

fail

Poster

Citation

If you find our paper or repo helpful for your research, please consider citing our paper and giving this repo a star⭐. Thank you!

@INPROCEEDINGS{li2025mao,
  author={Li, Haoyang and Zhou, Siyu and Wang, Liang and Long, Guodong},
  booktitle={2025 IEEE International Conference on Multimedia and Expo (ICME)}, 
  title={MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models}, 
  year={2025},
  pages={1-6},
  doi={10.1109/ICME59968.2025.11209968}}

Acknowledgements

🧰 Repositories

Our code is based on DPC, DePT, PromptSRC, MaPLe and CoOp repository. We thank the authors for releasing their code.

πŸ’– Special Thanks

The author extends heartfelt gratitude to the two M.A.O. β€” to him and to her β€” whose presence has enriched the soul and bestowed the strength to journey forward.