BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

May 17, 2026 · View on GitHub

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao

Overview

main figure

Abstract: Recent advancements in vision-language models (VLMs), such as CLIP, have demonstrated substantial success in self-supervised representation learning for vision tasks. However, effectively adapting VLMs to downstream applications remains challenging, as their accuracy often depends on time-intensive and expertise-demanding prompt engineering, while full model fine-tuning is costly. This is particularly true for biomedical images, which, unlike natural images, typically suffer from limited annotated datasets, unintuitive image contrasts, and nuanced visual features. Recent prompt learning techniques, such as Context Optimization (CoOp) intend to tackle these issues, but still fall short in generalizability. Meanwhile, explorations in prompt learning for biomedical image analysis are still highly limited. In this work, we propose BiomedCoOp, a novel prompt learning framework that enables efficient adaptation of BiomedCLIP for accurate and highly generalizable few-shot biomedical image classification. Our approach achieves effective prompt context learning by leveraging semantic consistency with average prompt ensembles from Large Language Models (LLMs) and knowledge distillation with a statistics-based prompt selection strategy. We conducted comprehensive validation of our proposed framework on 11 medical datasets across 9 modalities and 10 organs against existing state-of-the-art methods, demonstrating significant improvements in both accuracy and generalizability.

Method

Semantic Consistency with LLM-Enhanced Prompt Ensembles: Enhance context vector learning using prompt ensembles derived from GPT-4, combined with a knowledge distillation strategy to enforce semantic consistency.
Outlier Pruning for Robust Generalization: Employ a statistics-based pruning strategy to filter outlier prompts from LLMs, mitigating over-specialization and preserving essential biomedical patterns.
First Adoption of BiomedCLIP for Prompt Learning: Leverage BiomedCLIP for prompt learning for the first time, demonstrating superior performance over general knowledge CLIP in clinical tasks.
Extensive Multi-Modal Evaluation: Evaluate across 11 biomedical image classification datasets, 9 modalities, and 10 organs, showcasing BiomedCoOp's superior generalizability and robustness in few-shot and base-to-novel benchmarks.

:ballot_box_with_check: Supported Methods

Method	Paper	Configs	Training Scripts	Trainers
BiomedCoOp	CVPR 2025	link	link	link
CLIP	ICML 2021	link	link	link
CoOp	IJCV 2022	link	link	link
CoCoOp	CVPR 2022	link	link	link
KgCoOp	CVPR 2023	link	link	link
ProGrad	ICCV 2023	link	link	link
CLIP-Adapter	IJCV 2024	link	link	link
Tip-Adapter	ECCV 2022	link	link	link
LP	ICML 2021	link	link	link
LP++	CVPR 2024	link	link	link

Results

Results reported below show accuracy for few-shot scenarios as well as base and novel classes across 11 biomedical recognition datasets averaged over 3 seeds.

Few-shot Evaluation

Method	$K=1$	$K=2$	$K=4$	$K=8$	$K=16$
CLIP-Adapter	44.66	43.91	44.36	45.42	46.69
Tip-Adapter	49.19	52.36	57.33	61.98	67.15
Tip-Adapter-F	51.17	52.74	61.23	65.91	70.91
Standard LP	47.25	54.21	61.00	65.85	69.40
LP++	47.24	53.18	59.02	63.69	68.35
CoOp	50.16	54.18	59.75	65.84	69.62
CoCoOp	48.49	51.28	54.69	61.08	65.09
KgCoOp	50.85	53.18	57.82	62.08	62.84
ProGrad	51.88	54.71	60.42	65.61	67.13
BiomedCoOp	57.03	59.13	63.95	68.32	72.42

Base-to-Novel Generalization

Name	Base Acc.	Novel Acc.	HM
BiomedCLIP	47.84	65.42	53.81
CoOp	73.85	64.75	67.23
CoCoOp	72.26	67.03	67.22
KgCoOp	68.36	64.08	64.61
ProGrad	71.67	66.93	67.43
BiomedCoOp (ours)	76.26	73.92	75.07

Model Checkpoints and Logs

Name	Few-Shot	Base-to-Novel
BiomedCoOp	link	link

Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

Training and Evaluation

Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results using our pre-trained models.

Citation

If you use our work, please consider citing:

@inproceedings{koleilat2025biomedcoop,
  title={Biomedcoop: Learning to prompt for biomedical vision-language models},
  author={Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={14766--14776},
  year={2025}
}

Our code builds upon the CoOp, MaPLe, and LP++ repositories. We are grateful to the authors for making their code publicly available. If you use our model or code, we kindly request that you also consider citing these foundational works.