BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

May 17, 2026 ยท View on GitHub

Health-X Lab | IMPACT Lab

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao

paper Overview Datasets Models BibTeX

Overview

main figure

Abstract: Recent advancements in vision-language models (VLMs), such as CLIP, have demonstrated substantial success in self-supervised representation learning for vision tasks. However, effectively adapting VLMs to downstream applications remains challenging, as their accuracy often depends on time-intensive and expertise-demanding prompt engineering, while full model fine-tuning is costly. This is particularly true for biomedical images, which, unlike natural images, typically suffer from limited annotated datasets, unintuitive image contrasts, and nuanced visual features. Recent prompt learning techniques, such as Context Optimization (CoOp) intend to tackle these issues, but still fall short in generalizability. Meanwhile, explorations in prompt learning for biomedical image analysis are still highly limited. In this work, we propose BiomedCoOp, a novel prompt learning framework that enables efficient adaptation of BiomedCLIP for accurate and highly generalizable few-shot biomedical image classification. Our approach achieves effective prompt context learning by leveraging semantic consistency with average prompt ensembles from Large Language Models (LLMs) and knowledge distillation with a statistics-based prompt selection strategy. We conducted comprehensive validation of our proposed framework on 11 medical datasets across 9 modalities and 10 organs against existing state-of-the-art methods, demonstrating significant improvements in both accuracy and generalizability.

Method

  1. Semantic Consistency with LLM-Enhanced Prompt Ensembles: Enhance context vector learning using prompt ensembles derived from GPT-4, combined with a knowledge distillation strategy to enforce semantic consistency.
  2. Outlier Pruning for Robust Generalization: Employ a statistics-based pruning strategy to filter outlier prompts from LLMs, mitigating over-specialization and preserving essential biomedical patterns.
  3. First Adoption of BiomedCLIP for Prompt Learning: Leverage BiomedCLIP for prompt learning for the first time, demonstrating superior performance over general knowledge CLIP in clinical tasks.
  4. Extensive Multi-Modal Evaluation: Evaluate across 11 biomedical image classification datasets, 9 modalities, and 10 organs, showcasing BiomedCoOp's superior generalizability and robustness in few-shot and base-to-novel benchmarks.

:ballot_box_with_check: Supported Methods

MethodPaperConfigsTraining ScriptsTrainers
BiomedCoOpCVPR 2025linklinklink
CLIPICML 2021linklinklink
CoOpIJCV 2022linklinklink
CoCoOpCVPR 2022linklinklink
KgCoOpCVPR 2023linklinklink
ProGradICCV 2023linklinklink
CLIP-AdapterIJCV 2024linklinklink
Tip-AdapterECCV 2022linklinklink
LPICML 2021linklinklink
LP++CVPR 2024linklinklink

Results

Results reported below show accuracy for few-shot scenarios as well as base and novel classes across 11 biomedical recognition datasets averaged over 3 seeds.

Few-shot Evaluation

MethodK=1K=1K=2K=2K=4K=4K=8K=8K=16K=16
CLIP-Adapter44.6643.9144.3645.4246.69
Tip-Adapter49.1952.3657.3361.9867.15
Tip-Adapter-F51.1752.7461.2365.9170.91
Standard LP47.2554.2161.0065.8569.40
LP++47.2453.1859.0263.6968.35
CoOp50.1654.1859.7565.8469.62
CoCoOp48.4951.2854.6961.0865.09
KgCoOp50.8553.1857.8262.0862.84
ProGrad51.8854.7160.4265.6167.13
BiomedCoOp57.0359.1363.9568.3272.42

Base-to-Novel Generalization

NameBase Acc.Novel Acc.HM
BiomedCLIP47.8465.4253.81
CoOp73.8564.7567.23
CoCoOp72.2667.0367.22
KgCoOp68.3664.0864.61
ProGrad71.6766.9367.43
BiomedCoOp (ours)76.2673.9275.07

Model Checkpoints and Logs

NameFew-ShotBase-to-Novel
BiomedCoOplinklink

Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

Training and Evaluation

Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results using our pre-trained models.


Citation

If you use our work, please consider citing:

@inproceedings{koleilat2025biomedcoop,
  title={Biomedcoop: Learning to prompt for biomedical vision-language models},
  author={Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={14766--14776},
  year={2025}
}

Acknowledgements

Our code builds upon the CoOp, MaPLe, and LP++ repositories. We are grateful to the authors for making their code publicly available. If you use our model or code, we kindly request that you also consider citing these foundational works.