GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised Learning

October 9, 2025 · View on GitHub

Official repository of GOPro.

British Machine Vision Conference (BMVC) 2023

Abstract

Large-scale foundation models, such as CLIP, have demonstrated remarkable success in visual recognition tasks by embedding images in a semantically rich space. Selfsupervised learning (SSL) has also shown promise in improving visual recognition by learning invariant features. However, the combination of CLIP with SSL is found to face challenges due to the multi-task framework that blends CLIP’s contrastive loss and SSL’s loss, including difficulties with loss weighting and inconsistency among different views of images in CLIP’s output space. To overcome these challenges, we propose a prompt learning-based model called GOPro, which is a unified framework that ensures similarity between various augmented views of input images in a shared image-text embedding space, using a pair of learnable image and text projectors atop CLIP, to promote invariance and generalizability. To automatically learn such prompts, we leverage the visual content and style primitives extracted from pre-trained CLIP and adapt them to the target task. In addition to CLIP’s cross-domain contrastive loss, we introduce a visual contrastive loss and a novel prompt consistency loss, considering the different views of the images. GOPro is trained end-to-end on all three loss objectives, combining the strengths of CLIP and SSL in a principled manner. Empirical evaluations demonstrate that GOPro outperforms the state-of-the-art prompting techniques on three challenging domain generalization tasks across multiple benchmarks by a significant margin.

$ conda create -n gopro python=3.8
$ conda activate gopro
$ conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=10.2 -c pytorch
$ pip install -r requirements.txt

Code

datasets folder contains the dataloader files of each datasets.
trainers folder contains the code of our model.
Clone the awesome toolbox of dassl inside this repo.
scripts folder holds the scripts of for training and testing.
Define the dataset and task (base2new, cross-dataset, domain-generalization) in the script command

$ cd scripts
$ bash train.sh caltech101 basenew
$ bash test.sh caltech101 basenew

@article{singha2023gopro,
  title={GOPRO: Generate and Optimize Prompts in CLIP using Self-Supervised Learning},
  author={Singha, Mainak and Jha, Ankit and Banerjee, Biplab},
  journal={arXiv preprint arXiv:2308.11605},
  year={2023}
}

Acknowledgements

Thanks to the authors of CoOp as our code is mainly based on this repository.

GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised Learning

British Machine Vision Conference (BMVC) 2023

Abstract

Architecture

Datasets

How to install

Create your environment:

Code

Results

Base-to-New Class Generalization

Cross Dataset Generalization

Domain Generalization

Bibtex

Acknowledgements