πŸš€O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models (CVPR 2025 - highlighted(Top- 13%))

July 8, 2025 Β· View on GitHub

Our contributions are summarized as follows:

  • We provide new insights underlying the suboptimal performance of an existing top-performing calibration method for test-time prompt tuning

  • We propose a novel approach (named O-TPT) for calibrating test-time prompt tuning for VLMs by enforcing orthogonality constraints. This is accomplished by introducing orthogonal regularization on the textual features.

  • We perform an extensive evaluation to validate our approach on various datasets and across different baselines. Results reveal that O-TPT provides consistent gains over the state-of-the-art methods in overall average calibration performance with several different baselines. Moreover, our O-TPT provides better calibration performance than the zero-shot CLIP which reveals improved calibration compared to existing SOTA

O-TPT Results

Paper arXiv πŸ”— Project Page

✍️ Authors


Relationship between cosine similarity and ECE

O-TPT Results

πŸ“Œ Contents

  1. πŸ“₯ Installation
  2. πŸ“‚ Datasets
  3. πŸ”§ Run Experiments
  4. πŸ“Š Main Results
  5. πŸ™ Acknowledgement
  6. πŸ“– Citation
  7. πŸ“§ Contact

πŸ“₯ Installation

#Steps to set up the environment
 1. git clone https://github.com/ashshaksharifdeen/O-TPT.git
 2. cd O-TPT
 3. conda env create -f environment.yml
 4. conda activate otpt

πŸ“‚ Datasets

We have conducted main experiments on fine-grained and natural distribution shift datasets:

  • Fine-grained datasets:

    1. ImageNet
    2. Flower102
    3. OxfordPets
    4. SUN397
    5. DTD
    6. Food101
    7. StanfordCars
    8. Aircraft
    9. UCF101
    10. EuroSAT
    11. Caltech101
  • Natural distribution shift datasets:

    1. ImageNet-V2
    2. ImageNet-A
    3. ImageNet-R
    4. ImageNet-Sketch

Follow this repository for datasets preparation: TPT

πŸ”§ Run Experiments

In each .sh file, you can edit the root dataset directory location as well as configure the baseline, whether it's β€˜RN50’ or β€˜ViT-B/16’. Also, you can switch between different experiment modes by changing run_type, whether it is opt, tpt baselines, or calibration with temperature scaling.

🏁 Baseline Experiment

#bash scripts/test_baseline.sh /I/DTD/Flower102/Food101/Cars/SUN397/Aircraft/Pets/Caltech101/UCF101/eurosat for fine-grained classification
  bash scripts/test_baseline.sh {dataset}
  

🎯 TPT Experiment


#Fine-grained classification
bash scripts/test_tpt_fg.sh {dataset}

#Natural distribution shift
bash scripts/test_tpt_ds.sh {dataset}

πŸ”₯ O-TPT Experiment

#Fine-grained classification
bash scripts/test_tpt_otpt_fg.sh {dataset}

#natural distribution shift
bash scripts/test_tpt_otpt_ds.sh {dataset}

πŸ“Š Main Results

Comparison of calibration performance with CLIP-ViTB/16 backbone.

MethodMetricINetDfIDFLWFoodSUNAirPetsCaltUCFSATCarAvg
Zero ShotAcc.66.744.367.383.662.523.988.092.965.041.365.363.7
ECE2.128.503.002.392.535.114.375.503.5913.894.254.43
TPTAcc.69.046.769.084.764.523.487.193.867.342.466.365.0
ECE10.621.213.53.9811.316.85.774.512.5413.25.1611.6
C-TPTAcc.68.546.069.883.764.824.8588.293.6365.743.265.864.57
ECE3.1511.95.043.435.044.361.94.242.5413.21.595.13
Robust-adapt-SaLs-CTPTAcc.68.0445.5169.4383.1864.3823.9488.1293.6365.3243.0565.4864.55
ECE2.6314.562.741.263.566.213.163.786.9614.922.825.69
Robust-adapt-Penalty-CTPTAcc.68.0445.6969.5583.2864.3623.9187.9593.4765.3244.0665.5364.65
ECE2.6313.95.273.354.874.431.634.562.297.081.254.66
Robust-adapt-ZS-CTPTAcc.68.0145.6369.5583.2564.4123.8888.0393.3165.2442.6465.4564.51
ECE3.0112.354.943.85.164.312.064.342.1712.231.75.09
O-TPT (Ours)Acc.67.3345.6870.0784.1364.2323.6487.9593.9564.1642.8464.5364.41
ECE1.967.883.871.464.933.681.93.82.3412.981.784.21

🌍 Comparison of calibration performance with CLIP-ViTB/16 backbone on Natural distribution shift datasets:

MethodMetricI-AI-V2I-RI-SAvg
CLIP-ViT-B/16Acc.47.860.874.046.157.2
ECE8.613.013.584.955.04
TPTAcc.52.663.076.747.559.9
ECE16.411.14.3616.112.0
C-TPTAcc.51.662.776.047.959.6
ECE8.166.231.547.355.82
O-TPT (Ours)Acc.49.8761.6572.5547.1257.80
ECE7.223.971.466.874.88

πŸ™ Acknowledgement

We are thankful to the authors of TPT, C-TPT, and CoOp/CoCoOp for their open-source contributions.

πŸ“– Citation

If you find our work useful for your research, please consider citing it:

@InProceedings{Sharifdeen_2025_CVPR,
    author    = {Sharifdeen, Ashshak and Munir, Muhammad Akhtar and Baliah, Sanoojan and Khan, Salman and Khan, Muhammad Haris},
    title     = {O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {19942-19951}
}

πŸ“§ Contact

If you need any further clarification, please feel free to contact me at ashshaks@gmail.com.