TAP

April 22, 2025 · View on GitHub

Official implementation of our ICLR 2025 paper:

Tree of Attributes Prompt Learning for Vision-Language Models
Tong Ding, Wanhua Li, Zhongqi Miao, Hanspeter Pfister
In Proceedings of the International Conference on Learning Representation (ICLR 2025)

Comparison between TAP and existing methods for CLIP text prompts formation.

Installation

# Create a conda environment
conda create -y -n tap python=3.9

# Activate the environment
conda activate tap

# Install torch
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1


# Install Dassl
git clone https://github.com/KaiyangZhou/Dassl.pytorch.git
cd Dassl.pytorch/
pip install -r requirements.txt
python setup.py develop
cd ..

# Clone TAP code base
git clone https://github.com/HHenryD/TAP
cd TAP
# Install TAP requirements
pip install -r requirements.txt

Dataset

Please follow the CoOp repo to prepare the datasets. Set your data directory in the scripts/base_to_new.sh file.

Training and evaluation

bash scripts/base_to_new.sh

Results

Results reported below show accuracy for base and novel classes for across 11 recognition datasets averaged over 3 seeds.

TAP in comparison with existing state-of-the-art

Name	Base Acc.	Novel Acc.	HM
CLIP	69.34	74.22	71.70
CoOp	82.69	63.22	71.66
CoCoOp	80.47	71.69	75.83
ProGrad	82.48	70.75	76.16
RPO	81.13	75.00	77.78
LoGoPrompt	84.47	74.24	79.03
PromptSRC	84.26	76.10	79.97
TAP (ours)	84.75	77.63	81.04

Acknowledgements

The project was built on top of CoOP repository. We thank the authors and developers for their contribution.

Citation

If you find our work useful in your research or if you use parts of this code please consider citing our paper

@inproceedings{ding2025tree,
  title={Tree of Attributes Prompt Learning for Vision-Language Models},
  author={Tong Ding and Wanhua Li and Zhongqi Miao and Hanspeter Pfister},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}