LaZSL

July 13, 2025 ยท View on GitHub

This repository contains the code for the ICCV'25 paper titled with "Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model".

Pre-print version at [https://arxiv.org/abs/2506.23822]

Requirements

First install the dependencies.

Either manually:

conda install pytorch torchvision -c pytorch
conda install matplotlib torchmetrics -c conda-forge

Preparing Dataset

Please follow the instructions DATASETS.md to construct the datasets.

Running

To reproduce accuracy results from the paper: edit the directories to match your local machine in load_OP.py and set hparams['dataset'] accordingly. Then simply run python main_OP.py. Furthermore, all hyperparameters related to the different datasets are provided in the load_OP.py and all hyperparameters can be modified.

Results

Results of our released models using various evaluation protocols on 6 datasets.

DatasetAcc(ViT-B/32)Acc(ViT-B/16)Acc(ViT-L/14)
Imagenet65.369.275.7
CUB56.560.366.1
OxfordPets84.787.492.7
Food10185.989.793.5
Place36541.542.041.8

Citation

If you find LaZSL is useful in your research or applications, please consider giving us a star ๐ŸŒŸ and citing it by the following BibTeX entry.

@inproceedings{chen2025interpretable,
  title={Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model},
  author={Shiming, Chen and Bowen, Duan and Salman, Khan and Fahad Shahbaz, Khan},
  booktitle={ICCV}
  year={2025}
}