Latte
May 7, 2026 · View on GitHub
[ICCV 2025] Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning. Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He
Log
Complete code is available now!
- 2025/10/18: Dataset, core code of Latte
- 2026/05/05: Data partition, embedding caching, main function
- 2026/05/07: Scripts and dataset links
Prepare Data
Download or generate data
VLCS and TerraIncognita
We use the dataset provided by DomainBed.
CIFAR-10-C and CIFAR-100-C
Instead of using the given 10,000 samples for each dataset, we run the official code to generate corrupted images for the full 60,000 samples for each dataset. The generated data can be downloaded here:
Finally, the data should be arranged as:
${data_root}
│
├── domainbed
│ ├── VLCS
│ │ ├── Caltech101
│ │ │ ├── bird
│ │ │ └── ...
│ │ ├── ...
│ │ └── VOC2007
│ │ ├── bird
│ │ └── ...
│ │
│ └── terra_incognita
│ ├── location_100
│ │ ├── bird
│ │ └── ...
│ ├── ...
│ └── location_46
│ ├── bird
│ └── ...
│
└── corruption
├── CIFAR-10-C-Full
│ ├── brightness.npy
│ ├── ...
│ ├── pixelate.npy
│ └── labels.npy
│
└── CIFAR-100-C-Full
├── brightness.npy
├── ...
├── pixelate.npy
└── labels.npy
Cache image embeddings
For training-free TTA methods (TDA, DMN-ZS, Latte), the pre-trained model is not updated during the training. Therefore we can cache the image and text embeddings for more efficient experiments. To do that, run
cd ./shell
bash cache_emb.sh
Here are the embeddings I cached.
You may also download them to ./cache/. This should match with your results.
Run Latte
cd ./shell
bash vlcs.sh
bash terra_incognita.sh
bash cifar10c_full.sh
bash cifar100c_full.sh