[ICCV2025] MCP
November 10, 2025 ยท View on GitHub
Official Implementation of "MCP: Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models" (ICCV 2025)
๐ Read our paper: arxiv
๐ Project page: MCP-ICCV25
Overview
๐ This repository contains the official implementation of MCP (Multi-Cache Enhanced Prototype Learning), proposed in our ICCV 2025 paper.
MCP introduces a multi-cache mechanism that dynamically maintains prototype representations during test-time adaptation (TTA) of vision-language models (VLMs), enabling robust generalization under distribution shifts.
Abstract: In the zero-shot setting, test-time adaptation adjusts pre-trained models using unlabeled data from the test phase to enhance performance on unknown test distributions. Existing cache-enhanced TTA methods rely on a low-entropy criterion to select samples for prototype construction, assuming intra-class compactness. However, low-entropy samples may be unreliable under distribution shifts, and the resulting prototypes may not ensure compact intra-class distributions. This study identifies a positive correlation between cache-enhanced performance and intra-class compactness. Based on this observation, we propose a Multi-Cache enhanced Prototype-based Test-Time Adaptation (MCP) featuring three caches: an entropy cache for initializing prototype representations with low-entropy samples, an align cache for integrating visual and textual information to achieve compact intra-class distributions, and a negative cache for prediction calibration using high-entropy samples. We further develop MCP++, a framework incorporating cross-modal prototype alignment and residual learning, introducing prototype residual fine-tuning. Comparative and ablation experiments across 15 downstream tasks demonstrate that the proposed method and framework achieve state-of-the-art generalization performance.
Installation
git clone https://github.com/CenturyChen/MCP.git
cd MCP
conda create -n mcp python=3.10
conda activate mcp
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
Dataset
๐ฆ For dataset preparation, please follow the instructions provided in DATASETS.md.
Our implementation follows the same dataset setup protocol as TDA, covering the two standard benchmarks used in test-time adaptation.
Run MCP
OOD Benchmark
- ResNet50: Run MCP on the OOD Benchmark using the ResNet50 model:
bash ./scripts/run_ood_benchmark_rn50.sh
- ViT-B/16: Run MCP on the OOD Benchmark using the ViT-B/16 model:
bash ./scripts/run_ood_benchmark_vit.sh
Cross-Domain Benchmark
- ResNet50: Run MCP on the Cross-Domain Benchmark using the ResNet50 model:
bash ./scripts/run_cd_benchmark_rn50.sh
- ViT-B/16: Run MCP on the Cross-Domain Benchmark using the ViT-B/16 model:
bash ./scripts/run_cd_benchmark_vit.sh
Citation
๐ค If you find our work useful, please consider citing:
@inproceedings{chen2025multi,
title={Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models},
author={Chen, Xinyu and Zhai, Haotian and Zhang, Can and Shi, Xiupeng and Li, Ruirui},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={2281--2291},
year={2025}
}
Acknowledgements
This research is inspired by previous works including Tip-Adapter, TPT, CuPL, TDA, and DPE-CLIP. We appreciate their valuable open-source efforts.