Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation

September 3, 2025 ยท View on GitHub

๐Ÿ“ฐ News

๐ŸŽ‰ Our paper "Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation" has been accepted at ICCV 2025!
This work explores efficient online test-time adaptation for real-world distribution shifts, focusing on robustness without retraining.

๐Ÿ“ Conference: ICCV 2025
๐Ÿ“„ Paper: Arxiv version


Overview

This repository contains the official implementation of our token condensation method for Vision Transformers with Test-Time Adaptation (TCA). Our approach demonstrates that intelligent token pruning combined with adaptive classification can improve both efficiency and performance during test-time adaptation to distribution shifts.

Token Condensation Method

We explore whether reducing the number of visual tokens processed by Vision Transformers can improve efficiency without sacrificing performance. We implement and compare three token pruning strategies:

  • EViT: Efficient Vision Transformer that drops tokens based on attention scores
  • ToME: Token Merging that combines similar tokens
  • Ours: Our novel token condensation approach using coreset averaging and hierarchical token selection

Our method is evaluated on CLIP (Contrastive Language-Image Pre-training) models across multiple datasets with test-time adaptation.

Key Features

  • ๐Ÿš€ Efficient Token Condensation: Reduces computational cost by processing fewer tokens
  • ๐ŸŽฏ Training-free Test-Time Adaptation: Adapts to test distributions using reservoir-based caching without retraining
  • ๐Ÿ“Š Comprehensive Evaluation: Tested on 15+ datasets including ImageNet variants
  • ๐Ÿ”ง Flexible Framework: Easy to extend with new pruning methods
  • ๐ŸŒŸ Real-world Robustness: Focuses on practical distribution shifts

Environment Setup

Prerequisites

  • Python 3.9+
  • CUDA-compatible GPU (recommended)
  • Anaconda or Miniconda

Installation

  1. Clone the repository:
git clone https://github.com/Jo-wang/TCA.git
cd TCA
  1. Create conda environment:
conda env create -f environment.yaml
conda activate TTA

Dataset Preparation

Supported Datasets

The framework supports the following datasets:

ImageNet Variants:

  • ImageNet (I)
  • ImageNet-A (A) - Natural adversarial examples
  • ImageNet-V (V) - ImageNetV2 matched frequency
  • ImageNet-R (R) - Rendition
  • ImageNet-S (S) - Sketch

Fine-grained Classification:

  • Caltech101
  • DTD (Describable Textures Dataset)
  • EuroSAT
  • FGVC (Fine-Grained Visual Classification)
  • Food101
  • Oxford Flowers
  • Oxford Pets
  • Stanford Cars
  • SUN397
  • UCF101

Data Structure

Based on the project's codebase, organize your datasets as follows:

data/
โ”œโ”€โ”€ imagenet/
โ”‚   โ””โ”€โ”€ val/                 # ImageNet validation images
โ”œโ”€โ”€ imagenet-a/              # ImageNet-A (natural adversarial examples)
โ”œโ”€โ”€ imagenet-r/              # ImageNet-R (rendition)
โ”œโ”€โ”€ imagenet-s/
โ”‚   โ””โ”€โ”€ sketch/              # ImageNet-Sketch
โ”œโ”€โ”€ caltech101/
โ”‚   โ””โ”€โ”€ 101_ObjectCategories/# Caltech101 images
โ”œโ”€โ”€ dtd/
โ”‚   โ””โ”€โ”€ images/              # Describable Textures Dataset
โ”œโ”€โ”€ eurosat/                 # EuroSAT satellite images
โ”œโ”€โ”€ fgvc/
โ”‚   โ””โ”€โ”€ data/
โ”‚       โ””โ”€โ”€ images/          # FGVC Aircraft images
โ”œโ”€โ”€ food-101/
โ”‚   โ””โ”€โ”€ images/              # Food-101 images
โ”œโ”€โ”€ oxford_flowers/          # Oxford Flowers images
โ”œโ”€โ”€ oxford_pets/
โ”‚   โ”œโ”€โ”€ images/              # Oxford Pets images
โ”‚   โ””โ”€โ”€ annotations/         # Oxford Pets annotations
โ”œโ”€โ”€ sun397/
โ”‚   โ””โ”€โ”€ SUN397/              # SUN397 images
โ””โ”€โ”€ ucf101/                  # UCF101 action recognition

Download Instructions

For detailed dataset preparation instructions, please refer to CoOp's data preparation guide.

Usage

Basic Usage

Run the token pruning experiment with our method:

python runner.py 

Command Line Arguments

If you want to ensure similar FLOPs cost for EViT, ToME, and Ours. Please set Ours = 0.035 when EViT and ToME = 0.1 in--token_pruning.

ArgumentDescriptionDefault
--configPath to configuration directoryconfigs/
--datasetsDatasets to processoxford_flowers
--data-rootPath to datasets directorydata/
--backboneCLIP model backboneViT-B/16
--token_pruningPruning method and rateOurs-0.035
--wandb-logEnable Weights & Biases loggingFalse
--reservoir-simUse cosine similarity for cachingTrue
--divUse diverse samples for cachingTrue
--token_simUse token-level similarityTrue
--flagFuse similarity with current sampleTrue

Method Details

Our Token Pruning Approach

Our method introduces several key innovations:

  1. Hierarchical Token Selection: Instead of binary keep/drop decisions, we use multiple levels of token importance
  2. Coreset Averaging: Groups similar tokens and represents them with fewer representative tokens
  3. Class Token Context: Leverages previously seen examples to guide token selection
  4. Information Preservation: Summarizes dropped tokens rather than discarding them completely

Test-Time Adaptation

The TCA framework:

  • Maintains a reservoir of representative samples per class
  • Uses feature similarity to guide sample selection
  • Dynamically updates predictions based on test distribution
  • Combines CLIP predictions with reservoir-based adaptation

Results

Experimental Results

Our method achieves:

  • Efficiency: Reduces token count by up to 90% in later transformer layers
  • Performance: Maintains or improves accuracy compared to full token processing
  • Adaptability: Better adaptation to test distributions through reservoir caching

Citation

If you use this code in your research, please cite:

@article{wang2024less,
  title={Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation},
  author={Wang, Zixin and Gong, Dong and Wang, Sen and Huang, Zi and Luo, Yadan},
  journal={arXiv preprint arXiv:2410.14729},
  year={2024}
}

Acknowledgments

  • OpenAI CLIP for the base model
  • EViT for efficient vision transformer implementation
  • ToME for token merging techniques
  • TDA for test-time adaptation on CLIP

Contact

For questions or issues, please open an issue on GitHub or contact [zixin.wang@uq.edu.au].