README.md

March 11, 2026 ยท View on GitHub

๐Ÿฆ… TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery

Accepted to CVPR 2026

โ€ข [arXiv] โ€ข

๐Ÿ’ก TL;DR

TALON is a novel approach for on-the-fly category discovery (OCD) that utilizes a test-time adaptation framework to continuously learn from unlabeled data streams. This repository contains the official implementation of our CVPR 2026 paper.

๐Ÿ“‚ Project Structure

TALON/
โ”œโ”€โ”€ config.py                      <- Dataset root paths & DINO pretrain path
โ”œโ”€โ”€ train.py                       <- Training entry point
โ”œโ”€โ”€ test.py                        <- Evaluation entry point
โ”œโ”€โ”€ pyproject.toml                 <- Project metadata & dependencies (uv)
โ”‚
โ”œโ”€โ”€ data/                          <- Dataset loading modules
โ”‚   โ”œโ”€โ”€ cifar.py                      CIFAR-10 / CIFAR-100
โ”‚   โ”œโ”€โ”€ cub.py                        CUB-200-2011
โ”‚   โ”œโ”€โ”€ food101.py                    Food-101
โ”‚   โ”œโ”€โ”€ pets.py                       Oxford-IIIT Pet
โ”‚   โ”œโ”€โ”€ scars.py                      Stanford Cars
โ”‚   โ””โ”€โ”€ imagenet.py                   ImageNet-100
โ”‚
โ”œโ”€โ”€ methods/                       <- Model implementations
โ”‚   โ””โ”€โ”€ talon/
โ”‚       โ”œโ”€โ”€ model.py                  TALONModel (backbone + learnable prototypes)
โ”‚       โ”œโ”€โ”€ trainer.py                Training loop, TTA, evaluation logic
โ”‚       โ””โ”€โ”€ utils.py                  NCM prototypes, logits, metrics
โ”‚
โ”œโ”€โ”€ tools/                         <- General utilities
โ”‚   โ”œโ”€โ”€ evaluate_utils.py             Clustering accuracy (Hungarian assignment)
โ”‚   โ”œโ”€โ”€ losses.py                     Loss functions
โ”‚   โ””โ”€โ”€ train_utils.py               SmoothedValue, training helpers
โ”‚
โ””โ”€โ”€ checkpoints/                   <- Pretrained model weights (download below)
    โ”œโ”€โ”€ clip/{cub,food,scars}/
    โ””โ”€โ”€ dino/{cub,food,scars}/

โš™๏ธ Dependencies and Installation

This project uses uv for dependency management to ensure a clean and reproducible environment.

Requirements:

  • Python >= 3.12
  • PyTorch (CUDA 12.4)
  • OpenAI CLIP / timm (DINO ViT-B/16)
# 1. Clone the repository
git clone https://github.com/ynanwu/TALON
cd TALON

# 2. Install uv (if not already installed)
# See https://github.com/astral-sh/uv for details

# 3. Install all dependencies
uv sync

# That's it! Use `uv run` to execute any script โ€” no need to manually activate the venv.

๐Ÿ“ฆ Pretrained Models

We provide pretrained checkpoints for CUB, Food-101, and Stanford Cars using both CLIP and DINO backbones.

๐Ÿ“ฅ Download from Google Drive or Hugging Face and place them as follows:

checkpoints/
โ”œโ”€โ”€ clip/
โ”‚   โ”œโ”€โ”€ cub/
โ”‚   โ”‚   โ””โ”€โ”€ best_model.pth
โ”‚   โ”œโ”€โ”€ food/
โ”‚   โ”‚   โ””โ”€โ”€ best_model.pth
โ”‚   โ””โ”€โ”€ scars/
โ”‚       โ””โ”€โ”€ best_model.pth
โ””โ”€โ”€ dino/
    โ”œโ”€โ”€ cub/
    โ”‚   โ””โ”€โ”€ best_model.pth
    โ”œโ”€โ”€ food/
    โ”‚   โ””โ”€โ”€ best_model.pth
    โ””โ”€โ”€ scars/
        โ””โ”€โ”€ best_model.pth

๐Ÿ“Š Datasets

Supported datasets and their known / novel class splits:

DatasetTotal ClassesKnown ClassesNovel Classes
CIFAR-101064
CIFAR-1001008020
CUB-200-2011200100100
Oxford-IIIT Pet371918
Stanford Cars1969898
Food-1011015150
ImageNet-1001008020

Configure the dataset root paths in config.py before running:

# config.py
CUB_ROOT = "datasets/CUB"
FOOD_101_ROOT = "datasets/Food101"
OXFORD_PET_ROOT = "datasets/OxfordPets"
SCARS_ROOT = "datasets/stanford_cars/"
CIFAR_10_ROOT = "datasets/CIFAR/"
CIFAR_100_ROOT = "datasets/CIFAR/"
IMAGENET_ROOT = "datasets/imagenet/"

# DINO backbone pretrained weights
pretrain_path = "dino_vitbase16_pretrain.pth"

๐Ÿš€ Usage

Training

# CUB with CLIP backbone
uv run train.py --dataset_name cub --backbone clip --save_dir my_experiment --device cuda:0

# Food-101 with DINO backbone, custom tau and TTA
uv run train.py --dataset_name food --backbone dino --tau 0.8 --tta_state M+P --epochs 100 --device cuda:0

# Stanford Cars with CLIP, Model TTA only
uv run train.py --dataset_name scars --backbone clip --tta_state M --save_dir scars_exp --device cuda:0
๐Ÿ“‹ Full list of training arguments
ArgumentTypeDefaultDescription
--seedint1028Random seed
--dataset_namestrcubpets / scars / cub / food / imagenet100 / cifar10 / cifar100
--backbonestrclipclip or dino
--tta_statestrM+PM / P / M+P / none (see below)
--taufloat0.75Threshold for novel class detection (see below)
--devicestrautoe.g. cuda:0 (auto-selects best GPU if empty)
--save_dirstrtestDirectory for logs and checkpoints
--train_batch_sizeint128Training batch size
--eval_batch_sizeint64Evaluation batch size
--num_workersint8DataLoader workers
--prop_train_labelsfloat0.5Proportion of labeled training data
--epochsint100Total training epochs
--start_epochint0Resume from epoch
--clip_gradfloatNoneGradient clipping max norm

๐ŸŒก๏ธ About --tau (Novel Class Detection Threshold)

tau controls the confidence threshold for novel class detection. When a test sample's maximum cosine similarity to all known prototypes is below tau, it is identified as a novel (unseen) class and a new prototype is created.

๐Ÿ”„ About --tta_state (Test-Time Adaptation Modes)

ModeWhat Gets UpdatedDescription
MBackbone norm layersModel TTA โ€” fine-tunes the affine parameters (weight & bias) of LayerNorm/BatchNorm in the last transformer block. Minimizes entropy + maximizes instance-to-prototype similarity + inter-class repulsion.
PClass prototypesPrototype TTA โ€” updates class prototype vectors via EMA based on test features. Adapts the classifier without touching the backbone.
M+PBothJoint TTA โ€” applies both Model TTA and Prototype TTA simultaneously. Typically yields the best performance. (Recommended)
noneNothingNo adaptation. Uses fixed model and prototypes. Useful as a baseline.

Evaluation

# Evaluate CLIP on CUB (with default M+P TTA)
uv run test.py --dataset_name cub --backbone clip --ckpt_path checkpoints/clip/cub/best_model.pth

# Evaluate DINO on Food-101 with Prototype TTA only
uv run test.py --dataset_name food --backbone dino --tta_state P --ckpt_path checkpoints/dino/food/best_model.pth

# Evaluate without TTA (baseline)
uv run test.py --dataset_name scars --backbone clip --tta_state none --ckpt_path checkpoints/clip/scars/best_model.pth

๐Ÿ“ Citation

If you find this work useful for your research, please consider citing our paper:

@inproceedings{talon2026,
  title={TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery},
  author={Wu, Yanan and Yan, Yuhan and Chen, Tailai and Chi, Zhixiang and Wu, ZiZhang and Jin, Yi and Wang, Yang and Li Zhenbo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}