README.md
March 11, 2026 ยท View on GitHub
๐ฆ TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery
โข [arXiv] โข
๐ก TL;DR
TALON is a novel approach for on-the-fly category discovery (OCD) that utilizes a test-time adaptation framework to continuously learn from unlabeled data streams. This repository contains the official implementation of our CVPR 2026 paper.
๐ Project Structure
TALON/
โโโ config.py <- Dataset root paths & DINO pretrain path
โโโ train.py <- Training entry point
โโโ test.py <- Evaluation entry point
โโโ pyproject.toml <- Project metadata & dependencies (uv)
โ
โโโ data/ <- Dataset loading modules
โ โโโ cifar.py CIFAR-10 / CIFAR-100
โ โโโ cub.py CUB-200-2011
โ โโโ food101.py Food-101
โ โโโ pets.py Oxford-IIIT Pet
โ โโโ scars.py Stanford Cars
โ โโโ imagenet.py ImageNet-100
โ
โโโ methods/ <- Model implementations
โ โโโ talon/
โ โโโ model.py TALONModel (backbone + learnable prototypes)
โ โโโ trainer.py Training loop, TTA, evaluation logic
โ โโโ utils.py NCM prototypes, logits, metrics
โ
โโโ tools/ <- General utilities
โ โโโ evaluate_utils.py Clustering accuracy (Hungarian assignment)
โ โโโ losses.py Loss functions
โ โโโ train_utils.py SmoothedValue, training helpers
โ
โโโ checkpoints/ <- Pretrained model weights (download below)
โโโ clip/{cub,food,scars}/
โโโ dino/{cub,food,scars}/
โ๏ธ Dependencies and Installation
This project uses uv for dependency management to ensure a clean and reproducible environment.
Requirements:
- Python >= 3.12
- PyTorch (CUDA 12.4)
- OpenAI CLIP / timm (DINO ViT-B/16)
# 1. Clone the repository
git clone https://github.com/ynanwu/TALON
cd TALON
# 2. Install uv (if not already installed)
# See https://github.com/astral-sh/uv for details
# 3. Install all dependencies
uv sync
# That's it! Use `uv run` to execute any script โ no need to manually activate the venv.
๐ฆ Pretrained Models
We provide pretrained checkpoints for CUB, Food-101, and Stanford Cars using both CLIP and DINO backbones.
๐ฅ Download from Google Drive or Hugging Face and place them as follows:
checkpoints/
โโโ clip/
โ โโโ cub/
โ โ โโโ best_model.pth
โ โโโ food/
โ โ โโโ best_model.pth
โ โโโ scars/
โ โโโ best_model.pth
โโโ dino/
โโโ cub/
โ โโโ best_model.pth
โโโ food/
โ โโโ best_model.pth
โโโ scars/
โโโ best_model.pth
๐ Datasets
Supported datasets and their known / novel class splits:
| Dataset | Total Classes | Known Classes | Novel Classes |
|---|---|---|---|
| CIFAR-10 | 10 | 6 | 4 |
| CIFAR-100 | 100 | 80 | 20 |
| CUB-200-2011 | 200 | 100 | 100 |
| Oxford-IIIT Pet | 37 | 19 | 18 |
| Stanford Cars | 196 | 98 | 98 |
| Food-101 | 101 | 51 | 50 |
| ImageNet-100 | 100 | 80 | 20 |
Configure the dataset root paths in config.py before running:
# config.py
CUB_ROOT = "datasets/CUB"
FOOD_101_ROOT = "datasets/Food101"
OXFORD_PET_ROOT = "datasets/OxfordPets"
SCARS_ROOT = "datasets/stanford_cars/"
CIFAR_10_ROOT = "datasets/CIFAR/"
CIFAR_100_ROOT = "datasets/CIFAR/"
IMAGENET_ROOT = "datasets/imagenet/"
# DINO backbone pretrained weights
pretrain_path = "dino_vitbase16_pretrain.pth"
๐ Usage
Training
# CUB with CLIP backbone
uv run train.py --dataset_name cub --backbone clip --save_dir my_experiment --device cuda:0
# Food-101 with DINO backbone, custom tau and TTA
uv run train.py --dataset_name food --backbone dino --tau 0.8 --tta_state M+P --epochs 100 --device cuda:0
# Stanford Cars with CLIP, Model TTA only
uv run train.py --dataset_name scars --backbone clip --tta_state M --save_dir scars_exp --device cuda:0
๐ Full list of training arguments
| Argument | Type | Default | Description |
|---|---|---|---|
--seed | int | 1028 | Random seed |
--dataset_name | str | cub | pets / scars / cub / food / imagenet100 / cifar10 / cifar100 |
--backbone | str | clip | clip or dino |
--tta_state | str | M+P | M / P / M+P / none (see below) |
--tau | float | 0.75 | Threshold for novel class detection (see below) |
--device | str | auto | e.g. cuda:0 (auto-selects best GPU if empty) |
--save_dir | str | test | Directory for logs and checkpoints |
--train_batch_size | int | 128 | Training batch size |
--eval_batch_size | int | 64 | Evaluation batch size |
--num_workers | int | 8 | DataLoader workers |
--prop_train_labels | float | 0.5 | Proportion of labeled training data |
--epochs | int | 100 | Total training epochs |
--start_epoch | int | 0 | Resume from epoch |
--clip_grad | float | None | Gradient clipping max norm |
๐ก๏ธ About --tau (Novel Class Detection Threshold)
tau controls the confidence threshold for novel class detection. When a test sample's maximum cosine similarity to all known prototypes is below tau, it is identified as a novel (unseen) class and a new prototype is created.
๐ About --tta_state (Test-Time Adaptation Modes)
| Mode | What Gets Updated | Description |
|---|---|---|
M | Backbone norm layers | Model TTA โ fine-tunes the affine parameters (weight & bias) of LayerNorm/BatchNorm in the last transformer block. Minimizes entropy + maximizes instance-to-prototype similarity + inter-class repulsion. |
P | Class prototypes | Prototype TTA โ updates class prototype vectors via EMA based on test features. Adapts the classifier without touching the backbone. |
M+P | Both | Joint TTA โ applies both Model TTA and Prototype TTA simultaneously. Typically yields the best performance. (Recommended) |
none | Nothing | No adaptation. Uses fixed model and prototypes. Useful as a baseline. |
Evaluation
# Evaluate CLIP on CUB (with default M+P TTA)
uv run test.py --dataset_name cub --backbone clip --ckpt_path checkpoints/clip/cub/best_model.pth
# Evaluate DINO on Food-101 with Prototype TTA only
uv run test.py --dataset_name food --backbone dino --tta_state P --ckpt_path checkpoints/dino/food/best_model.pth
# Evaluate without TTA (baseline)
uv run test.py --dataset_name scars --backbone clip --tta_state none --ckpt_path checkpoints/clip/scars/best_model.pth
๐ Citation
If you find this work useful for your research, please consider citing our paper:
@inproceedings{talon2026,
title={TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery},
author={Wu, Yanan and Yan, Yuhan and Chen, Tailai and Chi, Zhixiang and Wu, ZiZhang and Jin, Yi and Wang, Yang and Li Zhenbo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}