SpectralGCD (ICLR 2026)
March 18, 2026 · View on GitHub
Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery
This is the official repository of the ICLR 2026 paper "SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery" by Lorenzo Caselli, Marco Mistretta, Simone Magistri, Andrew D. Bagdanov.
Abstract
Generalized Category Discovery (GCD) aims to identify novel categories in unlabeled data while leveraging a small labeled subset of known classes. Training a parametric classifier solely on image features often leads to overfitting to old classes, and recent multimodal approaches improve performance by incorporating textual information. However, they treat modalities independently and incur high computational cost. We propose SpectralGCD, an efficient and effective multimodal approach to GCD that uses CLIP cross-modal image-concept similarities as a unified cross-modal representation. Each image is expressed as a mixture over semantic concepts from a large task-agnostic dictionary, which anchors learning to explicit semantics and reduces reliance on spurious visual cues. To maintain the semantic quality of representations learned by an efficient student, we introduce Spectral Filtering which exploits a cross-modal covariance matrix over the softmaxed similarities measured by a strong teacher model to automatically retain only relevant concepts from the dictionary. Forward and reverse knowledge distillation from the same teacher ensures that the cross-modal representations of the student remain both semantically sufficient and well-aligned. Across six benchmarks, SpectralGCD delivers accuracy comparable to or significantly superior to state-of-the-art methods at a fraction of the computational cost.

Check our demo on how to use Spectral Filtering on any dataset.
Citation
@inproceedings{caselli2026spectralgcd,
author={Lorenzo Caselli and Marco Mistretta and Simone Magistri and Andrew D. Bagdanov},
title={Spectral{GCD}: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=PyfV9tFmdR}
}
Installation
The codebase has been tested with Python 3.9 and PyTorch 2.6.0 with CUDA 12.4.
conda env create -f environment.yml
conda activate spectralgcd
Datasets
We evaluate on the following standard GCD benchmarks:
| Dataset | Total Classes | Known | Novel | Type |
|---|---|---|---|---|
| CIFAR-10 | 10 | 5 | 5 | Generic |
| CIFAR-100 | 100 | 80 | 20 | Generic |
| ImageNet-100 | 100 | 50 | 50 | Generic |
| CUB-200 | 200 | 100 | 100 | Fine-grained |
| Stanford Cars | 196 | 98 | 98 | Fine-grained |
| FGVC Aircraft | 100 | 50 | 50 | Fine-grained |
Download links:
- CIFAR-10/100 — auto-downloaded by torchvision
- ImageNet-100
- CUB-200 / Stanford Cars / FGVC Aircraft — via the Semantic Shift Benchmark splits
After downloading, set the dataset paths in config.py:
cifar_10_root = 'path_to_dataset/cifar10'
cifar_100_root = 'path_to_dataset/cifar100'
cub_root = 'path_to_dataset/cub'
aircraft_root = 'path_to_dataset/fgvc_aircraft'
car_root = 'path_to_dataset/stanford_cars'
imagenet_root = 'path_to_dataset/imagenet'
Reproducing the Experiments
The easiest way to run it is via the provided scripts, which handle all datasets and seeds automatically.
Quick start — all datasets
Set the paths at the top of the file, then run:
bash scripts/train_all_datasets.sh
This iterates over all six datasets (cub, scars, aircraft, cifar10, cifar100, imagenet_100), runs steps 1–3 for each, and repeats training for 3 seeds.
Quick start — single dataset
bash scripts/train_single_dataset.sh
Set DATASET_NAME at the top of the file to select the dataset (default: cub).
The steps can also be run individually as described below.
Step 1 — Save class name splits
Generates old_class_names.csv and new_class_names.csv under dataset_class_names/{dataset_name}/, encoding which classes are known (old) and which are novel.
python -m utils.save_old_class_names \
--dataset_name "cub" \
--use_ssb_splits
This must be run once per dataset before spectral filtering.
Step 2 — Spectral Filtering
Filters the concept dictionary down to a compact, discriminative subset relevant to the dataset. The output is a CSV file consumed by the training script.
python spectral_filtering.py \
--dataset_name "cub" \
--batch_size 128 \
--num_workers 8 \
--use_ssb_splits \
--use_torch_impl \
--thresholding_eig 0.95 \
--thresholding_concepts 0.99 \
--cuda_dev 0 \
--path_to_filtered_concepts /path/to/filtered_concepts \
--path_to_dictionary dictionaries/textgcd_tags_dictionary.csv \
--exp_root /path/to/exp_root \
--exp_id "cub_spectral_filtering"
The output file will be saved as {path_to_filtered_concepts}/{dataset_name}_concepts.csv.
Key parameters:
| Parameter | Default | Description |
|---|---|---|
--thresholding_eig | 0.99 | Variance threshold for eigenvalue selection (β_e) |
--thresholding_concepts | 0.99 | Variance threshold for concept filtering (β_c) |
--use_torch_impl | False | Use PyTorch GPU-accelerated eigendecomposition (recommended) |
--path_to_dictionary | — | Path to concept dictionary CSV (see available dictionaries) |
Concept dictionaries
Three pre-built dictionaries are provided under dictionaries/:
| File | Concepts | Source |
|---|---|---|
textgcd_tags_dictionary.csv | — | TextGCD tags (default) |
openimages_dictionary.csv | — | Open Images labels |
Step 3 — Training
python spectralgcd.py \
--dataset_name "cub" \
--batch_size 128 \
--epochs 200 \
--num_workers 8 \
--use_ssb_splits \
--sup_weight 0.35 \
--weight_decay 5e-5 \
--lr 0.1 \
--lr_backbone 0.005 \
--warmup_teacher_temp 0.07 \
--teacher_temp 0.04 \
--warmup_teacher_temp_epochs 30 \
--memax_weight 2 \
--seed 0 \
--cuda_dev 0 \
--path_to_filtered_concepts /path/to/filtered_concepts/cub_concepts.csv \
--path_to_saved_cross_modal_representations /path/to/saved_representations \
--exp_root /path/to/exp_root \
--exp_id "cub_spectralgcd"
Key hyperparameters:
| Parameter | Default | Description |
|---|---|---|
--lr | 0.1 | Learning rate for the projection head |
--lr_backbone | 0.005 | Learning rate for the CLIP backbone |
--sup_weight | 0.35 | Weight balancing supervised vs. unsupervised loss |
--memax_weight | 2 | Mean entropy maximization weight (dataset-specific) |
--teacher_temp | 0.04 | GCD head temperature after warmup |
--warmup_teacher_temp | 0.07 | Initial GCD head temperature |
--path_to_saved_cross_modal_representations | '' | Directory to cache teacher cross-modal features (set to '' to disable) |
Weights & Biases logging is disabled by default. To enable it, add:
--use_wandb \
--w_key_path /path/to/wandb_key.txt \
--project_name "spectralgcd" \
--group_name "my_group" \
--experiment_name "cub_run"
How To Use Spectral Filtering
If you want to use Spectral Filtering on some external/proprietary data, inside
spectral_filtering_demo.ipynb you can find a self-contained implementation that runs the full Spectral Filtering pipeline on any dataset you want.
It might be useful even for inspecting which concepts from a large dictionary are retained for a given dataset.
To run the demo, please set the following variables in the Configuration cell before proceeding:
| Variable | Description |
|---|---|
PROJECT_ROOT | Absolute path to the repository root |
AIRCRAFT_ROOT | Path to the FGVC-Aircraft dataset (swap for any other dataset loader) |
PATH_TO_DICTIONARY | Concept dictionary CSV (default: dictionaries/textgcd_tags_dictionary.csv) |
PATH_TO_OUTPUT | Where to save the filtered concept CSV |
CLIP_MODEL | HuggingFace Hub ID of the teacher CLIP model |
Acknowledgements
Our codebase builds upon GET and SimGCD. We thank the authors for their excellent work.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
If you have further questions or discussions, feel free to reach out:
Lorenzo Caselli (lorenzo.caselli@unifi.it - caselli.lorenzo1@gmail.com)