NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection
December 9, 2025 Β· View on GitHub
Official implementation of NegRefine, accepted to ICCV 2025.
π Paper on arXiv
NegRefine improves negative label-based zero-shot OOD detection by:
- Filtering subcategories and proper nouns from the negative label set using an LLM
- Multi-matching-aware scoring that accounts for images matching multiple labels
With these improvements, NegRefine achieves state-of-the-art results on large-scale ImageNet-1K benchmark.
π Code Overview
The repository is structured as follows:
neg_refine/
ββ data/ # Dataset root (add datasets here)
ββ output/ # Save folder for outputs and results per dataset/seed
β ββ imagenet/seed_0/ # Example folder for ImageNet with seed 0
ββ scripts/ # Bash scripts for running experiments
β ββ ...
ββ src/ # Python source code
β ββ class_names.py # Dataset class names and prompt templates
β ββ clip_ood.py # Main method for CLIP-based zero-shot OOD detection
β ββ create_negs.py # Generates initial negative labels (CSP-based)
β ββ eval.py # Entry point for experiments and evaluation
β ββ neg_filter.py # LLM-based refinement of negative labels
β ββ ood_evaluate.py # OOD evaluation metrics (AUROC, FPR@95, etc.)
ββ txtfiles/ # WordNet lexicon text files (adjectives/nouns)
β ββ ...
βοΈ Environment Setup
This project was developed with Python 3.10.12 and PyTorch 2.6.0 on Ubuntu 22.04.
- CLIP: We used the OpenAI CLIP implementation.
- LLM: For negative label filtering, we primarily used Qwen2.5-14B-Instruct-1M via Hugging Face.
- Other dependencies: See requirements.txt for the full list of packages.
π¦ Dataset Downloads
Below are the sources for downloading the datasets used in our experiments:
-
ImageNet-1K: Download from the ImageNet Challenge 2012 website. Only the validation data is required.
-
NINCO & Clean: Available from the NINCO GitHub. The provided
.tar.gzfile includes both: NINCO dataset (NINCO_OOD_classes) and Clean Collection (NINCO_popular_datasets_subsamples, obtained through manual analysis of random samples from 11 common OOD datasets). -
OpenImage-O: Can be downloaded from OpenOOD using the provided download script.
-
ImageNet-10, ImageNet-20, ImageNet-100: Refer to the MCM GitHub for instructions to create these subsets of ImageNet-1K classes.
Note: In our experiments, we modified ImageNet-100 to create ImageNet-99 by removing the βrace carβ class (class n04037443). -
iNaturalist, SUN, Places, Textures: Download links available on the MOS GitHub.
-
CUB-200, Stanford Cars, Food-101, Oxford Pets: Download links available on the MCM GitHub.
-
Waterbirds (Spurious OOD): Refer to this MCM GitHub issue.
After downloading, place all datasets in the data/ folder.
Refer to (or modify) the load_dataset() function in src/eval.py for the exact folder structure and naming conventions used for data loading.
π Running Experiments
The script to run each experiment from the main paper is provided in the scripts/ folder.
Scripts are named after the in-distribution datasets used in the experiments.
For example, to reproduce the ImageNet-1K benchmark, run:
sh scripts/imagenet.sh
The results of each experimentβincluding evaluation metrics, logs, and negative label filesβwill be saved in the output/ folder.
π Example Results
As an illustration, we provide the saved results for ImageNet-1K with seed 0, available in output/imagenet/seed_0/. These include the saved negative labels, LLM refinement logs, and final evaluation results.
Results (In-Distribution: ImageNet-1K, Seed 0):
| OOD Dataset | AUROC (%) | FPR@95 (%) |
|---|---|---|
| β iNaturalist | 99.57 | 1.51 |
| β OpenImage-O | 95.02 | 24.03 |
| β Clean | 90.70 | 33.04 |
| β NINCO | 81.90 | 62.11 |
| SUN | 94.64 | 22.93 |
| Places | 90.42 | 39.10 |
| Textures | 94.69 | 21.15 |
Note: Only the first four datasets are considered valid OOD data and are included in the main paper results, as they contain minimal or no in-distribution contamination. In contrast, SUN, Places, and Textures contain notable overlap with ImageNet-1K classes, leading to in-distribution contamination. For further discussion, refer to our paper and the NINCO paper.
The table above shows results for ImageNet-1K with seed 0.
For the complete set of experiments and results, averaged over 10 seeds, please refer to our main paper.
π Acknowledgements
Our code is built on the excellent work of CSP and NegLabel. We sincerely thank the authors.
π Citation
If you find this work useful in your research, please consider citing our paper:
@inproceedings{ansari2025negrefine,
title={NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection},
author={Ansari, Amirhossein and Wang, Ke and Xiong, Pulei},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={573--582},
year={2025}
}