README.md

September 19, 2024 · View on GitHub

[CVPR'24] A noisy elephant in the room:
Is your out-of-distribution detector robust to label noise?

noisy vs. clean training labels?	ID mistakes vs. OOD images?	difficulty of the OOD set?

🔎 About

Considering how pervasive the problem of label noise is in real-world image classification datasets, its effect on OOD detection is crucial to study. To address this gap, we systematically analyse the label noise robustness of a wide range of OOD detectors. Specifically:

We present the first study of post-hoc OOD detection in the presence of noisy classification labels, examining the performance of 20 state-of-the-art methods under different types and levels of label noise in the training data. Our study includes multiple classification architectures and datasets, ranging from the beloved CIFAR10 to the more difficult Clothing1M, and shows that even at a low noise rate, the label noise setting poses an interesting challenge for many methods.
We revisit the notion that OOD detection performance correlates with ID accuracy, examining when and why this relation holds. Robustness to inaccurate classification requires that OOD detectors effectively separate mistakes on ID data from OOD samples - yet most existing methods confound the two.

👩🏻‍🏫 Getting started

What's in this repo?

the analysis folder contains the scripts used to process and analyse results.
- The analysis/paper_figures.ipynb notebook is a good place to start. It reproduces all the visualizations and results in the paper, supplementary material and poster.
the run folder contains bash scripts to train the base classifiers on different sets of (clean or noisy) labels (e.g. run/cifar10_train.sh), and then evaluate post-hoc OOD detectors (e.g. run/cifar10_eval.sh). Training checkpoints and OOD detection results are saved in the results folder.

The rest of the repo follows the structure of OpenOOD:

data/images_classic contains the raw ID & OOD datasets and annotations. See data/README.md for download instructions.
data/benchmark_imglist contains the list of images and corresponding label for each train, val, test and OOD set. For example, the training labels for CIFAR-10N-Agg (9.01% noise rate) can be found in data/benchmark_imglist/train_cifar10n_agg.txt . We provide all the .txt files used in our experiments, as well as the scripts used to generate them.
- for the code used to generate the clean & real noisy label sets, see the dataset-specific notebooks in the data/images_classic folder (.e.g create_txt_files_cifar10.ipynb, create_txt_files_clothing1m.ipynb, create_txt_files_cub.ipynb ...)
- synthetic label sets are generated from the data/benchmark_imglist/generate_synth_labels.ipynb notebook.

Conda environments

This code was tested on Ubuntu 18.04 + CUDA 11.3 & Ubuntu 20.04 + CUDA 12.5 with Python 3.11.3 + PyTorch 2.0.1. CUDA & PyTorch are only necessary for training classifiers and evaluating OOD detectors yourself. If you are only interested in reproducing the paper's tables & visualizations, you can install a minimal environment.

Minimal environment

conda create --name ood-labelnoise-viz python=3.11.3
conda activate ood-labelnoise-viz
pip install -r requirements_viz.txt

Full environment

conda create -n ood-labelnoise python=3.11.3
conda activate ood-labelnoise
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html
conda install gcc_linux-64 gxx_linux-64
pip install Cython==3.0.2
pip install -r requirements_full.txt

OOD detection methods

We benchmark the following 20 post-hoc OOD detection methods (listed in the order that they are presented in the paper). Their implementations are based on the OpenOOD benchmark, except for MDSEnsemble and GRAM which we modified to better align with the original papers.

Name	Implementation	Paper
MSP	BasePostprocessor	Hendrycks et al. 2017
TempScaling	TemperatureScalingPostprocessor	Guo et al. 2017
ODIN	ODINPostprocessor	Liang et al. 2018
GEN	GENPostprocessor	Liu et al. 2023
MLS	MaxLogitPostprocessor	Hendrycks et al. 2022
EBO	EBOPostprocessor	Liu et al. 2020
REACT	ReactPostprocessor	Sun et al. 2021
RankFeat	RankFeatPostprocessor	Song et al. 2022
DICE	DICEPostprocessor	Sun et al. 2022
ASH	ASHPostprocessor	Djurisic et al. 2023
MDS	MDSPostprocessor	Lee et al. 2018
MDSEnsemble	MDSEnsemblePostprocessorMod	Lee et al. 2018
RMDS	RMDSPostprocessor	Ren et al. 2021
KLM	KLMatchingPostprocessor	Hendrycks et al. 2022
OpenMax	OpenMax	Bendale et al. 2016
SHE	SHEPostprocessor	Zhang et al. 2023
GRAM	GRAMPostprocessorMod	Sastry et al. 2020
KNN	KNNPostprocessor	Sun et al. 2022
VIM	VIMPostprocessor	Wang et al. 2022
GradNorm	GradNormPostprocessor	Huang et al. 2021

📝 Updates

June 13th 2024: Code repo released
June 7th 2024: Project page released

📚 Citation

If you find our work useful, please cite:

@InProceedings{Humblot-Renaux_2024_CVPR,
    author={Humblot-Renaux, Galadrielle and Escalera, Sergio and Moeslund, Thomas B.},
    title={A Noisy Elephant in the Room: Is Your Out-of-Distribution Detector Robust to Label Noise?},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month={June},
    year={2024},
    pages={22626-22636},
    doi={10.1109/CVPR52733.2024.02135}
}

✉️ Contact

If you have have any issues or doubts about the code, please create a Github issue. Otherwise, you can contact me at gegeh@create.aau.dk

🤝🏼 Acknowledgements

Our codebase heavily builds on the OpenOOD benchmark. We list our main changes in the paper's supplementary material.
Our benchmark includes the CIFAR-N and Clothing1M datasets. These are highly valuable as they provide pairs of clean vs. real noisy labels.
We use the deep-significance implementation of the Almost Stochastic Order test in our experimental comparisons.
We follow the training procedure and splits from the Semantic Shift Benchmark to evaluate fine-grained semantic shift detection.
The Compact Transformer and MLPMixer model implementation and training hyper-parameters are based on the following repositories: Compact-Transformers and vision-transformers-cifar10.

[CVPR'24] A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

[CVPR'24] A noisy elephant in the room:
Is your out-of-distribution detector robust to label noise?