Bayesian Non-parametrics for Out-of-distribution Detection

February 14, 2025 · View on GitHub

Bayesian non-parametrics are a natural solution for out-of-distribution (OOD) detection problems as they model the probabilty that a sample is generated from a unknown cluster (i.e. an outlier). Here we provide implementations of hierarchical Dirichlet process mixture models (DPMM) with Gaussian likelihoods for OOD detection. We provide expectation maximization methods for efficient inference and demonstrate their effectiveness of our approach on the OpenOOD benchmark. We analyze the covariance structure of the ViT-B/16 (DeiT) features for the Imagenet dataset which motivates the application of hierarchical DPMMs and the coupled hierarchical DPMM with diagonal covariance here. We also generate synthetic datasets to demonstate the performance of our approach in different data regimes here. For full details on our approach and experiments, please see our paper "A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection".

Imagenet Dataset Analysis

The covariance analysis figures discussed in the paper can be generated using the notebook ImagenetDataAnalysis.ipynb. We highlight that the diagonal elements of the empirical covariance matrices Σ^k\hat{\Sigma}_k are scaled up or down versions of their average Σ^\hat{\Sigma}. This analysis motivates the coupled hierarchical DPMM with diagonal covariance.

Synthetic Data Experiments

We sweep over the ν0\nu_0 parameter of the NIW prior used to generate the synthetic data to demonstrate the sensitivity of different models to how tied the class covariances are, as shown in the figure below.

We also sweep over the number of samples per class NkN_k. We see that compared to the independent RMDS model, the hierarchical DPMMs are more robust to small NkN_k.

The experiments can be recreated in the notebook SyntheticExperiments.ipynb.

OpenOOD Experiments

Near Far
Model Accuracy SSB Hard NINCO Avg. iNaturalist OpenImageO Textures Avg.
MSP 80.89 71.75 79.87 75.81 88.66 85.62 84.62 86.30
Temp. MSP 80.89 73.29 81.27 77.28 91.23 87.81 86.78 88.61
MDS 80.41 71.45 86.48 78.97 96.00 92.34 89.38 92.57
RMDS 80.41 72.79 87.28 80.03 96.09 92.29 89.38 92.59
Hierarchical DPMMs
Tied 80.41 71.80 86.76 79.28 96.00 92.40 89.72 92.70
Full 76.79 62.84 78.48 70.66 85.88 85.03 88.02 86.31
Diag. 76.54 73.89 87.32 80.60 95.36 90.78 86.41 90.85
Coupled 76.51 74.47 87.48 80.98 95.51 90.63 86.02 90.72

Scripts are provided in scripts/ to reproduce all of the OpenOOD experiments in the paper. The scripts save results with the following directory structure:

bnp4ood/
    openood_exps/
        {MODEL_NAME}/ # Model names: mds-rmds, full, tied, diag, coupled_diag
            logs/
            results/

The OpenOOD_Results notebook generates the tables presented in the paper from the saved results.

Installation

Ensure that the required python packages in requirements.txt are installed.

Vision Transformer Features

Downloading Features

We provide the features we generated from the OpenOOD experiments in our release here. Before running the experiments, you need to combine vit-b-16-img1k-feats-part*.pkl by running the script:

python combine_partial_feats.py --feats-file-prefix vit-b-16-img1k-feats
rm vit-b-16-img1k-feats-part*.pkl

Regenerating Features

To generate the features for the OpenOOD experiments, you will need to install the OpenOOD benchmark available here. We provide a script to extract the features from the OpenOOD benchmark, available here, that saves the features in the following format:

# ID Datasets
{MODEL_NAME}-img1k-feats.pkl # Train
{MODEL_NAME}-img1k-{ID_SPLIT}-feats.pkl # ID Splits: val, test

# OOD Datasets
# OOD Granularities: near, far
# OOD Datasets: ssb_hard, ninco, inaturalist, textures, openimages-o
{MODEL_NAME}-img1k-{OOD_GRANULARITY}_{DATASETNAME}-feats.pkl

where dataset names are the lowercase names of each dataset. To run this script, copy it to the OpenOOD/scripts/ directory and run the following command:

python openood/extract_imagenet_features.py --model_name {MODEL_NAME}

Once the features have been saved, link or copy them to this repository with the same names.