Noise-efficient Private Dataset Distillation

July 31, 2025 · View on GitHub

This repository provides an implementation of a framework for noise-efficient differentially private dataset distillation. The code is adapted from Differentially Private Dataset Condensation.

Overview

The framework introduces Decoupled Optimization and Sampling (DOS) and Subspace discovery for Error Reduction (SER) to improve the utility of distilled datasets under differential privacy constraints.

DOSser Architecture

Prerequisites

See requirements.txt

Usage

Step 1: Compute Noise Magnitude

To calculate the noise scale (sigma) for a given privacy budget:

python compute_sigma_with_fixed_budget.py

Specify your privacy budget and dataset parameters in the script.

Step 2: Dataset Distillation

Using the computed sigma, run the distillation process:

CUDA_VISIBLE_DEVICES=1 python dosser.py \
    --sampling_iteration 1000 \
    --training_iteration 200000 \
    --dataset CIFAR10 \
    --aux_path /data/rzheng/sd_cifar10_50000_96 \
    --aux_ipc 100 \
    --ser_dim 1000 \
    --SER --PEA

Parameters:

--sampling_iteration: Number of sampling iterations.
--training_iteration: Number of optimization iterations.
--dataset: Dataset to be used (e.g., CIFAR10).
--aux_path: Path to auxiliary dataset.
--aux_ipc: Images per class for auxiliary dataset.
--ser_dim: Dimension of the subspace for SER.
--SER: Enable Subspace Error Reduction (SER).
--PEA: Use Partitioning and Expansion Augmentation (optional)