The Diffusion Duality Series

June 6, 2026 · View on GitHub

Unlocks few-step generation in discrete diffusion-LLMs via the underlying Gaussian diffusion.

Chapter II: Ψ-Samplers (ICLR 2026)

By Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo

Uniform-state beats Masked diffusion on text and image generation!

This repository contains the code for the two papers in the Diffusion Duality series. It includes:

Duo / $\text{Duo}^\text{++}$ sampling (ancestral, ReMDM, $\Psi$ -samplers, greedy-tail) — Sampling & Eval
Original and efficient curriculum training strategies — Training
Discrete Consistency Distillation (DCD) — Distillation
Baselines (AR, MDLM, SEDD, D3PM) — Baselines

Getting Started | Checkpoints | Citation

Getting Started

To get started, create a conda environment containing the required dependencies.

conda create -n duo python=3.12
conda activate duo
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1

Checkpoints

Duo (Language Modeling): Trained on OpenWebText for 1M training steps (distilled / base):
- Huggingface🤗.
- Google Drive folder as the HF checkpoints can't be finetuned.
Duo (Image Modeling): Trained on CIFAR-10
- Huggingface (contains the raw checkpoints)
Baselines (SEDD, MDLM, AR): Trained on OpenWebText
- Google Drive folder — download ar.ckpt, mdlm.ckpt, sedd.ckpt.

Training

This repo implements the original Duo curriculum, as well as the fast $\text{Duo}^\text{++}$ curriculum. By default, the training scripts use the original curriculum. To enable the efficient curriculum, simply replace algo.curriculum.mode=simple by algo.curriculum.mode=poly9 (see comments in each training script).

To train $\text{Duo}^\text{++}$ , use the following scripts:

LM1B
- w/ sentencepacking (same as in D3PM)
  - Training script: scripts/train_lm1b_duo_sentencepacking.sh
  - Wandb run
- w/o sentencepacking (same as in MDLM, SEDD)
  - Training script: scripts/train_lm1b_duo.sh
  - Wandb run
OWT: scripts/train_owt_duo.sh.
CIFAR-10:
- Duo: scripts/train_cifar10_duo_cosine.sh
- MDLM: scripts/train_cifar10_mdlm_cosine.sh
- Both scripts default to a cosine noise schedule. To use log-linear instead, set noise=log-linear.

Notes:

Run mkdir watch_folder to create a directory to store slurm logs, and then run any script in scripts/ as a slurm job: sbatch scripts/ABC_XYZ.sh
Control the batch size per GPU using the argument loader.batch_size. If loader.batch_size * num_gpus < loader.global_batch_size, PyTorch Lightning resorts to gradient accumulation.

Discrete Consistency Distillation

To distill a model using the Discrete Consistency Distillation (Alg. 1 in the Duo paper), use scripts/distil_owt.sh.

Sampling & Eval

Likelihood

To compute test perplexity on the validation set of OWT use scripts/eval_owt_duo.sh and for zero shot perplexities use scripts/zero_shot_duo.sh.

Sampling

You can sample with ancestral sampling using the scripts in scripts/gen_ppl_*.sh. To sample with the PC samplers such as ReMDM and our $\Psi$ -samplers, use the scripts in scripts/psi_samplers. This directory contains examples for sampling text and images.

To use the "Greedy-tail sampler" (equivalent to nucleus sampling in AR models; see Sec. 4.2 in the paper), set sampling.noise_removal=greedy. Using the default sampling.noise_removal=ancestral will produce more diverse samples (higher entropy) but with worse generative perplexity.

To sample from a HuggingFace checkpoint (text only), run the following command:

python main.py \
  mode=sample_eval \
  loader.batch_size=2 \
  loader.eval_batch_size=8 \
  data=openwebtext-split \
  algo=duo_base \
  algo.backbone=hf_dit \
  eval.checkpoint_path=s-sahoo/duo-distilled \
  sampling.steps=8 \
  sampling.num_sample_batches=1 \
  sampling.noise_removal=greedy \
  +wandb.offline=true

To use the example scripts with raw checkpoints (see Checkpoints), download them and set the checkpoint path in the script.

Baselines

Download the baseline checkpoints (see Checkpoints) and specify the paths appropriately in the respective shell scripts:

scripts/eval_owt_*.sh for computing validation perplexity on OWT.
scripts/gen_ppl_*.sh for generating text samples and evaluating them.
scripts/zero_shot_*.sh for computing zero shot perplexities.
scripts/train_*.sh for training the models.

Acknowledgements & Citation

This repository was built off of MDLM's Github repository. Cite our papers using:

@inproceedings{
    sahoo2025the,
    title={The Diffusion Duality},
    author={Subham Sekhar Sahoo and Justin Deschenaux and Aaron Gokaslan and Guanghan Wang and Justin T Chiu and Volodymyr Kuleshov},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025},
    url={https://openreview.net/forum?id=9P9Y8FOSOk}
}

@inproceedings{
    deschenaux2026the,
    title={The Diffusion Duality, Chapter {II}: \${\textbackslash}Psi\$-Samplers},
    author={Justin Deschenaux and Caglar Gulcehre and Subham Sekhar Sahoo},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=RSIoYWIzaP}
}