The Diffusion Duality Series

June 6, 2026 · View on GitHub

The Diffusion Duality Series

Chapter I (ICML 2025)

By Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov

GitHub Open In Colab YouTube deploy arXiv deploy

Unlocks few-step generation in discrete diffusion-LLMs via the underlying Gaussian diffusion.

Chapter II: Ψ-Samplers (ICLR 2026)

By Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo

Open In Colab deploy arXiv

Uniform-state beats Masked diffusion on text and image generation!

This repository contains the code for the two papers in the Diffusion Duality series. It includes:

  • Duo / Duo++\text{Duo}^\text{++} sampling (ancestral, ReMDM, Ψ\Psi-samplers, greedy-tail) — Sampling & Eval
  • Original and efficient curriculum training strategies — Training
  • Discrete Consistency Distillation (DCD) — Distillation
  • Baselines (AR, MDLM, SEDD, D3PM) — Baselines

Getting Started | Checkpoints | Citation

Getting Started

To get started, create a conda environment containing the required dependencies.

conda create -n duo python=3.12
conda activate duo
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1

Checkpoints

Training

This repo implements the original Duo curriculum, as well as the fast Duo++\text{Duo}^\text{++} curriculum. By default, the training scripts use the original curriculum. To enable the efficient curriculum, simply replace algo.curriculum.mode=simple by algo.curriculum.mode=poly9 (see comments in each training script).

To train Duo++\text{Duo}^\text{++}, use the following scripts:

Notes:

  • Run mkdir watch_folder to create a directory to store slurm logs, and then run any script in scripts/ as a slurm job: sbatch scripts/ABC_XYZ.sh
  • Control the batch size per GPU using the argument loader.batch_size. If loader.batch_size * num_gpus < loader.global_batch_size, PyTorch Lightning resorts to gradient accumulation.

Discrete Consistency Distillation

To distill a model using the Discrete Consistency Distillation (Alg. 1 in the Duo paper), use scripts/distil_owt.sh.

Sampling & Eval

Likelihood

To compute test perplexity on the validation set of OWT use scripts/eval_owt_duo.sh and for zero shot perplexities use scripts/zero_shot_duo.sh.

Sampling

You can sample with ancestral sampling using the scripts in scripts/gen_ppl_*.sh. To sample with the PC samplers such as ReMDM and our Ψ\Psi-samplers, use the scripts in scripts/psi_samplers. This directory contains examples for sampling text and images.

To use the "Greedy-tail sampler" (equivalent to nucleus sampling in AR models; see Sec. 4.2 in the paper), set sampling.noise_removal=greedy. Using the default sampling.noise_removal=ancestral will produce more diverse samples (higher entropy) but with worse generative perplexity.

To sample from a HuggingFace checkpoint (text only), run the following command:

python main.py \
  mode=sample_eval \
  loader.batch_size=2 \
  loader.eval_batch_size=8 \
  data=openwebtext-split \
  algo=duo_base \
  algo.backbone=hf_dit \
  eval.checkpoint_path=s-sahoo/duo-distilled \
  sampling.steps=8 \
  sampling.num_sample_batches=1 \
  sampling.noise_removal=greedy \
  +wandb.offline=true 

To use the example scripts with raw checkpoints (see Checkpoints), download them and set the checkpoint path in the script.

Baselines

Download the baseline checkpoints (see Checkpoints) and specify the paths appropriately in the respective shell scripts:

Acknowledgements & Citation

This repository was built off of MDLM's Github repository. Cite our papers using:

@inproceedings{
    sahoo2025the,
    title={The Diffusion Duality},
    author={Subham Sekhar Sahoo and Justin Deschenaux and Aaron Gokaslan and Guanghan Wang and Justin T Chiu and Volodymyr Kuleshov},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025},
    url={https://openreview.net/forum?id=9P9Y8FOSOk}
}

@inproceedings{
    deschenaux2026the,
    title={The Diffusion Duality, Chapter {II}: \${\textbackslash}Psi\$-Samplers},
    author={Justin Deschenaux and Caglar Gulcehre and Subham Sekhar Sahoo},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=RSIoYWIzaP}
}