Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

July 18, 2025 ยท View on GitHub

Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles" (Slow Fast Sampling).

The Three Golden Principles: Certainty ยท Convergence ยท Positional

Pipeline
Fig. 1 โ€“ Throughput and Accuracy Comparison on GPQA (8-shot, Length=1024) with LLaDA and Our Proposed Methods.


โœจ Key Highlights

๐Ÿ’—๐Ÿ’—๐Ÿ’—What makes Slow Fast Sampling special?
Three Golden Principles ๐Ÿ‘‘Certainty, Convergence, Positional guide exactly when and where to decode.
Two-Stage Dance ๐Ÿขโ†’โšกCautious Slow phase finds a stable span, then the Fast phase parallel-decodes it in one swoop.
Plug-and-Play ๐Ÿ”ŒDrop-in sampler for any masked-diffusion LLM: LLaDA-8B, Dream-7B.
Crazy Speed-ups โšก15.6 ร— faster than vanilla diffusion; 34.2 ร— with dLLM-Cache โ€”with minimal accuracy loss.
Outruns ARMs ๐ŸƒBeats LLaMA-3 8B in throughput while matching accuracy (Table 4, p. 9).

๐Ÿš€ Pipeline at a Glance

SFS-overview
Fig. 2 โ€“ Overview of the Slow Fast Sampling Pipeline: From Exploratory to Accelerated Decoding.


๐Ÿ› ๏ธ Installation

# 1. Clone
git clone https://github.com/LiangrunFlora/Slow-Fast-Sampling.git
cd slow-fast-sampling

# 2. Env (Python โ‰ฅ 3.10) & Deps
bash install.sh         

๐Ÿ“˜ Quick Start

# GSM8K with LLaDA-8B
bash scripts/run_llada_gsm8k_base.sh

# GPQA with LLaDA-8B
bash scripts/run_llada_gpqa_base.sh

# BBH with Dream-7B
bash scripts/run_dream_bbh_base.sh

๐Ÿ“ฎ Contact

Created and maintained by Qingyan Wei (liangrun@csu.edu.cn). Feel free to open an issue or drop me an emailโ€”PRs are welcome!

๐ŸŽ‰ Acknowledgements

This project stands on the shoulders of LLaDA, Dream, dLLM-Cache and the lm-evaluation-harness. Huge thanks to these amazing communities for paving the way.

๐Ÿ“Œ Citation

If you find this work useful, please cite our paper:

@article{wei2025accelerating,
  title={Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles},
  author={Wei, Qingyan and Zhang, Yaojie and Liu, Zhiyuan and Liu, Dongrui and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2506.10848},
  year={2025}
}