[ICML 2026] dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching

May 1, 2026 · View on GitHub

Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache) in ICML 2026.

:fire: News

  • [2026/05/01] Our dLLM-Cache paper has been accepted to ICML 2026. Thanks!
  • [2025/06/15] Our dLLM-Cache is compatible with MMaDA.
  • [2025/05/31] Our dLLM-Cache is integrated into LLaDA-V.
  • [2025/05/23] The code of our paper has been released.
  • [2025/05/17] Our paper has been released.

✨️ Key Highlights

radar_speed

  • Currently supported models: LLaDA, Dream, LLaDA-V and MMaDA.
  • Speedup: Achieves up to 9.1x speedup over standard dLLM pipelines, with no performance loss on most tasks.
  • Evaluation: Evaluated on LLaDA 8B and Dream 7B.
  • Latency: Approaches ARM-level inference speeds in many scenarios.

:rocket: Pipeline

Here's an overview of the process behind our dLLM-Cache method: pipeline

🛠️ Installation

To get started with dLLM-Cache, follow the installation instructions below.

  1. Clone the Repository:
git clone https://github.com/maomaocun/dLLM-Cache.git
cd dLLM-Cache
  1. Set Up the Environment: Create a Python environment with conda or virtualenv and install dependencies:
bash install.sh
  1. Demo:
python demo_{model_name}.py
  1. Running Experiments: Run experiments using the provided scripts:
bash eval_scripts/run_{model_name}_{task_name}_base.sh

:blue_book: Example Usage

  1. GSM8K with LLaDA
bash eval_scripts/run_LLaDA_gsm8k_base.sh
  1. BBH with Dream
bash eval_scripts/run_Dream_bbh_base.sh

:postbox: Contact

If you have any questions, please email yangyicun187@gmail.com.

🎉 Acknowledgements

This repository was built off of LLaDA, Dream, LLaDA-V, MMaDA and lm-evaluation-harness.

:pushpin: Citation

If you find dLLM-Cache useful for your research and applications, please cite using this BibTeX:

@article{liu2025dllm,
  title={dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching},
  author={Liu, Zhiyuan and Yang, Yicun and Zhang, Yaojie and Chen, Junjie and Zou, Chang and Wei, Qingyuan and Wang, Shaobo and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2506.06295},
  year={2025}
}

:star2: Star History

Star History Chart