DLLM-AccelEval
May 26, 2026 ยท View on GitHub
Evaluation code for A Comparative Survey of Inference Acceleration for DLLMs against AR-LLMs: No Free Lunch.
This repository organizes the experiments in the survey around two questions:
- How do different inference-acceleration techniques behave on diffusion language models (DLLMs)?
- How do optimized or integrated DLLM systems compare with autoregressive LLM (AR-LLM) baselines?
https://github.com/user-attachments/assets/1b30415c-5f40-40e1-aee7-56c1e74f3a6e
Installation
git clone https://github.com/haoyun-jiang/DLLM-AccelEval.git
cd DLLM-AccelEval
conda create -n dllm-acceleval python=3.10 -y
conda activate dllm-acceleval
pip install -r requirements.txt
pip install -e .
Create a local environment file and fill in model paths:
cp .env.example .env
Before running examples that rely on local model paths, export the required variables or load them from .env.
Supported Methods
| Area | Documentation | Example script |
|---|---|---|
| KV/cache management | docs/kv_cache.md | scripts/run_kv_cache.sh |
| Sparse computation | docs/sparse_computation.md | scripts/run_sparse_computation.sh |
| Decoding methods | docs/decoding_methods.md | scripts/run_decoding_methods.sh |
| Speculative decoding | docs/speculative_decoding.md | scripts/run_speculative_decoding.sh |
| Integrated DLLM systems | docs/integrated.md | scripts/run_integrated.sh |
| AR baselines | docs/ar.md | scripts/run_ar.sh |
Quick Start
Run the KV-cache examples:
bash scripts/run_kv_cache.sh
SPA-Cache requires pre-computed SVD proxy files. To include the SPA-Cache example in scripts/run_kv_cache.sh, generate them once before running the script:
python src/spacache_svd.py $LLADA_INST_PATH
Run the sparse-computation examples:
bash scripts/run_sparse_computation.sh
Run the decoding-method examples:
bash scripts/run_decoding_methods.sh
Run the speculative-decoding examples:
bash scripts/run_speculative_decoding.sh
Run the integrated DLLM examples:
bash scripts/run_integrated.sh
Run the AR baseline examples:
bash scripts/run_ar.sh
Each script contains explicit example commands and writes Hydra outputs under outputs/examples/.
Citation
If you find this repository useful, please cite:
@article{jiang2026comparative,
title={A Comparative Survey of Inference Acceleration for DLLMs against AR-LLMs: No Free Lunch},
author={Jiang, Haoyun and He, Junqi and Wang, Muyi and Zeng, Fanqin and Hong, Feng and Yu, Geng and Chen, Pengyi and Ye, Yushi and Cao, Yuting and Fu, Yicheng and others},
year={2026},
publisher={Preprints}
}
Acknowledgements
This repository builds on and adapts ideas, interfaces, and evaluation utilities from several open research projects. We gratefully acknowledge the authors and maintainers of d2Cache, Fast-dLLM v2, EAGLE-3, and the Language Model Evaluation Harness. Their released methods and infrastructure provide important foundations for reproducible studies of efficient LLM inference.