DLLM-AccelEval

May 26, 2026 ยท View on GitHub

Evaluation code for A Comparative Survey of Inference Acceleration for DLLMs against AR-LLMs: No Free Lunch.

This repository organizes the experiments in the survey around two questions:

  1. How do different inference-acceleration techniques behave on diffusion language models (DLLMs)?
  2. How do optimized or integrated DLLM systems compare with autoregressive LLM (AR-LLM) baselines?

https://github.com/user-attachments/assets/1b30415c-5f40-40e1-aee7-56c1e74f3a6e

Installation

git clone https://github.com/haoyun-jiang/DLLM-AccelEval.git
cd DLLM-AccelEval

conda create -n dllm-acceleval python=3.10 -y
conda activate dllm-acceleval

pip install -r requirements.txt
pip install -e .

Create a local environment file and fill in model paths:

cp .env.example .env

Before running examples that rely on local model paths, export the required variables or load them from .env.

Supported Methods

AreaDocumentationExample script
KV/cache managementdocs/kv_cache.mdscripts/run_kv_cache.sh
Sparse computationdocs/sparse_computation.mdscripts/run_sparse_computation.sh
Decoding methodsdocs/decoding_methods.mdscripts/run_decoding_methods.sh
Speculative decodingdocs/speculative_decoding.mdscripts/run_speculative_decoding.sh
Integrated DLLM systemsdocs/integrated.mdscripts/run_integrated.sh
AR baselinesdocs/ar.mdscripts/run_ar.sh

Quick Start

Run the KV-cache examples:

bash scripts/run_kv_cache.sh

SPA-Cache requires pre-computed SVD proxy files. To include the SPA-Cache example in scripts/run_kv_cache.sh, generate them once before running the script:

python src/spacache_svd.py $LLADA_INST_PATH

Run the sparse-computation examples:

bash scripts/run_sparse_computation.sh

Run the decoding-method examples:

bash scripts/run_decoding_methods.sh

Run the speculative-decoding examples:

bash scripts/run_speculative_decoding.sh

Run the integrated DLLM examples:

bash scripts/run_integrated.sh

Run the AR baseline examples:

bash scripts/run_ar.sh

Each script contains explicit example commands and writes Hydra outputs under outputs/examples/.

Citation

If you find this repository useful, please cite:

@article{jiang2026comparative,
  title={A Comparative Survey of Inference Acceleration for DLLMs against AR-LLMs: No Free Lunch},
  author={Jiang, Haoyun and He, Junqi and Wang, Muyi and Zeng, Fanqin and Hong, Feng and Yu, Geng and Chen, Pengyi and Ye, Yushi and Cao, Yuting and Fu, Yicheng and others},
  year={2026},
  publisher={Preprints}
}

Acknowledgements

This repository builds on and adapts ideas, interfaces, and evaluation utilities from several open research projects. We gratefully acknowledge the authors and maintainers of d2Cache, Fast-dLLM v2, EAGLE-3, and the Language Model Evaluation Harness. Their released methods and infrastructure provide important foundations for reproducible studies of efficient LLM inference.