Super-Outlier in DLMs
May 10, 2026 · View on GitHub
Code accompanying the paper "Layer Collapse in Diffusion Language Models" by Alexander Conzelmann, Albert Catalan-Tatjer, and Shiwei Liu (Tübingen AI Center / MPI for Intelligent Systems / ELLIS Institute Tübingen). Link: https://arxiv.org/abs/2605.06366
We systematically evaluate pruning and quantization for diffusion language models (LLaDA-8B, DREAM-7B) against autoregressive baselines (Llama 3.1 8B, Qwen 2.5 7B), and study the layer-collapse phenomenon that emerges under sparsification.
Installation
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .
For development:
uv pip install -e ".[dev]"
pre-commit install
Run the test suite:
pytest tests/
Quick Start
Single run (Hydra config, override from CLI):
python scripts/run.py model=llada_8b pruning=wanda pruning.sparsity=0.5 evaluation=commonsense
HTCondor batch submission:
python scripts/submit.py model=llada_8b pruning=wanda evaluation=commonsense \
--multirun pruning.sparsity=0.2,0.3,0.4,0.5,0.6,0.7
A SLURM launcher is also available; both are configured via
configs/condor.yaml / configs/slurm.yaml and overridable per-cluster via
configs/local/{condor,slurm}.yaml (see the *.example templates). The
cluster to use is autodetected based on available command-line tools.
Reproducing the paper
The repo's out/ directory (eval result JSONs, ~3 GB) is gitignored. To
regenerate paper figures:
- Set environment variables:
export REPO_DIR=$PWD export WORK_DIR=/path/to/scratch export MODELS=/path/to/model/cache # HF model snapshots land here export HF_HOME=/path/to/hf/cache # datasets cache root - Pre-download models, C4 calibration data, and eval datasets:
python scripts/download_artifacts.py - Submit each surviving experiment. Each
experiments/AXX_*/run.shis self-contained and writes results into the flatout/directory:bash experiments/A11_owl_scores/run.sh bash experiments/A23_pruning_statistics/run.sh bash experiments/A24_pythia160m/run.sh bash experiments/A25_activation_histograms/run.sh bash experiments/A26_strategy_gap_fill/run.sh bash experiments/A27_channel_magnitude_per_step/run.sh - Render every paper figure:
Figures land underbash scripts/replot_paper_figures.shplots/experiments/AXX_*/.
Mapping experiments to paper figures
| Experiment | Produces |
|---|---|
A11_owl_scores | OWL outlier-score analyses |
A23_pruning_statistics | Pruning statistics across models |
A24_pythia160m | 160M-scale ablations |
A25_activation_histograms | Per-layer activation heatmaps |
A26_strategy_gap_fill | Sparsity-allocation strategy comparison |
A27_channel_magnitude_per_step | Per-step channel-magnitude sweep |
Project Structure
src/diffusion_prune/: Main packagemodel/: Model loading (AR + DLM)pruning/: WANDA, DWANDA (diffusion-aware), magnitude, SparseGPT, OWL / alpha sparsity allocationquantization/: GPTQ, RTN, plus virtual variants for DLMsevaluation/: lm-eval-harness integration with result cachingdiffusion_masking.py: Random-timestep masking for DLM calibration
configs/: Hydra configs (model/,pruning/,quantization/,evaluation/, plus cluster launcherscondor.yaml/slurm.yaml)scripts/: Entry points (run.py,submit.py), figure / table generation (plot.py,_tables.py,summary_table.py,baseline_table.py,best_hyperparams.py,replot_paper_figures.sh,pruning_statistics.py, per-stat modules understats/), data download (download_*.py)experiments/: One folder per paper experiment (AXX_<short_desc>), each withrun.shandplot.shout/: Flat layout of evaluation result JSONs (gitignored)plots/: Generated figures (gitignored)tests/: pytest suite
Models
Base and instruct variants of:
- LLaDA-8B (DLM)
- DREAM-7B (DLM)
- Llama 3.1 8B (AR)
- Qwen 2.5 7B (AR)
Tasks
- QnA (base models): arc_challenge, hellaswag, piqa, winogrande, boolq,
openbookqa (
evaluation=commonsense) - Reasoning (instruct models): GSM8K (
evaluation=gsm8k)
Methods
- Pruning: WANDA, DWANDA, magnitude, SparseGPT; with uniform / OWL / alpha (deeper-is-sparser, earlier-is-sparser) allocations
- Quantization: GPTQ, RTN, plus virtual variants for DLMs
Configuration
Configs are composed via Hydra. The default entry is configs/config.yaml;
override fields from the CLI:
python scripts/run.py model=dream_7b pruning=wanda pruning.sparsity=0.5 \
pruning.allocation=earlier evaluation=commonsense
Paths are controlled by the REPO_DIR, WORK_DIR, MODELS, HF_HOME env
vars (see "Reproducing the paper" above).
License
Released under the Apache 2.0 license — see LICENSE.
Citation
If you use this code, please cite:
@misc{conzelmann2026layercollapsediffusionlanguage,
title={Layer Collapse in Diffusion Language Models},
author={Alexander Conzelmann and Albert Catalan-Tatjer and Shiwei Liu},
year={2026},
eprint={2605.06366},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.06366},
}