Super-Outlier in DLMs

May 10, 2026 · View on GitHub

Code accompanying the paper "Layer Collapse in Diffusion Language Models" by Alexander Conzelmann, Albert Catalan-Tatjer, and Shiwei Liu (Tübingen AI Center / MPI for Intelligent Systems / ELLIS Institute Tübingen). Link: https://arxiv.org/abs/2605.06366

We systematically evaluate pruning and quantization for diffusion language models (LLaDA-8B, DREAM-7B) against autoregressive baselines (Llama 3.1 8B, Qwen 2.5 7B), and study the layer-collapse phenomenon that emerges under sparsification.

Installation

uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .

For development:

uv pip install -e ".[dev]"
pre-commit install

Run the test suite:

pytest tests/

Quick Start

Single run (Hydra config, override from CLI):

python scripts/run.py model=llada_8b pruning=wanda pruning.sparsity=0.5 evaluation=commonsense

HTCondor batch submission:

python scripts/submit.py model=llada_8b pruning=wanda evaluation=commonsense \
    --multirun pruning.sparsity=0.2,0.3,0.4,0.5,0.6,0.7

A SLURM launcher is also available; both are configured via configs/condor.yaml / configs/slurm.yaml and overridable per-cluster via configs/local/{condor,slurm}.yaml (see the *.example templates). The cluster to use is autodetected based on available command-line tools.

Reproducing the paper

The repo's out/ directory (eval result JSONs, ~3 GB) is gitignored. To regenerate paper figures:

Set environment variables:

export REPO_DIR=$PWD
export WORK_DIR=/path/to/scratch
export MODELS=/path/to/model/cache       # HF model snapshots land here
export HF_HOME=/path/to/hf/cache         # datasets cache root

Pre-download models, C4 calibration data, and eval datasets:
```
python scripts/download_artifacts.py
```

Submit each surviving experiment. Each experiments/AXX_*/run.sh is self-contained and writes results into the flat out/ directory:

bash experiments/A11_owl_scores/run.sh
bash experiments/A23_pruning_statistics/run.sh
bash experiments/A24_pythia160m/run.sh
bash experiments/A25_activation_histograms/run.sh
bash experiments/A26_strategy_gap_fill/run.sh
bash experiments/A27_channel_magnitude_per_step/run.sh

Render every paper figure:
```
bash scripts/replot_paper_figures.sh
```
Figures land under plots/experiments/AXX_*/.

Mapping experiments to paper figures

Experiment	Produces
`A11_owl_scores`	OWL outlier-score analyses
`A23_pruning_statistics`	Pruning statistics across models
`A24_pythia160m`	160M-scale ablations
`A25_activation_histograms`	Per-layer activation heatmaps
`A26_strategy_gap_fill`	Sparsity-allocation strategy comparison
`A27_channel_magnitude_per_step`	Per-step channel-magnitude sweep

Project Structure

src/diffusion_prune/: Main package
- model/: Model loading (AR + DLM)
- pruning/: WANDA, DWANDA (diffusion-aware), magnitude, SparseGPT, OWL / alpha sparsity allocation
- quantization/: GPTQ, RTN, plus virtual variants for DLMs
- evaluation/: lm-eval-harness integration with result caching
- diffusion_masking.py: Random-timestep masking for DLM calibration
configs/: Hydra configs (model/, pruning/, quantization/, evaluation/, plus cluster launchers condor.yaml / slurm.yaml)
scripts/: Entry points (run.py, submit.py), figure / table generation (plot.py, _tables.py, summary_table.py, baseline_table.py, best_hyperparams.py, replot_paper_figures.sh, pruning_statistics.py, per-stat modules under stats/), data download (download_*.py)
experiments/: One folder per paper experiment (AXX_<short_desc>), each with run.sh and plot.sh
out/: Flat layout of evaluation result JSONs (gitignored)
plots/: Generated figures (gitignored)
tests/: pytest suite

Models

Base and instruct variants of:

LLaDA-8B (DLM)
DREAM-7B (DLM)
Llama 3.1 8B (AR)
Qwen 2.5 7B (AR)

Tasks

QnA (base models): arc_challenge, hellaswag, piqa, winogrande, boolq, openbookqa (evaluation=commonsense)
Reasoning (instruct models): GSM8K (evaluation=gsm8k)

Methods

Pruning: WANDA, DWANDA, magnitude, SparseGPT; with uniform / OWL / alpha (deeper-is-sparser, earlier-is-sparser) allocations
Quantization: GPTQ, RTN, plus virtual variants for DLMs

Configuration

Configs are composed via Hydra. The default entry is configs/config.yaml; override fields from the CLI:

python scripts/run.py model=dream_7b pruning=wanda pruning.sparsity=0.5 \
    pruning.allocation=earlier evaluation=commonsense

Paths are controlled by the REPO_DIR, WORK_DIR, MODELS, HF_HOME env vars (see "Reproducing the paper" above).

License

Released under the Apache 2.0 license — see LICENSE.

Citation

If you use this code, please cite:

@misc{conzelmann2026layercollapsediffusionlanguage,
      title={Layer Collapse in Diffusion Language Models}, 
      author={Alexander Conzelmann and Albert Catalan-Tatjer and Shiwei Liu},
      year={2026},
      eprint={2605.06366},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.06366}, 
}