Improving Sampling for Masked Diffusion Models via Information Gain
April 30, 2026 · View on GitHub
中文版 README | English README | Paper | Project Page
A unified decoding framework for Masked Diffusion Models (MDMs) that replaces greedy local-certainty heuristics with a principled information-gain objective, yielding more robust generation across math, code, and creative tasks.
🎉 News: Accepted by ICML 2026!
Highlights
- 🎯 Information-Gain objective — each decoding step maximises , balancing immediate certainty with long-term impact.
- ⚡ One forward pass — all N candidates are scored in a single batched call; no iterative rollouts needed.
- 🔌 Standalone API —
InfoGainSamplerworks with any MDM (LLaDA, Dream, SDAR, TraDo) without the dllm dependency. - 🗂️ Pre-baked configs — one command to reproduce every experiment in the paper.
Quickstart
# 1. Install
git clone --recurse-submodules git@github.com:yks23/Information-Gain-Sampler.git
cd Information-Gain-Sampler
conda create -n info-gain python=3.10 && conda activate info-gain
pip install -r requirements.txt
# 2. Download a model (LLaDA shown; see docs/models.md for others)
huggingface-cli download GSAI-ML/LLaDA-8B-Instruct --local-dir ./model/llada
# 3. Run
python run.py --config configs/gsm8k_info_gain.yaml
python run.py --config configs/gsm8k_info_gain.yaml --model dream # swap model
python run.py --config configs/gsm8k_info_gain.yaml --max_samples 2 # smoke-test
Available configs (configs/):
| Config | Task | Sampler |
|---|---|---|
gsm8k_info_gain.yaml | GSM8K | Info-Gain |
math500_info_gain.yaml | MATH-500 | Info-Gain |
humaneval_info_gain.yaml | HumanEval | Info-Gain |
mbpp_info_gain.yaml | MBPP | Info-Gain |
writing_info_gain.yaml | Creative writing | Info-Gain |
gsm8k_original.yaml | GSM8K | Greedy baseline |
Any config key can be overridden on the command line: python run.py --config X.yaml --key value.
Standalone API
from src.samplers import InfoGainSampler
sampler = InfoGainSampler(model, tokenizer)
output_ids = sampler.sample(
input_ids,
max_new_tokens=256,
steps=256,
block_size=32,
candidate_number=8,
position_temperature=0.2,
threshold=0.8,
variant="info_gain", # "info_gain" | "lookum"
)
decoded = tokenizer.decode(output_ids[0, prompt_len:], skip_special_tokens=True)
Documentation
| Doc | Contents |
|---|---|
| docs/installation.md | Full install guide, submodules, MMaDA extra steps |
| docs/models.md | Model list, HuggingFace paths, download commands |
| docs/usage.md | All run.py flags, multi-GPU, dllm / accelerate, multimodal |
| docs/method.md | Motivation, objective derivation, three-step cycle |
Project Status
| Status | Item |
|---|---|
| ✅ | Published arXiv paper (arXiv:2602.18176) |
| ✅ | dllm framework integration with full cache support (LLaDA, Dream, SDAR, TraDo) |
| ✅ | Standalone InfoGainSampler — no dllm dependency |
| ✅ | Pre-baked experiment configs for one-command reproduction |
| ✅ | Unified run.py entry point |
| 🔄 | Beam search feature organisation |
| 🔄 | Protein generation quality testing |
License
This project is released into the public domain under The Unlicense — use it however you like, no conditions.
Citation
@misc{yang2026improvingsamplingmaskeddiffusion,
title={Improving Sampling for Masked Diffusion Models via Information Gain},
author={Kaisen Yang and Jayden Teoh and Kaicheng Yang and Yitong Zhang and Alex Lamb},
year={2026},
eprint={2602.18176},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.18176},
}