[NBDiff] A Principled Adaptation Path for Diffusion LLMs

January 29, 2026 ยท View on GitHub

Pushing Diffusion LLM Performance to Its Limits!

  • ๐Ÿ”ญ We try to find an adaptation path from AR to Block-Diffusion;

  • โšก Block-Diffusion with larger block-sizes has good acceleration potentials;

  • ๐Ÿค” Long-context and reasoning lead to significant performance gains.

inference

image

Model Weight

We have opensourced the weights of NBDiff-7B-Instruct/Base. Please feel free to download them:

Demo

We have provided a demo to run our Diffusion model. We recommend using python==3.10. Before running this demo, please install the following supporting packages:

torch==2.6
transformers==4.53.2

To start the demo, please run:

python demo.py

Citation

If you find this research useful, please cite:

@misc{tian2025nexttokennextblockprincipledadaptation,
      title={From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs}, 
      author={Yuchuan Tian and Yuchen Liang and Jiacheng Sun and Shuo Zhang and Guangwen Yang and Yingte Shu and Sibo Fang and Tianyu Guo and Kai Han and Chao Xu and Hanting Chen and Xinghao Chen and Yunhe Wang},
      year={2025},
      eprint={2512.06776},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.06776}, 
}

Acknowledgement

We sincerely thank the openPangu team for their code.