News

December 28, 2025 · View on GitHub

OpenMoE 2: Sparse Diffusion Language Models

Jinjie Ni and the team


The first-ever sparse diffusion large language model trained from scratch, focusing on architectural insights.

Static Badge Static Badge Static Badge Twitter

News

[2025-10-27] We release the codebase, all training checkpoints, and logs. The codebase is highly optimized and is industry-level in terms scalability and efficiency.

[2025-10-03] The blog is out! Check it out here!


Code

The codebase is released here. It is a highly-optimized codebase for any-scale DLMs training backend with Megatron-LM.

The full MoE implementation is not yet released. We plan to release it after the main training is done.


Resources

We opensource all model checkpoints and training logs mentioned in the paper. All of them can be downloaded at https://huggingface.co/collections/jinjieni/mdga.

The easiest way to download a folder is using this script (setup the variables properly):

python utils/hf_download_folder.py

Alternatively, you can also use wget to directly download individual files from the folder, e.g.:

wget https://huggingface.co/datasets/MDGA-1/openmoe2_logs/blob/main/dense_vs_moe/dense_100b_1e_1b7_difflm/tensorboard/events.out.tfevents.1755443508.0648415733

We link the related resources below:

You can refer to this script to inference with the huggingface checkpoints. Due to the large amount, most small checkpoints above are still in megatron formats. You may refer to this script to convert them (need to tweak the conversion scripts).


Todo List

This is an on-going project! We will tick the below todo list one-by-one.

  • Architectural Design Choices
  • Scaled-up Pre-training
  • Post-training
  • Routing & Other Analysis
  • Full Paper
  • Code & Checkpoint Open-sourcing

Citation

@misc{ni2025openmoe2,
  title={OpenMoE 2: Sparse Diffusion Language Models},
  author={Ni, Jinjie and team},
  year={2025}
  howpublished={\url{https://github.com/JinjieNi/OpenMoE2}},
}