Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning

April 1, 2026 ยท View on GitHub

Paper Blog

This repo is modified from Dion. We sincerely thank the authors for their great work!

Quick Start

Clone or unpack the repository, then install the package from the dion/ directory:

cd dion
pip install -e .[train]

To download a pretokenized FineWeb subset used by the training script:

cd dion
python data/cached_fineweb100B.py 200

Run one of the provided training configurations with:

cd dion
torchrun --standalone --nproc_per_node=8 train.py --config configs/mousse_160m.yaml

The detailed project documentation can be found in dion/README.md.