Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning
April 1, 2026 ยท View on GitHub
This repo is modified from Dion. We sincerely thank the authors for their great work!
Quick Start
Clone or unpack the repository, then install the package from the dion/ directory:
cd dion
pip install -e .[train]
To download a pretokenized FineWeb subset used by the training script:
cd dion
python data/cached_fineweb100B.py 200
Run one of the provided training configurations with:
cd dion
torchrun --standalone --nproc_per_node=8 train.py --config configs/mousse_160m.yaml
The detailed project documentation can be found in dion/README.md.