AuON
January 18, 2026 ยท View on GitHub
Alternative Unit-norm momentum-updates by Normalized nonlinear scaling
A linear-time optimizer that achieves remarkable performance without producing semi-orthogonal matrices while preserving structure to guide better-aligned progress and recondition ill-posed updates.
๐ Paper: A Survey For Linear-time Orthogonal Optimizer
Features
- Linear-time complexity: No expensive SVD or matrix inverse operations
- Cosh-based normalization: Stable gradient scaling through hyperbolic cosine
- Drop-in replacement: Compatible with PyTorch's optimizer API
- H100 optimized: Includes FP8 operations for modern GPU acceleration
Installation
# Clone the repository
git clone https://github.com/ryyzn9/AuON.git
cd AuON
# Install dependencies
pip install -r requirements.txt
# Or install as package
pip install -e .
Quick Start
from auon import AuON, Adam
# Create model
model = YourModel()
# Separate parameters: Adam for small params, AuON for matrix params
small_params = [p for p in model.parameters() if p.dim() < 2]
matrix_params = [p for p in model.parameters() if p.dim() >= 2]
# Create optimizers
adam_optimizer = Adam(small_params, lr=0.008, betas=(0.8, 0.95))
auon_optimizer = AuON(matrix_params, lr=0.08, momentum=0.95, weight_decay=0.01)
# Training loop
for batch in dataloader:
loss = model(batch)
loss.backward()
adam_optimizer.step()
auon_optimizer.step()
adam_optimizer.zero_grad()
auon_optimizer.zero_grad()
Training Scripts
Train with AuON
python scripts/train.py --optimizer auon --num-chunks 10 --iterations 5000
Compare AuON vs Muon
python scripts/compare_optimizers.py --num-chunks 10 --iterations 10000
Repository Structure
AuON/
โโโ auon/ # Core optimizer package
โ โโโ optimizer.py # AuON optimizer
โ โโโ muon.py # Muon optimizer (baseline)
โ โโโ adam.py # Custom Adam
โ โโโ utils.py # Newton-Schulz, helpers
โ
โโโ models/ # Model implementations
โ โโโ gpt.py # GPT model
โ โโโ components.py # Attention, MLP, etc.
โ โโโ custom_ops.py # FP8 operations
โ
โโโ data/ # Data loading
โ โโโ fineweb.py # FineWebEDU loader
โ
โโโ training/ # Training infrastructure
โ โโโ config.py # Hyperparameters
โ โโโ trainer.py # Training loop
โ
โโโ scripts/ # Executable scripts
โ โโโ train.py # Main training
โ โโโ compare_optimizers.py
โ
โโโ examples/ # Usage examples
โโโ basic_usage.py
AuON Algorithm
The AuON optimizer applies:
- Momentum: Exponential moving average of gradients
- Decoupled weight decay: Applied before the update step
- Cosh normalization: Scales gradients using
sqrt(mean(cosh(g)ยฒ))
# Core AuON update (simplified)
m = momentum * m + (1 - momentum) * grad
g_normalized = m / (norm(m) + eps)
g_scaled = g_normalized * gamma
r_scaled = sqrt(mean(cosh(g_scaled) ** 2))
update = m / (r_scaled + eps)
param -= lr * update
Requirements
- Python >= 3.10
- PyTorch >= 2.4.0
- CUDA-capable GPU (H100 recommended for FP8 operations)
Citation
@misc{auon2024,
title={AuON: A Linear-time Alternative to Orthogonal Momentum Updates
},
author={Dipan Maity},
year={2025},
url={https://arxiv.org/abs/2509.24320/}
}
License
Apache-2.0 License. See LICENSE for details.