Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 SCOPE Workshop)
March 6, 2025 · View on GitHub
Authors: Yang Cao, Xiaoyu Li, Zhao Song
This repository contains the official PyTorch implementation for Grams optimizer.
We introduce Gradient Descent with Adaptive Momentum Scaling (Grams), a novel optimization algorithm that decouples the direction and magnitude of parameter updates in deep learning. Unlike traditional optimizers that directly integrate momentum into updates, Grams separates the update direction, derived from current gradients, from momentum, which is used solely for adaptive magnitude scaling. This approach enables Grams to achieve improved loss descent compared to state-of-the-art cautious and momentum-based optimizers.
Install
Use the following command to install our pytorch implementation for Grams:
pip install grams-pytorch
How to use Grams
Switching from Adam/AdamW to Grams is simple and requires only two lines of code:
Before:
import torch
optimizer = torch.optim.adam(model.parameters(), lr=1e-3, weight_decay=0.0)
Switching to Grams:
from grams import Grams
optimizer = Grams(model.parameters(), lr=1e-3, weight_decay=0.0)
Just import Grams and swap the optimizer—everything else remains the same!
Citation
Please cite our work!
@inproceedings{cao2025grams,
title={Grams: Gradient Descent with Adaptive Momentum Scaling},
author={Yang Cao and Xiaoyu Li and Zhao Song},
booktitle={ICLR 2025 First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models},
year={2025},
url={https://openreview.net/forum?id=GmKQnpQdsc}
}