MLCommons™ AlgoPerf: Training Algorithms Leaderboard

March 24, 2025 · View on GitHub

Leaderboards

Leaderboard Version: 0.5
Last Updated: 2023-12-18 09:54 UTC
Using Benchmark Version: 0.1.5

Important

This is not the latest leaderboard. If you are looking for the latest results, please see the latest leaderboard.

External Tuning Ruleset Leaderboard

In the external tuning ruleset, submission must provide workload-agnostic hyperparameter search spaces and they will get $5$ tuning trials per workload sampled from this search space.

Rank	Submission	Authors	Affiliation	Framework	Logs	Score
1.	Distributed Shampoo Based on the Distributed Shampoo algorithm of Anil et al. (2020) with an implementation tailored to leverage PyTorch performance optimizations. See Shi et al. (2023) for details. The submission uses a list of five hyperparameter settings.	Hao-Jun Shi, Tsung-Hsien Lee, Anna Cai, Shintaro Iwasaki, Wenyin Fu, Yuchen Hao, Mike Rabbat	Meta Platforms	PyTorch	💾	0.7784
2.	Schedule Free AdamW An externally tuned version of Schedule Free AdamW (Defazio et al., 2024) with a list of five hyperparameter configurations.	Alice Yang, Aaron Defazio, Konstantin Mishchenko	Meta AI, Samsung AI	PyTorch	💾	0.7077
3.	Generalized Adam Submission with an Adam-style update rule, tuning over the use of Nesterov acceleration and preconditioning. Essentially tuning over AdamW (Kingma & Ba, 2015), NadamW, and SGD (Robbins & Monro, 1951) with or without momentum.	George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg	Google	JAX	💾	0.6383
4.	Cyclic LR Revisits the work of Loshchilov & Hutter (2017) and Smith (2017), coupling NadamW (Dozat, 2016; Loshchilov & Hutter, 2019) with a cyclic learning rate scheduler. Each cycle involves a linear warmup phase for the LR, followed by cosine annealing.	Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping	MPI-IS, ELLIS Institute Tübingen	PyTorch	💾	0.6301
5.	NadamP Uses NadamW with an extra tunable hyperparameter $p$ enabling $p$ th root of denominator inside NadamW update rule instead of the default of $2$.	George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg	Google	JAX	💾	0.5909
6.	*Baseline* Baseline using NadamW (Dozat, 2016; Loshchilov & Hutter, 2019) and a linear learning rate warmup followed by a cosine decay (Dahl et al., 2023).			JAX	💾	0.5707
7.	Amos Submission based on the Amos optimizer (Tian & Parikh, 2022) with a list of five hyperparameter settings.	Ran Tian	Google	JAX	💾	0.4918
8.	CASPR Adaptive A submission based on (Duvvuri et al., 2024) with a list of five hyperparameter configurations.	Sai Surya Duvvuri, Inderjit S. Dhillon, Cho-Jui Hsieh	UT Austin, UCLA, Google	JAX	💾	0.4722
9.	LAWA Queue Employs Latest Weight Averaging (Izmailov et al., 2018; Kaddour, 2022) on top of NAdamW (Dozat, 2016; Loshchilov & Hutter, 2019), maintaining a queue of previous model weights. The queue is periodically updated during training and passed to the competition API for evaluation.	Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping	MPI-IS, ELLIS Institute Tübingen	PyTorch	💾	0.3699
10.	LAWA EMA Employs Latest Weight Averaging (Izmailov et al., 2018; Kaddour, 2022) on top of NAdamW (Dozat, 2016; Loshchilov & Hutter, 2019), maintaining an exponential moving average of the model weights, which is updated periodically during training and returned to the competition API for evaluation.	Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping	MPI-IS, ELLIS Institute Tübingen	PyTorch	💾	0.3384
11.	Schedule Free Prodigy Combining Schedule-free (Defazio et al., 2024) with the Prodigy optimizer (Mishchenko & Defazio, 2024).	Alice Yang, Aaron Defazio, Konstantin Mishchenko	Meta AI, Samsung AI	PyTorch	💾	0.0000

Self-Tuning Ruleset Leaderboard

In the self-tuning ruleset, submissions must be completely hyperparameter-free.

Rank	Submission	Authors	Affiliation	Framework	Logs	Score
1.	Schedule Free AdamW A self-tuning version of Schedule Free AdamW (Defazio et al., 2024) using a single hyperparameter configuration.	Alice Yang, Aaron Defazio, Konstantin Mishchenko	Meta AI, Samsung AI	PyTorch	💾	0.8542
2.	*Baseline* Baseline using NadamW, a linear learning rate warmup followed by a cosine decay, and a single hyperparameter point (Dahl et al., 2023).			JAX	💾	0.8194
3.	NadamW Sequential Uses NadamW update rule and runs 3 fixed hyperparameter points sequentially. The intention was for these to be the top 3 hyperparameter points found at one third the self-tuning ruleset step budgets.	George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg	Google	JAX	💾	0.3308
4.	Sinv6 75 A submission for a task-invariant learned optimizer meta-trained on small tasks. Uses $75$% of the number of steps as target in learned optimizer initialization.	Abhinav Moudgil	Mila, Concordia University	JAX	💾	0.1420
5.	Sinv6 A submission for a task-invariant learned optimizer meta-trained on small tasks.	Abhinav Moudgil	Mila, Concordia University	JAX	💾	0.0903
6.	AdamG A submission based on the AdamG optimizer (Pang et al., 2024).	Yijiang Pang	Michigan State University	PyTorch	💾	0.0000

How to Submit

To submit your algorithm for evaluation on the AlgoPerf leaderboard, please follow these steps:

Implement your algorithm in the AlgoPerf API: Have a look at our Getting Started Guide and the Technical Documentation.
Create a Pull Request: Fork this repository, create a new branch and add your submission code to a new folder within either submissions/external_tuning/ or submissions/self_tuning. Open a pull request (PR) to the evaluation branch of this repository. Make sure to fill out the PR template asking for information such as submission name, authors, affiliations, etc.
PR Review and Evaluation: The AlgoPerf working group will review your PR. Based on our available resources and the perceived potential of the method, it will be selected for a free evaluation and merged into the evaluation branch. The working group will run your submission on all workloads and push the results, as well as the updated leaderboard, to the mainbranch.

Citation

If you use the AlgoPerf benchmark in your research, please consider citing our paper.

Dahl, Schneider, Nado, et al.
> Benchmarking Neural Network Training Algorithms
> arXiv 2306.07179

@Misc{Dahl2023AlgoPerf,
  title         = {{Benchmarking Neural Network Training Algorithms}},
  author        = {Dahl, George E. and Schneider, Frank and Nado, Zachary and Agarwal, Naman and Sastry, Chandramouli Shama and Hennig, Philipp and Medapati, Sourabh and Eschenhagen, Runa and Kasimbeg, Priya and Suo, Daniel and Bae, Juhan and Gilmer, Justin and Peirson, Abel L. and Khan, Bilal and Anil, Rohan and Rabbat, Mike and Krishnan, Shankar and Snider, Daniel and Amid, Ehsan and Chen, Kongtao and Maddison, Chris J. and Vasudev, Rakshith and Badura, Michal and Garg, Ankush and Mattson, Peter},
  year          = {2023},
  archiveprefix = {arXiv},
  eprint        = {2306.07179},
}

If you use the results from the first AlgoPerf competition, please consider citing the results paper, as well as the relevant submissions:

Kasimbeg, Schneider, Eschenhagen, et al.
> Accelerating neural network training: An analysis of the AlgoPerf competition
ICLR 2025

@inproceedings{Kasimbeg2025AlgoPerfResults,
title           = {Accelerating neural network training: An analysis of the {AlgoPerf} competition},
author          = {Kasimbeg, Priya and Schneider, Frank and Eschenhagen, Runa and Bae, Juhan and Sastry, Chandramouli Shama and Saroufim, Mark and Boyuan, Feng and Wright, Less and Yang, Edward Z. and Nado, Zachary and Medapati, Sourabh and Hennig, Philipp and Rabbat, Michael and Dahl, George E.},
booktitle       = {The Thirteenth International Conference on Learning Representations},
year            = {2025},
url             = {https://openreview.net/forum?id=CtM5xjRSfm}
}