MLCommons™ AlgoPerf: Training Algorithms Leaderboard
March 24, 2025 · View on GitHub
Leaderboards
Leaderboard Version: 0.5
Last Updated: 2023-12-18 09:54 UTC
Using Benchmark Version: 0.1.5
Important
This is not the latest leaderboard. If you are looking for the latest results, please see the latest leaderboard.
External Tuning Ruleset Leaderboard
In the external tuning ruleset, submission must provide workload-agnostic hyperparameter search spaces and they will get $5$ tuning trials per workload sampled from this search space.
| Rank | Submission | Authors | Affiliation | Framework | Logs | Score |
|---|---|---|---|---|---|---|
| 1. | Distributed ShampooBased on the Distributed Shampoo algorithm of Anil et al. (2020) with an implementation tailored to leverage PyTorch performance optimizations. See Shi et al. (2023) for details. The submission uses a list of five hyperparameter settings. | Hao-Jun Shi, Tsung-Hsien Lee, Anna Cai, Shintaro Iwasaki, Wenyin Fu, Yuchen Hao, Mike Rabbat | Meta Platforms | PyTorch | 💾 | 0.7784 |
| 2. | Schedule Free AdamWAn externally tuned version of Schedule Free AdamW (Defazio et al., 2024) with a list of five hyperparameter configurations. | Alice Yang, Aaron Defazio, Konstantin Mishchenko | Meta AI, Samsung AI | PyTorch | 💾 | 0.7077 |
| 3. | Generalized AdamSubmission with an Adam-style update rule, tuning over the use of Nesterov acceleration and preconditioning. Essentially tuning over AdamW (Kingma & Ba, 2015), NadamW, and SGD (Robbins & Monro, 1951) with or without momentum. | George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg | JAX | 💾 | 0.6383 | |
| 4. | Cyclic LRRevisits the work of Loshchilov & Hutter (2017) and Smith (2017), coupling NadamW (Dozat, 2016; Loshchilov & Hutter, 2019) with a cyclic learning rate scheduler. Each cycle involves a linear warmup phase for the LR, followed by cosine annealing. | Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping | MPI-IS, ELLIS Institute Tübingen | PyTorch | 💾 | 0.6301 |
| 5. | NadamPUses NadamW with an extra tunable hyperparameter enabling th root of denominator inside NadamW update rule instead of the default of $2$. | George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg | JAX | 💾 | 0.5909 | |
| 6. | BaselineBaseline using NadamW (Dozat, 2016; Loshchilov & Hutter, 2019) and a linear learning rate warmup followed by a cosine decay (Dahl et al., 2023). | JAX | 💾 | 0.5707 | ||
| 7. | AmosSubmission based on the Amos optimizer (Tian & Parikh, 2022) with a list of five hyperparameter settings. | Ran Tian | JAX | 💾 | 0.4918 | |
| 8. | CASPR AdaptiveA submission based on (Duvvuri et al., 2024) with a list of five hyperparameter configurations. | Sai Surya Duvvuri, Inderjit S. Dhillon, Cho-Jui Hsieh | UT Austin, UCLA, Google | JAX | 💾 | 0.4722 |
| 9. | LAWA QueueEmploys Latest Weight Averaging (Izmailov et al., 2018; Kaddour, 2022) on top of NAdamW (Dozat, 2016; Loshchilov & Hutter, 2019), maintaining a queue of previous model weights. The queue is periodically updated during training and passed to the competition API for evaluation. | Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping | MPI-IS, ELLIS Institute Tübingen | PyTorch | 💾 | 0.3699 |
| 10. | LAWA EMAEmploys Latest Weight Averaging (Izmailov et al., 2018; Kaddour, 2022) on top of NAdamW (Dozat, 2016; Loshchilov & Hutter, 2019), maintaining an exponential moving average of the model weights, which is updated periodically during training and returned to the competition API for evaluation. | Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping | MPI-IS, ELLIS Institute Tübingen | PyTorch | 💾 | 0.3384 |
| 11. | Schedule Free ProdigyCombining Schedule-free (Defazio et al., 2024) with the Prodigy optimizer (Mishchenko & Defazio, 2024). | Alice Yang, Aaron Defazio, Konstantin Mishchenko | Meta AI, Samsung AI | PyTorch | 💾 | 0.0000 |
Self-Tuning Ruleset Leaderboard
In the self-tuning ruleset, submissions must be completely hyperparameter-free.
| Rank | Submission | Authors | Affiliation | Framework | Logs | Score |
|---|---|---|---|---|---|---|
| 1. | Schedule Free AdamWA self-tuning version of Schedule Free AdamW (Defazio et al., 2024) using a single hyperparameter configuration. | Alice Yang, Aaron Defazio, Konstantin Mishchenko | Meta AI, Samsung AI | PyTorch | 💾 | 0.8542 |
| 2. | BaselineBaseline using NadamW, a linear learning rate warmup followed by a cosine decay, and a single hyperparameter point (Dahl et al., 2023). | JAX | 💾 | 0.8194 | ||
| 3. | NadamW SequentialUses NadamW update rule and runs 3 fixed hyperparameter points sequentially. The intention was for these to be the top 3 hyperparameter points found at one third the self-tuning ruleset step budgets. | George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad Feinberg | JAX | 💾 | 0.3308 | |
| 4. | Sinv6 75A submission for a task-invariant learned optimizer meta-trained on small tasks. Uses $75$% of the number of steps as target in learned optimizer initialization. | Abhinav Moudgil | Mila, Concordia University | JAX | 💾 | 0.1420 |
| 5. | Sinv6A submission for a task-invariant learned optimizer meta-trained on small tasks. | Abhinav Moudgil | Mila, Concordia University | JAX | 💾 | 0.0903 |
| 6. | AdamGA submission based on the AdamG optimizer (Pang et al., 2024). | Yijiang Pang | Michigan State University | PyTorch | 💾 | 0.0000 |
How to Submit
To submit your algorithm for evaluation on the AlgoPerf leaderboard, please follow these steps:
- Implement your algorithm in the AlgoPerf API: Have a look at our Getting Started Guide and the Technical Documentation.
- Create a Pull Request: Fork this repository, create a new branch and add your submission code to a new folder within either
submissions/external_tuning/orsubmissions/self_tuning. Open a pull request (PR) to theevaluationbranch of this repository. Make sure to fill out the PR template asking for information such as submission name, authors, affiliations, etc. - PR Review and Evaluation: The AlgoPerf working group will review your PR. Based on our available resources and the perceived potential of the method, it will be selected for a free evaluation and merged into the
evaluationbranch. The working group will run your submission on all workloads and push the results, as well as the updated leaderboard, to themainbranch.
Citation
If you use the AlgoPerf benchmark in your research, please consider citing our paper.
Dahl, Schneider, Nado, et al.
> Benchmarking Neural Network Training Algorithms
> arXiv 2306.07179
@Misc{Dahl2023AlgoPerf,
title = {{Benchmarking Neural Network Training Algorithms}},
author = {Dahl, George E. and Schneider, Frank and Nado, Zachary and Agarwal, Naman and Sastry, Chandramouli Shama and Hennig, Philipp and Medapati, Sourabh and Eschenhagen, Runa and Kasimbeg, Priya and Suo, Daniel and Bae, Juhan and Gilmer, Justin and Peirson, Abel L. and Khan, Bilal and Anil, Rohan and Rabbat, Mike and Krishnan, Shankar and Snider, Daniel and Amid, Ehsan and Chen, Kongtao and Maddison, Chris J. and Vasudev, Rakshith and Badura, Michal and Garg, Ankush and Mattson, Peter},
year = {2023},
archiveprefix = {arXiv},
eprint = {2306.07179},
}
If you use the results from the first AlgoPerf competition, please consider citing the results paper, as well as the relevant submissions:
@inproceedings{Kasimbeg2025AlgoPerfResults,
title = {Accelerating neural network training: An analysis of the {AlgoPerf} competition},
author = {Kasimbeg, Priya and Schneider, Frank and Eschenhagen, Runa and Bae, Juhan and Sastry, Chandramouli Shama and Saroufim, Mark and Boyuan, Feng and Wright, Less and Yang, Edward Z. and Nado, Zachary and Medapati, Sourabh and Hennig, Philipp and Rabbat, Michael and Dahl, George E.},
booktitle = {The Thirteenth International Conference on Learning Representations},
year = {2025},
url = {https://openreview.net/forum?id=CtM5xjRSfm}
}
