MLCommons™ AlgoPerf: Training Algorithms Leaderboard

March 24, 2025 · View on GitHub


MLCommons Logo

Leaderboards

Leaderboard Version: 0.5
Last Updated: 2023-12-18 09:54 UTC
Using Benchmark Version: 0.1.5

Important

This is not the latest leaderboard. If you are looking for the latest results, please see the latest leaderboard.

External Tuning Ruleset Leaderboard

In the external tuning ruleset, submission must provide workload-agnostic hyperparameter search spaces and they will get $5$ tuning trials per workload sampled from this search space.

RankSubmissionAuthorsAffiliationFrameworkLogsScore
1.
Distributed ShampooBased on the Distributed Shampoo algorithm of Anil et al. (2020) with an implementation tailored to leverage PyTorch performance optimizations. See Shi et al. (2023) for details. The submission uses a list of five hyperparameter settings.
Hao-Jun Shi, Tsung-Hsien Lee, Anna Cai, Shintaro Iwasaki, Wenyin Fu, Yuchen Hao, Mike RabbatMeta PlatformsPyTorch💾0.7784
2.
Schedule Free AdamWAn externally tuned version of Schedule Free AdamW (Defazio et al., 2024) with a list of five hyperparameter configurations.
Alice Yang, Aaron Defazio, Konstantin MishchenkoMeta AI, Samsung AIPyTorch💾0.7077
3.
Generalized AdamSubmission with an Adam-style update rule, tuning over the use of Nesterov acceleration and preconditioning. Essentially tuning over AdamW (Kingma & Ba, 2015), NadamW, and SGD (Robbins & Monro, 1951) with or without momentum.
George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad FeinbergGoogleJAX💾0.6383
4.
Cyclic LRRevisits the work of Loshchilov & Hutter (2017) and Smith (2017), coupling NadamW (Dozat, 2016; Loshchilov & Hutter, 2019) with a cyclic learning rate scheduler. Each cycle involves a linear warmup phase for the LR, followed by cosine annealing.
Niccolò Ajroldi, Antonio Orvieto, Jonas GeipingMPI-IS, ELLIS Institute TübingenPyTorch💾0.6301
5.
NadamPUses NadamW with an extra tunable hyperparameter pp enabling pp th root of denominator inside NadamW update rule instead of the default of $2$.
George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad FeinbergGoogleJAX💾0.5909
6.
BaselineBaseline using NadamW (Dozat, 2016; Loshchilov & Hutter, 2019) and a linear learning rate warmup followed by a cosine decay (Dahl et al., 2023).
JAX💾0.5707
7.
AmosSubmission based on the Amos optimizer (Tian & Parikh, 2022) with a list of five hyperparameter settings.
Ran TianGoogleJAX💾0.4918
8.
CASPR AdaptiveA submission based on (Duvvuri et al., 2024) with a list of five hyperparameter configurations.
Sai Surya Duvvuri, Inderjit S. Dhillon, Cho-Jui HsiehUT Austin, UCLA, GoogleJAX💾0.4722
9.
LAWA QueueEmploys Latest Weight Averaging (Izmailov et al., 2018; Kaddour, 2022) on top of NAdamW (Dozat, 2016; Loshchilov & Hutter, 2019), maintaining a queue of previous model weights. The queue is periodically updated during training and passed to the competition API for evaluation.
Niccolò Ajroldi, Antonio Orvieto, Jonas GeipingMPI-IS, ELLIS Institute TübingenPyTorch💾0.3699
10.
LAWA EMAEmploys Latest Weight Averaging (Izmailov et al., 2018; Kaddour, 2022) on top of NAdamW (Dozat, 2016; Loshchilov & Hutter, 2019), maintaining an exponential moving average of the model weights, which is updated periodically during training and returned to the competition API for evaluation.
Niccolò Ajroldi, Antonio Orvieto, Jonas GeipingMPI-IS, ELLIS Institute TübingenPyTorch💾0.3384
11.
Schedule Free ProdigyCombining Schedule-free (Defazio et al., 2024) with the Prodigy optimizer (Mishchenko & Defazio, 2024).
Alice Yang, Aaron Defazio, Konstantin MishchenkoMeta AI, Samsung AIPyTorch💾0.0000

Self-Tuning Ruleset Leaderboard

In the self-tuning ruleset, submissions must be completely hyperparameter-free.

RankSubmissionAuthorsAffiliationFrameworkLogsScore
1.
Schedule Free AdamWA self-tuning version of Schedule Free AdamW (Defazio et al., 2024) using a single hyperparameter configuration.
Alice Yang, Aaron Defazio, Konstantin MishchenkoMeta AI, Samsung AIPyTorch💾0.8542
2.
BaselineBaseline using NadamW, a linear learning rate warmup followed by a cosine decay, and a single hyperparameter point (Dahl et al., 2023).
JAX💾0.8194
3.
NadamW SequentialUses NadamW update rule and runs 3 fixed hyperparameter points sequentially. The intention was for these to be the top 3 hyperparameter points found at one third the self-tuning ruleset step budgets.
George Dahl, Sourabh Medapati, Zack Nado, Rohan Anil, Shankar Krishnan, Naman Agarwal, Priya Kasimbeg, Vlad FeinbergGoogleJAX💾0.3308
4.
Sinv6 75A submission for a task-invariant learned optimizer meta-trained on small tasks. Uses $75$% of the number of steps as target in learned optimizer initialization.
Abhinav MoudgilMila, Concordia UniversityJAX💾0.1420
5.
Sinv6A submission for a task-invariant learned optimizer meta-trained on small tasks.
Abhinav MoudgilMila, Concordia UniversityJAX💾0.0903
6.
AdamGA submission based on the AdamG optimizer (Pang et al., 2024).
Yijiang PangMichigan State UniversityPyTorch💾0.0000

How to Submit

To submit your algorithm for evaluation on the AlgoPerf leaderboard, please follow these steps:

  1. Implement your algorithm in the AlgoPerf API: Have a look at our Getting Started Guide and the Technical Documentation.
  2. Create a Pull Request: Fork this repository, create a new branch and add your submission code to a new folder within either submissions/external_tuning/ or submissions/self_tuning. Open a pull request (PR) to the evaluation branch of this repository. Make sure to fill out the PR template asking for information such as submission name, authors, affiliations, etc.
  3. PR Review and Evaluation: The AlgoPerf working group will review your PR. Based on our available resources and the perceived potential of the method, it will be selected for a free evaluation and merged into the evaluation branch. The working group will run your submission on all workloads and push the results, as well as the updated leaderboard, to the mainbranch.

Citation

If you use the AlgoPerf benchmark in your research, please consider citing our paper.

Dahl, Schneider, Nado, et al.
> Benchmarking Neural Network Training Algorithms
> arXiv 2306.07179

@Misc{Dahl2023AlgoPerf,
  title         = {{Benchmarking Neural Network Training Algorithms}},
  author        = {Dahl, George E. and Schneider, Frank and Nado, Zachary and Agarwal, Naman and Sastry, Chandramouli Shama and Hennig, Philipp and Medapati, Sourabh and Eschenhagen, Runa and Kasimbeg, Priya and Suo, Daniel and Bae, Juhan and Gilmer, Justin and Peirson, Abel L. and Khan, Bilal and Anil, Rohan and Rabbat, Mike and Krishnan, Shankar and Snider, Daniel and Amid, Ehsan and Chen, Kongtao and Maddison, Chris J. and Vasudev, Rakshith and Badura, Michal and Garg, Ankush and Mattson, Peter},
  year          = {2023},
  archiveprefix = {arXiv},
  eprint        = {2306.07179},
}

If you use the results from the first AlgoPerf competition, please consider citing the results paper, as well as the relevant submissions:

Kasimbeg, Schneider, Eschenhagen, et al.
> Accelerating neural network training: An analysis of the AlgoPerf competition
ICLR 2025

@inproceedings{Kasimbeg2025AlgoPerfResults,
title           = {Accelerating neural network training: An analysis of the {AlgoPerf} competition},
author          = {Kasimbeg, Priya and Schneider, Frank and Eschenhagen, Runa and Bae, Juhan and Sastry, Chandramouli Shama and Saroufim, Mark and Boyuan, Feng and Wright, Less and Yang, Edward Z. and Nado, Zachary and Medapati, Sourabh and Hennig, Philipp and Rabbat, Michael and Dahl, George E.},
booktitle       = {The Thirteenth International Conference on Learning Representations},
year            = {2025},
url             = {https://openreview.net/forum?id=CtM5xjRSfm}
}