Available algorithms

March 26, 2026 · View on GitHub

: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.

~: implemented but lightly tested.

X: known problems; please see Github issues.

AlgorithmsCategoryReferenceStatus
Information Set Monte Carlo Tree Search (IS-MCTS)SearchCowley et al. '12~
Max^nSearchLuckhart & Irani '86~
Minimax (and Alpha-Beta) SearchSearchWikipedia1, Wikipedia2, Knuth and Moore '75
Monte Carlo Tree SearchSearchWikipedia, UCT paper, Coulom '06, Cowling et al. survey
Perfect Information Monte Carlo (PIMC)SearchLong et al. '10~
Lemke-Howson (via nashpy)Opt.Wikipedia, Shoham & Leyton-Brown '09
ADIDASOpt.Gemp et al '22~
Least Core via Linear ProgrammingOpt.Yan & Procaccia '21~
Least Core via Saddle-Point (Lagrangian) ProgrammingOpt.Gemp et al '24~
Sequence-form linear programmingOpt.Koller, Megiddo, and von Stengel '94,
Shoham & Leyton-Brown '09
Sequence-form LP for Sequential EquilibriumOpt.Miltersen & Sørensen '06,
Shapley Values (incl. approximations via Monte Carlo sampling)Opt.Mitchell et al. '22~
Stackelberg equilibrium solverOpt.Conitzer & Sandholm '06~
MIP-NashOpt.Sandholm et al. '05~
Magnetic Mirror Descent (MMD) with dilated entropyOpt.Sokota et al. '22~
Counterfactual Regret Minimization (CFR)TabularZinkevich et al '08, Neller & Lanctot '13
CFR against a best responder (CFR-BR)TabularJohanson et al '12
Exploitability / Best responseTabularShoham & Leyton-Brown '09
External sampling Monte Carlo CFRTabularLanctot et al. '09, Lanctot '13
Fixed Strategy Iteration CFR (FSICFR)TabularNeller & Hnath '11~
Extensive-form Regret MinimizationTabularMorrill et. al. '22~
Mean-field Ficticious Play for MFGTabularPerrin et. al. '20~
Online Mirror Descent for MFGTabularPerolat et. al. '21~
Munchausen Online Mirror Descent for MFGTabularLauriere et. al. '22~
Fixed Point for MFGTabularHuang et. al. '06~
Boltzmann Policy Iteration for MFGTabularLauriere et. al. '22~
Outcome sampling Monte Carlo CFRTabularLanctot et al. '09, Lanctot '13
Policy IterationTabularSutton & Barto '18
Q-learningTabularSutton & Barto '18
Regret MatchingTabularHart & Mas-Colell '00
Restricted Nash Response (RNR)TabularJohanson et al '08~
SARSATabularSutton & Barto '18
Value IterationTabularSutton & Barto '18
Advantage Actor-Critic (A2C)RLMnih et al. '16
Deep Q-networks (DQN)RLMnih et al. '15
Ephemeral Value Adjustments (EVA)RLHansen et al. '18~
Proximal Policy Optimization (PPO)RLSchulman et al. '18~
Mean Field Proximal Policy Optimization (MF-PPO)RLAlgumaei et al. '23~
AlphaZero (C++/LibTorch)MARLSilver et al. '18
AlphaZero (Python/TF)MARLSilver et al. '18
Correlated Q-LearningMARLGreenwald & Hall '03~
Asymmetric Q-LearningMARLKononen '04~
Deep CFRMARLBrown et al. '18
ESCHERMARLMcAleer et al. '22~
DiCE: The Infinitely Differentiable Monte-Carlo Estimator (LOLA-DiCE)MARLFoerster, Farquhar, Al-Shedivat et al. '18~
Exploitability Descent (ED)MARLLockhart et al. '19
(Extensive-form) Fictitious Play (XFP)MARLHeinrich, Lanctot, & Silver '15
Learning with Opponent-Learning Awareness (LOLA)MARLFoerster, Chen, Al-Shedivat, et al. '18~
Nash Q-LearningMARLHu & Wellman '03~
Neural Fictitious Self-Play (NFSP)MARLHeinrich & Silver '16
Neural Replicator Dynamics (NeuRD)MARLOmidshafiei, Hennes, Morrill, et al. '19X
Regret Policy Gradients (RPG, RMPG)MARLSrinivasan, Lanctot, et al. '18
Policy-Space Response Oracles (PSRO)MARLLanctot et al. '17
GQ-based ("all-actions") Policy Gradient (QPG)MARLSrinivasan, Lanctot, et al. '18
Regression CFR (RCFR)MARLWaugh et al. '15, Morrill '16
Rectified Nash Response (PSRO_rn)MARLBalduzzi et al. '19~
Mean-Field PSRO (MFPSRO)MARLMuller et al. '21~
Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC)MARLBowling & Veloso '02~
α-RankEval. / Viz.Omidhsafiei et al. '19, arXiv
Nash AveragingEval. / Viz.Balduzzi et al. '18~
Replicator / Evolutionary DynamicsEval. / Viz.Hofbaeur & Sigmund '98, Sandholm '10
Voting-as-Evaluation (VasE)Eval. / Viz.Lanctot et al. '23