Learning to Communicate with Deep Multi-Agent Reinforcement Learning

October 28, 2018 ยท View on GitHub

This is a PyTorch implementation of the original Lua code release.

Overview

This codebase implements two approaches to learning discrete communication protocols for playing collaborative games: Reinforced Inter-Agent Learning (RIAL), in which agents learn a factorized deep Q-learning policy across game actions and messages, and Differentiable Inter-Agent Learning (DIAL), in which the message vectors are directly learned by backpropagating errors through a noisy communication channel during training, and discretized to binary vectors during test time. While RIAL and DIAL share the same individual network architecture, one would expect learning to be more efficient under DIAL, which directly backpropagates downstream errors during training, a fact that is verified in comparing the performance of the two approaches.

Execution

$ virtualenv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
$ python main.py -c config/switch_3_dial.json

Results for switch game

DIAL vs. RIAL reward curves

This chart was generated by plotting an exponentially-weighted average across 20 trials for each curve.

More info

More generally, main.py takes multiple arguments:

ArgShortDescriptionRequired?
--config_path-cpath to JSON configuration fileโœ…
--results_path-rpath to directory in which to save results per trial (as csv)-
--ntrials-nnumber of trials to run-
--start_index-sstart-index used as suffix in result filenames-
--verbose-vprints results per training epoch to stdout if set-
Configuration

JSON configuration files passed to main.py should consist of the following key-value pairs:

KeyDescriptionType
gamename of the game, e.g. "switch"string
game_nagentsnumber of agentsint
game_action_spacenumber of valid game actionsint
game_comm_limitedtrue if only some agents can communicate at each stepbool
game_comm_bitsnumber of bits per messageint
game_comm_sigmastandard deviation of Gaussian noise applied by DRUfloat
game_comm_hardtrue if use hard discretization, soft approximation otherwisebool
nstepsmaximum number of game stepsint
gammareward discount factor for Q-learningfloat
model_dialtrue if agents should use DIALbool
model_comm_narrowtrue if DRU should use sigmoid for regularization, softmax otherwisebool
model_targettrue if learning should use a target Q-networkbool
model_bntrue if learning should use batch normalizationbool
model_know_sharetrue if agents should share parametersbool
model_action_awaretrue if each agent should know their last actionbool
model_rnn_sizedimension of rnn hidden stateint
bsbatch size of episodes, run in parallel per epochint
learningratelearning rate for optimizer (RMSProp)float
momentummomentum for optimizer (RMSProp)float
epsexploration rate for epsilon-greedy explorationfloat
nepisodesnumber of epochs, each consisting of parallel episodesint
step_testperform a test episode every this many stepsint
step_targetupdate target network every this many stepsint
Visualizing results

You can use analyze_results.py to graph results output by main.py. This script will plot the average results across all csv files per path specified after -r. Further, -a can take an alpha value to plot results as exponentially-weighted moving averages, and -l takes an optional list of labels corresponding to the paths.

$ python util/analyze_results -r <paths to results> -a <weight for EWMA>

Bibtex

@inproceedings{foerster2016learning,
    title={Learning to communicate with deep multi-agent reinforcement learning},
    author={Foerster, Jakob and Assael, Yannis M and de Freitas, Nando and Whiteson, Shimon},
    booktitle={Advances in Neural Information Processing Systems},
    pages={2137--2145},
    year={2016} 
}

License

Code licensed under the Apache License v2.0