ExORL: Exploratory Data for Offline Reinforcement Learning

February 8, 2022 ยท View on GitHub

This is an original PyTorch implementation of the ExORL framework from

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning by

Denis Yarats*, David Brandfonbrener*, Hao Liu, Misha Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto.

*Equal contribution.

Prerequisites

Install MuJoCo if it is not already the case:

  • Download MuJoCo binaries here.
  • Unzip the downloaded archive into ~/.mujoco/.
  • Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate exorl

Datasets

We provide exploratory datasets for 6 DeepMind Control Stuite domains

DomainDataset nameAvailable task names
Cartpolecartpolecartpole_balance, cartpole_balance_sparse, cartpole_swingup, cartpole_swingup_sparse
Cheetahcheetahcheetah_run, cheetah_run_backward
Jaco Armjacojaco_reach_top_left, jaco_reach_top_right, jaco_reach_bottom_left, jaco_reach_bottom_right
Point Mass Mazepoint_mass_mazepoint_mass_maze_reach_top_left, point_mass_maze_reach_top_right, point_mass_maze_reach_bottom_left, point_mass_maze_reach_bottom_right
Quadrupedquadrupedquadruped_walk, quadruped_run
Walkerwalkerwalker_stand, walker_walk, walker_run

For each domain we collected datasets by running 9 unsupervised RL algorithms from URLB for total of 10M steps. Here is the list of algorithms

Unsupervised RL methodNamePaper
APSapspaper
APT(ICM)icm_aptpaper
DIAYNdiaynpaper
Disagreementdisagreementpaper
ICMicmpaper
ProtoRLprotopaper
RandomrandomN/A
RNDrndpaper
SMMsmmpaper

You can download a dataset by running ./download.sh <DOMAIN> <ALGO>, for example to download ProtoRL dataset for Walker, run

./download.sh walker proto

The script will download the dataset from S3 and store it under datasets/walker/proto/, where you can find episodes (under buffer) and episode videos (under video).

Offline RL training

We also provide implementation of 5 offline RL algorithms for evaluating the datasets

Offline RL methodNamePaper
Behavior Cloningbcpaper
CQLcqlpaper
CRRcrrpaper
TD3+BCtd3_bcpaper
TD3td3paper

After downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run

python train_offline.py agent=td3_bc expl_agent=proto task=walker_walk

Logs are stored in the output folder. To launch tensorboard run:

tensorboard --logdir output

Citation

If you use this repo in your research, please consider citing the paper as follows:

@article{yarats2022exorl,
  title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},
  author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},
  journal={arXiv preprint arXiv:2201.13425},
  year={2022}
}

License

The majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.