ExORL: Exploratory Data for Offline Reinforcement Learning
February 8, 2022 ยท View on GitHub
This is an original PyTorch implementation of the ExORL framework from
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning by
Denis Yarats*, David Brandfonbrener*, Hao Liu, Misha Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto.
*Equal contribution.
Prerequisites
Install MuJoCo if it is not already the case:
- Download MuJoCo binaries here.
- Unzip the downloaded archive into
~/.mujoco/. - Append the MuJoCo subdirectory bin path into the env variable
LD_LIBRARY_PATH.
Install the following libraries:
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip
Install dependencies:
conda env create -f conda_env.yml
conda activate exorl
Datasets
We provide exploratory datasets for 6 DeepMind Control Stuite domains
| Domain | Dataset name | Available task names |
|---|---|---|
| Cartpole | cartpole | cartpole_balance, cartpole_balance_sparse, cartpole_swingup, cartpole_swingup_sparse |
| Cheetah | cheetah | cheetah_run, cheetah_run_backward |
| Jaco Arm | jaco | jaco_reach_top_left, jaco_reach_top_right, jaco_reach_bottom_left, jaco_reach_bottom_right |
| Point Mass Maze | point_mass_maze | point_mass_maze_reach_top_left, point_mass_maze_reach_top_right, point_mass_maze_reach_bottom_left, point_mass_maze_reach_bottom_right |
| Quadruped | quadruped | quadruped_walk, quadruped_run |
| Walker | walker | walker_stand, walker_walk, walker_run |
For each domain we collected datasets by running 9 unsupervised RL algorithms from URLB for total of 10M steps. Here is the list of algorithms
| Unsupervised RL method | Name | Paper |
|---|---|---|
| APS | aps | paper |
| APT(ICM) | icm_apt | paper |
| DIAYN | diayn | paper |
| Disagreement | disagreement | paper |
| ICM | icm | paper |
| ProtoRL | proto | paper |
| Random | random | N/A |
| RND | rnd | paper |
| SMM | smm | paper |
You can download a dataset by running ./download.sh <DOMAIN> <ALGO>, for example to download ProtoRL dataset for Walker, run
./download.sh walker proto
The script will download the dataset from S3 and store it under datasets/walker/proto/, where you can find episodes (under buffer) and episode videos (under video).
Offline RL training
We also provide implementation of 5 offline RL algorithms for evaluating the datasets
| Offline RL method | Name | Paper |
|---|---|---|
| Behavior Cloning | bc | paper |
| CQL | cql | paper |
| CRR | crr | paper |
| TD3+BC | td3_bc | paper |
| TD3 | td3 | paper |
After downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run
python train_offline.py agent=td3_bc expl_agent=proto task=walker_walk
Logs are stored in the output folder. To launch tensorboard run:
tensorboard --logdir output
Citation
If you use this repo in your research, please consider citing the paper as follows:
@article{yarats2022exorl,
title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},
author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},
journal={arXiv preprint arXiv:2201.13425},
year={2022}
}
License
The majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.