OBD
October 31, 2024 · View on GitHub
Official repository for the NeurIPS 2024 paper "Offline Behavior Distillation" by Shiye Lei, Sen Zhang, and Dacheng Tao.
Dependencies
- Python 3.7
- Pytorch 1.11
- mujoco 2.10
- d4rl
- wandb
Quick Start
- Please refer to command_parser.py for default hyper-parameters.
- near-expert policy checkpoints are provided in offline_policy_checkpoints and obtained by using Cal-QL implemented in CORL.
- Av-PBC distilled datasets are available here.
Syntheisze Behavioral Datasets
- Av-PBC
python obd_bptt.py --env 'halfcheetah-medium-replay-v2' --match_objective 'offline_policy' --q_weight --save_dir './saved_synset' --seed 0
- PBC
python obd_bptt.py --env 'halfcheetah-medium-replay-v2' --match_objective 'offline_policy' --save_dir './saved_synset' --seed 0
- DBC
python obd_bptt.py --env 'halfcheetah-medium-replay-v2' --match_objective 'offline_data' --save_dir './saved_synset' --seed 0
Evaluate Behavioral Datasets
- Standard evaluation
python evaluate_synset.py --env 'halfcheetah-medium-replay-v2' --match_objective 'offline_policy' --q_weight --eval_freq 1000 --save_dir './saved_synset' --group 'Evaluate' --seed 0
- Ensemble evaluation
python evaluate_synset.py --env 'halfcheetah-medium-replay-v2' --match_objective 'offline_policy' --q_weight --eval_freq 1000 --eval_ensemble --ensemble_policy_num 10 --save_dir './saved_synset' --group 'Ensemble-Evaluate' -- --seed 0
Cross Arch/Optim Evaluation
python evaluate_cross_arch.py --env 'halfcheetah-medium-replay-v2' --match_objective 'offline_policy' --q_weight --eval_freq 1000 --save_dir '/home/leaves/Data/OBD/q-value-weighted-synset' --group 'Cross-Arch-Optim-Evaluate' --seed 0
Citation
@inproceedings{
lei2024offline,
title={Offline Behavior Distillation},
author={Lei, Shiye and Zhang, Sen and Tao, Dacheng},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}
Contact
For any issue, please kindly contact Shiye Lei: leishiye@gmail.com
Acknowledgment
- Remember The Past - Dataset Distillation: https://github.com/princetonvisualai/RememberThePast-DatasetDistillation
- Clean Offline Reinforcement Learning: https://github.com/tinkoff-ai/CORL