Direct Multi-Turn Preference Optimization for Language Agent

October 10, 2024 · View on GitHub

This repository contains the official code for our paper Direct Multi-Turn Preference Optimization for Language Agents. (EMNLP 2024 Main Conference)

Setup

You can set up the environment and download the data by running bash setup.sh.

Run

You can complete the DMPO pipeline by running run_dmpo.sh <DATASET> <BASIC_MODEL_PATH> <NEW_MODEL_SAVING_PATH>. The script contains three sections:

Training and evaluating the SFT model
Constructing the DMPO training dataset
Training and evaluating the DMPO model

Similarly, you can run the code run_dmpo_mistral.sh <DATASET> <BASIC_MODEL_PATH> <NEW_MODEL_SAVING_PATH> to perform training using the Mistral model.

Citation

If you find this code useful, please cite our paper:

@misc{shi2024directmultiturnpreferenceoptimization,
      title={Direct Multi-Turn Preference Optimization for Language Agents}, 
      author={Wentao Shi and Mengqi Yuan and Junkang Wu and Qifan Wang and Fuli Feng},
      year={2024},
      eprint={2406.14868},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.14868}, 
}