Introduction

December 10, 2024 · View on GitHub

This is the code repository corresponding to the paper "TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning"

Our work has been accepted by AAAI2025, and everyone is welcome to follow up.

here is an overview of our method 图片说明

TreeEval naturally avoids the problem of test data leakage by discarding the fixed test set.

Install

Refer to the installation of FastChat

The model we use can be found in huggingface: Yi-34B-Chat Xwin-LM-13B-V0.1 vicuna-33b-v1.3 Mistral-7B-Instruct-v0.2 WizardLM-13B-V1.2

Run steps

  1. start the server of fastchat
    1. Modify log_dir in fastchat.sh
    2. bash fastchat.sh
    3. python3 -m fastchat.serve.openai_api_server --host localhost --port 23261 --controller-address http://localhost:23241
  2. Configure the config.yaml file, copy the config.yaml file, and modify it to config_modelname.yaml
  3. python main.py

Citation

if you find this useful for your work, please cite:

@article{li2024treeeval,
      title={TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning}, 
      author={Xiang Li and Yunshi Lan and Chao Yang},
      year={2024},
      eprint={2402.13125},
      archivePrefix={arXiv},
      journal={arXiv preprint arXiv:2402.13125},
      primaryClass={cs.CL}
}