README.md
November 13, 2024 路 View on GitHub
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration
A competition-based benchmark with quantitative metrics for Large Language Model Powered Multi-agent system.
馃悰 Report Bug
路
馃搩 Main Page 路
馃摉 Paper
馃搳 Leaderboard
馃搶 MAgIC Benchmark News 馃帀馃敟
馃摉 About The Project
Scenarios
MAgIC provides a benchmark that can quantitatively measure the abilities of Cognition, Adaptability, Rationality and Collaboration of Large Language Models within multi-agent sytems. Our benchmark are based competition on 5 scenarios:
- Chameleon
- Undercover
- Cost Sharing
- Prisoner' Dilemma
- Public Good
PGM-Aware Agent Structure
Evaluation Metrics and Game Win Rate
Leaderboard
We have tested 10 models in our benchmark, and the PGM method we proposed has achieved a remarkable improvement.

PGM Performance
PGM improvements on different LLMs.

Getting Started
Installation
- Environment preparation
# conda virtual environment
conda create -n magic_llm python=3.9
conda activate magic_llm
# or python3 virtual environment
mkdir magic_llm
python3 -m venv magic_llm
source magic_llm/bin/activate
- Install required environments
pip3 install -r requirements.txt
Run competition and evaluation
- Get your own OpenAI API Key, and set $openai_api_key$
export OPENAI_API_KEY=$openai_api_key$
- Run experiments and calculate metrics. Now this code verson only support openai models, if you want to test your own LLMs, please refer to our leaderboard website to test your LLM and upload your results.
python3 arena_runner.py
Roadmap
- Upload relevant code
- Add link to Leaderboard website
- Introduce more scenarios and LLM results
- Add Online Demo where human and various LLMs can play together
License
Distributed under the MIT License. See LICENSE.txt for more information.
Contact
Lin Xu- @Lin_Xu_ - cathyxl2016@gmail.com
Citation
@article{xu2023magic,
title={MAgIC: Benchmarking Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration},
author={Lin Xu and Zhiyuan Hu and Daquan Zhou and Hongyu Ren and Zhen Dong and Kurt Keutzer and See Kiong Ng and Jiashi Feng},
year={2023},
journal={arXiv preprint arXiv: 2311.08562}
}

