EffiBench-X

October 22, 2025 ยท View on GitHub

Official codebase for our paper: EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code.

EffiBench-X is a benchmarking platform for evaluating code generation capabilities of Large Language Models (LLMs), with a focus on runtime and memory efficiency. It executes solutions in a sandboxed environment, measuring runtime, memory usage, and execution success.

โœจย Features | ๐Ÿ“ฆย Installation | ๐Ÿš€ย Quick Start | ๐Ÿ™ย Acknowledgments | โš–๏ธย License | ๐Ÿ“šย Citation

โœจ Features

  • Comprehensive Benchmarking: Evaluate LLM code generation not only for correctness but also for efficiency metrics (runtime, memory usage)
  • Multiple Language Support: Test solutions in Python, JavaScript, C++, Java, Go, and Ruby
  • Flexible Backends: Run evaluations using isolated Docker execution environments
  • Model Integration: Support for both open-source and proprietary LLMs (OpenAI, Anthropic, Google, DeepSeek, Qwen, Gemma, etc.)
  • Extensive Dataset: Problems from multiple sources (LeetCode, AtCoder, CodeChef, Codeforces, etc.)
  • Performance Analysis: Generate detailed reports and comparisons between different models

๐Ÿ“ฆ Installation

# Clone the repository
git clone https://github.com/EffiBench/EffiBench-X.git
cd EffiBench-X

# Install dependencies
pip install -r requirements.txt

๐Ÿš€ Quick Start

Managing Datasets

# Download dataset from Hugging Face Hub
python hf_dataset.py download

Start the Sandbox Backend

# Start with Docker backend
python start_sandbox.py --type docker --host 127.0.0.1 --port 8000

Generate Solutions

# Generate solutions for all models in the config file
python generate_solution.py generate data/dataset data/solutions --config model_config.yaml

# Merge canonical solutions
python generate_solution.py merge-canonical-solutions

Evaluate Solutions

# Evaluate solutions with multiple processes and threads
python evaluate_solution.py evaluate -o data/evaluation

# Generate evaluation report
python evaluate_solution.py report

๐Ÿ™ Acknowledgments

โš–๏ธ License

EffiBench-X is licensed under the Apache License 2.0; portions are available under separate terms. The component at third_party/llm-sandbox is licensed under MIT (see its LICENSE).

๐Ÿ“š Citation

Please kindly consider citing our paper if you find this repository helpful in your research and work.

@article{qing2025effibench,
  title={EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code},
  author={Qing, Yuhao and Zhu, Boyu and Du, Mingzhe and Guo, Zhijiang and Zhuo, Terry Yue and Zhang, Qianru and Zhang, Jie M and Cui, Heming and Yiu, Siu-Ming and Huang, Dong and Ng, See-Kiong and Tuan, Luu Anh},
  journal={Advances in neural information processing systems},
  year={2025}
}