SWE-Effi

August 18, 2025 · View on GitHub

Note

Our evaluation script will be released soon!

SWE-Effi

A comprehensive benchmark evaluation platform for Software Engineering Efficiency across different AI scaffolds and models.

📊 Overview

SWE-Effi provides a standardized platform for evaluating and comparing AI-powered software engineering tools across different scaffolds and language models. Our platform aggregates benchmark results and presents them through an interactive web interface.

🌐 Visit the Live Platform
📝 Submit Your Results

📁 Repository Structure

SWE-Effi
├── benchmark
│   └── results
│       └── agent-scaffold-stats
│           ├── agentless/
│           │   ├── GPT-4o-mini-2024-07-18/
│           │   │   ├── combined_stats.json
│           │   │   └── summary_stats.json
│           │   └── qwen3-32B/
│           │       ├── combined_stats.json
│           │       └── summary_stats.json
│           ├── agentless-mini/
│           ├── auto-code-rover/
│           ├── openhands/
│           └── swe-agent/
├── scripts/
│   ├── transform-benchmark.py      # data transformation
│   └── update-website.sh           # easy update script
└── website/
    ├── public/
    │   └── data/
    │         └── benchmark/
    │             └── raw/            # benchmark data
    │                 └── summary/    # benchmark data
    └── src/
        └── docs/
            ├── about/
            └── index.tsx

🚀 Quick Start

For Contributors

Want to submit your benchmark results? Follow our submission guide →

For Developers & Maintainers

Clone the repository:

git clone https://github.com/your-org/swe-effi.git
cd swe-effi

Process benchmark data:

# Process all new benchmark data
./scripts/update-website.sh --auto

# Process specific scaffold/model
./scripts/update-website.sh agentless gpt-4

# Validate files before processing
./scripts/update-website.sh --validate-only

Run the website locally:
```
cd website
npm install
npm run dev
```

🛠 Development Workflow

Processing New Submissions

When contributors submit benchmark results via PR:

Review the Pull Request for correctness

Validate locally (optional):

git checkout [pr-branch]
python3 scripts/transform-benchmark.py --validate-only

Merge the PR
Update the website:
```
./scripts/update-website.sh --auto
```

Script Reference

update-website.sh options:

--auto: Process all available data automatically
--validate-only: Only validate files, don't transform
--verbose: Show detailed logs
--help: Show help information

transform-benchmark.py options:

--scaffold NAME --model NAME: Process specific combination
--validate-only: Only validate file format
--auto: Auto process all data with validation
--verbose: Show detailed logs

🔧 Technical Requirements

Prerequisites

Python 3 for data processing
Node.js and npm for website

Environment Setup

cd website && npm install

🤝 Contributing

Submit Benchmark Results Data Flow

Contributor Results → PR Submission → Validation → Processing → Website Integration

Results Collection: Contributors submit via GitHub PRs
Validation: Automated checks ensure data quality
Processing: Scripts transform data for website consumption
Integration: Website automatically displays new results

File Format

Results must include:

combined_stats.json
summary_stats.json

📄 License

Apache License 2.0