SWE-Effi
August 18, 2025 ยท View on GitHub
Note
Our evaluation script will be released soon!
SWE-Effi
A comprehensive benchmark evaluation platform for Software Engineering Efficiency across different AI scaffolds and models.
๐ Overview
SWE-Effi provides a standardized platform for evaluating and comparing AI-powered software engineering tools across different scaffolds and language models. Our platform aggregates benchmark results and presents them through an interactive web interface.
๐ Visit the Live Platform
๐ Submit Your Results
๐ Repository Structure
SWE-Effi
โโโ benchmark
โ โโโ results
โ โโโ agent-scaffold-stats
โ โโโ agentless/
โ โ โโโ GPT-4o-mini-2024-07-18/
โ โ โ โโโ combined_stats.json
โ โ โ โโโ summary_stats.json
โ โ โโโ qwen3-32B/
โ โ โโโ combined_stats.json
โ โ โโโ summary_stats.json
โ โโโ agentless-mini/
โ โโโ auto-code-rover/
โ โโโ openhands/
โ โโโ swe-agent/
โโโ scripts/
โ โโโ transform-benchmark.py # data transformation
โ โโโ update-website.sh # easy update script
โโโ website/
โโโ public/
โ โโโ data/
โ โโโ benchmark/
โ โโโ raw/ # benchmark data
โ โโโ summary/ # benchmark data
โโโ src/
โโโ docs/
โโโ about/
โโโ index.tsx
๐ Quick Start
For Contributors
Want to submit your benchmark results? Follow our submission guide โ
For Developers & Maintainers
-
Clone the repository:
git clone https://github.com/your-org/swe-effi.git cd swe-effi -
Process benchmark data:
# Process all new benchmark data ./scripts/update-website.sh --auto # Process specific scaffold/model ./scripts/update-website.sh agentless gpt-4 # Validate files before processing ./scripts/update-website.sh --validate-only -
Run the website locally:
cd website npm install npm run dev
๐ Development Workflow
Processing New Submissions
When contributors submit benchmark results via PR:
- Review the Pull Request for correctness
- Validate locally (optional):
git checkout [pr-branch] python3 scripts/transform-benchmark.py --validate-only - Merge the PR
- Update the website:
./scripts/update-website.sh --auto
Script Reference
update-website.sh options:
--auto: Process all available data automatically--validate-only: Only validate files, don't transform--verbose: Show detailed logs--help: Show help information
transform-benchmark.py options:
--scaffold NAME --model NAME: Process specific combination--validate-only: Only validate file format--auto: Auto process all data with validation--verbose: Show detailed logs
๐ง Technical Requirements
Prerequisites
- Python 3 for data processing
- Node.js and npm for website
Environment Setup
cd website && npm install
๐ค Contributing
Submit Benchmark Results Data Flow
Contributor Results โ PR Submission โ Validation โ Processing โ Website Integration
- Results Collection: Contributors submit via GitHub PRs
- Validation: Automated checks ensure data quality
- Processing: Scripts transform data for website consumption
- Integration: Website automatically displays new results
File Format
Results must include:
combined_stats.jsonsummary_stats.json