SWE-Effi

August 18, 2025 ยท View on GitHub

Note

Our evaluation script will be released soon!

SWE-Effi

A comprehensive benchmark evaluation platform for Software Engineering Efficiency across different AI scaffolds and models.

๐Ÿ“Š Overview

SWE-Effi provides a standardized platform for evaluating and comparing AI-powered software engineering tools across different scaffolds and language models. Our platform aggregates benchmark results and presents them through an interactive web interface.

๐ŸŒ Visit the Live Platform
๐Ÿ“ Submit Your Results

๐Ÿ“ Repository Structure

SWE-Effi
โ”œโ”€โ”€ benchmark
โ”‚   โ””โ”€โ”€ results
โ”‚       โ””โ”€โ”€ agent-scaffold-stats
โ”‚           โ”œโ”€โ”€ agentless/
โ”‚           โ”‚   โ”œโ”€โ”€ GPT-4o-mini-2024-07-18/
โ”‚           โ”‚   โ”‚   โ”œโ”€โ”€ combined_stats.json
โ”‚           โ”‚   โ”‚   โ””โ”€โ”€ summary_stats.json
โ”‚           โ”‚   โ””โ”€โ”€ qwen3-32B/
โ”‚           โ”‚       โ”œโ”€โ”€ combined_stats.json
โ”‚           โ”‚       โ””โ”€โ”€ summary_stats.json
โ”‚           โ”œโ”€โ”€ agentless-mini/
โ”‚           โ”œโ”€โ”€ auto-code-rover/
โ”‚           โ”œโ”€โ”€ openhands/
โ”‚           โ””โ”€โ”€ swe-agent/
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ transform-benchmark.py      # data transformation
โ”‚   โ””โ”€โ”€ update-website.sh           # easy update script
โ””โ”€โ”€ website/
    โ”œโ”€โ”€ public/
    โ”‚   โ””โ”€โ”€ data/
    โ”‚         โ””โ”€โ”€ benchmark/
    โ”‚             โ””โ”€โ”€ raw/            # benchmark data
    โ”‚                 โ””โ”€โ”€ summary/    # benchmark data
    โ””โ”€โ”€ src/
        โ””โ”€โ”€ docs/
            โ”œโ”€โ”€ about/
            โ””โ”€โ”€ index.tsx

๐Ÿš€ Quick Start

For Contributors

Want to submit your benchmark results? Follow our submission guide โ†’

For Developers & Maintainers

  1. Clone the repository:

    git clone https://github.com/your-org/swe-effi.git
    cd swe-effi
    
  2. Process benchmark data:

    # Process all new benchmark data
    ./scripts/update-website.sh --auto
    
    # Process specific scaffold/model
    ./scripts/update-website.sh agentless gpt-4
    
    # Validate files before processing
    ./scripts/update-website.sh --validate-only
    
  3. Run the website locally:

    cd website
    npm install
    npm run dev
    

๐Ÿ›  Development Workflow

Processing New Submissions

When contributors submit benchmark results via PR:

  1. Review the Pull Request for correctness
  2. Validate locally (optional):
    git checkout [pr-branch]
    python3 scripts/transform-benchmark.py --validate-only
    
  3. Merge the PR
  4. Update the website:
    ./scripts/update-website.sh --auto
    

Script Reference

update-website.sh options:

  • --auto: Process all available data automatically
  • --validate-only: Only validate files, don't transform
  • --verbose: Show detailed logs
  • --help: Show help information

transform-benchmark.py options:

  • --scaffold NAME --model NAME: Process specific combination
  • --validate-only: Only validate file format
  • --auto: Auto process all data with validation
  • --verbose: Show detailed logs

๐Ÿ”ง Technical Requirements

Prerequisites

  • Python 3 for data processing
  • Node.js and npm for website

Environment Setup

cd website && npm install

๐Ÿค Contributing

Submit Benchmark Results Data Flow

Contributor Results โ†’ PR Submission โ†’ Validation โ†’ Processing โ†’ Website Integration
  1. Results Collection: Contributors submit via GitHub PRs
  2. Validation: Automated checks ensure data quality
  3. Processing: Scripts transform data for website consumption
  4. Integration: Website automatically displays new results

File Format

Results must include:

  • combined_stats.json
  • summary_stats.json

๐Ÿ“„ License

Apache License 2.0