SWE-PolyBench Submission
April 25, 2025 ยท View on GitHub
This branch contains the logs, trajectories, and predictions of all leaderboard submissions. We follow similar procedure as SWE-bench with a few exceptions. To submit please follow the following procedure:
-
Fork the SWE-PolyBench repository.
-
Clone the repository. Consider using
git clone --depth 1if cloning takes too long. -
Checkout the
submissionbranch using:
git checkout submission
-
Under the split you evaluated on (either
evaluation/PBorevaluation/PB500), create a folder with the submission date and the agent/model name, i.e.20250402_sweagent_claude-sonnet37.PBis for our full dataset andPB500is for our sampled dataset. -
Within the folder, please include the following files:
all_preds.jsonl: Model predictionslogs/: SWE-PolyBench evaluation artifcats- Evaluation artifacts mean 500/2110 (PB/PB500) files. The file will be
instance_id_result.jsonfiles (i.e.microsoft__vscode-1234_result.json). This is the instance level result file that is generated automatically once you run our evaluation code.
- Evaluation artifacts mean 500/2110 (PB/PB500) files. The file will be
metadata.yaml: Metadata for how result is shown on website. Please include the following fields:name: The name you want in the leaderboard entryoss:trueif your system is open-sourcesite: URL/link to more information about your systempass_rate: The pass rate (resolved rate) you observed after your evaluation run (i.e.XX.XX% (123/500)).
trajs/: Reasoning trace reflecting how your system solved the problem- Submit one reasoning trace per task instance. The reasoning trace should show all of the steps your system took while solving the task. If your system outputs thoughts or comments during operation, they should be included as well.
- The reasoning trace can be represented with any text based file format (e.g. md, json, yaml)
- Ensure the task instance ID is in the name of the corresponding reasoning trace file.
README.md: Include anything you'd like to share about your model here!
-
Create a pull request to the
submissionbranch of SWE-PolyBench with the new folder.
git add .
git commit -m "your message"
git push origin submission
Please NOTE that you need to select submission as the Base branch and the Compare will be your forks submission branch.
๐ Contact
Questions? Please create an issue.
โ๏ธ Citation
If you found this repository helpful or are citing the numbers on the leaderboard for academic purposes, please cite:
@misc{rashid2025swepolybenchmultilanguagebenchmarkrepository,
title={SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents},
author={Muhammad Shihab Rashid and Christian Bock and Yuan Zhuang and Alexander Buchholz and Tim Esler and Simon Valentin and Luca Franceschi and Martin Wistuba and Prabhu Teja Sivaprasad and Woo Jung Kim and Anoop Deoras and Giovanni Zappella and Laurent Callot},
year={2025},
eprint={2504.08703},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2504.08703},
}