Immersion in the GitHub Universe: Scaling Coding Agents to Mastery

March 5, 2026 ยท View on GitHub

arXiv Hugging Face Datasets Hugging Face Models Website License

๐Ÿ”ฅ Highlights

  • Source from 6M+ pull requests and 23000+ repositories.
  • Cover 5200 Repositories.
  • 100k high-quality instances.
  • 71k trajectories from DeepSeek v3.2 with 3.5B token.
  • Strong performance: 64% in SWE-bench-Verified trained from Qwen3-30A3B-Instruct.

๐Ÿ“ฃ News

  • 2026-03-03 We released AweAgent, which now provides native support for Scale-SWE data.
  • 2026-02-26 ๐Ÿš€ We released a portion of our data on Hugging Face. This release includes 20,000 SWE task instancesโ€”currently the largest Real Executable open-source SWE dataset availableโ€”alongside 71k distillation trajectories(3.5B) from DeepSeek v3.2. Much more data will be released in the future.
  • 2026-02-10 ๐Ÿ“ Our paper "Immersion in the GitHub Universe: Scaling Coding Agents to Mastery" is now available on arXiv.

FAQ

  • For evaluation of Scale-SWE-Data, you can use AweAgent and refer to this evaluation script.

๐Ÿ“Š Data Format

FieldDescription
instance_idA unique identifier formatted as {user}_{repo}_pr{id}.
userThe owner of the GitHub repository.
repoThe name of the GitHub repository.
languageThe programming language of the codebase (currently Python).
workdirThe working directory path within the environment.
image_urlThe URL of the pre-built Docker image for the task.
patchThe ground-truth patch (Golden Patch) from the corresponding pull request.
pr_commitThe commit hash of the pull request.
parent_commitThe commit hash of the parent commit (base state).
problem_statementThe issue description conveying the bug, provided to the model as input.
f2p_patchThe developer-written test patch containing tests that fail before the fix (if available). For evaluation, this patch should be applied. See this script.
f2p_scriptThe synthetic reproduction script generated by our unit-test creator agent. Because a lot of high qaulity pull request do not have author written F2P, we can only synthetic F2P. This should be applied as test_fail_to_pass.py file just under repository directory. just before evaluation. See this script.
FAIL_TO_PASSUnit tests that fail on the buggy version but pass after the fix.
PASS_TO_PASSUnit tests that pass in both versions (regression tests).
github_urlThe URL of the original GitHub repository.
pre_commandsThese commands must be executed immediately upon entering the container to check out the correct commit.

๐Ÿค– Results

We fine-tuned Qwen-30B-A3B-Instruct on our synthesized trajectories.

Scale-SWE-Agent

Please use AweAgent to inference Scale-SWE-Agent. Scale-SWE-Agent model parameter is avaliable at Huggingface. Key parameters can be seen below:

ParameterValue
Max turns200
Max sequence length256k
Temperature1

๐Ÿ“– Citation

If you find this project useful for your research, please consider citing our paper:

@misc{zhao2026immersiongithubuniversescaling,
      title={Immersion in the GitHub Universe: Scaling Coding Agents to Mastery}, 
      author={Jiale Zhao and Guoxin Chen and Fanzhe Meng and Minghao Li and Jie Chen and Hui Xu and Yongshuai Sun and Xin Zhao and Ruihua Song and Yuan Zhang and Peng Wang and Cheng Chen and Jirong Wen and Kai Jia},
      year={2026},
      eprint={2602.09892},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2602.09892}, 
}

๐Ÿ“„ License

This project is licensed under the CC BY 4.0 License - see the LICENSE file for details.