AIDev: Studying AI Coding Agents on GitHub (The Rise of AI Teammates in Software Engineering 3.0)
November 5, 2025 Β· View on GitHub
AIDev: Studying AI Coding Agents on GitHub (The Rise of AI Teammates in Software Engineering 3.0)
π’ Weβre hosting the MSR 2026 Mining Challenge (co-located with ICSE 2026 in Rio de Janeiro, Brazil). Details and submissions:
β οΈβ οΈβ οΈ https://2026.msrconf.org/track/msr-2026-mining-challenge β οΈβ οΈβ οΈ
- Paper: https://arxiv.org/abs/2507.15003
- HuggingFace Dataset: https://huggingface.co/datasets/hao-li/AIDev
- Example Notebooks:
| Description | Notebook Link | Open in Colab |
|---|---|---|
| Basic usage | load_AIDev.ipynb | |
| Dataset overview | dataset_overview.ipynb | |
| Analysis of programming usage | language_usage.ipynb | |
| PR merge rate and turnaround time | productivity.ipynb |
β οΈ Update (Aug 10, 2025): The dataset has been refreshed to include data up to August 1, 2025, ensuring our dataset reflects the most recent trends in coding agents.
This repository contains the replication package for the paper "The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping SE". Due to the size limit of GitHub repositories, the full dataset is not included here. You can find our full dataset on HuggingFace: https://huggingface.co/datasets/hao-li/AIDev
If you're interested in the raw data of AIDev-pop, you can find them here: https://drive.google.com/file/d/1l0_RjS7ZT0Y27V3mv0oJK-jfeRkhq5l5/view?usp=drive_link
Overview
The overview of the AIDev dataset is as follows:
| #PR | #Developer | #Repo | |
|---|---|---|---|
OpenAI Codex | 814,522 | 61,653 | 84,704 |
Devin | 29,744 | NA | 4,747 |
GitHub Copilot | 50,447 | NA | 14,492 |
Cursor | 32,941 | 9,658 | 12,699 |
Claude Code | 5,137 | 1,643 | 1,915 |
| Total | 932,791 | 72,189 | 116,211 |

Repository Structure
βββ AIDev-pop/ # AIDev-pop subset of AIDev
βββ analysis/ # Analysis scripts and Jupyter notebooks
βββ figs/ # Generated figures and results
βββ requirements.txt # Python dependencies
βββ README.md # This file
Installation
Install required dependencies:
pip install -r requirements.txt
Key Findings
The key findings from the analysis of are based on AIDev-pop, a subset of the AIDev dataset.
AIDev-pop: Filtered (>100 stars)
| #PR | #Developer | #Repo | |
|---|---|---|---|
OpenAI Codex | 21,799 | 1,284 | 1,248 |
Devin | 4,827 | NA | 288 |
GitHub Copilot | 4,970 | NA | 1,012 |
Cursor | 1,541 | 363 | 327 |
Claude Code | 459 | 236 | 213 |
| Total | 33,596 | 1,796 | 2,807 |

Productivity in the Coding Agents Era

Turnaround Time

Language Usage

Autonomous Coding Agents exhibit distinct language preferences reflecting domain specialization in their capabilities. TypeScript is the most common language across all agents, underscoring its popularity in AI-assisted development. However, notable divergences emerge: OpenAI Codex shows a pronounced skew toward Python, while GitHub Copilot heavily favours C#, likely reflecting their respective integrations and user bases.
Dataset Schema

Citation
If you use this dataset or code in your research, please cite our paper:
@misc{li2025aiteammates,
title={The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering},
author={Hao Li and Haoxiang Zhang and Ahmed E. Hassan},
year={2025},
eprint={2507.15003},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2507.15003},
}