AIDev: Studying AI Coding Agents on GitHub (The Rise of AI Teammates in Software Engineering 3.0)

November 5, 2025 Β· View on GitHub

Description

AIDev: Studying AI Coding Agents on GitHub (The Rise of AI Teammates in Software Engineering 3.0)

πŸ“’ We’re hosting the MSR 2026 Mining Challenge (co-located with ICSE 2026 in Rio de Janeiro, Brazil). Details and submissions:
⚠️⚠️⚠️ https://2026.msrconf.org/track/msr-2026-mining-challenge ⚠️⚠️⚠️

Paper Hugging Face DOI

DescriptionNotebook LinkOpen in Colab
Basic usageload_AIDev.ipynbOpen In Colab
Dataset overviewdataset_overview.ipynbOpen In Colab
Analysis of programming usagelanguage_usage.ipynbOpen In Colab
PR merge rate and turnaround timeproductivity.ipynbOpen In Colab

⚠️ Update (Aug 10, 2025): The dataset has been refreshed to include data up to August 1, 2025, ensuring our dataset reflects the most recent trends in coding agents.

This repository contains the replication package for the paper "The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping SE". Due to the size limit of GitHub repositories, the full dataset is not included here. You can find our full dataset on HuggingFace: https://huggingface.co/datasets/hao-li/AIDev

If you're interested in the raw data of AIDev-pop, you can find them here: https://drive.google.com/file/d/1l0_RjS7ZT0Y27V3mv0oJK-jfeRkhq5l5/view?usp=drive_link

Overview

The overview of the AIDev dataset is as follows:

#PR#Developer#Repo
OpenAI Codex814,52261,65384,704
Devin29,744NA4,747
GitHub Copilot50,447NA14,492
Cursor32,9419,65812,699
Claude Code5,1371,6431,915
Total932,79172,189116,211

Repository Structure

β”œβ”€β”€ AIDev-pop/              # AIDev-pop subset of AIDev
β”œβ”€β”€ analysis/              # Analysis scripts and Jupyter notebooks
β”œβ”€β”€ figs/                  # Generated figures and results
β”œβ”€β”€ requirements.txt       # Python dependencies
└── README.md             # This file

Installation

Install required dependencies:

pip install -r requirements.txt

Key Findings

The key findings from the analysis of are based on AIDev-pop, a subset of the AIDev dataset.

AIDev-pop: Filtered (>100 stars)

#PR#Developer#Repo
OpenAI Codex21,7991,2841,248
Devin4,827NA288
GitHub Copilot4,970NA1,012
Cursor1,541363327
Claude Code459236213
Total33,5961,7962,807

Productivity in the Coding Agents Era

pr_merge_compare_radar2.png

Turnaround Time

Language Usage

Autonomous Coding Agents exhibit distinct language preferences reflecting domain specialization in their capabilities. TypeScript is the most common language across all agents, underscoring its popularity in AI-assisted development. However, notable divergences emerge: OpenAI Codex shows a pronounced skew toward Python, while GitHub Copilot heavily favours C#, likely reflecting their respective integrations and user bases.

Dataset Schema

Citation

If you use this dataset or code in your research, please cite our paper:

@misc{li2025aiteammates,
      title={The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering}, 
      author={Hao Li and Haoxiang Zhang and Ahmed E. Hassan},
      year={2025},
      eprint={2507.15003},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2507.15003}, 
}