SWE-QA
April 7, 2026 Β· View on GitHub
SWE-QA is a benchmark for repository-level code question answering. This repository hosts the benchmark data (questionβanswer pairs tied to pinned commits) and code to construct the benchmark, clone evaluation repositories, and run baselines and agents.
It covers the original SWE-QA v1 release (12 popular Python projects such as Django and Flask) together with the complementary SWE-QA v2 that adds conan, streamlink, and reflex.
πOur paper "SWE-QA: Can Language Models Answer Repository-level Code Questions?" has been accepted to ACL 2026 Findings.
π Paper
For more details about the methodology and results, please refer to the paper:
- Paper: "SWE-QA: Can Language Models Answer Repository-level Code Questions?"γarxivγ
π Dataset
The benchmark dataset is available on Hugging Face:
- Dataset: γhugging-faceγ
Benchmark Construction Workflow
The following diagram illustrates the workflow for constructing the SWE-QA benchmark:

Benchmark Example
The following example shows the structure and format of questions in the benchmark:

π Repository Structure
SWE-QA-Bench/ # Repository root
βββ Benchmark/ # Released benchmark (JSONL per project)
β βββ *.jsonl # e.g. astropy.jsonl, django.jsonl, ...
βββ Benchmark construction/ # Build and score the benchmark
β βββ issue_analyzer/ # GitHub issue to question drafts
β βββ qa_generator/
β βββ repo_parser/
β βββ score/ # e.g. llm-as-a-judge.py
β βββ models/
βββ Experiment/
β βββ ErrorAnalysis/ # e.g. error_analysis.jsonl
β βββ Script/ # Eval methods and agent runners
β βββ llm_direct/
β βββ rag_function_chunk/
β βββ rag_sliding_window/
β βββ SWE-agent_QA/
β βββ OpenHands_QA/
β βββ Cursor-Agent_QA/
βββ assets/ # README figures
βββ clone_repos.sh
βββ repo_commit.txt # URLs + commits for clone_repos.sh
βββ pyproject.toml # Dependencies (uv)
βββ uv.lock
βββ Dockerfile
βββ LICENSE
βββ README.md
After running ./clone_repos.sh, evaluated repositories are checked out under datas/repos/ (not committed to git).
π Environment Setup
Prerequisites
- Python 3.12
- uv package management
- OpenAI API access (required for all evaluation methods)
- Voyage AI API access (required for RAG-based methods)
Installation
Install dependencies:
uv sync
If you want to run evaluation methods
uv sync --extra baseline
SWE Repository Prerequisites:
# Use the provided script to clone all repositories at specific commits
./clone_repos.sh
References
If you use SWE-QA in your work, please cite:
@article{peng2025swe,
title={Swe-qa: Can language models answer repository-level code questions?},
author={Peng, Weihan and Shi, Yuling and Wang, Yuhang and Zhang, Xinyun and Shen, Beijun and Gu, Xiaodong},
journal={arXiv preprint arXiv:2509.14635},
year={2025}
}
π Related resources
For a curated list of papers and resources on repository-level code generation, issue resolution, and related topics (including repo-level code QA), see Awesome Repository-Level Code Generation.