RepoRepair

December 21, 2025 ยท View on GitHub

RepoRepair is a novel documentation-driven approach for repository-level automated program repair, which leverages hierarchically generated code documentation to achieve precise fault localization and cost-effective patch generation across diverse programming languages. Evaluated on both SWE-bench Lite and Multimodal benchmarks, it achieves state-of-the-art repair rate (45.67% on Lite, 37.13% on Multimodal) while maintaining superior cost efficiency.

RepoRepair

Key Features

  • ๐Ÿ“š Documentation-Aware:
    Uses LLM-generated code documentation for cross-file context understanding.
  • ๐ŸŒ Language-Agnostic Design:
    Supports JavaScript/TypeScript and Python repositories through AST-based parsing and generalized documentation generation.
  • ๐Ÿ’ฐ Cost Efficiency:
    Achieves average repair costs of $0.44 on SWE-bench Lite and $0.56 on SWE-bench Multimodal.

Performance Highlights

MetricRepoRepairAgentless LiteImprovement
Lite Results
%Resolved45.67%32.33%+13.34%
Avg. Cost/Repair$0.44$0.21+0.23
Multimodal Results
%Resolved37.13%25.34%+11.79%
Avg. Cost/Repair$0.56$0.38+0.18

*Agentless Lite uses different model configurations across benchmarks

Installation

git clone https://github.com/ZhongQiangDev/RepoRepair.git
cd RepoRepair
pip install -r requirements.txt  # Requires Python 3.9+

Usage

1. Resource Download

# Download issues and repositories
python issue_diff_download.py

python issue_repo_download.py
python unzip.py
  • Use Selenium to fetch the repository's compressed file from GitHub.

2. Repository Parsing

# Parse code and analyze dependencies
python CodeParser.py  # Uses Tree-sitter for PY/JS/TS parsing
python DependencyGraph.py
python generate_doc_meta.py  # Output: repo_doc_meta/

3. Code Documentation Generation

# Generate documentation at different levels
python generate_document_func.py  # Output: repo_document_func/

python generate_document_file.py  # Output: repo_document_file/

4. File Retrieval

# Analyze and retrieve relevant files
python ps_cause_analyze.py  # Output: problem_statement_analysis/

python file_retrival.py  # Uses LangChain, output: repo_file_rag/

5. Localization

# Hierarchical localization
python file_localization.py  # Output: buggy_files/

python func_localization.py  # Output: buggy_elements/

6. Repair

# Generate patches
python bug_repair.py  # Output: bug_repair/

Directory Structure

โ”œโ”€โ”€ repo_doc_meta/              # Parsed repository metadata
โ”œโ”€โ”€ repo_document_func/         # Function-level documentation
โ”œโ”€โ”€ repo_document_file/         # File-level documentation
โ”œโ”€โ”€ problem_statement_analysis/ # Issue analysis results
โ”œโ”€โ”€ repo_file_rag/              # Retrieved files
โ”œโ”€โ”€ buggy_files/                # Localized problematic files  
โ”œโ”€โ”€ buggy_elements/             # Localized functions/classes
โ””โ”€โ”€ bug_repair/                 # Generated patches