repo-scan

March 27, 2026 · View on GitHub

Python 3.6+ License: MIT Platform Agent Skill

English | 中文

Every ecosystem has its own dependency manager, but no tool looks across C++, Android, iOS, C#/.NET, and Web to tell you: how much code is actually yours, what's third-party, and what's dead weight.

repo-scan gives you the answer — a cross-stack source code asset audit that classifies every file, identifies every dependency, and delivers an actionable verdict for each module. One command, zero dependencies, interactive HTML report.

repo-scan banner


The Problem

You're staring at a monorepo with 200+ directories, 50,000 files, multiple tech stacks, and third-party code mixed into source folders. Before you can refactor, merge, or make any architectural decision, you need answers:

  • Which modules are core assets worth investing in?
  • Which are duplicate wheels that should be merged?
  • Which haven't been touched in 3 years and should be retired?
  • Where are the hidden third-party libs with no version tracking?

Running cloc gives you line counts. Running dependency scanners gives you one stack at a time. repo-scan gives you the full picture — across all stacks, in one pass.

What Sets It Apart

Traditional toolsrepo-scan
ScopeSingle language/ecosystemC/C++, Java/Android, iOS, C#/.NET, Web — unified
Third-party detectionDeclared deps onlySource-embedded libs too (50+ known libraries)
OutputRaw metricsActionable 4-level verdicts per module
MonorepoFlat file listHierarchical scan with drill-down HTML
AI-nativeN/ADesigned as Agent Skill with token-efficient analysis

Core Capabilities

  • Cross-stack unified view — C/C++, Java/Android, iOS (OC/Swift), C#/.NET, Web (TS/JS/Vue) in a single report
  • Three-way file classification — Project code / third-party / build artifacts with accurate size metrics
  • Third-party detection — Auto-identifies 50+ libraries (FFmpeg, Boost, OpenSSL...) with version extraction from headers, configs, and package files
  • Four-level verdicts — Every module gets a decision: Core Asset / Extract & Merge / Rebuild / Deprecate
  • Cross-module review — Second-pass analysis finds capability overlaps, dependency topology, verdict corrections, and refactoring priorities
  • Interactive HTML reports — Dark-theme local pages; monorepo mode generates index.html with clickable project cards and verdict distribution bars
  • Incremental deep analysisdeep mode adds thread safety, memory management, error handling, and API consistency checks on top of standard data
  • Hierarchical scanning — Large monorepos auto-split into summary + sub-project reports, keeping AI context manageable
  • Code duplication detection — Finds same-name directories across the project, auto-excludes third-party false positives
  • Git activity analysis — Discovers all sub-repos with commit history (which modules haven't been touched in 2 years?)
  • AI token efficiency — Three-layer strategy: filename inference → key file reading → quality sampling (no exhaustive reading)

Analysis Depth Levels

LevelFiles Read (per module)Quality ChecksUse Case
fast1-2: build config + one key headerDependency versions onlyQuick inventory of huge directories (hundreds of modules)
standard2-5: headers + entry files + build configFull: deps, architecture, tech debtDefault audit
deep5-10: adds core implementation, tests, CIThread safety, memory, error handling, API consistencyIncremental on top of standard data
fullAll files in moduleFull analysis + cross-file comparisonPre-merge comprehensive review

Deep mode is incremental — it detects existing scan data, auto-selects high-value modules (Core Asset + Extract & Merge), and appends detailed analysis:

/repo-scan /path/to/project --level deep                          # auto-select modules
/repo-scan /path/to/project --level deep --modules base,rtmp_sdk  # specific modules

--gap-check — Incremental capability gap detection — after a scan is complete, compare your consolidated module library against candidate source directories to find missed symbols, API differences, and implementation improvements:

/repo-scan --gap-check
/repo-scan --gap-check -m base

Copy config/gap-config-example.json to gap-config.json and fill in your local paths before running. Outputs a Markdown report with [MANDATORY-IMPORT], [MANDATORY-EVAL], and [EVAL-IMPL] tagged items.

Output Sections

SectionContent
Architecture TreePhysical directory structure, semantically compressed, third-party and dead code color-coded
Module DescriptionsFunction, core classes, dependencies, third-party refs (with version assessment), quality, verdict
Asset Triage TableGlobal summary: Core Asset / Extract & Merge / Rebuild / Deprecate
Cross-Module ReviewCapability overlap map, dependency topology, verdict corrections, refactoring priorities
Deep AnalysisPer-file review, thread safety, memory, error handling, API consistency (purple DEEP badge)

Sub-project overview with verdict distribution and DEEP badges

More screenshots: triage table & deep analysis

Asset triage table with four-level verdicts

Deep analysis: per-file review with thread safety and memory checks

Quick Start

Installation

# Global skills directory
git clone https://github.com/haibindev/repo-scan.git ~/.claude/skills/repo-scan

# Or project-level
git clone https://github.com/haibindev/repo-scan.git .claude/skills/repo-scan

As an Agent Skill

/repo-scan /path/to/my-project
/repo-scan /path/to/my-project --level fast
/repo-scan /path/to/my-project --level deep
/repo-scan /path/to/my-project --level deep --modules base,encoder

Standalone Pre-scan

The pre-scan script (Python 3, zero deps) generates structured Markdown data for AI analysis:

python scripts/pre-scan.py /path/to/project                    # stdout
python scripts/pre-scan.py /path/to/project -o report.md       # single file
python scripts/pre-scan.py /path/to/project -d ./scan-output   # hierarchical (recommended)
python scripts/pre-scan.py /path/to/project -c config.json     # custom config
Pre-scan output sections
#SectionDescription
1Overall StatisticsThree-way split: project / third-party / build artifacts
2Top-Level BreakdownFile count, size, build system, classification per directory
3Tech Stack StatsPer-stack source file counts
4Third-Party DepsDetected libraries with name, version, location, size
5Code DuplicationDirectories appearing 3+ times (potential copy-paste)
6Directory TreeClean tree with noise filtered and third-party marked
7Git ActivityCommit history and activity for all discovered repos
8Noise SummaryBuild artifact sizes aggregated by type

Project Structure

repo-scan/
├── SKILL.md                       # Skill definition (Agent entry point)
├── deep-mode.md                   # Deep mode & --modules rules
├── full-mode.md                   # Full mode rules
├── reference.md                   # Tech stack audit reference tables
├── config/
│   ├── ignore-patterns.json       # Configurable ignore/recognition patterns
│   └── gap-config-example.json    # Example config for --gap-check (copy & fill in paths)
├── scripts/
│   ├── pre-scan.py                # Pre-scan script (Python 3, zero deps)
│   ├── capability_gap.py          # Incremental capability gap detection (--gap-check)
│   ├── gen_html.py                # HTML generator (Markdown → interactive pages)
│   └── i18n.py                    # Internationalization (auto-detects zh/en)
└── templates/
    ├── report.html                # Single project template (dark theme)
    ├── index.html                 # Multi-project summary template (cards + cross-analysis)
    └── dual-scan.html             # Dual-scan cross-validation template

Configuration

Edit config/ignore-patterns.json to customize patterns:

{
  "noise_dirs": {
    "common": [".git", ".svn", "obj", "tmp"],
    "cpp": ["Debug", "Release", "x64", "ipch"],
    "java_android": [".gradle", "build", "target"],
    "ios": ["DerivedData", "Pods", "xcuserdata"],
    "web": ["node_modules", "dist", ".next"]
  },
  "thirdparty_dirs": {
    "container_names": ["vendor", "external", "libs"],
    "known_libs": ["ffmpeg", "boost", "openssl", ...]
  }
}

Requirements

  • Python 3.6+
  • An AI Agent with custom skill support (e.g. Claude Code)
  • Git (optional, for activity analysis)

Star History

Star History Chart

About

haibindev.github.io — personal site & blog

License

MIT