🔍 SynthScan

April 17, 2026 · View on GitHub

GitHub Marketplace License: MIT

Detect AI-generated (synthetic) code patterns in your repository and automatically open a GitHub Issue with the findings.

120 detection patterns across 14 categories · Severity-weighted scoring · Normalised per 1 000 LOC · Editable Markdown pattern file


How It Works

  1. Patterns are defined in a human-readable Markdown file (patterns/synthetic_patterns.md).
    Each pattern is either a plain-text substring (case-insensitive) or a Python regex (prefixed with regex:).
    Patterns carry a severity (CRITICAL = 10, HIGH = 5, MEDIUM = 2, LOW = 1 points).
    Categories can optionally be scoped to file extensions (e.g. Applies to: .py).

  2. The scanner walks every source file in the target directory, tests each line against every applicable pattern, and computes:

    • a raw score — the sum of severity points for all matches,
    • the Synthetic Code Score — the raw score normalised per 1 000 lines of code.
      This makes the score comparable across projects of different sizes.
  3. A GitHub Issue is created (or updated) with:

    • the Synthetic Code Score and severity breakdown,
    • every matched snippet grouped by category,
    • the file path and line number for each hit.
  4. A JSON report is uploaded as a build artifact for programmatic consumption.

Synthetic Code Score

The headline metric is score per 1 000 lines of code:

Synthetic Code Score=Raw ScoreLines Scanned×1000\text{Synthetic Code Score} = \frac{\text{Raw Score}}{\text{Lines Scanned}} \times 1000

This normalisation prevents large codebases from naturally accumulating higher scores than small ones.
A project with 100 000 LOC and a handful of incidental matches will score near zero,
while a small but fully AI-generated project will score significantly higher.

Reference ranges (from benchmark testing):

Score rangeInterpretation
0 – 5Likely human-written
5 – 15Low AI signal — review flagged lines
15 – 30Moderate AI signal
30+Strong AI signal

Quick Start

From the GitHub Actions Marketplace

Add this workflow to any repo at .github/workflows/synthscan.yml:

name: SynthScan

on:
  workflow_dispatch:
    inputs:
      scan_path:
        description: "Path to scan"
        default: "."
      score_threshold:
        description: "Fail when Synthetic Code Score >= value (0 = never)"
        default: "0"

permissions:
  contents: read
  issues: write

jobs:
  synthscan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: marcoramilli/SynthScan@v1
        with:
          scan_path: ${{ github.event.inputs.scan_path || '.' }}
          score_threshold: ${{ github.event.inputs.score_threshold || '0' }}
          create_issue: "true"

Running locally

# scan the current directory
INPUT_SCAN_PATH=. python3 scanner/synthscan.py

# scan a specific folder, write report to a custom path
INPUT_SCAN_PATH=./src INPUT_REPORT_PATH=report.json python3 scanner/synthscan.py

Example output:

============================================================
Raw score            : 277  (162 matches)
Lines scanned        : 11187  (32 files)
Synthetic Code Score : 24.8  (per 1k LOC)
============================================================

Matches by category:
  - Decorative Section Separators: 97 matches (194 pts)
  - Excessive Try-Catch Wrapping: 57 matches (57 pts)
  - Cross-Language Confusion: 3 matches (15 pts)
  - Synthetic Comment Markers: 1 matches (5 pts)
  - Self-Referential Comments: 2 matches (4 pts)
  - Verbosity Indicators: 2 matches (2 pts)

Inputs

NameDefaultDescription
scan_path.Directory to scan (relative to repo root).
patterns_filepatterns/synthetic_patterns.mdMarkdown file with detection patterns.
score_threshold0Fail the step when the Synthetic Code Score (per 1k LOC) ≥ this value. 0 = never fail.
create_issuetrueOpen / update a GitHub Issue with the report.
issue_labelsynthscanLabel applied to the created issue.
report_pathsynthscan-report.jsonPath for the JSON artefact.

Outputs

NameDescription
scoreSynthetic Code Score (normalised per 1k LOC).
raw_scoreUn-normalised sum of severity points.
match_countNumber of pattern hits.
lines_scannedTotal lines of source code scanned.
issue_bodyFull Markdown report.

Pattern Categories

CategoryDefault SeverityWhat it detects
Slop PhrasesMEDIUMFiller clichés AI injects (Feel free to modify, Here's a simple example)
AI Slop VocabularyMEDIUMOverused LLM words (delve, leverage, robust, seamless)
Synthetic Comment MarkersHIGHDirect AI attribution (Generated by GPT, AI-generated)
Self-Referential CommentsMEDIUMComments narrating structure instead of intent
Redundant / Tautological CommentsLOWComments restating code verbatim (# Set x to 5)
Verbosity IndicatorsLOWOverly explanatory phrases (This line initializes)
Example Usage BlocksLOW# Example usage: blocks AI always appends
Fake / Example DataMEDIUMPlaceholder data (John Doe, user@example.com)
Cross-Language ConfusionHIGHWrong-language idioms in Python (.push(), null, &&)
Hallucination IndicatorsCRITICALPhantom imports, hallucinated API chains
Overly Generic Function NamesLOWprocess_data(), do_something(), helper()
Excessive Try-Catch WrappingMEDIUMBare except Exception, generic error messages
Decorative Section SeparatorsMEDIUMUnicode box-drawing headers, long ---- lines

Updating Patterns

All detection patterns live in patterns/synthetic_patterns.md.

To add a new pattern:

  1. Open the file and find (or create) a ## Category section.
  2. Optionally add Applies to: .py, .js to restrict the category to specific file extensions.
  3. Inside the ```patterns block, add one pattern per line.
    • Plain text → matched as a case-insensitive substring.
    • regex: prefix → compiled as a Python regular expression.
    • Prepend [CRITICAL], [HIGH], [MEDIUM], or [LOW] to override the category default.
    • Lines starting with # are comments and ignored.
  4. Commit and push. The next scan will pick up the changes automatically.

JSON Report

The scanner writes a JSON report to synthscan-report.json (configurable via report_path):

{
  "synthetic_code_score": 24.8,
  "raw_score": 277,
  "match_count": 162,
  "lines_scanned": 11187,
  "files_scanned": 32,
  "matches": [
    {
      "file": "app.py",
      "line": 31,
      "text": "# ── Background task tracker ──────────",
      "pattern": "#.*[─━═╌╍┄┅]{5,}",
      "category": "Decorative Section Separators",
      "severity": "MEDIUM",
      "score": 2.0
    }
  ]
}

Project Structure

SynthScan/
├── action.yml                          # GitHub Action definition (Marketplace entry)
├── LICENSE                             # MIT License
├── scanner/
│   └── synthscan.py                    # Core scanning engine
├── patterns/
│   └── synthetic_patterns.md           # Detection patterns (editable)
└── README.md

License

MIT — see LICENSE.