SynthScan
May 17, 2026 · View on GitHub
This file defines the detection patterns used by SynthScan.
Focused exclusively on AI slop — phrases, vocabulary, structural tells, and hallucination markers that indicate AI-generated code. General code-quality issues (linting, security, style) are intentionally excluded to avoid false positives.
Each pattern is defined in a fenced block under its category. To add new patterns, append them to the appropriate section or create a new
## Category.Severity tags — prepend a pattern line with
[CRITICAL],[HIGH],[MEDIUM], or[LOW]to override the default severity for that category. If omitted, the category default applies.Severity → score mapping:
Tag Points CRITICAL 10 HIGH 5 MEDIUM 2 LOW 1
Slop Phrases
Default severity: MEDIUM
Classic filler phrases and clichés that AI code assistants inject into comments, docstrings, and string literals. Humans rarely write these.
# Direct AI self-references
As an AI language model
As a language model
I cannot provide
I'm unable to
# Filler / hedging phrases AI over-produces
It's worth noting that
Note that this is a simplified
This is a basic implementation
For demonstration purposes
Let me know if you need
Feel free to modify
Feel free to adjust
Feel free to customize
Here's a simple example
Here is a simple example
As mentioned earlier
As discussed above
Here's how you can
Here is how you can
This should work for most cases
You can modify this to
You may want to adjust
[LOW] Make sure to replace
[LOW] Don't forget to
# Instructional tone (AI talks to the user, not the reader)
regex:#.*\byou can\s+(also\s+)?(use|try|add|change|modify|adjust|replace)\b
[LOW] regex:#.*\bmake sure (to|you)\b
[LOW] regex:#.*\bdon'?t forget to\b
AI Slop Vocabulary
Default severity: MEDIUM
Distinctive words and phrases LLMs disproportionately overuse in comments, docstrings, and string literals. Individually low signal, but clusters are a strong AI tell.
# High-frequency AI slop words in comments
regex:#.*\b(delve|tapestry|multifaceted|nuanced|streamlined)\b
regex:#.*\b(leverage|utilize|facilitate|comprehensive)\b
regex:#.*\b(robust|seamless|cutting-edge|state-of-the-art|paradigm)\b
regex:#.*\b(aforementioned|henceforth|pertaining to|in conjunction with)\b
regex:#.*\b(endeavor|pivotal|intricate|meticulous|holistic)\b
regex:#.*\b(unleash|empower|elevate|harness|supercharge)\b
regex:#.*\b(game-?changer|best practices|synergy|scalable solution)\b
# Same words in docstrings / multi-line strings
regex:""".*\b(delve into|it's important to note|in order to)\b
regex:""".*\b(at the end of the day|a testament to|serves as a)\b
regex:""".*\b(leverage|utilize|robust|seamless|comprehensive|facilitate)\b
# Overly enthusiastic adverbs in comments
regex:#.*\b(Certainly|Absolutely|Definitely|Essentially|Fundamentally)\b
# "Simply" / "just" — oversimplification markers
[LOW] regex:#.*\b(simply|just)\s+(call|use|add|set|pass|create|return)\b
# Phrases in // comments (JS/Go/Java/C++)
regex://.*\b(delve|tapestry|multifaceted|nuanced|leverage|utilize|robust|seamless)\b
regex://.*\b(Certainly|Absolutely|Definitely|Essentially|Fundamentally)\b
Synthetic Comment Markers
Default severity: HIGH
Comments that explicitly reveal AI authorship or templated generation.
# Direct AI attribution
Generated by AI
Generated by GPT
Generated by ChatGPT
Generated by Copilot
Generated by Claude
Generated by Gemini
Generated by Llama
Generated by Bard
Generated by OpenAI
Auto-generated code
This code was generated
regex:#.*\bAI[- ]generated\b
regex://.*\bAI[- ]generated\b
regex:#.*\bwritten by (an )?AI\b
regex:#.*\bcreated by (an )?AI\b
regex:#.*\bproduced by AI\b
regex://.*\bwritten by (an )?AI\b
# Prompt leakage (AI echoing the user's prompt)
regex:#.*\b(as requested|as you asked|as per your request|per your instructions)\b
regex://.*\b(as requested|as you asked|as per your request)\b
Self-Referential Comments
Default severity: MEDIUM
Comments that narrate what the code is rather than why — a strong AI tell. Humans comment intent; AI describes structure.
# "This X does Y" tautologies
regex:#\s*This\s+(class|function|method|module|file)\s+(is|provides|represents|implements|handles|contains|defines)
regex:#\s*The\s+(following|above|below)\s+(class|function|method|code|block|section)
regex:"""This\s+(class|function|method|module)\s+(is|provides|represents|implements)
# Narrating the obvious
regex:#\s*(Import|Importing)\s+(the\s+)?(necessary|required|needed)\s+(modules|libraries|packages|dependencies)
regex:#\s*(Define|Defining|Create|Creating)\s+(the\s+)?(main|a|an|the)\s+\w+
regex:#\s*(Initialize|Initializing)\s+(the\s+)?\w+\s+(variable|object|instance|class)
Redundant / Tautological Comments
Default severity: LOW
Comments that restate the code verbatim — a hallmark of LLM generation.
# Increment / assignment restatements
regex:#\s*(Set|Assign)\s+\w+\s+to\s+
regex:#\s*(Increment|Decrement)\s+\w+(\s+by\s+\d+)?\s*$
regex:#\s*Return\s+(the\s+)?(result|value|output|data)\s*$
regex:#\s*(Loop|Iterate)\s+(through|over)\s+(the\s+)?(list|array|items|elements|data)
regex:#\s*(Check|Verify)\s+if\s+
regex:#\s*(Print|Display|Output)\s+(the\s+)?(result|value|output|message)
regex:#\s*(Open|Close|Read|Write)\s+(the\s+)?file
regex:#\s*(Add|Append|Push|Insert)\s+(the\s+)?\w+\s+(to|into)\s+(the\s+)?(list|array|queue|stack)
Verbosity Indicators
Default severity: LOW
Overly explanatory phrases that signal machine-generated text.
# Over-explanation in comments
This line initializes
This variable stores
We need to check if
The purpose of this function is
The following code block
This section handles
# Numbered step narration
regex:#\s*Step\s+\d+\s*:
regex://\s*Step\s+\d+\s*:
Example Usage Blocks
Default severity: LOW
AI assistants almost always append "Example usage:" blocks at the bottom of generated code.
# Example-usage header comments
regex:#\s*(Example\s+usage|Usage\s+example|Sample\s+usage|How\s+to\s+use)\s*:?\s*$
regex://\s*(Example\s+usage|Usage\s+example)\s*:?\s*$
regex:#\s*Usage:\s*$
Fake / Example Data
Default severity: LOW
Hardcoded placeholder data that AI models insert as "examples" and developers forget to replace. Severity reduced to LOW because placeholder emails and names like "John Doe" appear frequently in legitimate pre-ChatGPT sample code, tutorials, and documentation.
# Canonical placeholder names / emails — LOW to reduce false positives on old tutorial code
[LOW] regex:['"]John\s+Doe['"]
[LOW] regex:['"]Jane\s+Doe['"]
[LOW] regex:['"]user@example\.com['"]
[LOW] regex:['"]admin@example\.com['"]
[LOW] regex:['"]test@test\.com['"]
[LOW] regex:['"]foo@bar\.com['"]
[LOW] regex:['"]123\s+Main\s+St(reet)?['"]
[LOW] regex:['"]Acme\s+(Corp|Inc|Ltd)['"]
# Lorem ipsum / placeholder text — strong AI signal when in code (not docs)
Lorem ipsum
dolor sit amet
# Phone number placeholders
[LOW] regex:['"]555-\d{4}['"]
Cross-Language Confusion
Default severity: HIGH
Applies to: .py
AI models trained on many languages frequently emit idioms from the wrong language. These are strong AI tells because experienced human developers don't make these mistakes.
# Wrong-language method calls in Python files
regex:\w+\.push\(
regex:\w+\.length\(\)
regex:\w+\.equals\(
regex:\w+\.toString\(\)
# null / undefined in Python (should be None)
regex:\b(null|undefined)\s*[;)}\],]
regex:\bif\s+\w+\s*(==|!=|is)\s*null\b
# true/false lowercase in Python (should be True/False)
regex:\breturn\s+(true|false)\s*$
# Logical operators from C/JS used in Python files (should be and/or/not)
# Only match when preceded by a Python-like variable/expression context
regex:^[^#]*\b\w+\s+&&\s+\w+
regex:^[^#]*\b\w+\s+\|\|\s+\w+
Hallucination Indicators
Default severity: CRITICAL
Patterns that suggest hallucinated APIs, phantom imports, or invented function signatures — among the strongest signals of AI-generated code.
# Suspicious deeply-nested import paths (common AI hallucinations)
regex:from\s+\w+\.utils\.helpers\s+import\s+\w+
regex:from\s+\w+\.core\.exceptions\s+import\s+\w+Error
# Hallucinated long chained attribute access
regex:\w+\.\w+\.\w+\.\w+\.\w+\.\w+\(
Overly Generic Function Names
Default severity: LOW
Applies to: .py, .js, .ts, .jsx, .tsx
Function names so generic they indicate AI-generated scaffolding rather than domain-specific design. Severity is LOW because experienced humans also write these as stubs — cluster scoring carries the signal.
[LOW] regex:def\s+(process_data|handle_request|do_something|do_stuff)\s*\(
[LOW] regex:def\s+(run_task|execute_task|perform_action|main_function)\s*\(
[LOW] regex:def\s+(helper|my_function|my_method|test_function)\s*\(
[LOW] regex:function\s+(processData|handleRequest|doSomething|getData)\s*\(
[LOW] regex:func\s+(processData|handleRequest|doSomething)\s*\(
Excessive Try-Catch Wrapping
Default severity: MEDIUM
AI models tend to wrap every operation in try/except with generic AI-typical error messages.
# Bare "Error:" prefix (AI-typical phrasing)
regex:print\s*\(\s*f?['"]Error:?\s
regex:print\s*\(\s*f?['"]An error occurred
regex:print\s*\(\s*f?['"]Something went wrong
# Bare except Exception catch-alls (AI uses these excessively)
[LOW] regex:^\s*except\s+Exception(\s+as\s+\w+)?:
Decorative Section Separators
Default severity: MEDIUM
AI assistants love inserting visually decorated section headers with Unicode box-drawing characters or long dash/equals lines. Humans occasionally do this, but AI does it systematically throughout a file.
# Unicode box-drawing section headers (── Title ──────)
regex:#.*[─━═╌╍┄┅]{5,}
regex://.*[─━═╌╍┄┅]{5,}
# Long dash/equals separator lines (10+ chars)
regex:#\s*-{10,}\s*$
regex:#\s*={10,}\s*$
Magic Placeholder Names
Default severity: HIGH
Hardcoded API key and token placeholders that AI models insert as stand-ins. Near-certain AI artifacts when found in source code.
regex:\byour[_-]?api[_-]?key\b
regex:\bYOUR[_-]?API[_-]?KEY\b
regex:\bYOUR[_-]?TOKEN[_-]?HERE\b
regex:\bYOUR[_-]?SECRET[_-]?HERE\b
regex:\bINSERT[_-]?YOUR[_-]?(KEY|TOKEN|SECRET|PASSWORD)\b
regex:['"]?<YOUR[_-]?(API[_-]?KEY|TOKEN|SECRET)>['"]?
regex:\byour[_-]?database[_-]?url\b
regex:\bYOUR[_-]?DATABASE[_-]?URL\b
Hyper-Verbose Identifiers
Default severity: LOW
Function names so long they describe implementation rather than domain intent.
AI models consistently produce identifiers like calculateTotalAmountOfAllItems
where a human writes total_price.
regex:def\s+[a-z_]{25,}\s*\(
regex:function\s+[a-zA-Z]{25,}\s*\(
regex:def\s+[a-z_]*(process|calculate|compute|validate|handle|get|set)(And|Or|Then)[A-Z]
regex:function\s+[a-zA-Z]*(process|calculate|compute|validate|handle|get|set)(And|Or|Then)[A-Z]
regex:class\s+\w*(DataManager|DataProcessor|DataHandler|RequestHandler|ResponseHandler)\b
Cross-Language Confusion (JS/TS)
Default severity: HIGH
Applies to: .js, .ts, .jsx, .tsx
Python idioms incorrectly used in JavaScript/TypeScript files. Experienced JS/TS developers never write these; AI models do frequently.
regex:\breturn\s+None\b
regex:\bif\s+\w+\s*(==|===|!=|!==)\s*None\b
regex:\breturn\s+(True|False)\b
regex:^\s*elif\s+
regex:^\s*print\s*\(
How to Add New Patterns
- Create a new
## Categoryheading and optionally state a default severity. - Optionally add an
Applies to: .py, .jsline to restrict the category to specific file extensions. - Add a fenced code block tagged as
```patterns. - Put one pattern per line.
- Plain text lines are matched as case-insensitive substrings.
- Lines starting with
regex:are compiled as Python regular expressions. - Prepend
[CRITICAL],[HIGH],[MEDIUM], or[LOW]to override the category default.
- Comment lines starting with
#inside the block are ignored. - Commit and push — the action will pick up new patterns automatically.