Contributing to user-scanner
June 20, 2026 · View on GitHub
This project separates two kinds of checks:
- Username availability checks (under
user_scanner/user_scan/*) — synchronous validators that the main username scanner uses. - Email OSINT checks (under
user_scanner/email_scan/) — asynchronous, multi-step flows that probe signup pages or email-focused APIs. Put email-focused modules inuser_scanner/email_scan/(subfolders likesocial/,dev/,community,creatoretc. are fine — follow the existing tree).
Module naming for both email_scan and user_scan modules
- File name must be the platform name in lowercase (no spaces or special characters).
- Examples:
github.py,reddit.py,x.py,pinterest.py
- Examples:
Email-scan (email_scan) — guide for contributors
Minimal best-practices checklist for email modules
- Put file in
user_scanner/email_scan/<category>/service.py. - Export
async def validate_<service>(email: str) -> Result. - Use
httpx.AsyncClientfor requests, with sensible timeouts and follow_redirects when needed. - Add a short docstring describing environment variables (api keys), rate limits, and responsible-use note (if required)
Example: Mastodon async example:
import httpx
import re
from user_scanner.core.result import Result
async def _check(email: str) -> Result:
"""
Internal helper that performs the multi-step signup probe.
This function demonstrates how to handle CSRF tokens, custom error
messages (like IP bans), and passing the target URL back to Results.
"""
# The display URL used for output and error reporting
show_url = "https://mastodon.social"
signup_url = f"{show_url}/auth/sign_up"
post_url = f"{show_url}/auth"
headers = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"referer": f"{show_url}/explore",
"origin": show_url,
}
async with httpx.AsyncClient(http2=True, headers=headers, follow_redirects=True) as client:
try:
# 1. Access the signup page to retrieve required CSRF tokens
initial_resp = await client.get(signup_url, timeout=15.0)
if initial_resp.status_code not in [200, 302]:
return Result.error(f"Failed to access signup page: {initial_resp.status_code}", url=show_url)
# Extract the CSRF/authenticity token from the HTML
token_match = re.search(r'name="csrf-token" content="([^"]+)"', initial_resp.text)
if not token_match:
return Result.error("Could not find authenticity token", url=show_url)
csrf_token = token_match.group(1)
# 2. Prepare the probe payload with the email we want to check
payload = {
"authenticity_token": csrf_token,
"user[account_attributes][username]": "no3motions_robot_020102",
"user[email]": email,
"user[password]": "Theleftalone@me",
"user[password_confirmation]": "Theleftalone@me",
"user[agreement]": "1",
"button": ""
}
response = await client.post(post_url, data=payload, timeout=15.0)
res_text = response.text
res_status = response.status_code
# 3. Analyze the response to determine account status
if "has already been taken" in res_text:
return Result.taken(url=show_url)
elif "registration attempt has been blocked" in res_text:
return Result.error("Your IP has been flagged by Mastodon", url=show_url)
elif res_status == 429:
return Result.error("Rate limited; try using the '-d' flag", url=show_url)
elif res_status in [200, 302]:
# If no 'taken' message is found and status is OK/Redirect, it's available
return Result.available(url=show_url)
else:
return Result.error("Unexpected response body", url=show_url)
except Exception as exc:
# Always pass the url=show_url even in exceptions for clear reporting
return Result.error(str(exc), url=show_url)
async def validate_mastodon(email: str) -> Result:
"""
Public validator used by the email mode.
All email modules must export a 'validate_<name>' function that
returns a Result object.
"""
return await _check(email)
Username availability check guide:
Validator function (user_scan/)
Each module must expose exactly one validator function named:
def validate_<sitename>(user: str) -> Result:
...
CRITICAL Rules for user_scan Modules:
- Explicit Verification (No False Positives): Never rely solely on a generic HTTP 200 to assume availability. Many WAFs and CDNs intercept requests and return 200 OK. You MUST explicitly verify a unique string or JSON key for BOTH the
takenandavailablestates. Never use a bareelse: return Result.available()block. - Deep Data Extraction: If the user is found, attempt to extract rich metadata (fullname, location, bio, stats) and return it via
Result.taken(extra={"fullname": "John Doe", ...}). - Strict Error Handling: NEVER use
raise Exception(). All unhandled states or unexpected status codes must returnResult.error(f"Unexpected status code {resp.status_code}"). - Use Orchestrator Helpers: Use
generic_validateto standardizehttpxlogic, but write robustprocesscallbacks.
Orchestrator helpers (user_scan)
To keep validators DRY, the repository provides helper functions in core/orchestrator.py.
1. generic_validate (Preferred)
- Purpose: Run a request for a given URL and let a callback (
process) inspect thehttpx.Responseand return aResult. - Use case: Highly recommended for all modern modules to inspect response content, prevent false positives, and parse out deep data.
Example robust module with deep data extraction:
from user_scanner.core.orchestrator import generic_validate
from user_scanner.core.result import Result
import re
import json
def validate_example(user: str) -> Result:
url = f"https://www.example.com/{user}/profile"
show_url = "https://www.example.com"
headers = {"User-Agent": "Mozilla/5.0"}
def process(response):
# 1. Explicitly check for the "not found" state
if response.status_code == 404 or "User does not exist" in response.text:
return Result.available()
# 2. Explicitly verify the "taken" state and extract deep data
if response.status_code == 200 and "profile-data" in response.text:
extra = {}
match = re.search(r'<script id="profile-data">({.+?})</script>', response.text)
if match:
data = json.loads(match.group(1))
if "name" in data:
extra["fullname"] = data["name"]
if "location" in data:
extra["location"] = data["location"]
return Result.taken(extra=extra)
# 3. Graceful error handling for unexpected states (No bare else!)
return Result.error(f"Unexpected response status: {response.status_code}")
return generic_validate(url, process, headers=headers, show_url=show_url, follow_redirects=True)
2. status_validate (Discouraged)
- Purpose: Simple helper for sites where availability can be determined purely from HTTP status codes (e.g., 404 = available, 200 = taken).
- Warning: Use this only as a last resort if the site has absolutely no WAF and reliably returns strict HTTP codes without custom redirect/error pages. Modern sites heavily punish this approach.
Return values and error handling
- Always return a Result object:
Result.available()Result.taken(extra={"fullname": "..."})Result.error("short diagnostic message")
- The orchestrator captures network errors (
httpx.ConnectError,httpx.TimeoutException, etc.) and returnsResult.error(...)automatically. - NEVER use
raise Exception("..."). If you encounter an anomaly in yourprocessfunction, always returnResult.error("...")so the scanner can gracefully continue to the next module.
Style & linting
- Follow PEP8.
- Use type hints for validator signatures.
- Keep code readable and small.
- Add docstrings to explain non-obvious heuristics.
- Run linters and formatters before opening a PR (pre-commit is recommended).
Thank you for contributing!