Contributing to user-scanner

June 20, 2026 · View on GitHub

This project separates two kinds of checks:

Username availability checks (under user_scanner/user_scan/*) — synchronous validators that the main username scanner uses.
Email OSINT checks (under user_scanner/email_scan/) — asynchronous, multi-step flows that probe signup pages or email-focused APIs. Put email-focused modules in user_scanner/email_scan/ (subfolders like social/, dev/, community, creator etc. are fine — follow the existing tree).

Module naming for both `email_scan` and `user_scan` modules

File name must be the platform name in lowercase (no spaces or special characters).
- Examples: github.py, reddit.py, x.py, pinterest.py

Email-scan (email_scan) — guide for contributors

Minimal best-practices checklist for email modules

Put file in user_scanner/email_scan/<category>/service.py.
Export async def validate_<service>(email: str) -> Result.
Use httpx.AsyncClient for requests, with sensible timeouts and follow_redirects when needed.
Add a short docstring describing environment variables (api keys), rate limits, and responsible-use note (if required)

Example: Mastodon async example:

import httpx
import re
from user_scanner.core.result import Result


async def _check(email: str) -> Result:
    """
    Internal helper that performs the multi-step signup probe.

    This function demonstrates how to handle CSRF tokens, custom error
    messages (like IP bans), and passing the target URL back to Results.
    """
    # The display URL used for output and error reporting
    show_url = "https://mastodon.social"

    signup_url = f"{show_url}/auth/sign_up"
    post_url = f"{show_url}/auth"

    headers = {
        "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
                      "(KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36",
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
        "referer": f"{show_url}/explore",
        "origin": show_url,
    }

    async with httpx.AsyncClient(http2=True, headers=headers, follow_redirects=True) as client:
        try:
            # 1. Access the signup page to retrieve required CSRF tokens
            initial_resp = await client.get(signup_url, timeout=15.0)
            if initial_resp.status_code not in [200, 302]:
                return Result.error(f"Failed to access signup page: {initial_resp.status_code}", url=show_url)

            # Extract the CSRF/authenticity token from the HTML
            token_match = re.search(r'name="csrf-token" content="([^"]+)"', initial_resp.text)
            if not token_match:
                return Result.error("Could not find authenticity token", url=show_url)

            csrf_token = token_match.group(1)

            # 2. Prepare the probe payload with the email we want to check
            payload = {
                "authenticity_token": csrf_token,
                "user[account_attributes][username]": "no3motions_robot_020102",
                "user[email]": email,
                "user[password]": "Theleftalone@me",
                "user[password_confirmation]": "Theleftalone@me",
                "user[agreement]": "1",
                "button": ""
            }

            response = await client.post(post_url, data=payload, timeout=15.0)
            res_text = response.text
            res_status = response.status_code

            # 3. Analyze the response to determine account status
            if "has already been taken" in res_text:
                return Result.taken(url=show_url)

            elif "registration attempt has been blocked" in res_text:
                return Result.error("Your IP has been flagged by Mastodon", url=show_url)

            elif res_status == 429:
                return Result.error("Rate limited; try using the '-d' flag", url=show_url)

            elif res_status in [200, 302]:
                # If no 'taken' message is found and status is OK/Redirect, it's available
                return Result.available(url=show_url)

            else:
                return Result.error("Unexpected response body", url=show_url)

        except Exception as exc:
            # Always pass the url=show_url even in exceptions for clear reporting
            return Result.error(str(exc), url=show_url)


async def validate_mastodon(email: str) -> Result:
    """
    Public validator used by the email mode.

    All email modules must export a 'validate_<name>' function that
    returns a Result object.
    """
    return await _check(email)

Username availability check guide:

Validator function (user_scan/)

Each module must expose exactly one validator function named:

def validate_<sitename>(user: str) -> Result:
    ...

CRITICAL Rules for user_scan Modules:

Explicit Verification (No False Positives): Never rely solely on a generic HTTP 200 to assume availability. Many WAFs and CDNs intercept requests and return 200 OK. You MUST explicitly verify a unique string or JSON key for BOTH the taken and available states. Never use a bare else: return Result.available() block.
Deep Data Extraction: If the user is found, attempt to extract rich metadata (fullname, location, bio, stats) and return it via Result.taken(extra={"fullname": "John Doe", ...}).
Strict Error Handling: NEVER use raise Exception(). All unhandled states or unexpected status codes must return Result.error(f"Unexpected status code {resp.status_code}").
Use Orchestrator Helpers: Use generic_validate to standardize httpx logic, but write robust process callbacks.

Orchestrator helpers (user_scan)

To keep validators DRY, the repository provides helper functions in core/orchestrator.py.

1. generic_validate (Preferred)

Purpose: Run a request for a given URL and let a callback (process) inspect the httpx.Response and return a Result.
Use case: Highly recommended for all modern modules to inspect response content, prevent false positives, and parse out deep data.

Example robust module with deep data extraction:

from user_scanner.core.orchestrator import generic_validate
from user_scanner.core.result import Result
import re
import json

def validate_example(user: str) -> Result:
    url = f"https://www.example.com/{user}/profile"
    show_url = "https://www.example.com"
    headers = {"User-Agent": "Mozilla/5.0"}

    def process(response):
        # 1. Explicitly check for the "not found" state
        if response.status_code == 404 or "User does not exist" in response.text:
            return Result.available()
            
        # 2. Explicitly verify the "taken" state and extract deep data
        if response.status_code == 200 and "profile-data" in response.text:
            extra = {}
            match = re.search(r'<script id="profile-data">({.+?})</script>', response.text)
            if match:
                data = json.loads(match.group(1))
                if "name" in data:
                    extra["fullname"] = data["name"]
                if "location" in data:
                    extra["location"] = data["location"]
            return Result.taken(extra=extra)

        # 3. Graceful error handling for unexpected states (No bare else!)
        return Result.error(f"Unexpected response status: {response.status_code}")

    return generic_validate(url, process, headers=headers, show_url=show_url, follow_redirects=True)

2. status_validate (Discouraged)

Purpose: Simple helper for sites where availability can be determined purely from HTTP status codes (e.g., 404 = available, 200 = taken).
Warning: Use this only as a last resort if the site has absolutely no WAF and reliably returns strict HTTP codes without custom redirect/error pages. Modern sites heavily punish this approach.

Return values and error handling

Always return a Result object:
- Result.available()
- Result.taken(extra={"fullname": "..."})
- Result.error("short diagnostic message")
The orchestrator captures network errors (httpx.ConnectError, httpx.TimeoutException, etc.) and returns Result.error(...) automatically.
NEVER use raise Exception("..."). If you encounter an anomaly in your process function, always return Result.error("...") so the scanner can gracefully continue to the next module.

Style & linting

Follow PEP8.
Use type hints for validator signatures.
Keep code readable and small.
Add docstrings to explain non-obvious heuristics.
Run linters and formatters before opening a PR (pre-commit is recommended).

Thank you for contributing!