RedAmon Reconnaissance Module

June 5, 2026 Β· View on GitHub

Unmask the hidden before the world does.

An automated OSINT reconnaissance and vulnerability scanning framework combining multiple security tools for comprehensive target assessment.


Table of Contents


The recon module is fully containerized. All tools run inside Docker containers.

The easiest way to run recon is through the webapp UI, which provides:

  • Real-time log streaming
  • Phase progress tracking
  • Project-specific settings from PostgreSQL
  • Automatic Neo4j graph updates
# 1. Start all services
cd postgres_db && docker-compose up -d
cd ../graph_db && docker-compose up -d
cd ../recon_orchestrator && docker-compose up -d
cd ../webapp && npm run dev

# 2. Open http://localhost:3000/graph
# 3. Click "Start Recon" button

Option 2: CLI with Environment Variables

For standalone CLI usage without the webapp:

# 1. Build the container (first time only)
cd recon/
docker-compose build

# 2. Run a scan with target specified via environment variable
TARGET_DOMAIN=testphp.vulnweb.com docker-compose run --rm recon python /app/recon/main.py

Docker Environment Variables

Override default settings via environment variables:

# Run with custom target
TARGET_DOMAIN=example.com docker-compose run --rm recon python /app/recon/main.py

# Run with Tor anonymity
USE_TOR_FOR_RECON=true docker-compose run --rm recon python /app/recon/main.py

# Run specific modules only
SCAN_MODULES="domain_discovery,port_scan,http_probe" docker-compose run --rm recon python /app/recon/main.py

When to Rebuild

Change TypeAction Required
Python code (*.py) changesdocker-compose build
requirements.txt changesdocker-compose build --no-cache
Dockerfile changesdocker-compose build --no-cache
.env file changesNo rebuild needed (mounted as volume)

πŸ”— Recon Orchestrator Integration

When started from the webapp, the recon module is managed by the Recon Orchestrator service, which provides:

  • Container Lifecycle Management - Start/stop/monitor recon containers
  • Real-time Log Streaming - SSE-based log streaming to the frontend
  • Phase Detection - Automatic detection of scan phases from log output
  • Status Tracking - Track running/completed/error states per project

Configuration Hierarchy

Settings are resolved in the following order of precedence:

  1. Webapp API (Primary) - When PROJECT_ID and WEBAPP_API_URL environment variables are set:

    # Set by recon orchestrator when starting container
    PROJECT_ID=cml6xov4q0002h58pln96n20d
    WEBAPP_API_URL=http://localhost:3000
    

    The recon module fetches all 169+ configurable parameters from:

    GET /api/projects/{projectId}
    
  2. Environment Variables - Override individual settings:

    TARGET_DOMAIN=example.com docker-compose run --rm recon python /app/recon/main.py
    
  3. DEFAULT_SETTINGS (Fallback) - Built-in defaults in project_settings.py for CLI usage without webapp

project_settings.py

The project_settings.py module handles settings resolution:

from recon.project_settings import get_settings

# Returns dict with all settings from API or DEFAULT_SETTINGS fallback
settings = get_settings()

TARGET_DOMAIN = settings['TARGET_DOMAIN']
SUBDOMAIN_LIST = settings['SUBDOMAIN_LIST']
SCAN_MODULES = settings['SCAN_MODULES']
# ... all 169+ parameters

Orchestrator Communication Flow

sequenceDiagram
    participant Webapp as Webapp UI
    participant Orchestrator as Recon Orchestrator
    participant Recon as Recon Container
    participant API as Webapp API
    participant Neo4j as Neo4j

    Webapp->>Orchestrator: POST /recon/{projectId}/start
    Orchestrator->>Recon: docker run with PROJECT_ID, WEBAPP_API_URL
    Recon->>API: GET /api/projects/{projectId}
    API-->>Recon: Project settings (169+ params)
    Recon->>Recon: Execute scan pipeline
    Recon->>Neo4j: Update graph with results
    Orchestrator->>Webapp: SSE log stream
    Recon-->>Orchestrator: Container exits
    Orchestrator->>Webapp: Complete event

πŸ—οΈ Docker-in-Docker Architecture

The recon module uses a Docker-in-Docker (DinD) pattern where the main recon container orchestrates sibling containers for each scanning tool.

How It Works

The recon container shares the host's Docker daemon via a socket mount, meaning all containers are siblings managed by the same host Docker daemon.

flowchart TB
    subgraph Host["πŸ–₯️ HOST MACHINE"]
        subgraph DockerDaemon["Docker Daemon (dockerd)"]
            Socket["/var/run/docker.sock"]
        end

        subgraph Containers["Sibling Containers"]
            Recon["redamon-recon<br/>Python Orchestrator<br/>πŸ“‹ Coordinates all scans"]
            NaabuC["naabu<br/>projectdiscovery/naabu<br/>πŸ”Œ Port Scanner"]
            HttpxC["httpx<br/>projectdiscovery/httpx<br/>🌐 HTTP Prober"]
            NucleiC["nuclei<br/>projectdiscovery/nuclei<br/>🎯 Vuln Scanner"]
            KatanaC["katana<br/>projectdiscovery/katana<br/>πŸ•ΈοΈ Web Crawler"]
            GAUC["gau<br/>sxcurity/gau<br/>πŸ“š URL Archives"]
            PurednsC["puredns<br/>frost19k/puredns<br/>🧹 Wildcard Filter"]
        end

        Volume["πŸ“ Shared Volume<br/>recon/output/"]
    end

    Socket -.->|socket mount| Recon
    Recon -->|docker run| NaabuC
    Recon -->|docker run| HttpxC
    Recon -->|docker run| NucleiC
    Recon -->|docker run| KatanaC
    Recon -->|docker run| GAUC
    Recon -->|docker run| PurednsC

    NaabuC --> Volume
    HttpxC --> Volume
    NucleiC --> Volume
    KatanaC --> Volume
    GAUC --> Volume
    Recon --> Volume

Container Execution Flow (Parallelized)

The pipeline uses a fan-out / fan-in pattern with ThreadPoolExecutor to run independent modules concurrently, significantly reducing total scan time while respecting data dependencies between groups.

sequenceDiagram
    participant User
    participant Recon as redamon-recon
    participant Docker as Docker Daemon
    participant Naabu as naabu container
    participant Httpx as httpx container
    participant Katana as katana container
    participant GAU as gau container
    participant KR as kiterunner container
    participant Nuclei as nuclei container
    participant GraphBG as Graph DB (background)

    User->>Recon: docker-compose run recon python main.py
    activate Recon

    Note over Recon: GROUP 1 β€” Fan-Out (parallel)
    par WHOIS + Discovery + URLScan
        Recon->>Recon: WHOIS lookup
    and
        Recon->>Recon: 5 discovery tools in parallel<br/>(crt.sh βˆ₯ HackerTarget βˆ₯ Subfinder βˆ₯ Amass βˆ₯ Knockpy)
    and
        Recon->>Recon: URLScan.io enrichment
    end
    Note over Recon: Fan-In β€” merge results + Puredns wildcard filtering + DNS (20 parallel workers)
    Recon->>GraphBG: Background: domain discovery graph update

    Note over Recon,Naabu: GROUP 3 β€” Fan-Out (parallel)
    par Shodan + Port Scan
        Recon->>Recon: Shodan enrichment
    and
        Recon->>Docker: docker run naabu
        Docker->>Naabu: Start container
        activate Naabu
        Naabu-->>Recon: JSON output (open ports)
        deactivate Naabu
    end
    Note over Recon: Fan-In β€” merge Shodan + port scan
    Recon->>GraphBG: Background: shodan + port scan graph update

    Note over Recon,Httpx: GROUP 4 β€” HTTP Probe (sequential)
    Recon->>Docker: docker run httpx
    Docker->>Httpx: Start container
    activate Httpx
    Httpx-->>Recon: JSON output (live URLs + tech)
    deactivate Httpx
    Recon->>GraphBG: Background: http probe graph update

    Note over Recon,KR: GROUP 5 β€” Resource Enum (parallel + sequential)
    par Katana βˆ₯ Hakrawler βˆ₯ GAU βˆ₯ Kiterunner
        Recon->>Docker: docker run katana
        Docker->>Katana: Crawl live URLs
        Katana-->>Recon: endpoints
    and
        Recon->>Docker: docker run hakrawler
        Docker->>Hakrawler: DOM-aware crawl
        Hakrawler-->>Recon: links & forms
    and
        Recon->>Docker: docker run gau
        Docker->>GAU: Fetch archived URLs
        GAU-->>Recon: historical URLs
    and
        Recon->>Docker: docker run kiterunner
        Docker->>KR: API bruteforce
        KR-->>Recon: hidden APIs
    end
    Recon->>Recon: jsluice β€” extract URLs & secrets from JS files
    Recon->>Recon: FFuf β€” directory/endpoint fuzzing with wordlists
    Recon->>Recon: Merge & classify endpoints
    Recon->>GraphBG: Background: resource enum graph update

    Note over Recon,KR: GROUP 5b β€” JS Recon (if enabled)
    Recon->>Recon: Download JS files (parallel)
    Recon->>Recon: 100 regex patterns + key validation + source maps
    Recon->>Recon: Dependency confusion + endpoint extraction + DOM sinks
    Recon->>GraphBG: Background: js_recon graph update

    Note over Recon,Nuclei: GROUP 6 β€” Vuln Scan + MITRE
    Recon->>Docker: docker run nuclei
    Docker->>Nuclei: Start container
    activate Nuclei
    Nuclei-->>Recon: JSON output (vulns)
    deactivate Nuclei
    Recon->>Recon: MITRE CWE/CAPEC enrichment
    Recon->>GraphBG: Background: vuln scan graph update

    Note over Recon,GraphBG: Wait for all background graph updates
    Recon->>Recon: Save recon_domain.json
    Recon-->>User: Scan complete
    deactivate Recon

Why Docker-in-Docker?

BenefitDescription
IsolationEach tool runs in its own container with minimal dependencies
ConsistencySame tool versions regardless of host OS
No host pollutionGo binaries (naabu, httpx, nuclei) don't need to be installed on host
Easy updatesJust pull new Docker images to update tools
PortabilityWorks on any system with Docker installed

πŸ”„ Scanning Pipeline Overview

RedAmon executes scans in a parallelized pipeline using a fan-out / fan-in pattern. Independent modules within each group run concurrently via ThreadPoolExecutor, while groups that depend on prior results run sequentially. Graph DB updates happen in a dedicated background thread so the main pipeline is never blocked.

High-Level Pipeline

flowchart LR
    subgraph Input["πŸ“₯ Input"]
        Domain[🌐 Target Domain]
    end

    subgraph G1["GROUP 1 β€” parallel fan-out"]
        DD[WHOIS]
        SUB[Subdomain Discovery<br/>5 tools in parallel]
        URLSCAN[URLScan.io]
    end

    subgraph G3["GROUP 3 β€” parallel fan-out"]
        SHODAN[Shodan Enrichment]
        PS[Port Scan β€” Naabu]
    end

    subgraph G4["GROUP 4 β€” sequential"]
        HP[HTTP Probe<br/>Httpx + Wappalyzer]
    end

    subgraph G5["GROUP 5 β€” parallel + sequential"]
        RE[Resource Enum<br/>Katana βˆ₯ Hakrawler βˆ₯ GAU βˆ₯ ParamSpider βˆ₯ Kiterunner<br/>then jsluice β†’ FFuf β†’ Arjun]
    end

    subgraph G6["GROUP 6 Phase A β€” parallel (4-way fan-out)"]
        VS[Vuln Scan β€” Nuclei]
        GQL[GraphQL Security<br/>Introspection + graphql-cop]
        TKO[Subdomain Takeover<br/>Subjack + Nuclei + BadDNS]
        VHOST[VHost & SNI Enum<br/>L7 Host header + L4 SNI probing]
    end

    subgraph G6B["GROUP 6 Phase B β€” sequential"]
        MIT[MITRE Enrichment<br/>CWE + CAPEC]
    end

    subgraph Output["πŸ“€ Output"]
        JSON[(recon_domain.json)]
        Graph[(Neo4j Graph<br/>background updates)]
    end

    Domain --> G1
    G1 -->|fan-in: merge + puredns filter| G3
    G3 -->|fan-in: merge| G4
    G4 --> G5
    G5 --> G6
    G6 --> G6B
    G6B --> JSON
    JSON --> Graph

Detailed Module Flow (Parallelized)

The pipeline uses fan-out / fan-in concurrency: modules within each group run in parallel threads, and results are merged before the next group starts. Graph DB writes happen in a single-writer background thread that never blocks the main pipeline.

flowchart TB
    subgraph Phase1["GROUP 1 β€” Fan-Out: WHOIS + Discovery + URLScan (parallel)"]
        direction TB
        Start([🌐 TARGET_DOMAIN]) --> FanOut1

        subgraph FanOut1["ThreadPoolExecutor β€” 3 parallel tasks"]
            direction LR
            WHOIS[WHOIS Lookup<br/>Registrar, dates, contacts]
            SubD[Subdomain Discovery]
            URLScanE[URLScan.io Enrichment<br/>Historical scans]
        end

        subgraph SubSources["5 Discovery Tools (parallel β€” ThreadPoolExecutor)"]
            CRT[crt.sh<br/>Certificate Transparency]
            HT[HackerTarget API<br/>DNS records]
            SF[Subfinder<br/>50+ passive sources]
            Amass[Amass<br/>50+ data sources]
            Knock[Knockpy<br/>Bruteforce]
        end

        SubD --> CRT
        SubD --> HT
        SubD --> SF
        SubD --> Amass
        SubD --> Knock

        CRT --> Merge[Fan-In: Merge & Dedupe]
        HT --> Merge
        SF --> Merge
        Amass --> Merge
        Knock --> Merge

        Merge --> Puredns[Puredns Wildcard Filter<br/>Validates against public resolvers<br/>Removes wildcards & poisoned entries]
        Puredns --> DNS[DNS Resolution<br/>20 parallel workers<br/>A, AAAA, MX, NS, TXT, CNAME]
        DNS --> Out1[(Subdomains + IPs)]
    end

    subgraph Phase2["GROUP 3 β€” Fan-Out: Shodan + Port Scan (parallel)"]
        direction TB
        Out1 --> FanOut3

        subgraph FanOut3["ThreadPoolExecutor β€” 2 parallel tasks"]
            direction LR
            ShodanE[Shodan Enrichment<br/>Host, DNS, CVEs]
            Naabu[Naabu Port Scanner<br/>SYN/CONNECT/Passive]
        end

        FanOut3 --> Out2[Fan-In: Merge Shodan + Ports]
    end

    subgraph Phase3["GROUP 4 β€” HTTP Probing (sequential, internally parallel)"]
        direction TB
        Out2 --> Httpx[Httpx HTTP Prober]

        subgraph HttpxFeatures["Detection Features"]
            Live[Live URL Check<br/>Status codes]
            Tech[Technology Detection<br/>Wappalyzer enhanced]
            TLS[TLS/SSL Analysis<br/>Certs, ciphers]
            Headers[Header Analysis<br/>Security headers]
        end

        Httpx --> Live
        Httpx --> Tech
        Httpx --> TLS
        Httpx --> Headers

        Live --> Out3[(Live URLs + Tech Stack)]
        Tech --> Out3
        TLS --> Out3
        Headers --> Out3
    end

    subgraph Phase4["GROUP 5 β€” Resource Enumeration (internally parallel)"]
        direction TB
        Out3 --> ResEnum[Resource Enumeration]

        subgraph EnumTools["3 Tools in Parallel"]
            Katana[Katana<br/>Active Crawling<br/>Current endpoints]
            GAU[GAU<br/>Passive Archives<br/>Historical URLs]
            KR[Kiterunner<br/>API Bruteforce<br/>Hidden endpoints]
        end

        ResEnum --> Katana
        ResEnum --> GAU
        ResEnum --> KR

        Katana --> MergeURL[Merge & Classify]
        GAU --> MergeURL
        KR --> MergeURL

        MergeURL --> Out4[(Endpoints + Parameters)]
    end

    subgraph Phase4b["GROUP 5b β€” JS Recon (if enabled)"]
        direction TB
        Out4 --> JsRecon[JS Recon Scanner]

        subgraph JsModules["5 Analyzers in Parallel"]
            Patterns[Secret Detection<br/>100 regex patterns]
            SrcMap[Source Map<br/>Discovery & Analysis]
            DepConf[Dependency<br/>Confusion Check]
            EpExtract[Endpoint<br/>Extraction]
            FwSink[Framework +<br/>DOM Sink Detection]
        end

        JsRecon --> Patterns
        JsRecon --> SrcMap
        JsRecon --> DepConf
        JsRecon --> EpExtract
        JsRecon --> FwSink

        Patterns --> KeyVal[Key Validation<br/>21 service validators]
        KeyVal --> Out4b[(JS Findings + Secrets + Endpoints)]
        SrcMap --> Out4b
        DepConf --> Out4b
        EpExtract --> Out4b
        FwSink --> Out4b
    end

    subgraph Phase5["GROUP 6 Phase A β€” Fan-Out: Nuclei βˆ₯ GraphQL βˆ₯ Takeover βˆ₯ VHost/SNI (parallel)"]
        direction TB
        Out4b --> FanOut6

        subgraph FanOut6["ThreadPoolExecutor β€” 4 parallel tasks (_isolated wrappers, deep-copy snapshots)"]
            direction LR
            Nuclei[Nuclei Scanner]
            GraphQL[GraphQL Security Scanner]
            Takeover[Subdomain Takeover Scanner]
            VhostSni[VHost & SNI Enum]
        end

        subgraph NucleiFeatures["Nuclei Scan Types"]
            CVE[CVE Detection<br/>Known vulnerabilities]
            DAST[DAST Fuzzing<br/>XSS, SQLi, SSTI]
            Misconfig[Misconfiguration<br/>Exposed panels, defaults]
            Info[Info Disclosure<br/>Backup files, .git]
        end

        subgraph GraphQLFeatures["GraphQL Tests"]
            Discover[Endpoint Discovery<br/>pattern + evidence probing]
            Intros[Introspection Test<br/>schema + sensitive fields]
            Cop[graphql-cop<br/>12 misconfig checks]
        end

        subgraph TakeoverFeatures["Takeover Tools"]
            Subjack[Subjack<br/>DNS-first fingerprints]
            NucTkO[Nuclei takeover templates<br/>http/takeovers + dns]
            BadDNS[BadDNS sidecar<br/>CNAME/NS/MX/TXT/SPF/wildcard]
        end

        subgraph VhostSniFeatures["VHost & SNI Probing"]
            L7[L7 probe<br/>Host header override]
            L4[L4 probe<br/>TLS SNI via --resolve]
            Anomaly[Anomaly oracle<br/>vs baseline status/size]
        end

        Nuclei --> CVE
        Nuclei --> DAST
        Nuclei --> Misconfig
        Nuclei --> Info

        GraphQL --> Discover
        GraphQL --> Intros
        GraphQL --> Cop

        Takeover --> Subjack
        Takeover --> NucTkO
        Takeover --> BadDNS

        VhostSni --> L7
        VhostSni --> L4
        VhostSni --> Anomaly

        CVE --> FanIn6
        DAST --> FanIn6
        Misconfig --> FanIn6
        Info --> FanIn6
        Discover --> FanIn6
        Intros --> FanIn6
        Cop --> FanIn6
        Subjack --> FanIn6
        NucTkO --> FanIn6
        BadDNS --> FanIn6
        L7 --> FanIn6
        L4 --> FanIn6
        Anomaly --> FanIn6
    end

    subgraph Phase5b["GROUP 6 Phase B β€” MITRE Enrichment (sequential, depends on Nuclei CVEs)"]
        FanIn6[Fan-In: Merge Nuclei + GraphQL + Takeover + VHost/SNI findings] --> MITRE[MITRE Enrichment<br/>CWE + CAPEC]
        MITRE --> Out5[(Vulnerabilities + Attack Patterns)]
    end

    subgraph Phase6["Phase 6: GitHub Hunting"]
        direction TB
        Out5 --> GitHub[GitHub Secret Hunter]

        subgraph Secrets["Secret Types"]
            API[API Keys<br/>AWS, GCP, Stripe]
            Creds[Credentials<br/>Passwords, tokens]
            Keys[Private Keys<br/>SSH, SSL]
            DB[Database Strings<br/>Connection strings]
        end

        GitHub --> API
        GitHub --> Creds
        GitHub --> Keys
        GitHub --> DB

        API --> Out6[(Exposed Secrets)]
        Creds --> Out6
        Keys --> Out6
        DB --> Out6
    end

    subgraph FinalOutput["πŸ“€ Final Output"]
        Out6 --> FinalJSON[(recon_domain.json)]
        FinalJSON --> Neo4j[(Neo4j Graph DB<br/>background thread writes)]
    end

Data Enrichment Flow

flowchart LR
    subgraph Discovery["Discovery Phase"]
        Sub[Subdomains] --> IP[IP Addresses]
        IP --> Port[Open Ports]
        Port --> Service[Services]
    end

    subgraph Analysis["Analysis Phase"]
        Service --> URL[Live URLs]
        URL --> Tech[Technologies]
        Tech --> Endpoint[Endpoints]
    end

    subgraph Assessment["Assessment Phase"]
        Endpoint --> Vuln[Vulnerabilities]
        Vuln --> CVE[CVE IDs]
        CVE --> CWE[CWE Weaknesses]
        CWE --> CAPEC[CAPEC Attacks]
    end

    subgraph Graph["Graph Storage"]
        CAPEC --> Neo4j[(Neo4j)]
    end

⚑ Parallelization Architecture

The recon pipeline uses a fan-out / fan-in pattern with Python's concurrent.futures.ThreadPoolExecutor to run independent modules concurrently, significantly reducing total scan time while respecting data dependencies.

Execution Groups

GroupModulesParallelismDependencies
GROUP 1WHOIS + Subdomain Discovery + URLScan3 parallel tasksOnly needs root_domain
Discoverycrt.sh + HackerTarget + Subfinder + Amass + Knockpy5 parallel toolsPart of GROUP 1
PurednsWildcard filtering (validates against public resolvers)SequentialAfter discovery fan-in, before DNS
DNSDNS resolution for all subdomains20 parallel workersAfter puredns filtering
GROUP 2bUncover Target Expansion (13 search engines)SequentialNeeds domain from GROUP 1; runs before GROUP 3
GROUP 3Shodan Enrichment + Port Scan (Naabu)2 parallel tasksNeeds IPs from GROUP 1 + GROUP 2b
GROUP 3bOSINT Enrichment (Censys + FOFA + OTX + Netlas + VirusTotal + ZoomEye + CriminalIP)7 parallel tasksNeeds IPs/domains from GROUP 1; runs concurrently with GROUP 3
GROUP 4HTTP Probe (httpx)Sequential (internally parallel)Needs ports from GROUP 3
GROUP 5Resource Enum (Katana + GAU + Kiterunner)3 tools internally parallelNeeds live URLs from GROUP 4
GROUP 5bJS Recon Scanner (100 regex patterns, key validation, source maps, dependency confusion, endpoint extraction, DOM sinks)5 analyzers parallel per fileNeeds JS files from GROUP 5; runs if JS_RECON_ENABLED
GROUP 6 Phase AVuln Scan (Nuclei) βˆ₯ GraphQL Security Testing βˆ₯ Subdomain Takeover βˆ₯ VHost & SNI Enumeration4 parallel tasks (ThreadPoolExecutor, _isolated wrappers)Needs endpoints from GROUP 5/5b
GROUP 6 Phase BMITRE Enrichment (CWE + CAPEC)SequentialNeeds CVEs from Phase A (Nuclei)

Background Graph DB Updates

All Neo4j graph updates run in a dedicated single-writer background thread (ThreadPoolExecutor(max_workers=1)). The main pipeline submits deep-copy snapshots of recon data and continues immediately. A final _graph_wait_all() ensures all updates complete before the pipeline exits.

Structured Logging

All log messages use a consistent [level][Module] prefix format (e.g., [+][crt.sh] Found 42 subdomains) for clarity when multiple tools produce interleaved output from concurrent threads.

Thread Safety

Each parallelized tool function is thread-safe:

  • Discovery tools (query_crtsh, query_hackertarget, etc.) create their own requests.Session instances
  • Module _isolated variants (e.g., run_port_scan_isolated, run_shodan_enrichment_isolated) accept a read-only snapshot of combined_result and return only their data section
  • The main thread handles all merging β€” no shared mutable state between workers

🎯 Partial Recon

Partial Recon lets you run any single tool from the pipeline independently, without triggering a full scan. From the Workflow View or section headers, click the play button on any tool to open a modal that shows existing graph data (subdomains, IPs, ports, BaseURLs, endpoints), accepts custom targets, and launches the tool in isolation. Results are merged into the existing Neo4j graph via MERGE -- no duplicates. All 23 pipeline tools are supported (including graphql_scan -- custom URLs are validated against project scope and injected via GRAPHQL_ENDPOINTS -- and vhost_sni -- custom subdomains and IPs are validated and injected before probing). The tool runs with the project's saved settings (timeouts, wordlists, API keys, proxy, Tor). Custom inputs are validated in real time (scope checks, IP/CIDR format, port ranges). See recon/partial_recon.py for the implementation.

Wiki: Recon Pipeline Workflow -- Partial Recon


πŸ“‹ Scan Modules Explained

Configure Which Modules to Run

Configure via the webapp project settings or environment variables:

# Run all modules (recommended for full assessment)
SCAN_MODULES="domain_discovery,port_scan,http_probe,resource_enum,vuln_scan"

# Quick recon only (no vulnerability scanning)
SCAN_MODULES="domain_discovery"

# Port scan + HTTP probing (skip vulnerability scanning)
SCAN_MODULES="domain_discovery,port_scan,http_probe"

Module 1: domain_discovery

All 5 subdomain discovery tools run concurrently via ThreadPoolExecutor(max_workers=5). Each tool is a thread-safe function with its own HTTP session. After merging, Puredns validates the combined list against public DNS resolvers to remove wildcard and DNS-poisoned entries. DNS resolution is then parallelized with 20 concurrent workers. WHOIS and URLScan run in a separate parallel group alongside discovery.

flowchart LR
    subgraph Input
        Domain[example.com]
    end

    subgraph Discovery["Domain Discovery (5 tools in parallel)"]
        CRT[crt.sh<br/>CT logs]
        HT[HackerTarget<br/>DNS search]
        SF[Subfinder<br/>50+ sources]
        Amass[Amass<br/>50+ data sources]
        Knock[Knockpy<br/>Bruteforce]
    end

    subgraph Merge["Fan-In"]
        MergeDedupe[Merge & Dedupe]
    end

    subgraph DNSPhase["DNS Resolution (20 parallel workers)"]
        DNS[DNS Resolver<br/>A, AAAA, MX, NS, TXT, CNAME]
    end

    subgraph Output
        Subs[Subdomains]
        IPs[IP Addresses]
        Records[DNS Records]
    end

    Domain --> CRT
    Domain --> HT
    Domain --> SF
    Domain --> Amass
    Domain --> Knock

    CRT --> MergeDedupe
    HT --> MergeDedupe
    SF --> MergeDedupe
    Amass --> MergeDedupe
    Knock --> MergeDedupe

    MergeDedupe --> PD[Puredns<br/>Wildcard Filter]
    PD --> DNS

    DNS --> Subs
    DNS --> IPs
    DNS --> Records
What It DoesOutput
WHOIS lookupRegistrar, creation date, owner info
Subdomain discoveryFinds subdomains via 5 parallel sources (crt.sh, HackerTarget, Subfinder, Amass, Knockpy)
Wildcard filteringPuredns validates subdomains against public DNS resolvers, removes wildcards and DNS-poisoned entries
DNS enumerationA, AAAA, MX, NS, TXT, CNAME records (20 parallel workers)
IP resolutionMaps all discovered hostnames to IPs

πŸ“– Key Parameters:

TARGET_DOMAIN = "example.com"           # Root domain
SUBDOMAIN_LIST = []                     # Empty = discover ALL
USE_BRUTEFORCE_FOR_SUBDOMAINS = False   # Brute force mode

Module 2: port_scan

flowchart LR
    subgraph Input
        IPs[IP Addresses]
    end

    subgraph Scanner["Naabu Scanner"]
        SYN[SYN Scan]
        Service[Service Detection]
        CDN[CDN Detection]
    end

    subgraph Output
        Ports[Open Ports]
        Services[Service Names]
        CDNInfo[CDN/WAF Info]
    end

    IPs --> SYN
    SYN --> Service
    Service --> CDN
    CDN --> Ports
    CDN --> Services
    CDN --> CDNInfo
What It FindsExamples
Open ports22/SSH, 80/HTTP, 443/HTTPS, 3306/MySQL
CDN detectionCloudflare, Akamai, Fastly
Service hintsCommon service identification

πŸ“– Key Parameters:

NAABU_TOP_PORTS = "1000"        # Number of top ports
NAABU_RATE_LIMIT = 1000         # Packets per second
NAABU_SCAN_TYPE = "s"           # SYN scan

πŸ“– Detailed documentation: readmes/README.PORT_SCAN.md


Module 3: http_probe

flowchart LR
    subgraph Input
        URLs[Target URLs<br/>from port scan]
    end

    subgraph Httpx["Httpx Prober"]
        Probe[HTTP/S Requests]
        Tech[Technology Detection]
        TLS[TLS Analysis]
        Headers[Header Extraction]
    end

    subgraph Wappalyzer["Wappalyzer Enhancement"]
        CMS[CMS Detection]
        Plugins[Plugin Detection]
        Analytics[Analytics Tools]
    end

    subgraph Output
        Live[Live URLs]
        Stack[Tech Stack]
        Certs[Certificates]
        SecHeaders[Security Headers]
    end

    URLs --> Probe
    Probe --> Tech
    Probe --> TLS
    Probe --> Headers

    Tech --> Wappalyzer
    Wappalyzer --> CMS
    Wappalyzer --> Plugins
    Wappalyzer --> Analytics

    CMS --> Live
    Plugins --> Stack
    Analytics --> Stack
    TLS --> Certs
    Headers --> SecHeaders
What It FindsExamples
Live URLsWhich endpoints are responding
TechnologiesWordPress, nginx, PHP, React
CMS PluginsYoast SEO, WooCommerce (via Wappalyzer)
TLS certificatesIssuer, expiry, SANs

πŸ“– Detailed documentation: readmes/README.HTTP_PROBE.md


Module 4: resource_enum

flowchart TB
    subgraph Input
        URLs[Live URLs]
    end

    subgraph Parallel["Parallel Execution"]
        subgraph Active["Active Discovery"]
            Katana[πŸ•ΈοΈ Katana<br/>Web Crawler<br/>Current endpoints]
            Hakrawler[πŸ”— Hakrawler<br/>DOM-aware Crawler<br/>Links & Forms]
        end

        subgraph Passive["Passive Discovery"]
            GAU[πŸ“š GAU<br/>Archive Search<br/>Historical URLs]
        end

        subgraph Bruteforce["API Discovery"]
            KR[πŸ”‘ Kiterunner<br/>Swagger Specs<br/>Hidden APIs]
        end
    end

    subgraph JSAnalysis["Sequential JS Analysis"]
        jsluice[πŸ” jsluice<br/>JS URL & Secret Extraction]
    end

    subgraph Merge["Merge & Classify"]
        Dedup[Deduplicate]
        Classify[Classify Endpoints<br/>API, Admin, Form, Static]
        Parse[Parse Parameters]
    end

    subgraph Output
        Endpoints[All Endpoints]
        Forms[Forms + Inputs]
        APIs[API Routes]
        Secrets[Secrets & API Keys]
    end

    URLs --> Katana
    URLs --> Hakrawler
    URLs --> GAU
    URLs --> KR

    Katana --> jsluice
    Hakrawler --> jsluice
    Katana --> Dedup
    Hakrawler --> Dedup
    GAU --> Dedup
    KR --> Dedup
    jsluice --> Dedup

    Dedup --> Classify
    Classify --> Parse

    Parse --> Endpoints
    Parse --> Forms
    Parse --> APIs
    Parse --> Secrets
ToolMethodWhat It Finds
KatanaActive crawlingCurrent live endpoints
HakrawlerActive crawlingLinks, forms, DOM-discovered URLs
GAUPassive archivesHistorical/deleted pages
KiterunnerAPI bruteforceHidden API routes
jsluiceJS analysis (active download)URLs, endpoints, and secrets from JS files

πŸ“– Detailed documentation: readmes/README.RESOURCE_ENUM.md


Module 4.5: ai_surface_recon

The detection / fingerprinting half of the adversarial-AI pipeline. Runs as Phase 4.5 (after resource_enum, before vuln_scan), gated on AI_SURFACE_RECON_ENABLED (not on SCAN_MODULES). It sends benign, protocol-aware shape-probes to confirm and characterize AI/LLM infrastructure that generic web recon can't see β€” and writes the results onto the graph as ai_* property annotations plus MCP tool-poisoning Vulnerability nodes. It never jailbreaks, injects, or fuzzes, and never presents credentials.

Seven workloads run per host (hosts in parallel, workloads sequential within a host):

#WorkloadConfirms / extracts
1Chat-shape probeLLM chat endpoint, dialect, streaming, p50 latency
2MCP handshake + tools/list + YARAMCP server, tools, capabilities, tool-poisoning findings
3OpenAPI / manifest / model listingtool/vision/streaming support, model ids + family, cached spec
4Julius probe pack (YAML)confirmed AI Technology (vLLM, Ollama, …)
5Vector-DB confirmation readsqdrant / chroma / weaviate / milvus, unauthenticated read exposure
6 / 7latency baseline + summary gluecounters, merged model family

Input: resource_enum.by_base_url (classified endpoints), http_probe.by_url (AI flags), port_scan.by_host (AI + vector-DB ports). Output: combined_result["ai_surface_recon"] β†’ graph via update_graph_from_ai_surface_recon (enriches Endpoint / Parameter / Technology, adds Vulnerability nodes). Heavy deps (mcp, yara, prance, jq, PyYAML) are lazy-imported, so a missing dep degrades one workload, not the job.

πŸ“– Detailed documentation: readmes/AI_SURFACE_RECON_MODULE.md


Module 5: vuln_scan

flowchart TB
    subgraph Input
        Endpoints[All Endpoints]
        Tech[Technology Stack]
    end

    subgraph Nuclei["Nuclei Scanner"]
        Templates[9000+ Templates]

        subgraph ScanTypes["Scan Types"]
            CVEScan[CVE Detection<br/>Known vulns]
            DAST[DAST Fuzzing<br/>XSS, SQLi, SSTI]
            Misconfig[Misconfiguration<br/>Exposed panels]
            InfoLeak[Info Disclosure<br/>.git, backups]
        end
    end

    subgraph CVELookup["CVE Lookup"]
        NVD[Query NVD<br/>by technology version]
        Match[Match CVEs<br/>nginx:1.19 β†’ CVE-2021-23017]
    end

    subgraph MITRE["MITRE Enrichment"]
        CWE[CWE Weaknesses<br/>Weakness hierarchy]
        CAPEC[CAPEC Patterns<br/>Attack techniques]
    end

    subgraph Output
        Vulns[Vulnerabilities]
        CVEs[CVE Details]
        Attacks[Attack Patterns]
    end

    Endpoints --> Templates
    Tech --> CVELookup

    Templates --> CVEScan
    Templates --> DAST
    Templates --> Misconfig
    Templates --> InfoLeak

    CVEScan --> MITRE
    DAST --> MITRE
    CVELookup --> NVD
    NVD --> Match
    Match --> MITRE

    MITRE --> CWE
    CWE --> CAPEC

    Misconfig --> Vulns
    InfoLeak --> Vulns
    CAPEC --> CVEs
    CAPEC --> Attacks
What It FindsExamples
Web CVEsLog4Shell, Spring4Shell
Injection flawsSQL injection, XSS
MisconfigurationsExposed admin panels
CWE WeaknessesWeakness hierarchy
CAPEC AttacksAttack techniques

πŸ“– Detailed documentation: readmes/README.VULN_SCAN.md | readmes/README.MITRE.md


Module 5b: graphql_scan

Dedicated GraphQL security scanner -- runs as GROUP 6 Phase A in parallel with Nuclei, Subdomain Takeover, and VHost & SNI Enumeration because all four scanners consume BaseURL / Endpoint / Technology / Subdomain / IP and emit Vulnerability nodes but have zero data dependency on each other. Each phase-A tool is wrapped in an _isolated variant that deep-copies combined_result so the four threads never race on the shared dict.

Toggle: GRAPHQL_SECURITY_ENABLED (default: false, opt-in).

flowchart TB
    subgraph Input
        BaseURLs[BaseURLs]
        Endpoints[Endpoints]
        JSFindings[JS Recon findings<br/>graphql/graphql_introspection]
        UserURLs[User-specified URLs<br/>GRAPHQL_ENDPOINTS]
    end

    subgraph Discovery["Endpoint Discovery"]
        HTTPProbe[HTTP probe matches<br/>Content-Type: application/graphql]
        ResourceEnum[Resource enum paths<br/>/graphql, /gql, /query POST]
        Pattern[Pattern probing<br/>/graphql, /api/graphql,<br/>/v1/graphql, /v2/graphql]
        Secondary[Secondary patterns<br/>/query, /graphiql, /playground<br/>only on bases with evidence]
    end

    subgraph Filter["RoE Filter"]
        ROE[ROE_EXCLUDED_HOSTS<br/>*.example.com wildcards]
    end

    subgraph NativeTest["Native Introspection Test"]
        Probe[POST __typename<br/>reachability check]
        Simple[Simple introspection<br/>queryType/mutationType]
        Deep[Full introspection<br/>TypeRef depth 1-20]
        Sens[Sensitive field detection<br/>password, token, ssn, cvv...]
    end

    subgraph CopTest["graphql-cop Docker<br/>dolevf/graphql-cop:1.14"]
        Cop1[field_suggestions]
        Cop2[detect_graphiql]
        Cop3[get_method_support]
        Cop4[trace_mode]
        Cop5[alias_overloading DoS]
        Cop6[batch_query DoS]
        Cop7[directive_overloading DoS]
        Cop8[circular_introspection DoS]
        Cop9[get_based_mutation]
        Cop10[post_based_csrf]
        Cop11[unhandled_error_detection]
    end

    subgraph Output
        Vulns[Vulnerability nodes]
        Flags[Endpoint capability flags<br/>graphql_graphiql_exposed,<br/>graphql_tracing_enabled,<br/>graphql_get_allowed, etc.]
        Schema[Schema hash + operation counts<br/>queries/mutations/subscriptions]
    end

    BaseURLs --> Discovery
    Endpoints --> Discovery
    JSFindings --> Discovery
    UserURLs --> Discovery

    Discovery --> HTTPProbe
    Discovery --> ResourceEnum
    Discovery --> Pattern
    Discovery --> Secondary

    HTTPProbe --> Filter
    ResourceEnum --> Filter
    Pattern --> Filter
    Secondary --> Filter

    Filter --> ROE
    ROE --> NativeTest
    ROE --> CopTest

    NativeTest --> Probe
    Probe --> Simple
    Simple --> Deep
    Deep --> Sens

    NativeTest --> Vulns
    NativeTest --> Schema
    CopTest --> Vulns
    CopTest --> Flags

Capabilities:

LayerWhat It Does
Endpoint discoveryMerges user-specified URLs, HTTP probe matches (Content-Type: application/graphql), resource-enum endpoints (paths containing graphql/gql/query via POST, or with query/mutation/variables/operationName parameters), JS Recon findings (graphql / graphql_introspection types), and pattern probes (primary /graphql, /api/graphql, /v1/graphql, /v2/graphql; secondary /query, /gql, /graphiql, /playground only on bases with prior evidence).
RoE filteringDrops endpoints matching ROE_EXCLUDED_HOSTS (wildcards supported). Skipped count exposed in summary.endpoints_skipped.
Native introspection test3-step probe per endpoint: POST { __typename } reachability, simple introspection, full introspection at configurable TypeRef depth (1-20, default 10). Extracts schema hash (16-char SHA256), query/mutation/subscription counts, and sensitive-field list (password, secret, token, key, api, private, credential, auth, ssn, credit, card, payment, bank, account, pin, cvv, salary, medical). Response larger than 10 MB falls back to simple result.
graphql-cop (opt-in)Docker-in-Docker wrapper around dolevf/graphql-cop:1.14. Runs 12 checks per endpoint: field suggestions, GraphiQL/Playground detection, trace mode, GET-method queries/mutations, POST url-encoded CSRF, alias overloading (DoS), batch query (DoS), directive overloading (DoS), circular introspection (DoS), unhandled errors. Uses --network host + -T flag when Tor is enabled; forwards HTTP_PROXY via -x. Per-test toggles are applied post-execution because the 1.14 image does not honor the -e flag.
Authentication5 modes: bearer, cookie, header (custom name), basic (base64), apikey. Values masked in logs. Same headers propagate to graphql-cop via -H '{"K":"V"}' JSON args.
Rate limiting + retriesGlobal RPS cap (GRAPHQL_RATE_LIMIT, 0-100), retry on 429/5xx with exponential backoff (GRAPHQL_RETRY_COUNT, GRAPHQL_RETRY_BACKOFF), concurrency clamp (1-20). Sequential mode at concurrency=1.
Endpoint enrichmentUpdates existing Endpoint nodes with capability flags (graphql_graphiql_exposed, graphql_tracing_enabled, graphql_get_allowed, graphql_field_suggestions_enabled, graphql_batching_enabled, graphql_cop_ran) β€” recorded even on negative results so the graph captures server state explicitly.

Stealth overrides: GRAPHQL_RATE_LIMIT=2, GRAPHQL_CONCURRENCY=1, GRAPHQL_TIMEOUT=60, and the four DoS-class graphql-cop tests (alias / batch / directive / circular) forced off.

Schema contract: KNOWN_VULN_KEYS + KNOWN_ENDPOINT_INFO_KEYS in graph_db/mixins/graphql_mixin.py pin every field the scanner may emit. Adding a new key without updating the mixin triggers a warning at graph-ingest time β€” no silent drops.

Source layout:

recon/graphql_scan/
β”œβ”€β”€ __init__.py           # Package entry point
β”œβ”€β”€ scanner.py            # Orchestration (run_graphql_scan, run_graphql_scan_isolated, test_single_endpoint)
β”œβ”€β”€ discovery.py          # Endpoint discovery from 5 sources + RoE filtering
β”œβ”€β”€ introspection.py      # Configurable-depth introspection query + schema operations
β”œβ”€β”€ misconfig.py          # graphql-cop Docker-in-Docker wrapper (12 tests)
β”œβ”€β”€ normalizers.py        # Finding normalization + severity aggregation
└── auth.py               # 5 auth modes (bearer/cookie/header/basic/apikey) with masked logs

πŸ“– Detailed documentation: readmes/GRAPH.SCHEMA.md β€” GraphQL-specific Endpoint & Vulnerability properties | Wiki: GraphQL Security Testing


Module 5d: vhost_sni_enum

Hidden-virtual-host discovery scanner -- runs as the fourth GROUP 6 Phase A sibling alongside Nuclei, GraphQL Scan, and Subdomain Takeover. Probes every target IP twice per candidate hostname: an L7 test that overrides the HTTP Host header and an L4 test that pins TLS SNI via curl --resolve. Each probe is compared against a per-port baseline raw-IP request; a status-code change OR a body-size delta beyond VHOST_SNI_BASELINE_SIZE_TOLERANCE (default 50 bytes) flags an anomaly. L4 catches modern reverse proxies (NGINX ingress, Traefik, Cloudflare, k8s) that route at the TLS handshake before reading any HTTP header; L7 catches classic Apache / Nginx vhost routing.

Toggle: VHOST_SNI_ENABLED (default: false, opt-in). Zero new binaries -- relies on curl already in the recon image.

flowchart TB
    subgraph Input
        IPs[IPs from port_scan + DNS]
        DefaultWL[Default wordlist<br/>2,471 entries]
        CustomWL[Custom wordlist setting]
        GraphCands[Graph-derived candidates<br/>DNS subdomains, httpx hosts,<br/>TLS SANs, CNAME, PTR, co-resident]
    end

    subgraph CandidateSet["Candidate Set Build"]
        Merge[Merge + dedupe + hostname validation<br/>regex with \Z to block newline injection]
        Cap[Deterministic sort + slice<br/>VHOST_SNI_MAX_CANDIDATES_PER_IP]
    end

    subgraph Baseline["Per-Port Baseline"]
        BaseReq[curl raw-IP request<br/>no Host override, no SNI swap]
    end

    subgraph Probing["Per-Candidate Probing (ThreadPoolExecutor)"]
        L7Probe[L7 probe<br/>-H 'Host: candidate']
        L4Probe[L4 probe<br/>--resolve candidate:port:ip<br/>https only]
    end

    subgraph Anomaly["Anomaly Oracle"]
        StatusDiff[Status code differs?]
        SizeDiff[Body size differs<br/>beyond tolerance?]
    end

    subgraph Severity["Severity Ladder"]
        High[high: L7 vs L4 disagree<br/>routing inconsistency / proxy bypass]
        Med[medium: matches INTERNAL_KEYWORDS<br/>~80 entries: admin, jenkins, vault, ...]
        Low[low: confirmed hidden vhost<br/>different status]
        Info[info: size-delta only]
    end

    subgraph Output
        Vulns[Vulnerability nodes<br/>hidden_vhost / hidden_sni_route / host_header_bypass]
        SubEnrich[Subdomain enrichment<br/>vhost_tested, vhost_hidden, sni_routed, ...]
        IPEnrich[IP enrichment<br/>is_reverse_proxy, hidden_vhost_count, ...]
        Feedback[Discovery feedback loop<br/>inject discovered vhost into http_probe.by_url]
    end

    DefaultWL --> Merge
    CustomWL --> Merge
    GraphCands --> Merge
    Merge --> Cap

    IPs --> BaseReq
    Cap --> L7Probe
    Cap --> L4Probe
    BaseReq --> StatusDiff
    BaseReq --> SizeDiff
    L7Probe --> StatusDiff
    L7Probe --> SizeDiff
    L4Probe --> StatusDiff
    L4Probe --> SizeDiff

    StatusDiff --> High
    StatusDiff --> Med
    StatusDiff --> Low
    SizeDiff --> Info

    High --> Vulns
    Med --> Vulns
    Low --> Vulns
    Info --> Vulns

    Vulns --> SubEnrich
    Vulns --> IPEnrich
    Vulns --> Feedback

Capabilities:

LayerWhat It Does
IP target collectionMerges (no fallback) every available IP source: port_scan.by_host (authoritative ports + per-port scheme overrides), then dns.subdomains[*].ips.ipv4 and dns.domain.ips.ipv4 get default 80/443 added. Per-IP port list deduped on (port, scheme).
Candidate set buildSix sources merged: default wordlist (2,471 entries shipped in-image), custom wordlist setting, DNS subdomains resolving to the IP, httpx-known hosts on the IP, TLS SAN list per-URL, and CNAME/PTR/co-resident external domains. UTF-8 BOM auto-handled for Windows-edited overrides. Hostname validation uses \Z to block newline-injected hostnames from corrupting --resolve syntax.
Per-port baselineOne raw-IP curl request per port. Status 0 (curl couldn't connect) is dropped as no-data instead of being recorded as a real probe.
L7 probing-H "Host: <candidate>" against https://<ip>:<port>/. Catches classic Apache / Nginx vhost routing.
L4 probing--resolve <candidate>:<port>:<ip> so the TLS handshake carries the candidate as SNI. Catches modern reverse proxies (NGINX ingress, Traefik, Cloudflare, k8s) that route at the TLS layer. Skipped when scheme is http (no SNI to set).
Anomaly detectionPer-(candidate, layer) probe compared to baseline. Anomaly if status differs OR body size deviates beyond VHOST_SNI_BASELINE_SIZE_TOLERANCE (default 50 bytes).
Severity ladderhigh when L7 and L4 disagree on the same hostname (proxy bypass primitive); medium when discovered vhost matches INTERNAL_KEYWORDS (~80 entries: admin, jenkins, vault, keycloak, argocd, kibana, grafana, ...); low for any anomaly with a different status code; info for size-delta-only anomalies. Compound names like admin-portal matched via longest-keyword-wins with lexicographic tie-break.
Discovery feedback loopWhen VHOST_SNI_INJECT_DISCOVERED=true (default), discovered hidden vhosts are folded into combined_result["http_probe"]["by_url"] as fresh BaseURLs with discovery_source="vhost_sni_enum", so downstream graph methods and follow-up partial-recon runs see them as real targets.
Concurrency + safetyAll candidates per IP fanned out via internal ThreadPoolExecutor(max_workers=VHOST_SNI_CONCURRENCY) (default 20). Each curl invocation has a --connect-timeout + --max-time = 3 * timeout, plus a subprocess timeout * 3 + 2 belt-and-braces guard. Per-IP candidate cap (default 2,000) is deterministic across reruns.

Stealth overrides: VHOST_SNI_ENABLED=false outright. Bare-IP curl probes plus per-candidate retries are too noisy for stealth profiles.

Three vulnerability shapes:

  • host_header_bypass (layer = both, attached to BOTH the Subdomain AND the IP node since the IP itself is the bypass surface)
  • hidden_sni_route (layer = L4, attached to the Subdomain)
  • hidden_vhost (layer = L7, attached to the Subdomain)

Each finding carries a deterministic id vhost_sni_<host>_<ip>_<port>_<layer> so rescans MERGE on the same Vulnerability node in Neo4j (no duplicates).

Source layout:

recon/main_recon_modules/vhost_sni_enum.py   # The full module (run_vhost_sni_enrichment, run_vhost_sni_enrichment_isolated, _build_candidate_set, _collect_ip_targets, _is_anomaly, _classify_severity, _inject_into_http_probe, _is_valid_hostname)
recon/wordlists/vhost-common.txt             # 2,471-entry default wordlist (shipped in-image)
graph_db/mixins/recon/vhost_sni_mixin.py     # Neo4jClient.update_graph_from_vhost_sni()

πŸ“– Detailed documentation: readmes/GRAPH.SCHEMA.md -- VHost/SNI Vulnerability properties + Subdomain/IP enrichment | Wiki: VHost & SNI Enumeration


Module 6: github

flowchart LR
    subgraph Input
        Org[GitHub Org/User]
    end

    subgraph Hunter["GitHub Secret Hunter"]
        Repos[List Repositories]
        Commits[Search Commits]
        Code[Search Code]
    end

    subgraph Patterns["Detection Patterns"]
        AWS[AWS Keys]
        GCP[GCP Credentials]
        Stripe[Stripe Keys]
        DB[Database Strings]
        SSH[SSH Keys]
    end

    subgraph Output
        Secrets[Exposed Secrets]
    end

    Org --> Repos
    Repos --> Commits
    Repos --> Code

    Commits --> Patterns
    Code --> Patterns

    AWS --> Secrets
    GCP --> Secrets
    Stripe --> Secrets
    DB --> Secrets
    SSH --> Secrets

Module 7: trufflehog

flowchart LR
    subgraph Input
        Org2[GitHub Org/User]
    end

    subgraph Scanner["TruffleHog Secret Scanner"]
        Repos2[List Repositories]
        GitHistory[Deep Git History Scan]
        Verify[Credential Verification]
    end

    subgraph Detectors["Detector Engine"]
        AWS2[AWS Keys]
        GCP2[GCP Credentials]
        Stripe2[Stripe Keys]
        DB2[Database Strings]
        Custom[700+ Detectors]
    end

    subgraph Output2
        Findings[Verified + Unverified Findings]
    end

    Org2 --> Repos2
    Repos2 --> GitHistory
    GitHistory --> Verify

    Verify --> Detectors

    AWS2 --> Findings
    GCP2 --> Findings
    Stripe2 --> Findings
    DB2 --> Findings
    Custom --> Findings

TruffleHog provides deep credential detection by scanning the full git history of all repositories belonging to the target GitHub organization or user. Unlike the built-in GitHub Secret Hunter (regex + entropy), TruffleHog uses a detector-based engine with 700+ credential detectors and can verify whether discovered credentials are still active.

FeatureGitHub Secret HunterTruffleHog
Detection method40+ regex patterns + Shannon entropy700+ detector-based engine
VerificationNoYes (active credential checking)
Git history depthCommit + code search APIFull local clone history
OutputSecrets + Sensitive filesVerified + Unverified findings

The trufflehog_scan/ directory contains the scanner wrapper and its docker-compose service definition.


πŸ†š Complete Tool Comparison

Overview Matrix

flowchart TB
    subgraph Layer1["Layer 1: DNS/Registry"]
        WHOIS[WHOIS<br/>Domain info]
        DNS[DNS<br/>Resolution]
    end

    subgraph Layer2["Layer 4: Transport"]
        Naabu[Naabu<br/>Port scan]
    end

    subgraph Layer3["Layer 7: Application"]
        Httpx[Httpx<br/>HTTP probe]
        Katana[Katana<br/>Crawl]
        Hakrawler[Hakrawler<br/>DOM crawl]
        GAU[GAU<br/>Archives]
        KR[Kiterunner<br/>API brute]
        jsluice[jsluice<br/>JS analysis]
        Nuclei[Nuclei<br/>Vuln scan]
    end

    subgraph Layer1b["OSINT Enrichment"]
        Shodan2[Shodan<br/>Host/DNS/CVEs]
        URLScan[URLScan<br/>Historical scans]
        Censys2[Censys<br/>Host intelligence]
        FOFA2[FOFA<br/>Asset search]
        OTX2[OTX<br/>Threat intel]
        Netlas2[Netlas<br/>Internet intel]
        VT2[VirusTotal<br/>Reputation]
        ZoomEye2[ZoomEye<br/>Host search]
        CrimIP2[CriminalIP<br/>Risk score]
    end

    subgraph Layer4["Data Enrichment"]
        MITRE[MITRE<br/>CWE/CAPEC]
        GVM[GVM<br/>Deep scan]
    end

    WHOIS --> DNS
    DNS --> Shodan2
    DNS --> URLScan
    Shodan2 --> Naabu
    URLScan --> Naabu
    Naabu --> Httpx
    Httpx --> Katana
    Httpx --> Hakrawler
    Httpx --> GAU
    Httpx --> KR
    Katana --> jsluice
    Hakrawler --> jsluice
    jsluice --> Nuclei
    Katana --> Nuclei
    Hakrawler --> Nuclei
    GAU --> Nuclei
    KR --> Nuclei
    Nuclei --> MITRE
    Nuclei --> GVM

Feature Comparison

FeatureWHOISDNSShodanURLScanCensysFOFAOTXNetlasVirusTotalZoomEyeCriminalIPMasscanNaabuhttpxKatanaHakrawlerGAUKiterunnerjsluiceNucleiGVM
Domain Infoβœ…βš οΈβŒβŒβŒβš οΈβš οΈβš οΈβŒβš οΈβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒ
IP ResolutionβŒβœ…βš οΈβš οΈβš οΈβš οΈβš οΈβš οΈβš οΈβš οΈβš οΈβŒβš οΈβœ…βŒβŒβŒβŒβŒβŒβŒ
Subdomain DiscoveryβŒβŒβš οΈβœ…βŒβš οΈβŒβš οΈβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒ
Port / Service DataβŒβŒβš οΈβŒβœ…βœ…βŒβœ…βœ…βœ…βš οΈβœ…βœ…βŒβŒβŒβŒβŒβŒβŒβœ…
Live URL CheckβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβœ…βŒβŒβŒβŒβŒβŒβŒ
Tech DetectionβŒβŒβš οΈβš οΈβš οΈβš οΈβŒβš οΈβš οΈβš οΈβŒβŒβŒβœ…βŒβŒβŒβŒβŒβš οΈβš οΈ
Endpoint DiscoveryβŒβŒβŒβš οΈβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβœ…βœ…βœ…βœ…βš οΈβŒβŒ
Historical URLsβŒβŒβŒβœ…βŒβŒβš οΈβŒβŒβŒβŒβŒβŒβŒβŒβŒβœ…βŒβŒβŒβŒ
Threat ReputationβŒβŒβŒβŒβŒβŒβœ…βŒβœ…βŒβœ…βŒβŒβŒβŒβŒβŒβŒβŒβŒβŒ
Passive DNSβŒβŒβš οΈβŒβŒβŒβœ…βœ…βŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒ
Malware / CVE IntelβŒβŒβœ…βŒβŒβŒβœ…βŒβœ…βŒβœ…βŒβŒβŒβŒβŒβŒβŒβŒβœ…βœ…
TLS / CertificateβŒβŒβš οΈβš οΈβœ…βœ…βŒβœ…βŒβœ…βš οΈβŒβŒβœ…βŒβŒβŒβŒβŒβŒβŒ
Geolocation / ASNβŒβŒβœ…βš οΈβœ…βœ…βš οΈβœ…βš οΈβœ…βœ…βŒβŒβš οΈβŒβŒβŒβŒβŒβŒβŒ
API DiscoveryβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβœ…βŒβŒβŒ
XSS/SQLi TestingβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβœ…βš οΈ
Secret DetectionβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβŒβœ…βŒβŒ

Legend: βœ… Primary | ⚠️ Limited | ❌ Not supported

Timing Comparison

ToolTypical DurationNotes
WHOIS<1 secondInstant
DNS<1 secondInstant
Shodan5-15 secondsPassive, per-IP queries
URLScan5-20 secondsPassive, API rate-limited
Censys5-30 secondsPassive, per-IP queries, 429 detection
FOFA5-30 secondsPassive, domain/IP query, max 10,000 results
OTX5-60 secondsPassive, per-IP + per-domain queries
Netlas5-30 secondsPassive, per-IP/domain queries, max 1,000 results
VirusTotal1-5 minutesFree tier: 4 req/min; 65s wait on rate limit
ZoomEye5-30 secondsPassive, per-IP/domain queries, max 1,000 results
CriminalIP5-30 secondsPassive, per-IP + per-domain queries
Amass1-10 minutesPassive; longer with active/brute
Puredns30-90 secondsDepends on subdomain count
Masscan1-30 secondsFastest for large CIDR ranges
Naabu5-10 seconds1000 ports
httpx10-30 secondsAll options
Katana1-5 minutesCrawl depth 3
Hakrawler30-120 secondsActive crawling, depth 2
GAU10-30 secondsPassive
jsluice10-60 secondsActive JS download + analysis
Nuclei1-30 minutesDepends on templates
GVM30 min - 2+ hoursFull scan

βš™οΈ Key Configuration Parameters

Essential Settings

All settings are managed through the webapp project form or via environment variables. Key defaults are defined in project_settings.py:

SettingDefaultDescription
TARGET_DOMAINβ€”Root domain to scan
SUBDOMAIN_LIST[]Empty = discover all
SCAN_MODULESall 5 modulesModules to run
NAABU_TOP_PORTS"1000"Top-N ports to scan
NAABU_SCAN_TYPE"s"SYN scan
MASSCAN_RATE1000Masscan packets/sec
OTX_ENABLEDtrueOTX threat intel enrichment (anonymous by default)
VIRUSTOTAL_RATE_LIMIT4VirusTotal requests per minute (free tier)
VIRUSTOTAL_MAX_TARGETS20Max IPs+domains to query with VirusTotal
FOFA_MAX_RESULTS1000FOFA results per query (max 10,000)
NETLAS_MAX_RESULTS1000Netlas results per query (max 1,000)
ZOOMEYE_MAX_RESULTS1000ZoomEye results per query
NUCLEI_DAST_MODEtrueActive fuzzing
NUCLEI_SEVERITYcritical, high, medium, lowSeverity filter
WAPPALYZER_ENABLEDtrueTechnology detection
MITRE_INCLUDE_CWEtrueCWE enrichment
MITRE_INCLUDE_CAPECtrueCAPEC enrichment

πŸ”§ Prerequisites

  • Docker with Docker Compose
  • Docker socket access for nested container execution
# Verify Docker is running
docker info

# Build and run
cd recon/
docker-compose build --network=host
docker-compose run --rm recon python /app/recon/main.py

Tool Containers (auto-pulled)

ToolDocker ImagePurpose
MasscanBuilt from source (native binary)High-speed SYN port scanning
Naabuprojectdiscovery/naabu:latestPort scanning
httpxprojectdiscovery/httpx:latestHTTP probing
Nucleiprojectdiscovery/nuclei:latestVuln scanning
Katanaprojectdiscovery/katana:latestWeb crawling
GAUsxcurity/gau:latestURL discovery
Amasscaffix/amass:latestSubdomain enumeration
Purednsfrost19k/puredns:latestWildcard filtering

πŸ“ Project Structure

recon/
β”œβ”€β”€ Dockerfile              # Container build
β”œβ”€β”€ docker-compose.yml      # Orchestration
β”œβ”€β”€ project_settings.py     # πŸ”— Settings fetcher (API or built-in defaults)
β”œβ”€β”€ main.py                 # πŸš€ Entry point
β”œβ”€β”€ domain_recon.py         # Subdomain discovery
β”œβ”€β”€ whois_recon.py          # WHOIS lookup
β”œβ”€β”€ urlscan_enrich.py       # URLScan.io OSINT enrichment
β”œβ”€β”€ censys_enrich.py        # Censys threat intelligence enrichment
β”œβ”€β”€ fofa_enrich.py          # FOFA internet asset search enrichment
β”œβ”€β”€ otx_enrich.py           # OTX (AlienVault) threat intelligence enrichment
β”œβ”€β”€ netlas_enrich.py        # Netlas internet intelligence enrichment
β”œβ”€β”€ virustotal_enrich.py    # VirusTotal reputation enrichment
β”œβ”€β”€ zoomeye_enrich.py       # ZoomEye host search enrichment
β”œβ”€β”€ criminalip_enrich.py    # Criminal IP threat intelligence enrichment
β”œβ”€β”€ port_scan.py            # Port scanning
β”œβ”€β”€ http_probe.py           # HTTP probing
β”œβ”€β”€ resource_enum.py        # Endpoint discovery
β”œβ”€β”€ main_recon_modules/
β”‚   └── ai_surface_recon.py # AI/LLM/MCP/vector-DB surface fingerprinting (Phase 4.5)
β”œβ”€β”€ vuln_scan.py            # Vulnerability scanning
β”œβ”€β”€ add_mitre.py            # MITRE enrichment
β”œβ”€β”€ github_secret_hunt.py   # GitHub secrets
β”œβ”€β”€ trufflehog_scan/        # TruffleHog secret scanner (separate service)
β”œβ”€β”€ output/                 # πŸ“„ Scan results (JSON)
β”œβ”€β”€ data/                   # πŸ“¦ Cached databases
β”‚   β”œβ”€β”€ mitre_db/           # CVE2CAPEC database
β”‚   └── wappalyzer/         # Technology rules
β”œβ”€β”€ helpers/                # Tool helpers
└── readmes/                # πŸ“– Module docs

πŸ“Š Output Format

All modules write to: recon/output/recon_<domain>.json

flowchart TB
    subgraph JSON["recon_domain.json"]
        Meta[metadata<br/>scan info, timestamps]
        WHOIS[whois<br/>registrar, dates]
        Subs[subdomains<br/>discovered hosts]
        DNSData[dns<br/>A, MX, TXT records]
        Ports[port_scan<br/>open ports, services]
        HTTP[http_probe<br/>live URLs, tech stack]
        Resources[resource_enum<br/>endpoints, forms]
        AISurface[ai_surface_recon<br/>AI/LLM/MCP/vector-DB]
        Vulns[vuln_scan<br/>CVEs, misconfigs]
        TechCVE[technology_cves<br/>version-based CVEs]
    end

    Meta --> WHOIS
    WHOIS --> Subs
    Subs --> DNSData
    DNSData --> Ports
    Ports --> HTTP
    HTTP --> Resources
    Resources --> AISurface
    AISurface --> Vulns
    Vulns --> TechCVE

πŸ§ͺ Test Targets

Safe, legal targets for security testing:

TargetTechnologyVulnerabilities
testphp.vulnweb.comPHP + MySQLSQLi, XSS, LFI
testhtml5.vulnweb.comHTML5DOM XSS
testasp.vulnweb.comASP.NETSQLi, XSS
scanme.nmap.orgN/APort scanning only
# Example configuration
TARGET_DOMAIN = "vulnweb.com"
SUBDOMAIN_LIST = ["testphp."]
NUCLEI_DAST_MODE = True

Only scan systems you own or have explicit written permission to test.

Unauthorized scanning is illegal. RedAmon is intended for:

  • Penetration testers with proper authorization
  • Security researchers on approved targets
  • Bug bounty hunters within program scope
  • System administrators testing their infrastructure

πŸ“– Detailed Documentation

ModuleDocumentation
Port Scanreadmes/README.PORT_SCAN.md
HTTP Probereadmes/README.HTTP_PROBE.md
Vuln Scanreadmes/README.VULN_SCAN.md
MITRE CWE/CAPECreadmes/README.MITRE.md
GVM/OpenVASREADME.GVM.md