Knowledge Base (RAG-Enhanced Pentesting Knowledge)

April 10, 2026 ยท View on GitHub

The Knowledge Base (KB) adds offline, retrieval-augmented search to the RedAmon agent. Instead of relying solely on Tavily web search, the agent queries a local vector index (FAISS) and graph database (Neo4j) populated with curated security datasets. When the local KB produces high-confidence results, Tavily is skipped entirely. When confidence is low, KB and Tavily results are merged.


Table of Contents


Architecture Overview

The KB system has two main phases: ingestion (offline, batch) and query (runtime, per-request). Ingestion fetches security datasets, chunks them, embeds them with a sentence-transformer model, and stores vectors in FAISS and metadata in Neo4j. At query time, the agent runs a hybrid retrieval pipeline combining vector similarity and keyword search, fuses results with Reciprocal Rank Fusion (RRF), reranks with a cross-encoder, and applies diversity filtering before returning results.

graph TB
    subgraph "Ingestion Phase"
        DS[Data Sources<br/>NVD, GTFOBins, LOLBAS,<br/>Nuclei, OWASP, ExploitDB,<br/>Tool Docs]
        SH[safe_get<br/>hostname-allowlisted HTTP]
        FC[File Cache<br/>.file_hashes.json]
        CH[Chunking<br/>structured or markdown]
        FU[Content-Hash Filter<br/>.manifest.json]
        EM1[Embedder<br/>e5-large-v2]
        FI1[FAISS Index<br/>index.faiss]
        NL1[Neo4j Loader<br/>KBChunk nodes]

        DS --> SH
        SH --> FC
        FC --> CH
        CH --> FU
        FU --> EM1
        EM1 --> FI1
        EM1 --> NL1
    end

    subgraph "Query Phase"
        AQ[Agent Query<br/>web_search tool]
        EM2[Embedder<br/>embed query]
        VS[FAISS Vector Search]
        FS[Neo4j Fulltext Search<br/>Lucene]
        RRF[RRF Fusion<br/>k=60]
        MF[Metadata Fetch<br/>+ Source Filtering]
        SB[Source Boosts]
        RR[Cross-Encoder Rerank<br/>bge-reranker-base]
        MMR[MMR Diversity Filter]
        SC[Sufficiency Check<br/>score >= 0.35?]
        TV[Tavily Fallback]
        RES[Results to Agent]

        AQ --> EM2
        EM2 --> VS
        AQ --> FS
        VS --> RRF
        FS --> RRF
        RRF --> MF
        MF --> SB
        SB --> RR
        RR --> MMR
        MMR --> SC
        SC -->|sufficient| RES
        SC -->|insufficient| TV
        TV --> RES
    end

Data Sources

The KB ingests from seven curated security data sources, organized into three profiles.

Profiles

ProfileSourcesBuild Time (CPU)Build Time (GPU/API)
cpu-litetool_docs, gtfobins, lolbas~15 min~2 min
litecpu-lite + owasp, exploitdb~4 hours~3 min
standardlite + nvd~4-5 hours~8 min
fullstandard + nuclei~5-6 hours~15 min

Source Details

graph LR
    subgraph "lite"
        TD[Tool Docs<br/>Local skill files<br/>agentic/skills/*.md]
        GT[GTFOBins<br/>GitHub tarball<br/>Unix binary abuse]
        LO[LOLBAS<br/>GitHub tarball<br/>Windows LOLBins]
        OW[OWASP WSTG<br/>GitHub tarball<br/>Web testing guide]
        EX[ExploitDB<br/>GitLab CSV<br/>Exploit database]
    end

    subgraph "standard"
        NV[NVD<br/>REST API v2.0<br/>CVE database]
    end

    subgraph "full"
        NU[Nuclei<br/>GitHub tarball<br/>~15k+ templates]
    end
SourceOriginTransportChunk StrategyChunk Count
tool_docsagentic/skills/*.mdLocal filesystemSummary chunk per tooling file; section-chunked for others~50-100
gtfobinsgithub.com/GTFOBinsHTTPS tarballOne chunk per binary per function type~400-500
lolbasgithub.com/LOLBAS-ProjectHTTPS tarballOne chunk per binary per command~800-1000
owaspgithub.com/OWASP/wstgHTTPS tarballMarkdown section-chunked by ## headers~500-700
exploitdbgitlab.com/exploit-databaseHTTPS CSVOne chunk per exploit entry~45,000+
nvdservices.nvd.nist.govREST API v2.0One chunk per CVE~7,500 (90d, CVSS>=7)
nucleigithub.com/projectdiscoveryHTTPS tarballOne chunk per template~15,000+

Source Boosts

Each source has a relevance boost factor applied during scoring. Higher boosts favor results from sources that tend to be more directly actionable:

SourceBoostRationale
tool_docs1.20Directly actionable, agent-specific playbooks
gtfobins1.15Precise, high-signal for Unix priv-esc
lolbas1.15Precise, high-signal for Windows LOLBins
owasp1.05Methodology-focused, solid reference
nuclei1.00Baseline (high volume, variable relevance)
nvd0.90Context-rich but often not directly actionable
exploitdb0.85High volume, variable quality

Ingestion Pipeline

The ingestion pipeline downloads, parses, chunks, embeds, and indexes security data. It supports incremental updates (only re-embeds changed content) and full rebuilds.

flowchart TD
    CLI["CLI / Makefile / redamon.sh<br/>python -m knowledge_base.curation.data_ingestion"]
    LOCK["Acquire exclusive ingest lock<br/>fcntl.flock on .ingest.lock"]
    INIT["Initialize components<br/>Embedder + FAISSIndexer + Neo4jLoader"]

    subgraph "Per-Source Loop"
        FETCH["Client.fetch()<br/>Download or read source data"]
        CHUNK["Client.to_chunks()<br/>Parse into chunk dicts"]
        DEDUP["Within-batch dedup<br/>by chunk_id, last-writer-wins"]
        HASH["Content-hash filter<br/>SHA256 vs .manifest.json"]
        EMBED["Embedder.embed_documents_batch()<br/>sentence-transformers, batch_size=64"]
        FAISS_ADD["FAISSIndexer.add()<br/>vectors + chunk_ids"]
        NEO4J_UP["Neo4jLoader.upsert_chunks()<br/>MERGE by chunk_id, dual-label"]
    end

    SAVE["FAISSIndexer.save()<br/>atomic write + integrity manifest"]
    MANIFEST["Write .manifest.json<br/>content hashes for next run"]
    MARKER["Write .last_ingest<br/>timestamp + profile"]

    CLI --> LOCK
    LOCK --> INIT
    INIT --> FETCH
    FETCH --> CHUNK
    CHUNK --> DEDUP
    DEDUP --> HASH
    HASH -->|changed only| EMBED
    EMBED --> FAISS_ADD
    FAISS_ADD --> NEO4J_UP
    HASH -->|unchanged| SKIP[Skip re-embedding]
    NEO4J_UP --> FETCH
    FAISS_ADD --> SAVE
    SAVE --> MANIFEST
    MANIFEST --> MARKER

Incremental Update Logic

The pipeline uses a two-layer caching system to avoid redundant work:

Layer 1 -- File-level cache (tarball sources only): Each downloaded file is hashed. On the next run, only files with changed hashes are re-parsed. Stored in data/cache/<source>/.file_hashes.json.

Layer 2 -- Chunk content-hash filter: After chunking, each chunk's content is SHA-256 hashed. If a chunk's hash matches the previous run's manifest (data/cache/.manifest.json), it is skipped entirely -- no re-embedding, no re-upserting. This catches cases where files changed but the extracted chunks did not.

NVD-specific behavior

  • Paginated API with 2000 results per page and 120-day windows (API limit)
  • Rate limiting: 6.5s between requests without API key, 0.65s with key
  • Incremental mode: when .last_ingest exists, uses lastModStartDate to only fetch recently modified CVEs
  • Unified cache: all CVEs merged into nvd_cache.json keyed by CVE ID

Query Pipeline

When the agent calls web_search(), the KB runs a 6-stage hybrid retrieval pipeline.

flowchart TD
    Q["Agent calls web_search(query,<br/>include_sources, exclude_sources,<br/>top_k, min_cvss)"]

    subgraph "Stage 1: Hybrid Candidate Retrieval"
        POOL["candidate_pool = max(top_k * 6, 30)"]
        VEC["FAISS vector search<br/>embed query, cosine similarity<br/>~30ms"]
        FT["Neo4j fulltext search<br/>Lucene keyword match<br/>~5ms"]
        RRFN["Reciprocal Rank Fusion<br/>score = sum(1 / (60 + rank_i))"]
        POOL --> VEC
        POOL --> FT
        VEC --> RRFN
        FT --> RRFN
    end

    subgraph "Stage 2: Filter and Fetch"
        META["Neo4j filter_chunks()<br/>source, CVSS, severity filters<br/>fetch full metadata"]
    end

    subgraph "Stage 3: Source Boosts"
        BOOST["score = rrf_score * source_boost<br/>tool_docs: 1.20, gtfobins: 1.15, ..."]
    end

    subgraph "Stage 4: Cross-Encoder Rerank"
        CE["BAAI/bge-reranker-base<br/>top 30 candidates<br/>logit -> sigmoid -> boost"]
    end

    subgraph "Stage 5: MMR Diversity"
        MMRN["Greedy selection, lambda=0.65<br/>similarity = 0.4*(same source) +<br/>0.6*(title Jaccard overlap)"]
    end

    subgraph "Stage 6: Sufficiency Check"
        CHECK["top_result.score >= 0.35?"]
        YES["Return KB results"]
        NO["Fallback to Tavily<br/>merge if partial KB results"]
        CHECK -->|yes| YES
        CHECK -->|no| NO
    end

    Q --> POOL
    RRFN --> META
    META --> BOOST
    BOOST --> CE
    CE --> MMRN
    MMRN --> CHECK

Scoring Details

  • RRF uses k=60 (standard). Rank-based, scale-invariant.
  • Source boosts are multiplicative on the RRF score (pre-rerank) and on the sigmoid-transformed reranker score (post-rerank).
  • Cross-encoder returns raw logits (can be negative). Sigmoid is applied before boosting to ensure correct directionality.
  • MMR uses a lightweight similarity proxy (no stored vectors): 0.4 * same_source + 0.6 * title_jaccard. Lambda=0.65 slightly favors relevance over diversity.
  • Sufficiency threshold defaults to 0.35. Below this, Tavily is queried as a fallback.

Agent Integration

The KB integrates into the agent's tool system through the web_search tool. The agent does not interact with the KB directly -- it calls web_search() with optional filtering parameters, and the tool transparently queries KB first, falling back to Tavily when needed.

flowchart TD
    subgraph "Agent Container"
        ORCH["AgentOrchestrator<br/>initialize()"]
        SETUP["_setup_knowledge_base()"]
        EMB["Embedder<br/>intfloat/e5-large-v2"]
        FIDX["FAISSIndexer<br/>/app/knowledge_base/data"]
        N4JL["Neo4jLoader<br/>dedicated driver"]
        KB["PentestKnowledgeBase"]
        WST["WebSearchToolManager<br/>knowledge_base=kb"]
        TOOL["web_search tool<br/>LLM-callable"]

        ORCH --> SETUP
        SETUP --> EMB
        SETUP --> FIDX
        SETUP --> N4JL
        EMB --> KB
        FIDX --> KB
        N4JL --> KB
        KB --> WST
        WST --> TOOL
    end

    subgraph "External"
        NEO["Neo4j<br/>bolt://neo4j:7687"]
        DISK["FAISS Index<br/>index.faiss on disk"]
        TAV["Tavily API<br/>fallback web search"]
    end

    N4JL -.-> NEO
    FIDX -.-> DISK
    WST -.-> TAV

Initialization Sequence

  1. AgentOrchestrator.initialize() calls _setup_knowledge_base()
  2. Feature gate: checks KB_ENABLED env var (default true). If false, returns None
  3. Dynamic import of KB modules (graceful degradation if dependencies missing)
  4. Creates Embedder (model from KB_EMBEDDING_MODEL env var or config)
  5. Creates FAISSIndexer pointed at KB_PATH (default /app/knowledge_base/data)
  6. Creates a dedicated Neo4j driver (separate from the graph query driver)
  7. Assembles PentestKnowledgeBase and calls kb.load() to read the FAISS index from disk
  8. Injects KB into WebSearchToolManager
  9. On any failure, logs a warning and starts without KB (Tavily-only mode)

Fallback Cascade

The web_search tool implements a three-tier fallback:

ConditionBehavior
KB score >= threshold (0.35)Return KB results only, skip Tavily
KB score < thresholdQuery Tavily, merge KB partial results with Tavily results
KB fails, Tavily succeedsReturn Tavily results only
Tavily fails, KB has partial resultsReturn KB results with "(Tavily unavailable)" header
Both failReturn "No results found"

Prompt Injection Defense

KB content is untrusted (sourced from public repositories). Three layers prevent prompt injection:

  1. Content sanitization: Strips XML role tags (<system>, <user>), instruction markers ([INST]), ChatML tokens (<|im_start|>), and self-referential frame markers
  2. Length capping: 2000 chars per chunk, 10 items per list
  3. Untrusted framing: All KB output is wrapped in explicit delimiters with a warning instructing the LLM to treat content as reference only

Tool Description (what the LLM sees)

The web_search tool exposed to the agent supports these KB-specific parameters:

ParameterTypeDescription
querystringSearch query
include_sourceslistOnly return results from these sources
exclude_sourceslistExclude results from these sources
top_kint (1-20)Number of results to return
min_cvssfloatMinimum CVSS score (NVD/Nuclei only)

Docker Services

graph TD
    subgraph "Docker Compose Stack"

        subgraph "Always Running"
            AGENT["agent<br/>redamon-agent<br/>Port 8090:8080"]
            NEO4J["neo4j<br/>redamon-neo4j<br/>Port 7474, 7687"]
        end

        subgraph "Opt-in (profile: kb-refresh)"
            REFRESH["kb-refresh<br/>redamon-kb-refresh<br/>Shell sleep-loop scheduler"]
        end

        VOL_DATA["Volume: kb_data<br/>FAISS index, caches"]
        VOL_CONFIG["Bind: kb_config.yaml<br/>read-only"]

        AGENT -->|"bolt://neo4j:7687"| NEO4J
        REFRESH -->|"bolt://neo4j:7687"| NEO4J
        AGENT -.->|":ro bind mount"| VOL_CONFIG
        AGENT -.->|":ro bind mount"| VOL_DATA
        REFRESH -.->|"named volume"| VOL_DATA
    end

Agent Container

  • Image: redamon-agent (built from agentic/Dockerfile)
  • Build context: project root (so knowledge_base/ is included)
  • Pre-cached models: intfloat/e5-large-v2 (~1.3 GB) and BAAI/bge-reranker-base (~568 MB) downloaded at build time
  • KB volumes:
    • ./knowledge_base/kb_config.yaml:/app/knowledge_base/kb_config.yaml:ro -- config
    • ./knowledge_base/data:/app/knowledge_base/data:ro -- FAISS index and caches
  • KB env vars: KB_ENABLED, KB_PATH, KB_EMBEDDING_MODEL, NEO4J_URI/USER/PASSWORD
  • File permissions: KB source code is chmod -R a-w (read-only), only data/ is writable

Neo4j

Shared between the recon graph, agent graph queries, and the KB. KB creates KBChunk nodes with a uniqueness constraint on chunk_id and a fulltext index on content and title for Lucene keyword search.

kb-refresh (Opt-in Sidecar)

  • Profile: kb-refresh (must be explicitly enabled)
  • Image: Same redamon-agent image, different entrypoint
  • Schedule:
    • Daily: NVD incremental update
    • Mondays: ExploitDB + Nuclei
    • 1st of month: GTFOBins + LOLBAS
  • Enable: KB_REFRESH_ENABLED=true docker compose --profile kb-refresh up -d kb-refresh

Configuration

All KB behavior is controlled through a layered configuration system. Each layer overrides the one below it.

flowchart BT
    D["Code Defaults<br/>DEFAULTS dict in kb_config.py"]
    Y["kb_config.yaml<br/>Repository config file"]
    E["Environment Variables<br/>KB_*, NVD_*"]
    A["Runtime Arguments<br/>CLI flags, project settings"]

    D --> Y
    Y --> E
    E --> A

Configuration File: kb_config.yaml

Located at knowledge_base/kb_config.yaml. Mounted read-only into the agent container.

KB_ENABLED: true

embedder:
  model: "intfloat/e5-large-v2"   # 1024 dimensions, 512 token cap
  batch_size: 64

chunking:
  max_tokens: 480                  # hard cap per chunk
  preferred_tokens: 256            # soft target for markdown splitting

reranker:
  enabled: true
  model: "BAAI/bge-reranker-base"  # 512 token cap
  pool_size: 30                    # candidates fed to cross-encoder
  max_tokens_per_side: 480         # pre-truncation limit

fulltext:
  enabled: true                    # Neo4j Lucene fulltext search

retrieval:
  top_k: 5
  overfetch_factor: 6              # candidate_pool = top_k * factor
  score_threshold: 0.35            # below this, fall back to Tavily
  rrf_k: 60                        # RRF smoothing constant

mmr:
  enabled: true
  lambda: 0.65                     # 1.0 = pure relevance, 0.0 = pure diversity

source_boosts:
  tool_docs: 1.20
  gtfobins: 1.15
  lolbas: 1.15
  owasp: 1.05
  nuclei: 1.00
  nvd: 0.90
  exploitdb: 0.85

ingestion:
  default_profile: "lite"
  nvd_lookback_days: 90
  nvd_min_cvss: 7.0
  profiles:
    lite: [tool_docs, gtfobins, lolbas, owasp, exploitdb]
    standard: [tool_docs, gtfobins, lolbas, owasp, exploitdb, nvd]
    full: [tool_docs, gtfobins, lolbas, owasp, exploitdb, nvd, nuclei]

Environment Variable Overrides

VariableConfig PathDefault
KB_ENABLEDtop-leveltrue
KB_EMBEDDING_MODELembedder.modelintfloat/e5-large-v2
KB_RERANK_ENABLEDreranker.enabledtrue
KB_RERANKER_MODELreranker.modelBAAI/bge-reranker-base
KB_RERANKER_MAX_TOKENS_PER_SIDEreranker.max_tokens_per_side480
KB_FULLTEXT_ENABLEDfulltext.enabledtrue
NVD_LOOKBACK_DAYSingestion.nvd_lookback_days90
NVD_MIN_CVSSingestion.nvd_min_cvss7.0
NVD_API_KEY(passed to NVD client)(none)
KB_INDEX_HMAC_KEY(FAISS integrity)(none)
KB_CONFIG_FILEconfig file path override(auto-detected)
KB_PATHdata directory path/app/knowledge_base/data
KB_EMBEDDING_USE_APIUse external API for embeddingsfalse
KB_EMBEDDING_API_BASE_URLAPI base URL (any OpenAI-compatible endpoint)(OpenAI default)
KB_EMBEDDING_API_KEYAPI key(none)
KB_EMBEDDING_API_MODELAPI model nametext-embedding-3-small

CLI Commands

Makefile Targets

All targets support MODE=docker (default, runs inside container) or MODE=local (runs on host with auto-bootstrapped venv).

Build (first-time ingestion):

make kb-build-lite          # ~30-60s, no network API keys needed
make kb-build-standard      # ~6-8m, needs NVD (no key required, slower)
make kb-build-full          # ~10-15m, downloads nuclei templates

Update (incremental, changed content only):

make kb-update-nvd          # recommended: daily
make kb-update-exploitdb    # recommended: weekly
make kb-update-nuclei       # recommended: weekly
make kb-update-gtfobins     # recommended: monthly
make kb-update-lolbas       # recommended: monthly
make kb-update-owasp        # on-demand
make kb-update-tools        # on-demand (after editing skills)

Rebuild (wipe and re-create):

make kb-rebuild-lite
make kb-rebuild-standard
make kb-rebuild-full

Utilities:

make kb-stats               # show index statistics
make kb-clean               # remove __pycache__
make kb-clean-full          # remove venv + caches
make kb-test                # run pytest on KB tests

redamon.sh Commands

./redamon.sh kb build [lite|standard|full]     # build with profile
./redamon.sh kb update [source|all]            # incremental update
./redamon.sh kb rebuild [lite|standard|full]   # wipe + rebuild
./redamon.sh kb stats                          # show statistics

The kb build command is automatically triggered during ./redamon.sh install, up, and restart with the lite profile. Build failure is non-fatal -- the agent starts without KB.


Scheduled Refresh

The kb-refresh sidecar container provides automated data updates without user intervention.

flowchart LR
    subgraph "kb-refresh container"
        LOOP["Shell sleep-loop<br/>checks time on wake"]
        DAILY["Daily<br/>NVD incremental"]
        WEEKLY["Monday<br/>ExploitDB + Nuclei"]
        MONTHLY["1st of month<br/>GTFOBins + LOLBAS"]

        LOOP --> DAILY
        LOOP --> WEEKLY
        LOOP --> MONTHLY
    end

    NEO["Neo4j"]
    VOL["kb_data volume"]

    DAILY --> NEO
    DAILY --> VOL
    WEEKLY --> NEO
    WEEKLY --> VOL
    MONTHLY --> NEO
    MONTHLY --> VOL

Enable:

KB_REFRESH_ENABLED=true docker compose --profile kb-refresh up -d kb-refresh

Recommended: Set NVD_API_KEY for faster NVD updates (50 req/30s vs 5 req/30s without key). Get a free key at https://nvd.nist.gov/developers/request-an-api-key.


Security Model

The ingestion pipeline handles untrusted external data. Multiple defense layers are implemented:

Network Safety

  • Hostname allowlist: All HTTP requests go through safe_get(), which validates every URL (including redirect targets) against a hardcoded list of 6 trusted hosts
  • GET-only: No POST/PUT/DELETE methods exist in the pipeline. Data is never sent outbound
  • Size caps: Response bodies are capped (200 MB general, 50 MB NVD pages) with streaming enforcement
  • Redirect validation: Manual redirect handling with per-hop allowlist checks, max 5 hops

Data Safety

  • Tar decompression: bounded_tar_iter() caps per-member (10 MB) and total (500 MB) decompressed size. Only regular files are extracted -- symlinks, hardlinks, and devices are skipped
  • YAML parsing: bounded_yaml_load() pre-scans for billion-laughs attacks (max 100 anchors, 1000 aliases), deep nesting, and oversized documents before calling yaml.safe_load()
  • Path traversal: safe_relative_path() rejects absolute paths, .. segments, NUL bytes, and paths that escape the base directory after symlink resolution
  • Symlink-safe writes: safe_write_text() uses O_NOFOLLOW to prevent TOCTOU symlink attacks

Index Integrity

  • FAISS manifest: SHA-256 (or HMAC-SHA256 if KB_INDEX_HMAC_KEY is set) digest of the index file, verified before every load
  • Constant-time comparison: hmac.compare_digest() used for all digest checks
  • Atomic writes: All file writes use tempfile + fsync + rename pattern

Query Safety

  • Cypher injection: All Neo4j query values are parameterized. Interpolated identifiers (labels, property keys) pass through a strict ^[A-Za-z_][A-Za-z0-9_]*$ regex
  • Prompt injection: Three-layer defense (content sanitization, length capping, untrusted content framing)
  • Secret protection: API keys are redacted in logs, HMAC keys are never logged

Per-Project Runtime Tuning

Project-level settings (stored in the database) can override KB behavior at runtime without restarting the agent. These are applied on every agent invocation via _apply_project_settings().

SettingTypeEffect
KB_ENABLEDboolDisable KB for this project (Tavily-only)
KB_SCORE_THRESHOLDfloatOverride sufficiency threshold
KB_TOP_KintOverride default result count
KB_MMR_ENABLEDboolToggle diversity filtering
KB_MMR_LAMBDAfloatTune relevance vs diversity balance
KB_OVERFETCH_FACTORintControl candidate pool size
KB_SOURCE_BOOSTSdictMerge custom per-source boosts
KB_ENABLED_SOURCESlistProject-wide source allowlist

All settings default to None (inherit from kb_config.yaml). Only non-None values override.


Neo4j Graph Schema

KB data is stored alongside the existing recon graph in Neo4j. All KB nodes use the base label KBChunk plus a source-specific label for efficient querying.

graph TD
    subgraph "Node Labels (dual-labeled)"
        BASE["KBChunk<br/>chunk_id (unique)<br/>content, title, source<br/>ingested_at"]

        NVD["NVDChunk<br/>cve_id, cvss_score, severity"]
        EDB["ExploitDBChunk<br/>edb_id, cve_id, platform"]
        GTF["GTFOBinsChunk<br/>binary_name, function_type"]
        LOL["LOLBASChunk<br/>binary_name, category, mitre_id"]
        OWA["OWASPChunk<br/>test_id, category"]
        NUC["NucleiChunk<br/>template_id, severity,<br/>cve_id, cvss_score, protocol"]
        TDC["ToolDocChunk<br/>tool_name"]

        BASE --- NVD
        BASE --- EDB
        BASE --- GTF
        BASE --- LOL
        BASE --- OWA
        BASE --- NUC
        BASE --- TDC
    end

    subgraph "Indexes"
        UC["Unique: KBChunk.chunk_id"]
        FT["Fulltext: KBChunk.(content, title)<br/>Lucene-backed"]
        PI["Property indexes per source<br/>e.g. NVDChunk.cve_id,<br/>NucleiChunk.severity"]
    end

On-Disk Layout

knowledge_base/
  __init__.py                      # re-exports PentestKnowledgeBase
  kb_config.py                     # configuration system
  kb_config.yaml                   # default config (mounted :ro)
  kb_orchestrator.py               # query pipeline (PentestKnowledgeBase)
  embedder.py                      # sentence-transformer wrapper
  faiss_indexer.py                 # FAISS index management + integrity
  neo4j_loader.py                  # Neo4j graph operations
  reranker.py                      # cross-encoder reranking
  chunking.py                      # text chunking strategies
  document_store.py                # source file access
  atomic_io.py                     # safe file I/O primitives
  curation/
    __init__.py
    data_ingestion.py              # ingestion orchestrator + CLI
    base_client.py                 # abstract client interface
    safe_http.py                   # hostname-allowlisted HTTP
    file_cache.py                  # caching, hashing, tar/YAML safety
    exploitdb_client.py            # ExploitDB CSV parser
    gtfobins_client.py             # GTFOBins tarball parser
    lolbas_client.py               # LOLBAS tarball parser
    nuclei_client.py               # Nuclei templates parser
    nvd_client.py                  # NVD REST API client
    owasp_client.py                # OWASP WSTG parser
    tool_docs_client.py            # local skill docs reader
  data/                            # runtime data (gitignored)
    index.faiss                    # FAISS vector index
    chunk_ids.json                 # chunk_id -> FAISS int ID mapping
    index.faiss.manifest.json      # integrity digest
    .last_ingest                   # timestamp + profile marker
    cache/
      .manifest.json               # content-hash manifest
      exploitdb/                   # cached CSV
      gtfobins/                    # cached parsed files
      lolbas/                      # cached YAML files
      nuclei/                      # cached templates
      nvd/                         # nvd_cache.json
      owasp/                       # cached markdown files
  tests/
    test_chunking.py
    test_clients.py
    test_data_ingestion.py
    test_embedder.py
    test_faiss_indexer.py
    test_file_cache.py
    test_kb_config.py
    test_kb_orchestrator.py
    test_neo4j_loader.py
    test_reranker.py
    test_safe_http.py

Embedding: CPU vs GPU vs API

The KB uses vector embeddings to convert text into searchable vectors. How embeddings are generated depends on your setup.

Automatic Detection

During ./redamon.sh install (and up/restart), RedAmon automatically detects your capabilities:

HardwareWhat HappensIngestion SpeedProfile
GPU (CUDA)sentence-transformers runs on GPUFast (~2-5 min)Full lite
API key configuredExternal API generates embeddingsFast (~2-3 min)Full lite
CPU onlyInteractive prompt asks user to chooseSlow for large datasetscpu-lite or lite

On CPU without an API key, you see an interactive prompt with estimated times for each source and the option to do a quick start (cpu-lite, ~15 min) or full ingestion (~4 hours).

Query time is always fast (~30ms) regardless of CPU/GPU/API. The speed difference only matters during ingestion.

API Mode Configuration

Configure in .env (copy from .env.example):

KB_EMBEDDING_USE_API=true
KB_EMBEDDING_API_KEY=sk-your-key
KB_EMBEDDING_API_MODEL=text-embedding-3-small
# KB_EMBEDDING_API_BASE_URL=  (leave empty for OpenAI)

OpenAI-Compatible APIs

The embedder uses the OpenAI SDK, which works with any API that implements the OpenAI embeddings endpoint. Set KB_EMBEDDING_API_BASE_URL to point to the compatible server:

ProviderBase URLNotes
OpenAI(leave empty)Direct OpenAI API
Ollamahttp://host.docker.internal:11434/v1Local models, free
LiteLLMhttp://host.docker.internal:4000/v1Proxy for 100+ providers
Together AIhttps://api.together.xyz/v1Hosted open models
Azure OpenAIhttps://<resource>.openai.azure.com/...Enterprise
vLLMhttp://host.docker.internal:8000/v1Self-hosted GPU server
Fireworks AIhttps://api.fireworks.ai/inference/v1Fast inference

Example with Ollama (free, runs locally):

KB_EMBEDDING_USE_API=true
KB_EMBEDDING_API_MODEL=nomic-embed-text
KB_EMBEDDING_API_KEY=ollama
KB_EMBEDDING_API_BASE_URL=http://host.docker.internal:11434/v1

Switching Embedding Models

Ingestion and query must use the same embedding model. Different models produce vectors with different dimensions (e.g., e5-large-v2 = 1024d, OpenAI text-embedding-3-small = 1536d). The FAISS index is dimension-locked.

If you switch models, rebuild the index:

make -C knowledge_base kb-rebuild-lite MODE=docker

The ingestion pipeline detects dimension mismatches and will error with a clear message if you forget to rebuild.

Environment Variables Reference

VariableDefaultDescription
KB_EMBEDDING_USE_APIfalseUse external API for embeddings
KB_EMBEDDING_API_BASE_URL(OpenAI default)API base URL (any OpenAI-compatible endpoint)
KB_EMBEDDING_API_KEY(none)API key
KB_EMBEDDING_API_MODELtext-embedding-3-smallModel name for the API

Ingestion Entry Points

There are four ways ingestion can start. Each ultimately calls the same Python module (knowledge_base.curation.data_ingestion), but they are triggered from different contexts.

flowchart TD
    subgraph "Entry Point 1: redamon.sh install / up / restart"
        CMD1["./redamon.sh install<br/>./redamon.sh up<br/>./redamon.sh restart"]
        CHK1["is_kb_enabled()?<br/>reads KB_ENABLED env or<br/>kb_config.yaml"]
        BOOT["_kb_bootstrap lite<br/>non-fatal on failure"]
        MAKE1["make kb-build-lite<br/>MODE=docker"]
        EXEC1["docker exec redamon-agent<br/>python -m knowledge_base<br/>.curation.data_ingestion<br/>--profile lite"]

        CMD1 --> CHK1
        CHK1 -->|yes| BOOT
        BOOT --> MAKE1
        MAKE1 --> EXEC1
        CHK1 -->|no| SKIP1["Skip KB, agent starts<br/>with Tavily-only"]
    end

    subgraph "Entry Point 2: Manual CLI"
        CMD2A["make kb-build-standard<br/>make kb-update-nvd<br/>make kb-rebuild-full"]
        CMD2B["./redamon.sh kb build standard<br/>./redamon.sh kb update nvd"]

        CMD2A --> EXEC2["docker exec redamon-agent<br/>python -m knowledge_base<br/>.curation.data_ingestion<br/>--profile/--source args"]
        CMD2B --> CMD2A
    end

    subgraph "Entry Point 3: kb-refresh sidecar"
        LOOP["Shell sleep-loop<br/>inside kb-refresh container"]
        DAILY["Every day: --source nvd"]
        WEEKLY["Mondays: --source exploitdb<br/>--source nuclei"]
        MONTHLY["1st: --source gtfobins<br/>--source lolbas"]

        LOOP --> DAILY
        LOOP --> WEEKLY
        LOOP --> MONTHLY
    end

    subgraph "Entry Point 4: Direct Python (local dev)"
        PY["python -m knowledge_base<br/>.curation.data_ingestion<br/>--profile lite<br/>--neo4j-uri bolt://localhost:7687"]
        VENV["Requires: .redamon-venv or<br/>active virtualenv with deps"]
        PY --- VENV
    end

    subgraph "Shared Execution Path"
        LOCK["Acquire .ingest.lock<br/>exclusive fcntl.flock"]
        RESOLVE["Resolve sources from<br/>profile or --source flag"]
        COMPONENTS["Create Embedder +<br/>FAISSIndexer + Neo4jLoader"]
        LOAD_EXISTING["Load existing FAISS index<br/>+ content-hash manifest"]

        PER_SOURCE["For each source:"]
        FETCH["client.fetch()<br/>Download data via safe_get<br/>or read local files"]
        CHUNKS["client.to_chunks()<br/>Parse into chunk dicts"]
        FILTER["Content-hash filter<br/>skip unchanged chunks"]
        EMBED["Embedder.embed_documents_batch()<br/>batch_size=64"]
        ADD["FAISSIndexer.add(vectors, ids)"]
        UPSERT["Neo4jLoader.upsert_chunks()<br/>MERGE by chunk_id"]

        SAVE["FAISSIndexer.save()<br/>index.faiss + manifest"]
        WRITE_MANIFEST["Write .manifest.json"]
        WRITE_MARKER["Write .last_ingest"]
        UNLOCK["Release .ingest.lock"]

        LOCK --> RESOLVE
        RESOLVE --> COMPONENTS
        COMPONENTS --> LOAD_EXISTING
        LOAD_EXISTING --> PER_SOURCE
        PER_SOURCE --> FETCH
        FETCH --> CHUNKS
        CHUNKS --> FILTER
        FILTER -->|changed| EMBED
        EMBED --> ADD
        ADD --> UPSERT
        FILTER -->|unchanged| PER_SOURCE
        UPSERT --> PER_SOURCE
        ADD --> SAVE
        SAVE --> WRITE_MANIFEST
        WRITE_MANIFEST --> WRITE_MARKER
        WRITE_MARKER --> UNLOCK
    end

    EXEC1 --> LOCK
    EXEC2 --> LOCK
    DAILY --> LOCK
    WEEKLY --> LOCK
    MONTHLY --> LOCK
    PY --> LOCK

What happens at each stage

StepWhat runsWhereCode location
User runs ./redamon.sh installStarts containers, then calls _kb_bootstrap liteHost shellredamon.sh:364-374
_kb_bootstrap calls make kb-build-liteRuns docker exec into agent containerHost -> containerMakefile:132-140
data_ingestion.main() startsParses CLI args, acquires lockAgent containerdata_ingestion.py:580-640
Embedder() createdLoads intfloat/e5-large-v2 model (pre-cached in image)Agent containerembedder.py:60-70
FAISSIndexer() createdPoints at /app/knowledge_base/data, loads existing index if presentAgent containerfaiss_indexer.py:80-95
Neo4jLoader() createdConnects to bolt://neo4j:7687, creates schemaAgent container -> Neo4jneo4j_loader.py:82-100
client.fetch() per sourceDownloads tarball/CSV/API data via safe_get()Agent container -> internet*_client.py
client.to_chunks()Parses raw data into chunk dicts with chunk_id, content, titleAgent container (CPU)*_client.py
_filter_unchanged()SHA-256 content hashes vs .manifest.json, drops unchangedAgent container (CPU)data_ingestion.py:430-490
embedder.embed_documents_batch()Runs sentence-transformer model on chunk texts, batch_size=64Agent container (CPU)embedder.py:130-160
faiss_indexer.add()Adds vectors + chunk_ids to in-memory FAISS indexAgent container (RAM)faiss_indexer.py:105-125
neo4j_loader.upsert_chunks()MERGE by chunk_id, dual-label (:KBChunk:SourceChunk)Agent -> Neo4jneo4j_loader.py:130-175
faiss_indexer.save()Atomic write index.faiss + chunk_ids.json + manifestAgent container (disk)faiss_indexer.py:130-175

Query Entry Points

Vector queries happen only when the LLM agent calls the web_search tool during a conversation. There is no other trigger. The flow starts from the user's chat message and passes through several layers before reaching the KB.

flowchart TD
    subgraph "User Interaction"
        USER["User sends message<br/>in webapp chat"]
        WEBAPP["Next.js webapp<br/>POST /api/chat"]
        AGENT_API["Agent HTTP API<br/>POST /invoke"]
    end

    subgraph "Agent LLM Loop"
        LLM["LLM (Claude/GPT)<br/>decides to call web_search"]
        TOOL_CALL["Tool call:<br/>web_search(query='sudo priv esc',<br/>include_sources=['gtfobins'],<br/>top_k=5)"]
    end

    subgraph "WebSearchToolManager"
        WSM["web_search tool closure<br/>tools.py"]
        CHK["KB attached?"]
        SRCS["Resolve sources<br/>include/exclude from args<br/>or project KB_ENABLED_SOURCES"]
        CLAMP["Clamp top_k to 1-20"]
    end

    subgraph "PentestKnowledgeBase.query()"

        EMB_Q["Embedder.embed_query()<br/>prepend 'query: ' prefix<br/>encode with e5-large-v2<br/>~30ms"]
        FAISS_S["FAISSIndexer.search()<br/>cosine similarity on<br/>normalized vectors<br/>returns (chunk_id, score)"]
        FT_S["Neo4jLoader.fulltext_search()<br/>Lucene keyword match<br/>on content + title fields<br/>~5ms"]
        RRF["RRF Fusion<br/>score = sum(1/(60 + rank_i))<br/>merge vector + keyword lists"]
        NEO_F["Neo4jLoader.filter_chunks()<br/>fetch metadata by chunk_ids<br/>apply source/CVSS/severity<br/>parameterized Cypher"]
        BOOST["Source Boost<br/>multiply score by boost factor<br/>tool_docs: 1.20, nvd: 0.90..."]
        RERANK["CrossEncoderReranker.rerank()<br/>top 30 candidates<br/>pre-truncate to 1920 chars<br/>logit -> sigmoid -> boost"]
        MMR_Q["MMR Diversity<br/>lambda=0.65<br/>greedy selection avoiding<br/>same-source pile-ups"]
    end

    subgraph "Sufficiency Decision"
        SUFF["top_result.score >= 0.35?"]
        KB_OK["Format KB results<br/>sanitize content<br/>wrap in UNTRUSTED frame"]
        TAVILY["Call Tavily API<br/>search_depth=advanced"]
        MERGE["Merge KB partial +<br/>Tavily results"]
        KB_ONLY["Return KB results<br/>with Tavily-unavailable note"]
        NO_RES["Return 'No results found'"]
    end

    subgraph "Back to Agent"
        RESULT["Tool result returned<br/>to LLM context"]
        LLM_NEXT["LLM processes results<br/>continues conversation"]
    end

    USER --> WEBAPP
    WEBAPP --> AGENT_API
    AGENT_API --> LLM
    LLM --> TOOL_CALL
    TOOL_CALL --> WSM
    WSM --> CHK
    CHK -->|yes| SRCS
    SRCS --> CLAMP
    CHK -->|no, Tavily only| TAVILY

    CLAMP --> EMB_Q
    EMB_Q --> FAISS_S
    CLAMP --> FT_S
    FAISS_S --> RRF
    FT_S --> RRF
    RRF --> NEO_F
    NEO_F --> BOOST
    BOOST --> RERANK
    RERANK --> MMR_Q
    MMR_Q --> SUFF

    SUFF -->|">= 0.35"| KB_OK
    KB_OK --> RESULT
    SUFF -->|"< 0.35"| TAVILY
    TAVILY -->|success + KB partial| MERGE
    MERGE --> RESULT
    TAVILY -->|fail + KB partial| KB_ONLY
    KB_ONLY --> RESULT
    TAVILY -->|success, no KB| RESULT
    TAVILY -->|fail, no KB| NO_RES
    NO_RES --> RESULT
    RESULT --> LLM_NEXT

Step-by-step: what happens when the agent searches

#StepComponentCode locationLatency
1User sends chat messageWebappwebapp/src/app/api/chat/--
2Webapp forwards to agent APIAgent HTTPorchestrator.py~ms
3LLM decides to call web_searchLLM inference(external API)~1-3s
4Tool invocation enters WebSearchToolManagerAgenttools.py:570-580--
5Check if KB is attachedAgenttools.py:588--
6Resolve include_sources from args or project settingsAgenttools.py:578-582--
7Clamp top_k to [1, 20]Agenttools.py:587--
8kb.query() startsKB Orchestratorkb_orchestrator.py:90--
9Embed query with e5-large-v2 (prepend "query: ")Embedderembedder.py:100-110~30ms
10FAISS inner-product search on normalized vectorsFAISSIndexerfaiss_indexer.py:110-130~5ms
11Neo4j Lucene fulltext search on content + titleNeo4jLoaderneo4j_loader.py:285-330~5ms
12RRF fusion merges vector + keyword ranked listsKB Orchestratorkb_orchestrator.py:240-270<1ms
13Neo4j metadata fetch + source/CVSS/severity filterNeo4jLoaderneo4j_loader.py:230-280~5ms
14Multiply scores by source boost factorsKB Orchestratorkb_orchestrator.py:145-155<1ms
15Cross-encoder rerank (top 30, sigmoid, re-boost)Rerankerreranker.py:130-190~200ms
16MMR diversity selection (top_k from pool)KB Orchestratorkb_orchestrator.py:280-340<1ms
17Sufficiency check: results[0].score >= 0.35KB Orchestratorkb_orchestrator.py:230-235--
18aIf sufficient: sanitize, frame, return KB resultsAgenttools.py:600-640--
18bIf insufficient: call Tavily, merge with partialsAgenttools.py:606-626~1-2s
19Tool result injected into LLM contextAgentorchestrator.py--

The LLM autonomously decides when to search based on the tool description in the system prompt. The tool registry (agentic/prompts/tool_registry.py) describes web_search as available for:

  • Looking up CVE details, exploit techniques, tool usage
  • Checking vulnerability databases
  • Finding attack methodologies or defense strategies
  • Researching specific security tools or protocols

The LLM can specify include_sources to target specific KB sources (e.g., ["gtfobins"] for Linux priv-esc, ["nuclei"] for template-based scanning) or leave it empty to search all sources.


Troubleshooting

KB not loading on agent startup

# Check if KB is enabled
docker exec redamon-agent env | grep KB_ENABLED

# Check if index files exist
ls -la knowledge_base/data/index.faiss knowledge_base/data/chunk_ids.json

# Check agent logs for KB init
docker logs redamon-agent 2>&1 | grep -i "knowledge\|kb\|faiss"

Rebuild the index from scratch

make kb-rebuild-lite   # or standard/full
docker compose build agent && docker compose up -d agent

NVD rate limiting

If NVD ingestion is slow, register for a free API key:

export NVD_API_KEY=your-key-here
make kb-update-nvd

Check index statistics

make kb-stats
# or
docker exec redamon-agent python -c "
from knowledge_base.faiss_indexer import FAISSIndexer
idx = FAISSIndexer('/app/knowledge_base/data')
idx.load()
print(f'Vectors: {idx.count()}')
"

Test KB query manually

docker exec -it redamon-agent python -c "
import asyncio
from tools import WebSearchToolManager
from knowledge_base import PentestKnowledgeBase
from knowledge_base.faiss_indexer import FAISSIndexer
from knowledge_base.neo4j_loader import Neo4jLoader
from knowledge_base.embedder import Embedder
from neo4j import GraphDatabase

embedder = Embedder('intfloat/e5-large-v2')
faiss = FAISSIndexer('/app/knowledge_base/data', dimensions=1024)
driver = GraphDatabase.driver('bolt://neo4j:7687', auth=('neo4j', 'changeme123'))
kb = PentestKnowledgeBase(faiss, Neo4jLoader(driver), embedder)
kb.load()

results = kb.query('sudo privilege escalation linux', top_k=5)
for r in results:
    print(f'{r[\"score\"]:.3f} [{r[\"source\"]}] {r[\"title\"]}')
"

Run unit tests

make kb-test
# or
pytest knowledge_base/tests -v