Skillspector Development Guide

June 30, 2026 · View on GitHub

This guide helps developers understand, run, test, and extend the LangGraph-based skillspector workflow.


1. Overview

skillspector is a LangGraph workflow that scans a skill directory (or zip) and produces a SARIF 2.1.0 report, risk score, and formatted output (terminal, JSON, Markdown, or SARIF). It is the graph/engine for security analysis of AI agent skills.

Entry points are:

  • CLI — run skillspector scan <path-or-url> (supports Git URL, file URL, .zip, .md file, or directory). Use --format terminal|json|markdown|sarif, --output FILE, --no-llm. See skillspector --help.
  • LangGraph dev server — run make langgraph-dev to start the dev server and open LangGraph Studio in your browser. In Studio you can view the graph and run it with custom inputs (e.g. skill_path, output_format, use_llm).
  • Programmaticfrom skillspector import graph and call graph.invoke(...) or graph.stream(...).

Data flow (one sentence): resolve_input (input_path or skill_path → skill_path, optional temp_dir_for_cleanup) → build context → parallel analyzers → meta_analyzer (LLM filter/enrich when use_llm is True) → report (SARIF + risk score + report_body from output_format). Caller cleans up temp_dir_for_cleanup after invoke when set.


2. Prerequisites and setup

To get started: create and activate a virtual environment, then install. All Makefile targets assume the venv is already created and activated.

# Create venv (use either uv or Python)
uv venv .venv
# or: python3 -m venv .venv

source .venv/bin/activate   # On Windows: .venv\Scripts\activate

make install-dev
  • Python: 3.12+ (see pyproject.toml). make install and make install-dev use uv if available (uv sync / uv sync --all-extras), otherwise pip (pip install -e . / pip install -e ".[dev]"). You must create and activate the virtual environment yourself before running any make target.

  • Environment: Optional .env in the project root. The LangGraph dev server loads it (see langgraph.json "env": ".env"). Key variables:

    • SKILLSPECTOR_PROVIDER: Selects the active LLM provider — openai, anthropic, or nv_build. Defaults to nv_build when unset.
    • Provider credential: depends on the active provider — NVIDIA_INFERENCE_KEY (NVIDIA), OPENAI_API_KEY (OpenAI), or ANTHROPIC_API_KEY (Anthropic). See llm_utils.py.
    • OPENAI_BASE_URL: Override the OpenAI endpoint (e.g. point at Ollama).
    • SKILLSPECTOR_MODEL: Override default model; see constants.py.
  • Logging: Internal/operational logging uses the stdlib logging module. User-facing output (report body, errors, progress) uses Rich console.print().

    • Env: SKILLSPECTOR_LOG_LEVEL (DEBUG, INFO, WARNING, ERROR). Default is "WARNING" (defined in constants.py).
    • CLI: --verbose / -V sets internal logging to DEBUG for that run.
    • In code: from skillspector.logging_config import get_logger; logger = get_logger(__name__).

3. Make targets

All targets assume the virtual environment is already created and activated. See Makefile for the full list.

TargetDescription
make helpShow available targets
make installInstall the package in production mode
make install-devInstall the package with development dependencies
make langgraph-devRun LangGraph dev server (opens Studio at LANGGRAPH_STUDIO_URL)
make testRun tests
make test-covRun tests with coverage report (HTML + terminal)
make lintRun linters (ruff only)
make formatFormat code with ruff (check + fix, then format)
make cleanRemove build artifacts and cache files
make buildBuild the package

4. Architecture and graph structure

State

state.py defines SkillspectorState (TypedDict, total=False). Key fields:

FieldDescription
input_pathRaw input (URL, zip path, file path, or directory); consumed by resolve_input
skill_pathResolved local directory path (set by resolve_input)
temp_dir_for_cleanupSet by resolve_input when URL/zip/file was resolved; caller must clean up after invoke
zip_bytes, modeOptional zip input and scan mode
componentsList of relative file paths in the skill
file_cacheMap of path → file contents
ast_cacheMap of path → AST representation (for future use)
manifest, previous_manifestParsed skill metadata (e.g. from SKILL.md)
component_metadataList of dicts: path, type, lines, executable, size_bytes (from build_context)
has_executable_scriptsTrue if any component has executable extension (e.g. .py, .sh); used for risk multiplier
output_formatRequested report format: terminal, json, markdown, or sarif
report_bodyFormatted report string (set by report node from output_format)
use_llmWhen False, meta_analyzer skips LLM and uses fallback (e.g. for --no-llm)
baselineLoaded suppression.Baseline (set by CLI/API from --baseline); report node drops matching findings before scoring
show_suppressedWhen True, baseline-suppressed findings are listed in the report (still excluded from the risk score)
suppressed_findingsList of SuppressedFinding (finding + reason) produced by the report node
findingsAll raw findings from analyzers (reducer: operator.add)
filtered_findingsFindings after meta_analyzer
model_configOptional model IDs per node (e.g. default, meta_analyzer)
risk_severitySeverity band from risk score: LOW, MEDIUM, HIGH, CRITICAL
risk_recommendationSAFE, CAUTION, or DO_NOT_INSTALL (from report node)
sarif_reportFinal SARIF 2.1.0 dict
risk_scoreNumeric risk score (0–100)

Graph

The graph is built in graph.py via create_graph() and exposed as graph from the package (init.py).

Flow diagram

flowchart LR
  START --> resolve_input
  resolve_input --> build_context
  build_context --> analyzers
  subgraph analyzers [Analyzers — run in parallel]
    static_all[static_*]
    behavioral[behavioral_*]
    mcp[mcp_*]
    semantic[semantic_*]
  end
  analyzers --> meta_analyzer
  meta_analyzer --> report
  report --> END

There are no conditional edges: after resolve_inputbuild_context, all analyzer nodes run in parallel (fan-out); they all feed into meta_analyzer (fan-in), then reportEND.

Nodes

NodeRoleSource
resolve_inputConsumes input_path or skill_path; resolves URLs/zips/files via InputHandler; sets skill_path and (when needed) temp_dir_for_cleanupresolve_input.py
build_contextReads skill_path, populates components, file_cache, ast_cache, manifest, component_metadata, has_executable_scriptsbuild_context.py
Analyzers22 nodes; each returns AnalyzerNodeResponse (list of Finding). State reducer appends to findings.nodes/analyzers/init.py (ANALYZER_NODE_IDS, ANALYZER_NODES)
meta_analyzerPer-file LLM filter/enrich of findingsfiltered_findings via LLMMetaAnalyzer; one LLM call per file (or per chunk for oversized files); token budgets from constants.py; falls back when use_llm is Falsemeta_analyzer.py, llm_analyzer_base.py
reportApplies baseline suppression (state["baseline"]), then builds SARIF 2.1.0, computes risk_score, risk_severity, risk_recommendation from the non-suppressed findings; writes report_body from output_format (terminal/json/markdown/sarif)report.py

5. Package layout

PathPurpose
Root
graph.pyBuilds and compiles the LangGraph workflow
state.pySkillspectorState, AnalyzerNodeResponse, MetaAnalyzerResponse
models.pyFinding, AnalyzerFinding, Location, Severity, AnalyzerPlugin
constants.pyEnv-driven config: inference URL, default model, MODELS dict, token budgets (get_max_input_tokens, get_max_output_tokens)
llm_utils.pychat_completion() for OpenAI-compatible / NVIDIA Inference API
cli.pyTyper app: scan (with input resolution, --format, --no-llm), --version
input_handler.pyResolves Git URL, file URL, .zip, single file, or directory to a local directory path
suppression.pyBaseline / false-positive suppression: Baseline, SuppressionRule, load_baseline, partition_findings, finding_fingerprint, build_baseline_dict (see SUPPRESSION.md)
__init__.pyPackage version (from pyproject.toml via importlib.metadata)
sarif_models.pySARIF 2.1.0 Pydantic models and validate_sarif_report()
nodes/
build_context.pyBuild-context node
llm_analyzer_base.pyBase LLM analyzer with per-file/per-chunk batching (LLMAnalyzerBase, LLMMetaAnalyzer, Batch)
meta_analyzer.pyMeta-analyzer node (uses LLMMetaAnalyzer for per-file LLM calls)
report.pyReport node
nodes/analyzers/
__init__.pyRegistry: ANALYZER_NODE_IDS, ANALYZER_NODES
common.pyShared analyzer helpers (line/context extraction, AST name resolution)
static_runner.pyRuns static patterns; converts AnalyzerFindingFinding
pattern_defaults.pyShared pattern metadata (category, explanation, remediation)
static_yara.pyYARA-based static analyzer
osv_client.pyOSV.dev API client for live vulnerability lookups (SC4); batch queries with caching and fallback
static_patterns_*.py14 pattern-based analyzers (prompt_injection, data_exfiltration, anti_refusal, etc.)
behavioral_ast.pyAST-based behavioral analyzer (AST1–AST8): detects exec, eval, subprocess, os.system, compile, dynamic import/getattr, and dangerous execution chains
behavioral_taint_tracking.pyTaint-tracking behavioral analyzer (TT1–TT5): source→sink data-flow analysis over Python AST
mcp_least_privilege.py, mcp_tool_poisoning.pyMCP analyzers (LP1–LP4 least-privilege; TP1–TP4 tool poisoning)
mcp_rug_pull.pyMCP rug-pull analyzer (RP1–RP3): detects manifest/tool-definition changes between scans
semantic_security_discovery.py, semantic_developer_intent.py, semantic_quality_policy.pySemantic (LLM) analyzers; emit findings only when use_llm is enabled

6. Running the workflow

LangGraph dev server (primary for development)

Running make langgraph-dev starts the LangGraph dev server and opens LangGraph Studio in your browser (the Studio URL is configurable via the LANGGRAPH_STUDIO_URL variable in the Makefile; defaults to public LangSmith). In Studio you can:

  • View the graph — See the workflow as a diagram: nodes (resolve_input, build_context, analyzers, meta_analyzer, report) and edges. Useful for understanding flow and debugging.
  • Run the graph interactively — Select the skillspector_scan graph, provide an input (e.g. {"input_path": "/path/to/your/skill"} or {"skill_path": "/path/to/your/skill"}), and execute a run. You can inspect state after each step and see the final sarif_report and risk_score.

Setup: langgraph.json defines the graph skillspector_scan at ./src/skillspector/graph.py:graph and loads .env. Provide input_path (URL, zip, file, or directory) or skill_path (local directory). If the graph resolves a URL/zip/file, it sets temp_dir_for_cleanup; the caller should clean up that directory after invoke.

CLI

After creating/activating the venv and running make install-dev (or pip install -e ".[dev]"), the skillspector CLI is available:

skillspector scan ./my-skill/                    # terminal output
skillspector scan ./my-skill/ --format json -o report.json
skillspector scan https://github.com/user/repo   # Git URL (clones to temp dir)
skillspector scan ./skill.zip --no-llm          # static analysis only
skillspector --version

The CLI passes input_path to the graph. The resolve_input node (using input_handler.py) resolves Git URL, file URL, .zip, single .md file, or directory to a local directory and sets skill_path (and temp_dir_for_cleanup when a temp dir was created). The CLI cleans up temp_dir_for_cleanup after invoke. Exit code 1 if risk_score > 50; exit code 2 on error. See Integrating SkillSpector for the full exit-code and JSON contract.

Programmatic

from skillspector import graph

result = graph.invoke({
    "input_path": "/path/to/skill",  # or use "skill_path" for a local dir
    "output_format": "json",   # optional: terminal, json, markdown, sarif (default sarif)
    "use_llm": True,           # optional: False to skip LLM in meta_analyzer
})
# Or: graph.stream(...)

Optional state keys: mode, model_config, output_format, use_llm. The result includes findings, filtered_findings, sarif_report, risk_score, risk_severity, risk_recommendation, and report_body (formatted string for the requested output_format).


7. Testing

  • Location:
    • tests/unit/: test_cli.py, test_input_handler.py, test_patterns.py, test_sarif.py
    • tests/integration/: test_graph.py, test_graph_scanner.py, test_meta_analyzer_use_llm.py
    • tests/nodes/: test_build_context.py, test_resolve_input.py, test_report.py, test_llm_analyzer_base.py
    • tests/nodes/analyzers/: analyzer tests (test_registry.py, test_static_patterns.py)
  • Commands: make test, make test-cov.
  • Key tests: test_graph.py invokes the graph and asserts findings, sarif_report, risk_score, report_body; test_input_handler.py covers directory, zip, and single-file resolution; test_resolve_input.py covers the resolve_input node; test_build_context.py asserts component_metadata and has_executable_scripts.

8. Data models

  • Finding (models.py): rule_id, message, severity, confidence, file, start_line, end_line, category, pattern, finding, explanation, remediation, code_snippet, intent, tags, context, matched_text. This is the type stored in state and used in SARIF and JSON report output.
  • AnalyzerFinding: Analyzer-facing type with Location and Severity enum. Convert to Finding via static_runner.analyzer_finding_to_finding (or equivalent).
  • SARIF: sarif_models.py provides Pydantic models for SARIF 2.1.0. The report node builds a SarifLog from filtered_findings.

9. Adding or modifying analyzer nodes

Registering an analyzer

  1. Add the node id to ANALYZER_NODE_IDS and the implementation to ANALYZER_NODES in nodes/analyzers/init.py.
  2. No change to graph.py is required: edges from build_context to each analyzer and from each analyzer to meta_analyzer are added in a loop using ANALYZER_NODE_IDS.

Node signature

  • Input: state: SkillspectorState (or dict[str, object]).
  • Output: AnalyzerNodeResponse — a dict with key "findings" and value list[Finding].

Static pattern analyzers

Use static_runner.run_static_patterns with one or more pattern modules. Each module must provide:

  • analyze(content: str, file_path: str, file_type: str) -> list[AnalyzerFinding]

Use pattern_defaults for category and remediation. Examples: static_patterns_prompt_injection.py, static_patterns_data_exfiltration.py.

Placeholder analyzers

Return {"findings": []}. All analyzer nodes are currently implemented; use this pattern for any new placeholder analyzer added before its detection logic lands. The LLM-backed semantic analyzers also return {"findings": []} when use_llm is False.


10. Environment and configuration

.env

Copy .env.example to .env in the project root and set values as needed. The LangGraph dev server loads .env (see langgraph.json).

VariableDescriptionExample
SKILLSPECTOR_PROVIDERActive LLM provider: openai | anthropic | nv_build | claude_cli | codex_cli. Defaults to nv_build.claude_cli
NVIDIA_INFERENCE_KEYCredential for nv_build.nvapi-...
OPENAI_API_KEYCredential for SKILLSPECTOR_PROVIDER=openai. Also tier-2 fallback for non-OpenAI providers.sk-...
OPENAI_BASE_URLOverride the OpenAI endpoint (e.g. point at Ollama).http://localhost:11434/v1
ANTHROPIC_API_KEYCredential for SKILLSPECTOR_PROVIDER=anthropic.sk-ant-...
SKILLSPECTOR_MODELOverride the active provider's bundled default model (see README.md for per-provider defaults). For claude_cli, this is passed as --model to the claude binary.gpt-5.2

CLI providers (claude_cli, codex_cli): no credential env var is needed. Authentication is managed by the agent CLI's own session (claude auth login / codex login). The subprocess is heavily sandboxed — see providers/_agent_cli.py.

Live provider tests

The manual test-provider CI job and local make test-provider target perform live requests against provider default endpoints. Missing provider keys print a WARNING: line before pytest runs and skip that provider. In CI, missing keys also make the manual job exit with the configured warning code so GitLab displays the job as passed with warnings; if a key is present but invalid, or the provider request fails, the corresponding test fails.

CommandRequired env varDefault URLOptional model override
make test-provider openaiOPENAI_API_KEYhttps://api.openai.com/v1SKILLSPECTOR_OPENAI_TEST_MODEL
make test-provider anthropicANTHROPIC_API_KEYhttps://api.anthropic.comSKILLSPECTOR_ANTHROPIC_TEST_MODEL
make test-provider nv_buildNVIDIA_INFERENCE_KEYhttps://integrate.api.nvidia.com/v1SKILLSPECTOR_NV_BUILD_TEST_MODEL
make test-providerAny/all of the provider keys aboveAll provider default URLs aboveAny/all provider model overrides above

Base URL env vars are not needed for live provider tests; the tests intentionally use provider defaults.

Constants, token budgets, and LLM

  • Constants (constants.py): _SKILLSPECTOR_DEFAULT_MODEL, MODEL_CONFIG (per-node model selection), MAX_INPUT_TOKENS_PCT (0.75), DEFAULT_CONTEXT_LENGTH (128k fallback).

    • get_max_input_tokens(model) — input budget per LLM request (75% of resolved context window).
    • get_max_output_tokens(model) — output budget per LLM request (min of 25% context, registry's max_output_tokens cap if set).
    • Batch budget overhead is computed per-prompt via estimate_tokens(base_prompt) rather than a fixed constant.
  • Providers (providers/): pluggable credential + token-budget resolvers. Each provider is a subpackage with its own provider.py and bundled model_registry.yaml; registry.py exposes lookup_context_length / lookup_max_output_tokens utilities the providers call directly. The active provider is chosen by SKILLSPECTOR_PROVIDER (default: nv_build):

    • nv_build/ — build.nvidia.com (HTTP, NVIDIA_INFERENCE_KEY)
    • openai/ — api.openai.com or any OpenAI-compatible URL (OPENAI_API_KEY)
    • anthropic/ — api.anthropic.com (ANTHROPIC_API_KEY)
    • claude_cli/local claude binary; no API key. Uses the CLI's own auth session (claude auth login). Set SKILLSPECTOR_PROVIDER=claude_cli.
    • codex_cli/local codex binary; no API key. Uses the CLI's own auth session (codex login). Set SKILLSPECTOR_PROVIDER=codex_cli.

    CLI providers (claude_cli, codex_cli) implement the optional AgentCLICapable interface (is_available() + complete()) defined in providers/base.py. has_cli_capability(provider) detects this at runtime. All subprocess calls go through the hardened helper providers/_agent_cli.py which enforces: no shell (shell=False), untrusted content via stdin only, capability stripping (tools disabled / sandboxed), environment scrubbing (no API keys forwarded), per-call timeout, and fail-closed error handling.

  • LLM calls (llm_utils.py): get_chat_model() and chat_completion() dispatch based on the active provider:

    • HTTP providers: resolve credentials in two tiers — active provider (NVIDIA_INFERENCE_KEY / ANTHROPIC_API_KEY / OPENAI_API_KEY → endpoint) — against any OpenAI-compatible endpoint. max_tokens is auto-bound to get_max_output_tokens(model) from model_info.
    • CLI providers (claude_cli, codex_cli): get_chat_model() returns an AgentCLIChatModel adapter backed by provider.complete(), so the analyzers' .invoke() / .with_structured_output(schema).invoke() calls work with no API key (structured output is produced by prompting for JSON, then Pydantic-validating). chat_completion() routes through get_chat_model() as well. is_llm_available() calls provider.is_available() instead of credential resolution.
  • LLM analyzer base (llm_analyzer_base.py): LLMAnalyzerBase provides per-file/per-chunk batching, token-budget-aware chunking, and a run loop for all LLM-based analyzers. LLMMetaAnalyzer extends it for filter/enrich (meta_analyzer node). Future semantic analyzers extend LLMAnalyzerBase for discovery mode.


11. Linting and formatting

  • Format: make format — Ruff check with auto-fix and Ruff format.
  • Lint: make lint — Ruff check.
  • Config: pyproject.toml (Ruff line-length 100, target Python 3.12).

12. Quick reference

TaskCommand or action
Get startedCreate venv (uv venv .venv or python3 -m venv .venv), then source .venv/bin/activate, then make install-dev. Re-activate venv in each new terminal.
Run workflowskillspector scan <path> for CLI; make langgraph-dev for LangGraph Studio; or graph.invoke({"input_path": "...", "output_format": "json"}) (or skill_path) programmatically
Add analyzerImplement node returning {"findings": list[Finding]}, register in nodes/analyzers/__init__.py
Run testsmake test; key integration test: tests/integration/test_graph.py