Data layout
April 30, 2026 · View on GitHub
data/ is deepsec's on-disk state. Each project owns a subdirectory; the
files inside are append-only across runs.
data/<projectId>/
├── project.json # rootPath, githubUrl (auto-managed)
├── INFO.md # repo context injected into AI prompts
├── config.json # priorityPaths, promptAppend, ignorePaths (optional)
├── files/ # one JSON per scanned source file (FileRecord)
│ └── path/to/source.ts.json
├── runs/ # one JSON per run (RunMeta)
│ └── 20260429215021-19ac.json
└── reports/ # generated markdown + JSON reports
data/ is gitignored by default. To version it (CI, sharing across
machines), commit it explicitly.
The schemas below are the source of truth for any tool that reads
data/ directly. They live in
packages/core/src/types.ts.
project.json — ProjectConfig
Auto-written on first scan; safe to edit by hand.
| Field | Type | Purpose |
|---|---|---|
projectId | string | Matches the directory name. |
rootPath | string | Absolute path to the codebase. Updated each scan with the most recent --root. |
createdAt | string (ISO) | Project init time. |
githubUrl | string? | https://github.com/owner/repo/blob/branch — used for clickable links in exports. Auto-detected from git remote if not set. |
config.json — per-project overrides
Optional. Read by scan and the AI agents.
| Field | Type | Purpose |
|---|---|---|
priorityPaths | string[] | Path prefixes processed first. |
promptAppend | string | Free-form text appended to the system prompt for this project. |
ignorePaths | string[] | Glob patterns to skip during scan. |
INFO.md
Free-form markdown injected into the AI prompt for process,
triage, and revalidate. See
getting-started.md for the agent prompt that
writes a good one.
files/.json — FileRecord
FileRecordThe core per-file accumulator. Every stage adds to this record;
nothing is overwritten. Re-scanning merges new candidates.
Re-processing appends to analysisHistory. Revalidation annotates
findings rather than replacing them.
The on-disk path mirrors the source path under <rootPath> plus a
.json suffix (src/api/auth.ts → files/src/api/auth.ts.json).
Top-level fields
| Field | Type | Purpose |
|---|---|---|
filePath | string | Path relative to rootPath. |
projectId | string | The owning project. |
candidates | CandidateMatch[] | Regex matcher hits — see below. |
lastScannedAt | string (ISO) | Most recent scan timestamp. |
lastScannedRunId | string | runId of the scan that last touched this file. |
fileHash | string (sha-256) | Source content hash at last scan. |
findings | Finding[] | Latest set of AI-produced findings. |
analysisHistory | AnalysisEntry[] | Append-only log of every AI investigation. |
gitInfo | object? | Git committer info + ownership data, written by enrich. |
status | "pending" | "processing" | "analyzed" | "error" | Lifecycle state. |
lockedByRunId | string? | When non-empty, a run holds this file. Cleared on completion. |
CandidateMatch
| Field | Type | Purpose |
|---|---|---|
vulnSlug | string | Matcher slug that fired. |
lineNumbers | number[] | 1-indexed source lines. |
snippet | string | Short excerpt around the first match. |
matchedPattern | string | Human-readable label of the regex (the matcher's label). |
Finding
| Field | Type | Purpose |
|---|---|---|
severity | "CRITICAL" | "HIGH" | "MEDIUM" | "HIGH_BUG" | "BUG" | "LOW" | See README severity table. |
vulnSlug | string | Matcher slug or other-<topic> if no matcher fits. |
title | string | One-sentence summary. |
description | string | Full explanation. |
lineNumbers | number[] | 1-indexed lines. |
recommendation | string | Suggested fix. |
confidence | "high" | "medium" | "low" | The agent's self-rated confidence. |
triage | Triage? | Set by triage. |
revalidation | Revalidation? | Set by revalidate. |
Triage (set by deepsec triage)
| Field | Type | Purpose |
|---|---|---|
priority | "P0" | "P1" | "P2" | "skip" | Recommended action bucket. |
exploitability | "trivial" | "moderate" | "difficult" | Effort to weaponize. |
impact | "critical" | "high" | "medium" | "low" | Blast radius if exploited. |
reasoning | string | Why this priority. |
triagedAt | string (ISO) | Timestamp. |
model | string | Model used for triage. |
Revalidation (set by deepsec revalidate)
| Field | Type | Purpose |
|---|---|---|
verdict | "true-positive" | "false-positive" | "fixed" | "uncertain" | Re-checked verdict. |
reasoning | string | Why this verdict. Includes git-history evidence if fixed. |
adjustedSeverity | Severity? | Set if revalidation re-rates the finding. |
revalidatedAt | string (ISO) | Timestamp. |
runId | string | runId of the revalidate run. |
model | string | Model used. |
AnalysisEntry
One per AI investigation of this file. Append-only — nothing is ever deleted.
| Field | Type | Purpose |
|---|---|---|
runId | string | The producing run. |
investigatedAt | string (ISO) | Timestamp. |
durationMs | number | Wall-clock total. |
durationApiMs | number? | API time only (excludes process orchestration). |
agentType | string | claude-agent-sdk or codex. |
model | string | Model identifier. |
modelConfig | Record<string, unknown> | Provider-specific settings echoed back. |
agentSessionId | string? | The agent's session/thread id, for reproducing or replaying. |
findingCount | number | Findings produced in this entry. |
numTurns | number? | Conversation turn count. |
costUsd | number? | Estimated USD cost. |
usage | { inputTokens, outputTokens, cacheReadInputTokens, cacheCreationInputTokens }? | Token accounting. |
refusal | RefusalReport? | See models.md. |
codexStderr | string? | Captured codex stderr when an investigation produced 0 output tokens (forensic only). |
reinvestigateMarker | number? | Wave marker from --reinvestigate <N>. |
RefusalReport
| Field | Type | Purpose |
|---|---|---|
refused | boolean | True if the agent skipped or declined any part of the investigation. |
reason | string? | Free-form reason if refused. |
skipped | Array<{ filePath?: string; reason: string }>? | Per-file skip reasons. |
raw | string? | Trimmed raw model response to the follow-up question, for debugging. |
gitInfo (set by deepsec enrich)
| Field | Type | Purpose |
|---|---|---|
recentCommitters | Array<{ name, email, date }> | Top contributors over the file's recent history. |
enrichedAt | string (ISO) | Last enrich timestamp. |
ownership | OwnershipData? | If an ownership plugin is active, structured ownership/escalation data. |
status lifecycle
pending -- scan finished, awaits AI
↓
processing -- a run is currently investigating (lockedByRunId set)
↓
analyzed -- AnalysisEntry appended; findings updated
error is set if the agent crashed mid-investigation. Re-running
process will retry error and pending files.
runs/.json — RunMeta
RunMetaOne per scan / process / revalidate invocation. Used for status
reporting (deepsec status) and for filtering exports by run.
| Field | Type | Purpose |
|---|---|---|
runId | string | <YYYYMMDDHHMMSS>-<rand4>. Sortable. |
projectId | string | Owning project. |
rootPath | string | Resolved root for the run. |
createdAt | string (ISO) | Run start. |
completedAt | string? (ISO) | Run end (absent while running). |
type | "scan" | "process" | "revalidate" | Stage. |
phase | "running" | "done" | "error" | Terminal status. |
scannerConfig | { matcherSlugs }? | Set on scan runs. |
processorConfig | { agentType, model, modelConfig }? | Set on process / revalidate runs. |
stats | object | Counters: filesScanned, candidatesFound, findingsCount, totalCostUsd, truePositives, falsePositives, … |
reports/
Generated by deepsec report. One markdown per project plus a JSON
summary. Re-running report overwrites; nothing here is incremental.
Reading data/ directly
The append-only model means a few patterns work well:
- Find every TP HIGH+ finding across a project:
jq -r '. as $r | $r.findings[] | select(.revalidation.verdict=="true-positive") | select(.severity=="HIGH" or .severity=="CRITICAL") | [$r.filePath, .severity, .title] | @tsv' data/<id>/files/**/*.json - Total spend on a project:
jq -s 'map(.analysisHistory[].costUsd // 0) | add' data/<id>/files/**/*.json - Files still pending after a run:
jq -r 'select(.status=="pending") | .filePath' data/<id>/files/**/*.json
For richer queries, prefer deepsec export --format json — it applies
filters consistently with the rest of the CLI.