Data layout

April 30, 2026 · View on GitHub

data/ is deepsec's on-disk state. Each project owns a subdirectory; the files inside are append-only across runs.

data/<projectId>/
├── project.json              # rootPath, githubUrl (auto-managed)
├── INFO.md                   # repo context injected into AI prompts
├── config.json               # priorityPaths, promptAppend, ignorePaths (optional)
├── files/                    # one JSON per scanned source file (FileRecord)
│   └── path/to/source.ts.json
├── runs/                     # one JSON per run (RunMeta)
│   └── 20260429215021-19ac.json
└── reports/                  # generated markdown + JSON reports

data/ is gitignored by default. To version it (CI, sharing across machines), commit it explicitly.

The schemas below are the source of truth for any tool that reads data/ directly. They live in packages/core/src/types.ts.

project.json — ProjectConfig

Auto-written on first scan; safe to edit by hand.

FieldTypePurpose
projectIdstringMatches the directory name.
rootPathstringAbsolute path to the codebase. Updated each scan with the most recent --root.
createdAtstring (ISO)Project init time.
githubUrlstring?https://github.com/owner/repo/blob/branch — used for clickable links in exports. Auto-detected from git remote if not set.

config.json — per-project overrides

Optional. Read by scan and the AI agents.

FieldTypePurpose
priorityPathsstring[]Path prefixes processed first.
promptAppendstringFree-form text appended to the system prompt for this project.
ignorePathsstring[]Glob patterns to skip during scan.

INFO.md

Free-form markdown injected into the AI prompt for process, triage, and revalidate. See getting-started.md for the agent prompt that writes a good one.

files/.json — FileRecord

The core per-file accumulator. Every stage adds to this record; nothing is overwritten. Re-scanning merges new candidates. Re-processing appends to analysisHistory. Revalidation annotates findings rather than replacing them.

The on-disk path mirrors the source path under <rootPath> plus a .json suffix (src/api/auth.tsfiles/src/api/auth.ts.json).

Top-level fields

FieldTypePurpose
filePathstringPath relative to rootPath.
projectIdstringThe owning project.
candidatesCandidateMatch[]Regex matcher hits — see below.
lastScannedAtstring (ISO)Most recent scan timestamp.
lastScannedRunIdstringrunId of the scan that last touched this file.
fileHashstring (sha-256)Source content hash at last scan.
findingsFinding[]Latest set of AI-produced findings.
analysisHistoryAnalysisEntry[]Append-only log of every AI investigation.
gitInfoobject?Git committer info + ownership data, written by enrich.
status"pending" | "processing" | "analyzed" | "error"Lifecycle state.
lockedByRunIdstring?When non-empty, a run holds this file. Cleared on completion.

CandidateMatch

FieldTypePurpose
vulnSlugstringMatcher slug that fired.
lineNumbersnumber[]1-indexed source lines.
snippetstringShort excerpt around the first match.
matchedPatternstringHuman-readable label of the regex (the matcher's label).

Finding

FieldTypePurpose
severity"CRITICAL" | "HIGH" | "MEDIUM" | "HIGH_BUG" | "BUG" | "LOW"See README severity table.
vulnSlugstringMatcher slug or other-<topic> if no matcher fits.
titlestringOne-sentence summary.
descriptionstringFull explanation.
lineNumbersnumber[]1-indexed lines.
recommendationstringSuggested fix.
confidence"high" | "medium" | "low"The agent's self-rated confidence.
triageTriage?Set by triage.
revalidationRevalidation?Set by revalidate.

Triage (set by deepsec triage)

FieldTypePurpose
priority"P0" | "P1" | "P2" | "skip"Recommended action bucket.
exploitability"trivial" | "moderate" | "difficult"Effort to weaponize.
impact"critical" | "high" | "medium" | "low"Blast radius if exploited.
reasoningstringWhy this priority.
triagedAtstring (ISO)Timestamp.
modelstringModel used for triage.

Revalidation (set by deepsec revalidate)

FieldTypePurpose
verdict"true-positive" | "false-positive" | "fixed" | "uncertain"Re-checked verdict.
reasoningstringWhy this verdict. Includes git-history evidence if fixed.
adjustedSeveritySeverity?Set if revalidation re-rates the finding.
revalidatedAtstring (ISO)Timestamp.
runIdstringrunId of the revalidate run.
modelstringModel used.

AnalysisEntry

One per AI investigation of this file. Append-only — nothing is ever deleted.

FieldTypePurpose
runIdstringThe producing run.
investigatedAtstring (ISO)Timestamp.
durationMsnumberWall-clock total.
durationApiMsnumber?API time only (excludes process orchestration).
agentTypestringclaude-agent-sdk or codex.
modelstringModel identifier.
modelConfigRecord<string, unknown>Provider-specific settings echoed back.
agentSessionIdstring?The agent's session/thread id, for reproducing or replaying.
findingCountnumberFindings produced in this entry.
numTurnsnumber?Conversation turn count.
costUsdnumber?Estimated USD cost.
usage{ inputTokens, outputTokens, cacheReadInputTokens, cacheCreationInputTokens }?Token accounting.
refusalRefusalReport?See models.md.
codexStderrstring?Captured codex stderr when an investigation produced 0 output tokens (forensic only).
reinvestigateMarkernumber?Wave marker from --reinvestigate <N>.

RefusalReport

FieldTypePurpose
refusedbooleanTrue if the agent skipped or declined any part of the investigation.
reasonstring?Free-form reason if refused.
skippedArray<{ filePath?: string; reason: string }>?Per-file skip reasons.
rawstring?Trimmed raw model response to the follow-up question, for debugging.

gitInfo (set by deepsec enrich)

FieldTypePurpose
recentCommittersArray<{ name, email, date }>Top contributors over the file's recent history.
enrichedAtstring (ISO)Last enrich timestamp.
ownershipOwnershipData?If an ownership plugin is active, structured ownership/escalation data.

status lifecycle

pending     -- scan finished, awaits AI

processing  -- a run is currently investigating (lockedByRunId set)

analyzed    -- AnalysisEntry appended; findings updated

error is set if the agent crashed mid-investigation. Re-running process will retry error and pending files.

runs/.json — RunMeta

One per scan / process / revalidate invocation. Used for status reporting (deepsec status) and for filtering exports by run.

FieldTypePurpose
runIdstring<YYYYMMDDHHMMSS>-<rand4>. Sortable.
projectIdstringOwning project.
rootPathstringResolved root for the run.
createdAtstring (ISO)Run start.
completedAtstring? (ISO)Run end (absent while running).
type"scan" | "process" | "revalidate"Stage.
phase"running" | "done" | "error"Terminal status.
scannerConfig{ matcherSlugs }?Set on scan runs.
processorConfig{ agentType, model, modelConfig }?Set on process / revalidate runs.
statsobjectCounters: filesScanned, candidatesFound, findingsCount, totalCostUsd, truePositives, falsePositives, …

reports/

Generated by deepsec report. One markdown per project plus a JSON summary. Re-running report overwrites; nothing here is incremental.

Reading data/ directly

The append-only model means a few patterns work well:

  • Find every TP HIGH+ finding across a project:
    jq -r '. as $r | $r.findings[] | select(.revalidation.verdict=="true-positive") | select(.severity=="HIGH" or .severity=="CRITICAL") | [$r.filePath, .severity, .title] | @tsv' data/<id>/files/**/*.json
    
  • Total spend on a project:
    jq -s 'map(.analysisHistory[].costUsd // 0) | add' data/<id>/files/**/*.json
    
  • Files still pending after a run:
    jq -r 'select(.status=="pending") | .filePath' data/<id>/files/**/*.json
    

For richer queries, prefer deepsec export --format json — it applies filters consistently with the rest of the CLI.