Data layout

April 30, 2026 · View on GitHub

data/ is deepsec's on-disk state. Each project owns a subdirectory; the files inside are append-only across runs.

data/<projectId>/
├── project.json              # rootPath, githubUrl (auto-managed)
├── INFO.md                   # repo context injected into AI prompts
├── config.json               # priorityPaths, promptAppend, ignorePaths (optional)
├── files/                    # one JSON per scanned source file (FileRecord)
│   └── path/to/source.ts.json
├── runs/                     # one JSON per run (RunMeta)
│   └── 20260429215021-19ac.json
└── reports/                  # generated markdown + JSON reports

data/ is gitignored by default. To version it (CI, sharing across machines), commit it explicitly.

The schemas below are the source of truth for any tool that reads data/ directly. They live in packages/core/src/types.ts.

project.json — `ProjectConfig`

Auto-written on first scan; safe to edit by hand.

Field	Type	Purpose
`projectId`	`string`	Matches the directory name.
`rootPath`	`string`	Absolute path to the codebase. Updated each scan with the most recent `--root`.
`createdAt`	`string` (ISO)	Project init time.
`githubUrl`	`string?`	`https://github.com/owner/repo/blob/branch` — used for clickable links in exports. Auto-detected from `git remote` if not set.

config.json — per-project overrides

Optional. Read by scan and the AI agents.

Field	Type	Purpose
`priorityPaths`	`string[]`	Path prefixes processed first.
`promptAppend`	`string`	Free-form text appended to the system prompt for this project.
`ignorePaths`	`string[]`	Glob patterns to skip during scan.

INFO.md

Free-form markdown injected into the AI prompt for process, triage, and revalidate. See getting-started.md for the agent prompt that writes a good one.

The core per-file accumulator. Every stage adds to this record; nothing is overwritten. Re-scanning merges new candidates. Re-processing appends to analysisHistory. Revalidation annotates findings rather than replacing them.

The on-disk path mirrors the source path under <rootPath> plus a .json suffix (src/api/auth.ts → files/src/api/auth.ts.json).

Top-level fields

Field	Type	Purpose
`filePath`	`string`	Path relative to `rootPath`.
`projectId`	`string`	The owning project.
`candidates`	`CandidateMatch[]`	Regex matcher hits — see below.
`lastScannedAt`	`string` (ISO)	Most recent scan timestamp.
`lastScannedRunId`	`string`	runId of the scan that last touched this file.
`fileHash`	`string` (sha-256)	Source content hash at last scan.
`findings`	`Finding[]`	Latest set of AI-produced findings.
`analysisHistory`	`AnalysisEntry[]`	Append-only log of every AI investigation.
`gitInfo`	`object?`	Git committer info + ownership data, written by `enrich`.
`status`	`"pending" \| "processing" \| "analyzed" \| "error"`	Lifecycle state.
`lockedByRunId`	`string?`	When non-empty, a run holds this file. Cleared on completion.

`CandidateMatch`

Field	Type	Purpose
`vulnSlug`	`string`	Matcher slug that fired.
`lineNumbers`	`number[]`	1-indexed source lines.
`snippet`	`string`	Short excerpt around the first match.
`matchedPattern`	`string`	Human-readable label of the regex (the matcher's `label`).

`Finding`

Field	Type	Purpose
`severity`	`"CRITICAL" \| "HIGH" \| "MEDIUM" \| "HIGH_BUG" \| "BUG" \| "LOW"`	See README severity table.
`vulnSlug`	`string`	Matcher slug or `other-<topic>` if no matcher fits.
`title`	`string`	One-sentence summary.
`description`	`string`	Full explanation.
`lineNumbers`	`number[]`	1-indexed lines.
`recommendation`	`string`	Suggested fix.
`confidence`	`"high" \| "medium" \| "low"`	The agent's self-rated confidence.
`triage`	`Triage?`	Set by `triage`.
`revalidation`	`Revalidation?`	Set by `revalidate`.

`Triage` (set by `deepsec triage`)

Field	Type	Purpose
`priority`	`"P0" \| "P1" \| "P2" \| "skip"`	Recommended action bucket.
`exploitability`	`"trivial" \| "moderate" \| "difficult"`	Effort to weaponize.
`impact`	`"critical" \| "high" \| "medium" \| "low"`	Blast radius if exploited.
`reasoning`	`string`	Why this priority.
`triagedAt`	`string` (ISO)	Timestamp.
`model`	`string`	Model used for triage.

`Revalidation` (set by `deepsec revalidate`)

Field	Type	Purpose
`verdict`	`"true-positive" \| "false-positive" \| "fixed" \| "uncertain"`	Re-checked verdict.
`reasoning`	`string`	Why this verdict. Includes git-history evidence if `fixed`.
`adjustedSeverity`	`Severity?`	Set if revalidation re-rates the finding.
`revalidatedAt`	`string` (ISO)	Timestamp.
`runId`	`string`	runId of the revalidate run.
`model`	`string`	Model used.

`AnalysisEntry`

One per AI investigation of this file. Append-only — nothing is ever deleted.

Field	Type	Purpose
`runId`	`string`	The producing run.
`investigatedAt`	`string` (ISO)	Timestamp.
`durationMs`	`number`	Wall-clock total.
`durationApiMs`	`number?`	API time only (excludes process orchestration).
`agentType`	`string`	`claude-agent-sdk` or `codex`.
`model`	`string`	Model identifier.
`modelConfig`	`Record<string, unknown>`	Provider-specific settings echoed back.
`agentSessionId`	`string?`	The agent's session/thread id, for reproducing or replaying.
`findingCount`	`number`	Findings produced in this entry.
`numTurns`	`number?`	Conversation turn count.
`costUsd`	`number?`	Estimated USD cost.
`usage`	`{ inputTokens, outputTokens, cacheReadInputTokens, cacheCreationInputTokens }?`	Token accounting.
`refusal`	`RefusalReport?`	See models.md.
`codexStderr`	`string?`	Captured codex stderr when an investigation produced 0 output tokens (forensic only).
`reinvestigateMarker`	`number?`	Wave marker from `--reinvestigate <N>`.

`RefusalReport`

Field	Type	Purpose
`refused`	`boolean`	True if the agent skipped or declined any part of the investigation.
`reason`	`string?`	Free-form reason if `refused`.
`skipped`	`Array<{ filePath?: string; reason: string }>?`	Per-file skip reasons.
`raw`	`string?`	Trimmed raw model response to the follow-up question, for debugging.

`gitInfo` (set by `deepsec enrich`)

Field	Type	Purpose
`recentCommitters`	`Array<{ name, email, date }>`	Top contributors over the file's recent history.
`enrichedAt`	`string` (ISO)	Last enrich timestamp.
`ownership`	`OwnershipData?`	If an ownership plugin is active, structured ownership/escalation data.

`status` lifecycle

pending     -- scan finished, awaits AI
   ↓
processing  -- a run is currently investigating (lockedByRunId set)
   ↓
analyzed    -- AnalysisEntry appended; findings updated

error is set if the agent crashed mid-investigation. Re-running process will retry error and pending files.

runs/.json — `RunMeta`

One per scan / process / revalidate invocation. Used for status reporting (deepsec status) and for filtering exports by run.

Field	Type	Purpose
`runId`	`string`	`<YYYYMMDDHHMMSS>-<rand4>`. Sortable.
`projectId`	`string`	Owning project.
`rootPath`	`string`	Resolved root for the run.
`createdAt`	`string` (ISO)	Run start.
`completedAt`	`string?` (ISO)	Run end (absent while running).
`type`	`"scan" \| "process" \| "revalidate"`	Stage.
`phase`	`"running" \| "done" \| "error"`	Terminal status.
`scannerConfig`	`{ matcherSlugs }?`	Set on scan runs.
`processorConfig`	`{ agentType, model, modelConfig }?`	Set on process / revalidate runs.
`stats`	`object`	Counters: filesScanned, candidatesFound, findingsCount, totalCostUsd, truePositives, falsePositives, …

reports/

Generated by deepsec report. One markdown per project plus a JSON summary. Re-running report overwrites; nothing here is incremental.

Reading data/ directly

The append-only model means a few patterns work well:

Find every TP HIGH+ finding across a project:

jq -r '. as $r | $r.findings[] | select(.revalidation.verdict=="true-positive") | select(.severity=="HIGH" or .severity=="CRITICAL") | [$r.filePath, .severity, .title] | @tsv' data/<id>/files/**/*.json

Total spend on a project:

jq -s 'map(.analysisHistory[].costUsd // 0) | add' data/<id>/files/**/*.json

Files still pending after a run:

jq -r 'select(.status=="pending") | .filePath' data/<id>/files/**/*.json

For richer queries, prefer deepsec export --format json — it applies filters consistently with the rest of the CLI.