Data directories
June 7, 2026 · View on GitHub
Every directory Memtrace creates, where it lives, what's inside, and when (if ever) to delete it.
TL;DR map
YOUR WORKSPACE (anchored by .memtrace-workspace marker, or git root, or CWD)
├── .memtrace-workspace ← optional anchor — multiple repos here share one .memdb
├── .memdb/ ← graph + vectors for every repo under this anchor
├── .memtrace/ ← indexer job state (transient, small)
├── repo-a/
├── repo-b/
└── repo-c/
YOUR HOME
~/.memtrace/
├── embed-cache/ ← per-symbol embedding cache (cross-workspace)
├── fastembed_cache/ ← downloaded embedding models
├── rerank-models/ ← downloaded reranker model
├── auth/ ← session tokens (one file)
├── config.toml ← persistent embedder choice — see embedding-providers.md
├── last_health.json ← last `memtrace embed test` probe result
├── logs/ ← legacy daemon.log from pre-0.6.10 OS service installs (if any)
├── session-ledger.jsonl ← user-global MCP tool-call ledger (v0.3.89)
├── watches.json ← persistent watch_directory registrations (v0.3.89)
└── telemetry/ ← buffered events (only if telemetry is ON)
Two important rules:
- Workspace state lives in the workspace, not the per-repo. Sibling
repos under one
.memtrace-workspacemarker (or one git root, if no marker) share a single.memdb/. That makes cross-repo edges work — your TS frontend'sfetch("/api/users")can link to your Rust backend'sRouter::route(...)even though they're in separate repo directories.memtrace reset <repo_id>removes only one repo's records;rm -rf .memdb/wipes them all. - Cross-workspace state lives in
~/.memtrace/. Models, embedding cache, and your session token aren't tied to a workspace.
Where .memdb/ actually lives — workspace anchor
The MemDB graph engine's on-disk store. Created the first time you run
memtrace start or memtrace index <path> somewhere under a workspace.
Discovery order when Memtrace boots:
.memtrace-workspacemarker (explicit anchor — empty file is fine). Memtrace walks UP from the current working directory looking for it. First match wins. All sibling directories under the marker share one.memdb/. Recommended for monorepos and multi-repo workspaces..git/root (per-repo fallback). If no.memtrace-workspacemarker is found, Memtrace uses the nearest git repo root.- Current working directory if neither marker nor git root is found.
For the full story on multi-repo setups — why to use a workspace,
cross-repo edges, and the --workspace flag — see
workspaces.md. This page covers the storage side.
Quick way to confirm where YOUR .memdb is on a running system:
memtrace status
The startup banner also prints:
◆ MemDB embedded — in-process (data dir: /your/path/.memdb)
Multi-repo workspace example
$ pwd
/home/me/work/
$ tree -L 1 -a
.
├── .memtrace-workspace ← marker, you `touch`-ed this
├── .memdb/ ← ONE store for all 3 repos
├── frontend/ ← repo A (git)
├── backend/ ← repo B (git)
└── infra/ ← repo C (terraform-only, no git)
$ cd frontend && memtrace index .
# → registers frontend's symbols in /home/me/work/.memdb under repo_id=frontend
$ cd ../backend && memtrace index .
# → registers backend's symbols in the SAME .memdb under repo_id=backend
# Cross-language edges between frontend & backend symbols work out of the box.
Without the marker, each repo gets its own <repo>/.memdb/ and they
can't see each other.
Layout (high-level — the inner files are MemDB's business; don't touch them):
.memdb/
├── daemon.pid # workspace-owner lock
├── daemon-state.json # owner endpoint + heartbeat
└── memtrace/ # database name
├── wal/ # write-ahead log
├── episodes/ # commit + working-tree snapshots
├── paged-records/ # Nodes / Edges / Episodes / VectorBlobs
├── indexes/ # property indexes + HNSW vectors
├── tantivy/ # BM25 full-text segments
└── manifest.toml # MemDB metadata
daemon.pid and daemon-state.json are runtime coordination files,
not graph data. They let memtrace start, memtrace mcp, and the
headless daemon agree on a single owner for this .memdb. If another
process already owns the lock, later memtrace mcp processes attach
to that owner's loopback endpoint instead of opening a duplicate
embedded MemDB.
Size grows roughly linearly with your codebase. Some rough numbers:
| Project | Files | Symbols | .memdb/ size |
|---|---|---|---|
| Small (mempalace) | ~250 | ~1.8k | ~30 MB |
| Medium (Express, Bun-style) | ~800 | ~12k | ~120 MB |
| Large (Django) | ~3,300 | ~50k | ~700 MB–1 GB |
| Huge (Linux kernel-class) | 30k+ | 500k+ | 5–10 GB |
Override location with MEMTRACE_MEMDB_DATA_DIR=<absolute path>
if you want it outside the repo (e.g. on a faster disk).
When to delete: memtrace reset does this safely. Manual
rm -rf .memdb/ works too if the daemon isn't running. The next
memtrace start will re-index from scratch.
Per-project: <project>/.memtrace/
Job state for the indexing pipeline — progress, watchers, recovery metadata. Tiny (usually < 1 MB). Created the first time the daemon runs in your project.
If the daemon crashes mid-index, this is what lets it resume rather
than starting over. Safe to delete when the daemon isn't running; you
just lose the resume point and the next memtrace start re-indexes
from scratch.
Override location with MEMTRACE_DATA_DIR=<path>.
<project>/.memtraceignore
Optional file. Glob patterns of paths the indexer should skip,
on top of the built-in exclude list (.git, node_modules,
target, dist, build, .venv, vendor, .claude/, etc.).
# .memtraceignore — same syntax as .gitignore
docs/generated/
**/*_pb2.py
fixtures/
You usually don't need this — the built-in excludes cover most cases. Reach for it when a generated/vendored directory is bloating your graph.
~/.memtrace/embed-cache/
A redb-backed key-value store mapping (model_id, symbol_ast_hash)
→ embedding vector. Cross-project — re-indexing a different repo
that has the same symbol body doesn't recompute the embedding.
Layout:
~/.memtrace/embed-cache/
└── memtrace_embed_v2.redb # single file, ACID, mmap
Typical size: 200 MB–2 GB depending on how many distinct symbols you've indexed across all your projects.
When to delete: Only when you change embedding models. The cache
is keyed by model ID, so switching from jina-embeddings-v2-base-code
to bge-small makes the existing entries cache-misses anyway — but
they still take disk. Manual cleanup:
rm -rf ~/.memtrace/embed-cache/
~/.memtrace/fastembed_cache/
HuggingFace-style cache for the downloaded embedding model. Default
is jina-embeddings-v2-base-code (~340 MB f32 ONNX, ~85 MB int8).
Layout:
~/.memtrace/fastembed_cache/
├── models--jinaai--jina-embeddings-v2-base-code/
│ ├── snapshots/<sha>/
│ │ ├── model.onnx OR model_int8.onnx
│ │ ├── tokenizer.json
│ │ └── tokenizer_config.json
│ ├── blobs/ # Hugging Face content-addressed blobs
│ └── refs/main
└── models--Xenova--bge-small-en-v1.5/ # only if you've used bge-small
Size: 340–500 MB for the default model. Adding alternative models (bge-small, bge-base) costs 100–500 MB each.
Override location with FASTEMBED_CACHE_DIR=<path> — useful if
your home directory is on a small SSD and you want models on a
different disk.
When to delete: When you want to redownload a model (rare). The cache is content-addressed, so a partial download self-heals on next use.
~/.memtrace/rerank-models/
Cross-encoder reranker models. Default is BAAI/bge-reranker-base
(int8 quantized, ~75 MB).
Layout:
~/.memtrace/rerank-models/
└── bge-reranker-base/
├── model_int8.onnx
├── tokenizer.json
└── config.json
The reranker is loaded into memory only when MEMTRACE_RERANK=on
(the default). Disable it with MEMTRACE_RERANK=off if you want a
pure BM25 + vector pipeline (faster, ~3–4 pp lower acc@1 on typical
agent queries).
~/.memtrace/auth/
Your Memtrace session token, refreshed automatically. One file:
~/.memtrace/auth/
└── session.json # { device_id, token, expires_at }
Don't share this file. If you suspect it's leaked,
memtrace auth logout deletes it; the next memtrace start walks
you through device-flow login again.
~/.memtrace/telemetry/
Created on first run because product telemetry is on by default.
Stores a small batch of pending events (sanitised crashes, errors,
and lightweight usage signals) until the flusher ships them every
60 seconds. See privacy-and-telemetry.md
for the full field-level breakdown of what's in there.
Set MEMTRACE_TELEMETRY=off to disable telemetry — the panic hook
still leaves a local breadcrumb in this directory if the binary
crashes (useful for your own debugging), but the flusher never
ships it. If you've never run memtrace start, the directory
doesn't exist yet.
~/.memtrace/logs/ (legacy)
Versions before 0.6.10 that used memtrace daemon install may
have left rolling logs here (daemon.log + dated rotations). Current
binaries do not write this file — runtime diagnostics go to stderr;
use RUST_LOG=info memtrace start or MEMTRACE_DEBUG=1 when
debugging startup failures.
Safe to delete the directory at any time. It is not recreated unless
you still have a legacy OS service registration from an older install
(in which case memtrace stop unloads it).
~/.memtrace/session-ledger.jsonl (v0.3.89)
User-global JSONL append-log of MCP tool calls. One line per call:
{ "event_id": "…", "tool_name": "find_symbol", "task_label": "…",
"timestamp": "2026-05-11T11:42:13Z", "bytes_avoided": 1843,
"elapsed_ms": 38, "files_referenced": 2, "repo_id": "myrepo", … }
Moved here from <workspace>/.memtrace/session-ledger.jsonl in
v0.3.89 to fix the bug @Badmrpotatohead reported where the
dashboard's /api/value/aggregate view said "0 queries / $0.00"
while the session log showed 140 historical calls — root cause
was memtrace mcp (agent-launched) and memtrace start (run from
your repo) writing to / reading from different ledger files because
each was anchored to its own cwd.
Override with MEMTRACE_SESSION_LEDGER=<absolute path> to put it
somewhere else (e.g. per-workspace isolation if you really want it).
Safe to delete — you lose historical receipts but the live counters keep working from in-memory state, and the file rebuilds itself on the next MCP call.
~/.memtrace/watches.json (v0.3.89)
Persistent registry of watch_directory registrations, atomic
write, BOM-tolerant read:
[
{ "path": "D:\\Repos\\my-project", "repo_id": "my-project",
"registered_at": "2026-05-11T01:42:00Z", "origin": "manual" },
{ "path": "/home/me/other-repo", "repo_id": "other-repo",
"registered_at": "2026-05-10T22:14:55Z", "origin": "manual" }
]
origin is "manual" for entries you registered via the
watch_directory MCP tool, and "restored" for entries the MCP
server re-armed from this file on boot.
Restore-on-boot is on by default. Escape hatch:
MEMTRACE_NO_WATCH_RESTORE=1 makes the MCP server ignore this file.
To forget all watches: delete the file.
Things Memtrace creates outside its own directories
A few files end up in your repo or home folder beyond the four directories above:
- Skills are installed at the global path your AI tool expects
(e.g.
~/.claude/skills/memtrace-skills/...for Claude Code,.cursor/...for Cursor). Generated bymemtrace install/npm install -g memtracepostinstall. - MCP config entries are appended to your tool's config:
~/.config/claude-code/mcp.json,~/.cursor/mcp.json, etc. The installer is idempotent — running it twice doesn't duplicate entries. - The Memtrace npm shim itself lives wherever your global npm puts
things (
~/.npm-global/lib/node_modules/memtrace/on most setups).
Cleaning up everything
Full reset to factory:
memtrace stop # stop the daemon
rm -rf <project>/.memdb # this project's graph
rm -rf <project>/.memtrace # this project's job state
rm -rf ~/.memtrace # ALL machine-level state
npm uninstall -g memtrace # the binary + skills
After this Memtrace leaves no trace on your system. Your source code is never touched.
What's safe to delete during normal use
| Path | Safe? | Effect |
|---|---|---|
<project>/.memdb/ | Yes (daemon stopped) | Re-index from scratch on next start |
<project>/.memtrace/ | Yes (daemon stopped) | Lose resume point; re-index full on next start |
~/.memtrace/embed-cache/ | Yes any time | Re-embed symbols on next index |
~/.memtrace/fastembed_cache/ | Yes any time | Re-download model on next start (~340 MB) |
~/.memtrace/rerank-models/ | Yes any time | Re-download reranker (~75 MB) |
~/.memtrace/auth/ | Yes | Forces re-login on next start |
~/.memtrace/telemetry/ | Yes | Drops any unsent events |
~/.memtrace/logs/ | Yes any time | Drops historical daemon logs; new ones will be created |
~/.memtrace/session-ledger.jsonl | Yes any time | Loses historical MCP-call receipts; live counters keep working from in-memory state |
~/.memtrace/watches.json | Yes (daemon / MCP stopped) | All watch_directory registrations forgotten; re-register manually after deleting |
Nothing here is precious. The graph rebuilds itself; the caches
warm themselves; the auth re-authenticates. Memtrace is designed so
you can rm -rf any of these at any time without consulting docs
first.