README.md

July 2, 2026 ยท View on GitHub

Vestige

Your bug was born days before it crashed. You just can't remember where.

Vestige is a local-first memory for AI agents that reaches backward through time to find the quiet change that caused today's failure: the cause that looks nothing like the bug. One 23MB Rust binary. No cloud. Your data never leaves your machine.

GitHub stars Release Tests License

โšก Quick Start ยท ๐Ÿง  The Idea ยท ๐Ÿ”ฌ The Science ยท ๐Ÿ›  13 Tools ยท ๐Ÿ“Š Dashboard


๐Ÿ‘‹ Why I built this

Hi, I'm Sam. I built Vestige from a tiny apartment in Chicago because I kept losing days to the same thing, and I bet you have too.

Production breaks. You start hunting. And the cause is almost never near the error. It's some quiet change you made days ago that looks nothing like the crash it eventually caused. A flipped env var. A swapped service. A config tweak you'd already forgotten.

Here's the part that took me a while to see: every AI memory tool is built on vector search, and vector search hunts for what looks like your problem. But a root cause never looks like the bug it creates. So they all search the goal line, while the real failure was a quiet midfield turnover fifteen minutes earlier.

I wanted a memory that traces the match backward.

So that's what Vestige is. Everyone else built a memory that remembers. I tried to build the first one that realizes: it gates what's worth keeping, lets the noise fade like your own memory does, and when a failure hits, it reaches back through time to the change that actually caused it.

It's one Rust binary. It runs entirely on your machine. It never phones home. And there's a 60-second start right below.

๐ŸŽ™๏ธ The 60-second version of this whole story, the one I give in person, lives in demo/PITCH-v2-causebench.md. If you've got a minute, read that first. It's the clearest way to get why this matters.


โšก Get it running in 60 seconds

Step 1 โ€” install (one binary, no Docker, no API key, no signup):

npm install -g vestige-mcp-server@latest

Step 2 โ€” connect it to your agent. Vestige speaks MCP, so it works with any AI agent. The universal config (works everywhere):

{ "mcpServers": { "vestige": { "command": "vestige-mcp" } } }

Drop that into your agent's MCP config file. Or use the one-line shortcut for your agent:

# Cursor / Windsurf / VS Code      โ†’ add the JSON above to ~/.cursor/mcp.json (or the editor's MCP settings)
# Claude Code                      โ†’ claude mcp add vestige vestige-mcp -s user
# Codex                            โ†’ codex mcp add vestige -- vestige-mcp
# Cline / Continue / Zed / Goose   โ†’ add the JSON above to that client's MCP config

Step 3 โ€” confirm it's working:

vestige-mcp --version     # prints the installed version
vestige stats             # prints your memory count (0 on a fresh install)

That's the whole install. New here? The 30-minute first-run guide walks you from install to your first backward-reach: what gets saved (and what doesn't), how to inspect your own memory, and how to scope it per project. Per-agent guides (Cursor, VS Code, Windsurf, JetBrains, Xcode, OpenCode, Codex, Claude Desktop) are here โ†“.

Now talk to your agent like it has a memory, because now it does:

You:  "Remember: we always disable SimSIMD on release builds, it breaks old x86 CPUs."
        ...days later, fresh session, zero context...
You:  "Should I enable SimSIMD for the release?"
AI:   โš ๏ธ Hold on, this contradicts a decision you stored: you chose to DISABLE it
        because it breaks old x86 CPUs.

That last line isn't me being cute. It's a real status the engine returns, called claim_contradicts_memory. Most memory tools would have happily handed you the wrong answer. Vestige tells you when you're about to walk back into a mistake you already learned from.

And the headline feature, the one nothing else does, is one command:

vestige backfill --contrast

When a failure is in your memory, this reaches backward through time and finds the quiet earlier change that caused it (the one a vector search ranks poorly because it shares no words with the error). It shows you, side by side, what similarity search returns versus the real cause. More on the backward reach โ†“

(Works with Codex, Cursor, VS Code, Claude Desktop, Windsurf, JetBrains, Zed: anything that speaks MCP. Full setup is here โ†“.)


๐Ÿง  It's not RAG with a nicer haircut

RAG is a bucket: throw everything in, hope nearest-neighbor finds it later. Vestige behaves more like an actual memory: it decides what's worth keeping, forgets what isn't, and reasons across what's left.

๐Ÿชฃ RAG / Vector Store๐Ÿง  Vestige
What it storesEverything you hand itOnly what's surprising or new (the rest gets merged or skipped)
What it forgetsNothing; it just bloatsUnused memories fade on a real forgetting curve, so your context stays lean
Finding a root causeCan't, because the cause isn't similar to the bugReaches backward in time to the change that caused it (the whole point โ†“)
Catching contradictionsSilent; serves the stale answer with a straight faceTells you: "this contradicts what you decided"
DuplicatesYou clean them up by handSelf-heals: "likes dark mode" + "prefers dark themes" quietly become one
Forgetting on demandDELETE and it's gonesuppress gently inhibits a memory (and its neighbors), reversible for 24h
Where it livesUsually someone else's cloudYour machine. One binary. No telemetry.

๐Ÿ”ฅ The thing nothing else does: memory with hindsight

This is the part I'm proudest of, and it's worth one honest paragraph.

A bug shows up today. The cause was a quiet decision from three weeks ago, like a changed env var or a swapped service. That cause shares no words with the error it created. A vector search will never connect them, because it only knows how to find things that look alike, and this is a case where the cause and the symptom look nothing alike. This isn't a tuning problem; in 2026 Google DeepMind published a proof (arXiv:2508.21038, ICLR 2026) that single-vector retrieval is mathematically incapable of bridging gaps like this.

So Vestige doesn't do it with similarity. Its Retroactive Salience Backfill (ported from Zaki/Cai et al., 2024, Nature 637:145โ€“155 (DOI), on how the brain links a shock to the quiet memory that caused it) reaches backward through time and promotes the dormant memory that's causally upstream: it shares an entity (the same file, env var, or service), not the same words.

I also built a benchmark to keep myself honest about it. Every pure vector retriever scored 0% recall@1 on the causal-gap task; Vestige scored 60%. (To be precise: the impossibility is DeepMind's theorem; the 0%-vs-60% is my measurement. Two different claims, and I keep them separate.)

vestige backfill --contrast      # show the root cause a vector search would have missed

The nice part: it compounds. Every failure your agent records makes the next session diagnose faster (run two is smarter than run one), and it happens automatically during consolidation, so you don't have to babysit it.

All of this shipped in v2.2.0, along with a 34โ†’13 tool consolidation and a rebuilt retrieval engine. Full release notes โ†’


๐Ÿ”ฌ This is real neuroscience, not a metaphor

I get skeptical when projects wave the word "neuroscience" around, so here's my receipt: every mechanism below is a real, cited paper, implemented in Rust, running locally on your machine. None of it phones a model in the cloud to sound smart.

MechanismWhat it does for youGrounded in
Prediction-Error GatingRedundant info gets merged, contradictory gets superseded, only the novel gets storedThe hippocampal novelty signal
FSRS-6 Spaced Repetition21 parameters of the mathematics of forgetting, so used memories stay and unused ones fadeModern spaced-repetition research
Retroactive Salience BackfillBackward causal reach to the root cause of a failureZaki/Cai et al. 2024, Nature 637:145โ€“155
Synaptic TaggingA memory that looked trivial this morning can be tagged critical tonightFrey & Morris 1997
Spreading ActivationSearch "auth bug," surface last week's JWT update, because memory is a graph, not a listCollins & Loftus 1975
Dual-Strength ModelStorage strength vs. retrieval strength, so deeply stored โ‰  instantly recalled, just like youBjork & Bjork 1992
Memory DreamingSleep-like consolidation: replays, connects, synthesizes insights to a graphActive-dreaming consolidation
Active Forgetting (suppress)Top-down inhibition that compounds and cascades to neighbors, reversible for 24hAnderson 2025 ยท Davis 2020

Read the full science doc โ†’. Every feature, every paper.


๐Ÿ›  13 tools, one brain

v2.2.0 consolidated a sprawling 34-tool surface into 13 sharp ones your agent actually reaches for. Old names still work as hidden aliases, so nothing breaks.

ToolWhat it does
๐Ÿ” recallThe retrieval engine. Folds search + deep reasoning + contradiction detection into one call. F32 embeddings, Reciprocal Rank Fusion, claim-vs-memory checks.
๐Ÿง  backfillMemory with hindsight. Backward causal reach to a failure's root cause (Cai 2024).
๐Ÿ’พ smart_ingestStores with CREATE / UPDATE / SUPERSEDE via Prediction-Error Gating. Batch session-end saves.
๐Ÿ—‚ memoryGet, edit, promote ๐Ÿ‘, demote ๐Ÿ‘Ž, check state, purge content + embeddings.
๐Ÿงฉ graphReasoning chains, associations, bridges, predictions, force-directed export.
๐ŸŒ™ maintainConsolidate, dream, GC, importance-score, backup, export, restore. One maintenance verb.
๐Ÿงน dedupSelf-healing duplicate detection + merge (8 old tools โ†’ 1).
๐Ÿšซ suppressTop-down active forgetting that compounds, cascades, and is reversible for 24h. The memory is inhibited, not erased.
๐Ÿ“Ÿ memory_statusHealth + stats + trends + recommendations in one packet.
๐Ÿงฌ codebase ยท intention ยท source_sync ยท session_startPer-project code memory ยท "remind me when X" ยท external-source connectors ยท one-call session init.

๐Ÿ“Š Watch your AI think in 3D

vestige dashboard      # โ†’ http://localhost:3927/dashboard

Every memory is a glowing node in a real-time, force-directed 3D graph. Connections form as you work. Nodes pulse when accessed, burst on creation, fade on decay. Kick off a consolidation and the whole graph slides into purple dream mode, replaying memories that light up in sequence.

Built with SvelteKit 2 ยท Svelte 5 ยท Three.js ยท WebGL bloom ยท live WebSocket events. 1000+ nodes at 60fps. Installable as a PWA.


๐Ÿงฉ Works with every AI agent

Vestige speaks MCP, so any agent that can register an MCP server can use it. Not a plugin for one tool, the memory layer underneath all of them. The universal config works everywhere:

{ "mcpServers": { "vestige": { "command": "vestige-mcp" } } }
AgentSetup
Cursoradd the JSON above to ~/.cursor/mcp.json ยท guide โ†’
Windsurfguide โ†’
VS Code (Copilot)guide โ†’
Cline / Continue / Zed / Gooseadd the universal JSON to that client's MCP config
Claude Codeclaude mcp add vestige vestige-mcp -s user
Codexcodex mcp add vestige -- vestige-mcp
JetBrains ยท Xcode ยท OpenCodeintegration guides โ†’
Claude Desktop2-minute setup โ†’
Other install methods (Intel Mac, Windows, build-from-source)

Update an existing install:

vestige update                          # binaries only
vestige update --sandwich-companion     # also refresh optional Claude Code companion files

macOS (Intel): Microsoft is dropping x86_64 macOS ONNX Runtime prebuilts after v1.23.0, so the Intel Mac build links dynamically against a Homebrew ONNX Runtime:

brew install onnxruntime
npm install -g vestige-mcp-server@latest
echo 'export ORT_DYLIB_PATH="'"$(brew --prefix onnxruntime)"'/lib/libonnxruntime.dylib"' >> ~/.zshrc && source ~/.zshrc
claude mcp add vestige vestige-mcp -s user

Full guide: docs/INSTALL-INTEL-MAC.md.

Windows + Claude Desktop: quit Claude Desktop from the tray, then in PowerShell:

npm install -g vestige-mcp-server@latest
vestige-mcp --version

Point %APPDATA%\Claude\claude_desktop_config.json at it:

{ "mcpServers": { "vestige": { "command": "vestige-mcp" } } }

If it can't find the command, run where vestige-mcp and use the exact .cmd path.

Build from source (Rust 1.91+):

git clone https://github.com/samvallad33/vestige && cd vestige
cargo build --release -p vestige-mcp
# Apple Silicon GPU: --features metal   ยท   NVIDIA: --features qwen3-embeddings,cuda

๐Ÿš€ Make your AI use memory automatically

Registering the server exposes the tools; a short instruction tells the agent when to call them. Drop in the protocol and your agent saves and recalls on its own:

You sayVestige does
"Remember this"Saves immediately
"I always..." / "I prefer..."Saves as a durable preference
"Remind me when..."Creates a future trigger (intention)
"This is important"Saves and promotes it

Agent memory protocol โ†’ ยท Claude Code template โ†’


๐Ÿ— Under the hood

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  SvelteKit Dashboard / Three.js 3D graph / WebGL bloom    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Axum HTTP + WebSocket (:3927) / REST + live event stream โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  MCP Server (stdio JSON-RPC) / 13 tools ยท 30 modules      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Cognitive Engine                                          โ”‚
โ”‚   FSRS-6 ยท Spreading Activation ยท Prediction-Error Gating โ”‚
โ”‚   Retroactive Salience Backfill ยท Synaptic Tagging        โ”‚
โ”‚   Memory Dreamer ยท Hippocampal Index ยท Active Forgetting  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Storage: SQLite + FTS5 ยท USearch HNSW ยท Nomic Embed v1.5 โ”‚
โ”‚   Optional: Qwen3 reranker ยท SQLCipher ยท Metal/CUDA       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
LanguageRust 2024 (MSRV 1.91), 86,000+ lines
Binary~23MB, single file
EmbeddingsNomic Embed Text v1.5 (768dโ†’256d Matryoshka, 8192 ctx); Qwen3 optional
Vector searchUSearch HNSW (โ‰ˆ20ร— faster than FAISS)
StorageSQLite + FTS5, optional SQLCipher encryption
Tests1,550 passing ยท clippy -D warnings clean
First runDownloads ~130MB embedding model once, then fully offline forever
PlatformsmacOS (ARM + Intel) ยท Linux x86_64 ยท Windows x86_64. All prebuilt

๐Ÿ“š Go deeper

Getting StartedYour first 30 minutes, start to finish
FAQ30+ real questions answered
The ScienceEvery feature, every paper
Storage ModesGlobal ยท per-project ยท multi-instance
ConfigurationCLI, env vars, every knob
ChangelogThe full story, version by version

If your agent should remember what you taught it yesterday, star it. โญ

86,000+ lines of Rust ยท 13 tools ยท 30 cognitive modules ยท 130 years of memory research ยท one 23MB binary that never phones home.

Built by @samvallad33 ยท AGPL-3.0 ยท 100% local, 100% yours