VaultPilot Guard
April 27, 2026 · View on GitHub
DeFi for AI agents, designed for when the AI can be compromised. Agent proposes, you approve.
An independent Claude Code skill that enforces bytes-verification invariants
for transactions prepared by vaultpilot-mcp.
Why this repo is separate
vaultpilot-mcp normally emits CHECKS PERFORMED / VERIFY-BEFORE-SIGNING
blocks that tell the agent how to verify bytes before signing. A compromised
MCP can silently omit those instruction blocks — with no skill installed,
the agent has no static rule to fall back on and will drop the checks. The
skill in this repo closes that gap: it loads from the user's local disk
(~/.claude/skills/vaultpilot-preflight/SKILL.md) and instructs the agent
to run the integrity checks regardless of what the MCP says.
The repo is intentionally independent of vaultpilot-mcp. An attacker
who compromises the MCP's release pipeline cannot push a change that
weakens or removes this skill — the skill's trust root is the user's own
clone of this repository.
Install
git clone https://github.com/szhygulin/vaultpilot-security-skill.git \
~/.claude/skills/vaultpilot-preflight
Restart Claude Code so the skill is discovered. When vaultpilot-mcp
starts and sees the file at
~/.claude/skills/vaultpilot-preflight/SKILL.md, it stops prefixing its
prepare_* / preview_* responses with the "skill not installed" warning.
Update
cd ~/.claude/skills/vaultpilot-preflight
git pull --ff-only
Diff the new SKILL.md against the current one before pulling if you
want to audit the change. See CHANGELOG.md for per-version notes.
Integrity pin (coordinated with vaultpilot-mcp)
Starting with 0.1.1, SKILL.md carries an invisible integrity sentinel
near the top and vaultpilot-mcp pins the expected SHA-256 of this file
in its server source code. On every signing flow, the agent is instructed
to:
- Run
sha256sum SKILL.mdand compare to the pinned value. - Confirm the sentinel is present in the content returned by the
Skilltool (not in the MCP's own instructions text, which reference the sentinel only by fragments so a naïve context-scan can't spoof it).
This catches two attacks a static marker alone cannot:
- Targeted tamper of
SKILL.mdon disk. An attacker with local write access cannot forge content that hashes to the pinned value. - Plugin collision (different skill registered under the same name). The colliding skill's content won't contain the sentinel, so the post-load check fails.
Residual: a compromise of the vaultpilot-mcp npm package itself would
also ship the attacker's pin — that's the same trust floor as the rest
of the MCP, not a new gap.
Maintainer update workflow
When SKILL.md changes in this repo, the pin in vaultpilot-mcp must
change with it. The coordinated release:
- Edit
SKILL.mdon a feature branch in this repo. sha256sum SKILL.md— record the new hash.- In
vaultpilot-mcp/src/index.ts, find thePREFLIGHT SKILL INTEGRITY PINblock inside the serverinstructionsarray and update theExpected SHA-256value. For substantive protocol changes, also bump the sentinel version (v1→v2) in bothSKILL.mdand the three sentinel fragments insrc/index.ts. - Merge the skill PR first, tag a new
vaultpilot-security-skillrelease, then merge + publish the matchingvaultpilot-mcprelease. - Bump
CHANGELOG.mdhere with the new sentinel and the minimum requiredvaultpilot-mcpversion.
What users see during a version skew
If a user upgrades only one side, every signing flow halts with the
exact vaultpilot-preflight skill integrity check FAILED message. That
symptom is identical to a real tamper — the fix (git pull here and
npm update on vaultpilot-mcp until both match) is safe either way,
but users should NOT bypass the check to "unblock themselves."
Use with other agents (non-Claude clients)
The content of this skill (the invariants in SKILL.md) is agent-agnostic
— it's plain prose about how to verify bytes before signing. The mechanism
(YAML frontmatter + ~/.claude/skills/<name>/SKILL.md layout) is
Anthropic's Agent Skills format,
honored by Claude Code, Claude Desktop, the Claude Agent SDK, and Claude's API.
Cursor / Cline / Windsurf / Continue / Aider don't scan that path.
To use the invariants with a non-Claude MCP client, copy the body of
SKILL.md (everything after the --- frontmatter) into
whatever that tool uses for persistent system-prompt rules:
| Agent / IDE | Where to paste the invariants |
|---|---|
| Cursor | .cursorrules at the project root, or User Rules in settings |
| Cline | .clinerules at the project root |
| Windsurf | .windsurfrules at the project root |
| Continue | systemMessage in the Continue config |
| Aider | chat-language / system prompt in .aider.conf.yml |
| Generic MCP | Your client's system prompt / persistent instructions |
vaultpilot-mcp's per-call CHECKS PERFORMED / VERIFY-BEFORE-SIGNING blocks
arrive inside tool responses regardless of client, so you still get the
server-emitted layer. This skill's role is the static, server-independent
layer — it has to be loaded into the agent's context by some mechanism your
client actually supports.
Silencing the MCP's missing-skill warning
vaultpilot-mcp checks for a file at ~/.claude/skills/vaultpilot-preflight/ SKILL.md and prefixes every prepare_* / preview_* response with a warning
when it's absent. On non-Claude clients that warning fires even if you loaded
the invariants via .cursorrules etc. To suppress it, set:
export VAULTPILOT_SKILL_MARKER_PATH=/path/to/any/existing/file
Only do this after you've actually loaded the invariants into your
agent's context — otherwise you're silencing a useful signal. A reasonable
pattern is to point the env var at your non-Claude rules file itself (e.g.
VAULTPILOT_SKILL_MARKER_PATH=$PWD/.cursorrules) so the marker's existence
tracks the rules' existence.
Honest limits
- The skill instructs the agent to run local checks. It does not mechanically block a compromised agent from skipping them.
- A fully-coordinated agent + MCP compromise (both lying together) is
only caught by the on-device Ledger hash match and — optionally — the
get_verification_artifactsecond-LLM paste flow. vaultpilot-mcpalso has a roadmap entry for aPreToolUsehook that would enforce the hash recompute as host-side code, not agent prose.
See SKILL.md for the full invariants the agent is told to enforce.
License
MIT — see LICENSE.