MCP Contract Testing

June 3, 2026 · View on GitHub

Problem: Your AI agent depends on external MCP servers you don't control. When those servers change their tool definitions (rename parameters, remove tools, add required fields), your agent breaks silently.

Solution: EvalView's MCP contract testing captures a snapshot of a server's tool definitions and diffs against it on every CI run. If the interface changed, you know immediately — before running your full test suite.

Detect when external MCP servers change their interface before your agent breaks.

The Problem

When you use MCP servers you don't own (Scenario 2), the server can change its tool definitions at any time: rename parameters, remove tools, add required fields. Your agent tests pass today and fail tomorrow — not because your code changed, but because the server did.

The Solution

MCP contract testing captures a snapshot of a server's tool definitions and diffs against it on every CI run. If the interface changed, you know immediately — before running your full test suite.

This mirrors EvalView's golden baseline system:

Golden traces detect when your agent's behavior drifts
MCP contracts detect when an external server's interface drifts

Quick Start

1. Snapshot a server

evalview mcp snapshot "npx:@modelcontextprotocol/server-github" --name server-github

Output:

Snapshot saved: .evalview/contracts/server-github.contract.json
Tools discovered: 8
  - create_issue
  - list_issues
  - create_pull_request
  - ...

2. Check for drift

evalview mcp check server-github

If the server changed:

CONTRACT_DRIFT - 2 breaking change(s)
  REMOVED: create_pull_request - tool 'create_pull_request' no longer available
  CHANGED: list_issues - new required parameter 'owner'

3. Use in CI

evalview run tests/ --contracts --fail-on "REGRESSION,CONTRACT_DRIFT"

The --contracts flag checks all saved contracts before running tests. If any contract drifted, the run aborts immediately — no point testing against a broken interface.

CLI Reference

`evalview mcp snapshot`

Capture tool definitions from an MCP server.

evalview mcp snapshot <endpoint> --name <server-name> [--notes "..."] [--timeout 30]

Argument	Description
`endpoint`	MCP server endpoint (e.g., `npx:@modelcontextprotocol/server-github`)
`--name`	Human-readable identifier for this contract (required)
`--notes`	Optional notes about this snapshot
`--timeout`	Connection timeout in seconds (default: 30)

Supports all MCP transport types:

stdio: "npx:@modelcontextprotocol/server-filesystem /tmp"
HTTP: "http://localhost:8080"
Command: "stdio:python my_server.py"

`evalview mcp check`

Compare current server interface against a saved contract.

evalview mcp check <name> [--endpoint <override>] [--timeout 30]

Argument	Description
`name`	Contract name (from `--name` in snapshot)
`--endpoint`	Override endpoint (default: use endpoint from snapshot)

Exit codes:

0 — No breaking changes
1 — Breaking changes detected (CONTRACT_DRIFT)
2 — Could not connect to server

`evalview mcp list`

List all saved contracts.

evalview mcp list

`evalview mcp show`

Show full details of a contract including all tool schemas.

evalview mcp show <name>

`evalview mcp delete`

Remove a contract.

evalview mcp delete <name> [--force]

Integration with `evalview run`

The --contracts flag adds a pre-flight check to any test run:

evalview run tests/ --contracts

This checks all contracts in .evalview/contracts/ before running tests. Combine with --fail-on CONTRACT_DRIFT to fail CI on drift:

evalview run tests/ --contracts --fail-on "REGRESSION,CONTRACT_DRIFT"

Or use --strict (now includes CONTRACT_DRIFT):

evalview run tests/ --contracts --strict

GitHub Actions

- name: Run EvalView
  uses: hidai25/eval-view@v0.8.0
  with:
    diff: true
    contracts: true
    fail-on: 'REGRESSION,CONTRACT_DRIFT'

What Gets Detected

Breaking changes (trigger CONTRACT_DRIFT)

Change	Example
Tool removed	`create_pull_request` no longer exists
Required parameter added	New required param `owner` on `list_issues`
Parameter removed	`repo` param no longer accepted
Parameter type changed	`limit` changed from `string` to `integer`
Parameter became required	`owner` was optional, now required

Informational changes (logged, don't fail)

Change	Example
New tool added	`merge_pull_request` now available
Optional parameter added	New optional param `labels` on `create_issue`
Description changed	Tool description updated

Contract File Format

Contracts are stored as JSON in .evalview/contracts/:

{
  "metadata": {
    "server_name": "server-github",
    "endpoint": "npx:@modelcontextprotocol/server-github",
    "snapshot_at": "2026-02-07T10:30:00",
    "protocol_version": "2024-11-05",
    "tool_count": 8,
    "schema_hash": "a1b2c3d4e5f67890"
  },
  "tools": [
    {
      "name": "create_issue",
      "description": "Create a new issue in a GitHub repository",
      "inputSchema": {
        "type": "object",
        "properties": {
          "repo": { "type": "string" },
          "title": { "type": "string" },
          "body": { "type": "string" }
        },
        "required": ["repo", "title"]
      }
    }
  ]
}

Commit these files to your repo so CI can use them.

Best Practices

Snapshot after verifying — Run your tests first, confirm everything works, then snapshot. The contract represents a known-good interface.
Refresh periodically — If a contract is >30 days old, evalview mcp check will warn you. Re-snapshot to accept intentional changes.
One contract per server — Name contracts after the server, not the tools. server-github not create-issue-tool.
Commit contracts — Store .evalview/contracts/ in git. They're small JSON files and CI needs them.
Check before testing — Use --contracts on evalview run so drift is caught before wasting time on tests that will fail anyway.

Golden Traces (Regression Detection) — Detect behavioral drift in your agent
CI/CD Integration — Run contract checks in CI
CLI Reference — Full command reference for evalview mcp
Framework Support — Supported agent frameworks