Recipe: Debug Check vs Snapshot Mismatch
March 25, 2026 ยท View on GitHub
Goal
Debug cases where evalview snapshot and evalview check appear to disagree about the same test.
Read These Files First
evalview/commands/shared.pyevalview/commands/check_cmd.pyevalview/commands/snapshot_cmd.pyevalview/core/golden.pyevalview/core/diff.pyevalview/core/types.py
Typical Symptoms
- a test snapshots successfully but immediately shows as changed in
check - a tool path looks correct in one flow but different in the other
- baseline metadata or model info appears missing or inconsistent
- multi-turn behavior passes in one command path and drifts in the other
Debug Flow
- Confirm the same test file and test name are being used in both commands.
- Confirm the same adapter and endpoint are being used.
- Inspect whether
snapshotandcheckare both routing through the same helper behavior inevalview/commands/shared.py. - Inspect what was persisted in
.evalview/golden/viaGoldenStore. - Inspect
TraceDiffgeneration inevalview/core/diff.py. - Check whether model metadata, per-turn data, tool sequences, or parameter diffs are being preserved differently between flows.
Useful Commands
python -m evalview snapshot --preview
python -m evalview check --dry-run
python -m evalview check --strict
pytest -q tests/test_check_cmd.py tests/test_snapshot_generated_workflow.py tests/test_diff_engine.py
Done Criteria
- the mismatch cause is identified as either execution, persistence, or diffing
- the fix is covered by tests in the relevant command/core modules
- snapshot/check semantics remain consistent after the fix
Common Pitfalls
- debugging terminal output instead of inspecting golden persistence and diff generation
- assuming the adapter returned identical trace structure in both paths
- fixing display code when the real bug is in
GoldenStoreorDiffEngine