Demo Video Shot List

May 28, 2026 · View on GitHub

Total budget 180 seconds. Hard cap 210 seconds.

Scene 0 — Cold open (0:00–0:12, 12s)

Visual: Black screen, white text "Falsification Engine — pre-registration + CI for AI-agent claims."
Voiceover: "AI agents make claims. This makes those claims falsifiable — before the data arrives."
Transition: Cut to terminal.

Alt-Scene 0 — Docker cold open (optional swap for Scene 0)

Visual: single terminal line — docker run --rm -it falsify-demo
Cut directly to the auto-demo output scrolling.
Voiceover: "One docker run. No Python setup. Watch the whole story."
Use this version if we want to emphasize zero-setup install over the philosophical cold open.

Scene 1 — The problem (0:12–0:30, 18s)

Visual: Terminal showing fake agent output: My model achieves 92 percent accuracy.
Voiceover: "An agent says 92 percent. What was the threshold? What was the metric? When did we decide? Without pre-registration, we're just story-telling."
Cursor runs: ls .falsify/ — empty.

Scene 2 — Lock the hypothesis (0:30–0:55, 25s)

Commands (typed live):

python3 falsify.py init calibration
python3 falsify.py lock calibration

Voiceover: "We declare the claim, metric, threshold before running anything. Lock it. Hash freezes the spec."
Visual highlight: zoom on LOCKED calibration — hash a3f9c12b.

Scene 3 — Run and verdict PASS (0:55–1:20, 25s)

Commands:

python3 falsify.py run calibration
python3 falsify.py verdict calibration

Voiceover: "Run. Verdict is deterministic — brier 0.21, below threshold 0.25, n=20. PASS. Exit code zero."
Visual: green PASS line, echo $? shows 0.

Scene 4 — Tampering test (1:20–1:50, 30s)

Action: open spec.yaml, change threshold 0.25 to 0.15

Commands:

python3 falsify.py diff calibration

Voiceover: "Someone edits the threshold after the fact. Diff shows exactly what changed. Hash no longer matches the lock. Exit code three."
Visual: red diff lines, minus threshold 0.25, plus threshold 0.15.

Scene 5 — FAIL verdict + guard (1:50–2:20, 30s)

Commands:

python3 falsify.py lock calibration
python3 falsify.py run calibration
python3 falsify.py verdict calibration
echo "brier below 0.15 confirmed" | xargs python3 falsify.py guard

Voiceover: "Now FAIL — brier exceeds the new threshold. And if someone tries to commit a contradicting claim, the guard blocks it. Exit code eleven."

Bonus — chain-integrity tamper shot (optional, ~5s)

Action: after the export section (if included), edit a canonical_hash field in the JSONL with sed (or in an editor).
Command: python3 falsify.py verify audit.jsonl → refuses with FAIL line + exit 10.
Voiceover: "Tamper the file, the chain breaks. Verify refuses."
Drives home chain integrity in 5 seconds. Merge with Scene 4 if the overall cut is running long.

Scene 6 — Opus 4.7 layers (2:20–2:45, 25s)

Visual: split screen, file tree on left with highlights:

.claude/skills/hypothesis-author/SKILL.md
.claude/skills/falsify/SKILL.md
.claude/skills/claim-audit/SKILL.md
.claude/agents/claim-auditor.md
.claude/agents/verdict-refresher.md
.github/workflows/falsify.yml

Voiceover: "Three Claude Code skills. Two forked-context subagents. CI gate. Opus 4.7 orchestrates all five — drafting specs, auditing text, refreshing stale verdicts."
Micro-moment (3s max): cut to a browser showing dashboard.html generated from falsify stats --html — verdict cards with colored state badges, dark-mode-aware. Quick pan, then back to file tree.

Scene 7 — Close (2:45–3:00, 15s)

Visual: GitHub repo page, green Actions badge, MIT license visible.
Voiceover: "Falsification Engine. Open source. Built in three days with Claude Opus 4.7. Link in the description."

Pre-record checklist

Terminal font size at least 18pt
Warp clean session, no prior history visible
rm -rf .falsify/calibration before Scene 2
spec.yaml reverted to threshold 0.25
Editor ready on second window for Scene 4 edit
Microphone levels tested (voiceover can be post-recorded)
Screen resolution 1920x1080 minimum

Recording tool: QuickTime Screen Recording, no cursor highlighting, show keystrokes via terminal only, audio post-recorded over silent screencast.

Editing targets

Crossfade between scenes 0.2s
Zoom-in moments: Scene 2 lock hash, Scene 4 diff lines, Scene 5 exit code 11
Background: no music, terminal keystrokes plus voice only, serious tone.