Dev Browser Eval

December 6, 2025 ยท View on GitHub

Benchmarking suite for testing different browser automation methods with Claude Code against the game-tracker application.

Setup

./setup.sh

This will:

  1. Clone the game-tracker repository
  2. Copy .env.local from ~/game-tracker/.env.local
  3. Install dependencies

Running Benchmarks

Run all benchmarks (3 runs each):

bun run scripts/benchmark.ts

Run a specific method:

bun run scripts/benchmark.ts dev-browser
bun run scripts/benchmark.ts playwright-skill
bun run scripts/benchmark.ts playwright-mcp

Generate Report

After running benchmarks:

bun run scripts/generate-benchmark.ts

This generates benchmark-comparison.md with averaged results.

Methods

MethodDescription
dev-browserUse the dev browser plugin
playwright-skillUse the playwright skill plugin
playwright-mcpUse playwright MCP server

Utility Scripts

  • bun run scripts/reset.ts - Reset dev environment (kills ports, clears /tmp, resets DB)