Overcode vs Bernstein: Feature Bakeoff
April 28, 2026 · View on GitHub
Overview
| Bernstein | Overcode | |
|---|---|---|
| Repo | chernistry/bernstein | This project |
| Language | Python 3.12+ (FastAPI + Textual) | Python 3.12+ (Textual TUI) |
| Stars | See README badge (not fetched) | N/A (private) |
| License | Apache 2.0 (LICENSE) | Proprietary |
| First Commit | ~2026-03-28 (sampled window) | 2025 |
| Last Commit | 2026-04-15 (active) | Active |
| Purpose | Deterministic orchestrator that decomposes a goal, spawns parallel agents in git worktrees, verifies output, and merges results | Claude Code supervisor/monitor with instruction delivery |
Scale indicators from ~/Code/bernstein:
- ~1,984 commits, ~341K LOC of Python in
src/, 989 source files, 1,043 test files, 135 docs - Version 1.7.5 in
pyproject.toml:7
Core Philosophy
Bernstein's thesis is that scheduling should be deterministic code, not an LLM. The orchestrator is a pure-Python state machine that takes a goal, asks a "manager" LLM to decompose it once into tasks with owned files and completion signals, then autonomously spawns short-lived CLI agents in isolated git worktrees, verifies their output with concrete gates (tests pass, lint clean, types correct, PII absent, fingerprint not copy-pasted from training data), and merges on success. The README's catchphrase — "Zero LLM tokens on scheduling" (README.md:39) — captures the design principle. The supporting manifesto lives in docs/the-bernstein-way.md:43-49 and docs/WHY_DETERMINISTIC.md.
Agents are treated as disposable, interchangeable workers (docs/the-bernstein-way.md:1-120). Any of ~20 CLI adapters (Claude Code, Codex, Gemini, Cursor, Aider, Ollama, Goose, etc., registered in src/bernstein/adapters/registry.py:33-53) can be swapped in, and roles can be routed to different models via role_model_policy in bernstein.yaml. The workflow loop is: bernstein init → bernstein -g "goal" (or bernstein run plan.yaml) → decompose → spawn-in-worktrees → verify → merge → bernstein recap. A --evolve mode lets Bernstein iterate on its own codebase autonomously under a budget and cycle cap (src/bernstein/cli/main.py:440-464, src/bernstein/evolution/loop.py).
Where Overcode is a control tower over long-lived, human-driven Claude Code sessions in tmux, Bernstein is a batch job runner for short-lived agents that prizes reproducibility and CI-friendliness (bernstein --headless emits structured JSON and non-zero exit codes on failure — README.md:73, src/bernstein/cli/main.py:485). Human interaction in Bernstein happens at goal entry, approval gates, and post-run review — not live keystrokes into agent shells.
Feature Inventory
1. Agent Support
Bernstein:
- Hardcoded registry of ~18 built-in adapters (
src/bernstein/adapters/registry.py:33-53): Claude Code, Codex, Gemini, Qwen, Aider, Ollama, Cursor, Continue.dev, Cody, Goose, IaC (Terraform/Pulumi), Kilo, Kiro, OpenCode, Roo Code, Tabby, Amp, CloudflareAgents, CloudflareCodex. GenericAdapterfallback for any CLI accepting--prompt(src/bernstein/adapters/registry.py:99-100).- Third-party adapters discoverable via pluggy entry-point group
bernstein.adapters(src/bernstein/adapters/registry.py:58-80,pyproject.toml:106-109). Adapters subclassCLIAdapter(src/bernstein/adapters/base.py) and implementspawn/kill/is_alive/name. - Auto-discovery cached on startup via
discover_agents_cached()insrc/bernstein/cli/main.py:411-416. - Agent-agnostic: the scheduler doesn't care which adapter runs a task; mix cheap local + heavy cloud models in one run (
README.md:40,bernstein.yamlrole_model_policy). - Any adapter can also serve as the internal scheduler LLM (
internal_llm_provider/internal_llm_modelinbernstein.yaml).
Overcode: Claude Code only. No plugin system for other agents.
2. Agent Launching
Bernstein CLI (Click-based, entry point bernstein.cli.main:cli, pyproject.toml:99). Commands observed:
- Core:
bernstein -g "GOAL",bernstein run <plan.yaml>,bernstein run --dry-run <plan.yaml>,bernstein init,bernstein init-wizard,bernstein quickstart,bernstein demo,bernstein stop,bernstein stop --force. - Monitoring:
bernstein live,bernstein dashboard,bernstein status,bernstein ps,bernstein cost,bernstein doctor,bernstein recap,bernstein trace <ID>,bernstein run-changelog --hours 48,bernstein dry-run,bernstein dep-impact,bernstein explain <cmd>,bernstein aliases,bernstein config-path,bernstein debug-bundle. - Agents:
bernstein agents list,bernstein agents sync,bernstein agents discover. - Evolution:
bernstein evolve(with--budget,--max-cycles,-y). - Git/PR:
bernstein fingerprint build --corpus-dir ~/oss-corpus,bernstein fingerprint check src/foo.py. - Cloud:
bernstein cloud init,bernstein cloud deploy,bernstein cloud run plan.yaml. - Persistence:
bernstein checkpoint,bernstein wrap-up. - Task control:
approve,reject(src/bernstein/cli/task_cmd.py). - Server:
bernstein mcp-server --port 3000(MCP gateway). - Flags on main run:
--headless,--auto-approve,-y,--fresh,--evolve/-e,--budget,--max-cycles,--classic(Rich Live instead of Textual). Source:src/bernstein/cli/main.py:478-497. - Command aliases resolved by
_RichGroup(src/bernstein/cli/main.py:372-405,src/bernstein/cli/aliases.py) — e.g.s→status.
How agents are spawned:
- Default: short-lived local subprocess per task (
src/bernstein/core/agents/spawner_core.py:1-76,AgentSpawner). - Optional: Docker container via
ContainerManager(imported inspawner_core.py:21). - Optional: Cloudflare Workers via
CloudflareBridge(src/bernstein/bridges/cloudflare.py:46-86). - Each spawn creates a git worktree at
.sdd/worktrees/{session_id}on branchagent/{session_id}(src/bernstein/core/git/worktree.py:38,53).
Prompt delivery:
- Role-specific system prompt via
CLIAdapter.spawn(request)(src/bernstein/adapters/base.py). - Task context file (owned files, completion signals, project context) assembled in
src/bernstein/core/context.py. - MCP tools auto-injected via
src/bernstein/adapters/skills_injector.py.
Templates / plans: YAML plan files with goal, cli, max_agents, role_model_policy, budget, internal_llm_provider, evolution_enabled, quality_gates, constraints, context_files (bernstein.yaml:1-45). Manager LLM decomposes into tasks with role, owned_files, completion_signals, dependencies (src/bernstein/core/tasks/models.py, src/bernstein/core/orchestration/manager_parsing.py).
Overcode: TUI action or CLI launch, prompt delivered via tmux send-keys. Preset launch templates exist but no YAML plan format.
3. Session / Agent Lifecycle
Bernstein agent states (src/bernstein/tui/agent_states.py:24-38):
SPAWNING (⊚̴ yellow), RUNNING (● green), STALLED (◐ orange), MERGING (⇄ blue), DEAD (○ red), IDLE (▢ gray), UNKNOWN (⊙ dim). Each has a color and animated spinner frame (agent_states.py:52-79).
Task states: open, claimed, done, failed, retry, blocked (waiting on dependency) — enumerated in .sdd/backlog/{open,claimed,closed}/ directory structure.
Persistence — entirely file-based under .sdd/ (docs/the-bernstein-way.md:82):
.sdd/worktrees/{session_id}/— per-agent checkout.sdd/backlog/{open,claimed,closed}/— task YAML files.sdd/runtime/— metrics, agent metadata, costs.sdd/metrics/kill_audit.jsonl— kill events.sdd/quarantine/{session_id}.json— quarantined branches.sdd/index/knowledge_graph.db— SQLite codebase graph (src/bernstein/core/knowledge/knowledge_graph.py:18).sdd/evolution/experiments.jsonl— evolve-mode log.sdd/routing/{policy.json,bandit_state.json}— ML router state.sdd/audit/YYYY-MM-DD.jsonl— HMAC-chained audit log
Crash recovery: Write-ahead log in src/bernstein/core/persistence/wal.py, replay on start in wal_replay.py. Session continuity in session_continuity.py, explicit bernstein checkpoint command in checkpoint.py.
Resume: Restart loads .sdd/runtime/agents.json + task store; --fresh forces clean start (src/bernstein/cli/main.py:488).
Cleanup: Graceful stop drains in-flight agents and merges successful branches; stop --force kills immediately and marks tasks failed (src/bernstein/cli/stop_cmd.py). Worktree cleanup + orphan pruning in src/bernstein/core/agents/spawner_core.py:45-49.
Overcode: Session states include running / waiting_user / waiting_approval / error / idle / stopped. Sessions persist in JSON via session_manager.py, survive TUI restarts, can be resumed with --resume. No WAL; cleanup ends the tmux window.
4. Isolation Model
Bernstein:
- Git worktrees by default, one per agent, branch
agent/{session_id}(src/bernstein/core/git/worktree.py:53). WorktreeSetupConfigsupports shallow checkouts, symlinks for large directories likenode_modules,venv,dist(worktree.py:42-73) with platform-specific warnings for Windows symlink permissions (worktree.py:57-70).- Sparse checkout via
sparse_paths(worktree.py:77) for monorepos. worktree: falseinbernstein.yamldisables isolation; agents share main checkout using file-ownership declarations to avoid races (docs/the-bernstein-way.md:57).- Merge workflow: janitor verifies completion signals, then merges
agent/{session_id}→ main; failed merges → quarantine at.sdd/quarantine/{session_id}.json. Merge orchestration insrc/bernstein/core/agents/spawner_merge.py. - Sub-task trees represented as task dependencies in the backlog (not worktree-of-worktree); manager LLM emits a DAG (
src/bernstein/core/orchestration/manager.py:1-100). - Conflicts are not auto-resolved — task is marked failed for human review.
Overcode: Shared repo, no worktrees, no automated merge. Agents coordinate via parent/child hierarchy and standing instructions, not filesystem isolation.
5. Status Detection
Bernstein:
- Heartbeat + PID monitoring (
src/bernstein/core/agents/heartbeat.py,HeartbeatMonitor) — stalled agents flagged for circuit breaker. - Log parsing against
completion_signalsdeclared per task (src/bernstein/core/tasks/task_completion.py): tests passed, files exist, lint clean, types clean. - Web dashboard polls
/status,/agents,/tasksendpoints (HTTP in-process). bernstein liverefresh interval configurable, default 2.0s (src/bernstein/cli/commands/advanced_cmd.py:49-52).- No LLM is used for status detection. Cost of detection: essentially free.
Overcode: 442 regex patterns + Claude Code hook events (instant, authoritative) with per-session toggle. Richer taxonomy for interactive Claude Code sessions.
6. Autonomy & Auto-Approval
Bernstein:
--headlessfor CI (JSON output, non-zero exit on failure,src/bernstein/cli/main.py:485).--auto-approveskips task approval prompt (main.py:497).-y/--yesskips cost confirmation in evolve mode (main.py:487).- Evolve mode safety:
--evolverequires a hard--budgetand--max-cycles;_validate_evolve_mode()refuses to start without them (main.py:440-464). - Quality gates (
src/bernstein/core/quality/, 60+ modules):lint_clean,test_passes,path_exists,llm_judge,type_check, coverage, PII scan, fingerprint, arch conformance — configured per-role inbernstein.yaml:33-38. - Circuit breaker (
src/bernstein/core/observability/circuit_breaker.py:1-120): trips on scope violations (edits outsideowned_files), budget violations, or behavioral anomalies (infinite loops). Writes.sdd/kill/{session_id}.json, appends tokill_audit.jsonl, quarantines branch. Orchestrator picks up signals inagent_lifecycle.check_kill_signals().
Overcode: Claude-powered supervisor daemon with standing instructions (25 presets), risk-assessed auto-approval, fork-with-context, oversight with stuck detection.
7. Supervision & Instruction Delivery
Bernstein:
- Manager LLM (
src/bernstein/core/orchestration/manager.py:1-100, prompts inmanager_prompts.py) runs once at decomposition — default Claude Sonnet. - Janitor module runs inside the orchestrator event loop, responsible for verification + merge decisions.
- Supervisor daemon at
src/bernstein/core/server/server_supervisor.pywatches the task queue when the server is running. - Mid-flight control is coarse: change task priority via
POST /tasks/{id},bernstein approve/bernstein rejectto walk approval gates (src/bernstein/cli/task_cmd.py,src/bernstein/core/orchestration/trigger_manager.py), or write a kill signal to.sdd/kill/{id}.json. No keystroke-level injection into running agents. - Cross-model review (
src/bernstein/core/quality/cross_model_verifier.py) — an independent model grades another agent's diff. - Token growth monitoring (
src/bernstein/core/tokens/token_monitor.py) + Z-score anomaly flagging (src/bernstein/core/cost/cost_anomaly.py); auto-intervention pauses spawning when thresholds trip. - Audit log: HMAC-chained JSONL in
.sdd/audit/YYYY-MM-DD.jsonl(src/bernstein/core/security/audit.py); integrity check inaudit_integrity.py.
Overcode: Heartbeat system delivers periodic instructions to idle agents; 25 standing-instruction presets; intervention history logged per agent; fork-with-context inherits conversation.
8. Cost & Budget Management
Bernstein:
TokenUsagedataclass (src/bernstein/core/cost/cost_tracker.py:52-80) tracksinput_tokens,output_tokens,cache_hit,cache_read_tokens,cache_write_tokens.- Per-model pricing table
_MODEL_COST_USD_PER_1K(cost_tracker.py:27) covers Claude opus/sonnet/haiku, Codex gpt-5.4/mini, Gemini, etc. - Cache economics: read @ ~10%, write @ ~25% of base price; optimizer in
src/bernstein/core/tokens/claude_prompt_cache_optimizer.py. - Cost records persisted to
.sdd/runtime/costs/{run_id}.json. - Budget enforcement is hard: thresholds at 80% (warn), 95% (critical), 100% (stop spawning) —
DEFAULT_HARD_STOP_THRESHOLD(cost_tracker.py:42-44). bernstein cost [--since 7d] [--by-model] [--by-task]reporting (src/bernstein/cli/commands/cost.py:1-100).- Z-score anomaly detection flags tasks >2σ from baseline (
src/bernstein/core/cost/cost_anomaly.py) and emits predictive alerts (notifications.py:53-66:predictive.budget_exhaustion,predictive.completion_decline,predictive.run_overrun).
Overcode: Per-agent $ budgets with soft enforcement (warn + skip, not kill). Cost display in TUI columns.
9. Agent Hierarchy & Coordination
Bernstein:
- No peer agent-to-agent chat — explicitly unsupported (
README.md:148). - Manager/worker pattern only: one-shot LLM decomposes the goal, deterministic scheduler assigns tasks to workers.
- Roles (backend, frontend, QA, docs, security) defined in role policy; workers never see each other's raw output — only merged results in main.
- Task dependencies create an implicit DAG (
src/bernstein/core/tasks/models.py—dependenciesfield). - Cascade routing: on failure,
src/bernstein/core/routing/cascade_router.pyescalates to a stronger model (haiku → sonnet → opus). - Bandit router (
src/bernstein/core/cost/bandit_router.py:1-120) learns optimal (model, effort) per task type via LinUCB. - Cascade operations: kill signals propagate, budget limits cap all children under a run.
Overcode: Parent/child trees 5 levels deep, cascade kill, fork with full context, agent-to-agent messaging via instruction delivery.
10. TUI / UI
Bernstein:
- Framework: Textual ≥1.0 (
pyproject.toml:52); main appBernsteinApp(App[None])insrc/bernstein/tui/app.py:15-56. - Layout: three-column
bernstein livedashboard — agent list (left), task list (center), activity feed + sparkline + chat input (right). Widgets insrc/bernstein/tui/widgets.py(AgentLogWidget,TaskListWidget,CoordinatorDashboard,TimelineEntry). - Keybindings declared in
BINDINGS(app.py:66-82):/search,qquit,?help,Spaceapprove, arrow keys navigation. Additional bindings configurable in~/.bernstein/keybindings.yaml(resolver inkeybinding_config.py). - Classic mode:
--classicswaps Textual for Rich Live (simpler, no mouse —src/bernstein/cli/commands/advanced_cmd.py:56-58). - Customization: keybindings, refresh interval, theme via config.
Overcode: ~50+ keybindings, timeline view, configurable columns, Textual framework, side-question subagents.
11. Terminal Multiplexer Integration
Bernstein: Not used. Agents are bare subprocesses with piped stdio; no tmux/zellij/screen dependency. The orchestrator owns lifecycle directly.
Overcode: tmux-backed, one window per agent, layout management, live pane visibility.
12. Configuration
Bernstein.yaml options (observed in bernstein.yaml:1-45 and docs/CONFIG.md):
goal,cli(adapter),max_agents,role_model_policy,budget(USD; 0 = unlimited),internal_llm_provider,internal_llm_model,evolution_enabled,auto_decompose,constraints(e.g. "Pyright strict", "Ruff"),quality_gates(per-role gate list),context_files,agency.path,worktree: true/false.
Environment variables (grep of BERNSTEIN_* + related):
BERNSTEIN_SERVER_URL(defaulthttp://localhost:8052,src/bernstein/tui/app.py:91)BERNSTEIN_WORKDIR,BERNSTEIN_ADAPTER,BERNSTEIN_HEADLESS,BERNSTEIN_DEBUGGITHUB_APP_ID,GITHUB_APP_PRIVATE_KEY,GITHUB_WEBHOOK_SECRET(src/bernstein/github_app/app.py:62-80)
Hook / event system (docs/hook-system.md, src/bernstein/plugins/hookspecs.py:92-150):
- pluggy-based. Entry-point group
bernstein.plugins. - Hooks:
before_spawn,after_spawn,before_merge,after_merge,on_failure,on_completion,on_budget_warning. - Orchestration by
PluginManager(src/bernstein/plugins/manager.py).
Overcode: TOML config, per-project + global, supervisor/heartbeat-focused options, shell hooks into Claude Code.
13. Web Dashboard / Remote Access
Bernstein:
- Stack: FastAPI ≥0.115 + uvicorn ≥0.30 (
pyproject.toml:46-47), built viasrc/bernstein/core/server/server_app.py:20-100. - Port: 8052 default (
src/bernstein/cli/commands/advanced_cmd.py:181). - Routes: 51 modules under
src/bernstein/core/routes/—/status,/tasks,/agents,/cost,/logs,/health, MCP, A2A. - Real-time: Server-Sent Events bus in the server app factory.
- Cluster mode: multi-node via shared task store (Postgres backend in
src/bernstein/core/persistence/store_postgres.py; node registry insrc/bernstein/core/cluster.py; Redis-optional leader election). - Multi-repo workspaces: task.
repo_pathtargets individual repositories.
Overcode: HTTP API + web dashboard with analytics; Sister integration for cross-machine monitoring.
14. Git / VCS Integration
Bernstein:
- Worktree lifecycle:
git worktree add .sdd/worktrees/{id} -b agent/{id}on spawn;git worktree remove+ branch delete on cleanup (src/bernstein/core/git/git_ops.py). - Commits: made inside the worktree by agents; janitor verifies before merge (
src/bernstein/core/git/git_ops.py). - PRs: optional — successful tasks can open PRs instead of direct merges (
src/bernstein/core/git/git_pr.py). - GitHub App: webhooks, JWT + installation token auth (
src/bernstein/github_app/app.py:1-100,webhooks.py), slash commands/bernstein fix-lint,/bernstein run-tests(slash_commands.py), check runs (check_runs.py). - Conflict handling: conflicts fail the task — no auto-resolution.
Overcode: No automated merges or PRs (relies on user); Claude Code sessions do commits.
15. Notifications & Attention
Bernstein: Multi-channel fan-out in src/bernstein/core/communication/notifications.py:1-100 and desktop_notify.py:
- Desktop (macOS/Linux via
notify-sendor native APIs). - Slack (Block Kit), Discord (embeds), Telegram (bot), PagerDuty (severity-mapped,
notifications.py:89-96), SMTP email (notifications.py:32-37), generic webhooks. - Events:
run.started,task.completed,task.failed,run.completed,budget.warning(80%),budget.exhausted(100%),approval.needed,incident.critical, predictive alerts (notifications.py:53-66).
Overcode: No native notifications (documented gap).
16. Data & Analytics
Bernstein:
- Session history: everything under
.sdd/is inspectable; WAL is JSONL (src/bernstein/core/persistence/wal.py). - Export formats: JSON (task records, costs, logs), Parquet (columnar, optional), CSV (cost summaries).
- Metrics: Prometheus
/metricsendpoint (prometheus-clientinpyproject.toml:58, metric defs insrc/bernstein/core/prometheus.py) —agents_spawned_total,tasks_completed_total,task_duration_seconds,cost_usd,tokens_consumed. - OpenTelemetry exporter presets; Grafana dashboards shipped as pre-built bundles.
bernstein recappost-run summary andbernstein run-changelog --hours 48for diff-based change logs.
Overcode: Parquet export, web analytics, timeline view.
17. Extensibility
Bernstein:
- pluggy plugin system (
src/bernstein/plugins/manager.py): write@hookimplfunctions, register viabernstein.pluginsentry point. - MCP bidirectional: serve via
bernstein mcp-server --port 3000(src/bernstein/core/protocols/mcp_gateway.py), consume via skills injection into agents (src/bernstein/adapters/skills_injector.py). MCP 1.0 and 1.1 compatible (README badges). - A2A protocol (experimental) with
/a2a/tasks/send,/a2a/messages,/a2a/agent.jsondiscovery (src/bernstein/core/protocols/a2a.py:10-42) — use case is federation with external orchestrators. - Custom adapters: subclass
CLIAdapter(src/bernstein/adapters/base.py:1-150),register_adapter()or entry-point (src/bernstein/adapters/registry.py:114-122).
Overcode: Limited extensibility; no public plugin system.
18. Developer Experience
Bernstein:
- Install: pip / pipx / uv /
brew install chernistry/tap/bernstein/ Fedora COPR /npx bernstein-orchestrator(README.md:183-192). VS Code + Open VSX extensions. - First run:
bernstein initscaffolds.sdd/andbernstein.yaml;bernstein init-wizardis interactive (src/bernstein/cli/commands/init_wizard_cmd.py);bernstein quickstart/bernstein demo(60-second Flask TODO). - Docs: 135 markdown files under
docs/, includingGETTING_STARTED.md,ARCHITECTURE.md,FEATURE_MATRIX.md,GLOSSARY.md,KNOWN_LIMITATIONS.md,WHY_DETERMINISTIC.md,the-bernstein-way.md, migration guides from CrewAI / LangGraph, Cloudflare setup. ReadTheDocs (mkdocs.yml). - Tests: 1,043 test files spanning
tests/unit,tests/integration,tests/golden(deterministic),tests/chaos(failure injection),tests/pentest(security),tests/benchmarks. - CI: GitHub Actions (badge in README,
.github/workflows/ci.yml); codecov (codecov.yml); SonarCloud (sonar-project.properties); mutation testing (mutmut_config.py); dead-code detection (vulture_whitelist.py);typos.toml. - Key deps (
pyproject.toml): click≥8.1, rich≥13, textual≥1.0, fastapi≥0.115, uvicorn≥0.30, httpx≥0.27, websockets≥14, pydantic-settings≥2.13.1, pluggy≥1.5, mcp≥1.0, cryptography≥45, prometheus-client≥0.21, opentelemetry-*≥1.30, watchdog≥4.
Overcode: Python project, ~1700 tests, fixtures-based testing, no public install path.
Unique / Notable Features
- "Zero LLM tokens on scheduling" — the orchestrator is pure deterministic Python; the LLM is invoked once per run for decomposition and optionally for cross-model review. Auditable and reproducible (
docs/WHY_DETERMINISTIC.md,docs/the-bernstein-way.md:43-49). - Self-evolution (
--evolve) — Bernstein reads its own codebase, proposes improvements, executes them, and can merge to main, all under a hard budget and cycle cap with experiment log at.sdd/evolution/experiments.jsonl(src/bernstein/evolution/loop.py, safety checks atsrc/bernstein/cli/main.py:440-464). - Contextual bandit router (LinUCB) —
src/bernstein/core/cost/bandit_router.py:1-120. Feature vector includes complexity, scope, priority, repo size, token estimate, task type, language, role; reward isquality_score * (1 - normalized_cost). 50-task warmup, state persisted to.sdd/routing/bandit_state.json. - Output fingerprinting — MinHash/LSH similarity check against an OSS corpus to detect copy-paste from training data (license risk).
bernstein fingerprint build --corpus-dir ~/oss-corpusandbernstein fingerprint check(src/bernstein/core/quality/output_fingerprint.py:1-100). - HMAC-chained tamper-evident audit log — each entry's HMAC includes the previous entry's HMAC;
bernstein audit checkverifies the chain (src/bernstein/core/security/audit.py,audit_integrity.py). - Knowledge graph of codebase (SQLite) — files, functions, classes, edges (imports, calls, inherits), used for owned-file suggestions, task routing, and LLM context (
src/bernstein/core/knowledge/knowledge_graph.py:1-100). - Cloudflare edge execution — agents run in V8 isolates on Workers with R2 workspace sync, Workers AI as LLM provider, D1 for metrics, Vectorize for semantic cache, browser rendering (
src/bernstein/bridges/cloudflare.py:46-100). - Semantic cache — Vectorize-backed chunk cache reduces re-reading of the same files (
src/bernstein/core/knowledge/semantic_cache.py). - Quality gate matrix — lint + type-check + tests + PII scan + arch conformance + fingerprint +
llm_judge(src/bernstein/core/quality/, 60+ gate modules) composable per role. - Predictive alerts —
predictive.budget_exhaustion,predictive.completion_decline,predictive.run_overrunfired before they actually happen (src/bernstein/core/communication/notifications.py:53-66). - MCP + A2A federation — bidirectional MCP and experimental A2A protocol for agent discovery across orchestrators (
src/bernstein/core/protocols/mcp_gateway.py,src/bernstein/core/protocols/a2a.py). - Cross-model verifier — an independent model judges another agent's diff as a gate (
src/bernstein/core/quality/cross_model_verifier.py).
Strengths Relative to Overcode
- End-to-end merge workflow. Bernstein owns the loop from goal → decomposition → spawn → verify → merge, with PR creation and conflict quarantining. Overcode stops at "agent produced changes; user reviews." Specific: worktree creation (
src/bernstein/core/git/worktree.py), janitor verification (src/bernstein/core/quality/gate_runner.py), merge with quarantine fallback (spawner_merge.py). - Concrete verification gates. Overcode trusts Claude's self-reporting; Bernstein checks tests-pass, lint-clean, type-check, path-exists as hard signals before merge. The
completion_signalsfield forces authors of goals to state what success looks like up front. - Provider agnosticism. 18+ adapters with pluggy third-party extension. Overcode is Claude-Code-only.
- CI/headless mode.
bernstein --headlessemits JSON, non-zero exit on failure, integrates with GitHub App check runs. Overcode is interactive-first. - Cost governance. Hard budget stops at 80/95/100%, Z-score anomaly detection, per-model tracking with cache-aware pricing. Overcode's budget is soft-skip.
- Audit and compliance. HMAC-chained audit log satisfies regulated-industry requirements Overcode doesn't address.
- Installable. PyPI, Homebrew, COPR, npm. Overcode is private.
- Git isolation by default. Worktree-per-agent prevents the "two agents editing the same file" class of problems entirely. Overcode assumes human supervision catches this.
- GitHub App integration. Slash commands on PRs (
/bernstein fix-lint), check runs, webhook-driven triggers. - ML routing. LinUCB bandit learns model selection per task. Overcode has no model selection policy.
- Documentation depth. 135 docs, migration guides, ReadTheDocs, ADRs in
docs/decisions/. Overcode has design docs only. - Observability. Prometheus/OTel/Grafana out of the box vs Overcode's in-TUI metrics only.
Overcode's Relative Strengths
- Live supervision of long-lived sessions. Overcode is built for the "agent that's been running for an hour and is stuck" case. Bernstein's agents are short-lived; once spawned they run to completion or die. Overcode's heartbeat + standing instructions (25 presets) + Claude-powered supervisor daemon is a whole layer Bernstein doesn't have.
- tmux-native. Users who want to drop into a pane and type at the agent can. Bernstein agents run as piped subprocesses with no shell visibility.
- Agent hierarchy and fork-with-context. Overcode supports parent/child trees 5 levels deep with cascade kill and conversation forking. Bernstein's hierarchy is a flat task DAG emitted by the manager.
- Status detection for Claude specifically. 442 regex patterns + Claude Code hooks give zero-latency authoritative detection with a rich taxonomy (waiting_user, waiting_approval, error, compact_running, side_question, etc.). Bernstein's status is coarse (RUNNING/STALLED/DEAD) because it doesn't need more.
- Keyboard-dense TUI. ~50+ bindings and a configurable column model; Bernstein's TUI is lighter-weight.
- Mid-flight intervention. Overcode can inject an instruction into a running agent's tmux pane. Bernstein's equivalents are coarse (kill signal, approve/reject at gates).
- Side-question subagents with token accounting. Overcode handles compact/side-question subagents and keeps their tokens from double-counting (bcfb3a6). Bernstein doesn't expose a comparable interactive aside.
- Sister cross-machine integration for monitoring agents across hosts.
Adoption Candidates
| Idea | Value | Complexity | Notes |
|---|---|---|---|
Completion-signal fields on tasks (tests_pass, path_exists, lint_clean, type_check) with an automatic verifier before marking done | High | Medium | Would force users to articulate "done" up front; Overcode could run these as an optional lint before the user clears an agent. Source of pattern: src/bernstein/core/quality/ gate modules and src/bernstein/core/tasks/task_completion.py. |
| Hard budget thresholds at 80/95/100% with predictive alerts before exhaustion | High | Low | Overcode's soft-skip is easy to miss; a hard 100% stop plus an 80% warning is a small change with large blast-radius reduction. See src/bernstein/core/cost/cost_tracker.py:42-44 and notifications.py:53-66. |
| HMAC-chained audit log for agent spawn/kill/instruction/budget events | Medium | Medium | Small compliance upside for users in regulated industries; implementation is ~200 LOC. Reference: src/bernstein/core/security/audit.py + audit_integrity.py. |
| Z-score cost anomaly detection per task type | Medium | Low | Flags a runaway agent 10× faster than a flat threshold. Reference: src/bernstein/core/cost/cost_anomaly.py. |
--headless mode with JSON output and non-zero exit on failure | High | Low | Unlocks CI/cron use cases and makes Overcode scriptable outside a terminal. Reference: src/bernstein/cli/main.py:485. |
init-wizard / doctor / debug-bundle commands for onboarding and support | Medium | Low | Huge DX wins: bernstein doctor pre-flights environment; bernstein debug-bundle zips logs + config for bug reports. |
| Predictive alerts (budget exhaustion, completion-rate decline, run-overrun) | Medium | Medium | Acts on trend, not state. Reference: src/bernstein/core/communication/notifications.py:53-66. |
| Pluggy plugin system for user-defined hooks (before_spawn, after_spawn, on_failure) | Medium | Medium | Lets users add org-specific approval gates, telemetry, secret scanners. Reference: src/bernstein/plugins/hookspecs.py:92-150. |
| Structured agent state enum with color + spinner frame per state | Low | Low | Nicer-looking TUI. Reference: src/bernstein/tui/agent_states.py:24-79. |
Prometheus /metrics endpoint from the Overcode web dashboard | Medium | Low | Overcode already has a dashboard; adding /metrics is trivial and opens Grafana integration. Reference: src/bernstein/core/prometheus.py. |
| Multi-channel notification fan-out (Slack/Discord/PagerDuty/email/webhook) with event taxonomy | Medium | Medium | Overcode's documented gap is notifications; copy the event catalogue. Reference: src/bernstein/core/communication/notifications.py. |
| Optional worktree isolation mode for Overcode users who do want it | High | High | Would close the biggest feature gap. Bernstein's WorktreeSetupConfig with shallow clone + symlinks (src/bernstein/core/git/worktree.py:42-73) is a good reference implementation. |
Analyzed: 2026-04-15. Source repo cloned to ~/Code/bernstein at commit HEAD of main.