棟梁 Toryo

April 27, 2026 · View on GitHub

CI npm License: MIT TypeScript

The intelligent agent orchestrator. Not just parallel agents — the full self-improving development loop.

棟梁 (toryo) — Japanese for "master builder" or "foreman." The toryo is the person who oversees the entire construction crew, assigns specialists to the right tasks, and ensures every piece meets quality standards before it stays in the structure.

Toryo chains multiple AI coding agents (Claude Code, Aider, Gemini CLI, Codex, Ollama) with spec-driven workflows, trust-based delegation, quality ratcheting, and a real-time dashboard.

npx @jweigel/toryo init    # scaffold config + task specs
npx @jweigel/toryo run     # start orchestration

Try instantly: npx @jweigel/toryo demo (no AI tools needed)

Documentation | Getting Started | Configuration | Bus Pattern | Contributing

Toryo Run — Full cycle with Ralph Loop retry


Why Toryo?

Running AI agents in parallel is easy. Making them work together intelligently is hard.

ProblemHow Toryo solves it
Agents produce low-quality outputQuality ratcheting — only git-commit results that pass QA (score ≥ threshold). Auto-revert everything else.
No way to retry failuresRalph Loop — failed attempts get QA feedback routed back to the agent for a retry before discarding.
Which agent should do what?Trust-based delegation — agents earn autonomy through consistent scores. New agents start supervised.
Context gets lost between stepsSmart truncation — strips boilerplate, preserves substance, feeds optimal context to each phase.
Agent output disappearsAuto-extraction — code blocks and skills are automatically saved to disk as the agent produces them.
No visibility into what's happeningReal-time dashboard — live event feed, agent status cards, results table, metrics.
One agent isn't enoughPluggable adapters — mix and match Claude Code, Aider, Gemini CLI, Ollama, or any CLI tool.

How It Works

Toryo runs cycles. Each cycle has 4 phases:

📋 Plan → 🔍 Research → ⚡ Execute → ✅ Review
  1. Plan — An agent reads the task spec and creates a brief
  2. Research — An agent gathers context and information
  3. Execute — An agent writes code, tests, or documentation
  4. Review — An agent scores the output (1-10) and provides feedback

After review:

  • Score ≥ thresholdgit commit (keep)
  • Score < thresholdgit revert → Ralph Loop retry → keep or discard
  • Infrastructure failure → log as crash, skip scoring

This is the ratcheting pattern from Karpathy's autoresearch: only forward progress gets committed. Bad results are automatically reverted.

Quick Start

1. Initialize

npx @jweigel/toryo init

Creates:

  • toryo.config.json — agent definitions, quality gates, delegation rules
  • specs/ — task specifications (YAML frontmatter + markdown)

2. Configure Agents

Edit toryo.config.json:

{
  "agents": {
    "researcher": {
      "adapter": "claude-code",
      "strengths": ["research", "analysis"],
      "timeout": 900
    },
    "coder": {
      "adapter": "ollama",
      "model": "qwen3.5:27b",
      "strengths": ["code", "architecture"],
      "timeout": 900
    },
    "reviewer": {
      "adapter": "claude-code",
      "strengths": ["review", "scoring"],
      "timeout": 600
    }
  }
}

3. Write Task Specs

Create markdown files in specs/:

---
name: Write Unit Tests
difficulty: 0.5
tags: [testing]
phases:
  plan: auto
  research: auto
  execute: coder
  review: reviewer
---

Write tests for uncovered modules. Focus on edge cases.

## Acceptance Criteria
- [ ] Tests cover at least one untested module
- [ ] All tests pass
- [ ] Edge cases are covered

4. Run

npx @jweigel/toryo run              # run indefinitely
npx @jweigel/toryo run -n 10        # run 10 cycles
npx @jweigel/toryo run --dry-run    # preview without executing
toryo check                # validate config + tools
toryo status               # check metrics + agent trust
toryo dashboard            # open web dashboard

Adapters

Toryo ships with first-class adapters for 7 tools + a generic adapter for anything else:

AdapterToolHow it works
claude-codeClaude Codeclaude --print (non-interactive)
aiderAideraider --message
gemini-cliGemini CLIgemini --prompt
ollamaOllamaDirect HTTP API (no CLI needed)
codexCodex CLIcodex exec
cursorCursor CLIagent -p --force (requires CURSOR_API_KEY)
clineCline CLIcline --yolo
customAny CLI toolConfigurable command + args

Mix and match — use Claude Code for research, Ollama for local code generation, and Gemini for review:

{
  "agents": {
    "researcher": { "adapter": "claude-code" },
    "coder": { "adapter": "ollama", "model": "qwen3.5:27b" },
    "reviewer": { "adapter": "gemini-cli" }
  }
}

Trust-Based Delegation

Agents start at supervised autonomy and earn trust through consistent high scores:

LevelTrustBehavior
🔴 Supervised< 0.6Strict instruction following, precise format
🟡 Guided0.6–0.8Follow spec but suggest improvements
🟢 Autonomous≥ 0.8Take initiative, be creative, report after

Trust is calculated from rolling average scores. An agent that consistently scores 8+/10 earns autonomous mode. An agent that drops below threshold gets demoted back to supervised.

Trust = min(avg_score / 10, 1.0)

When a task comes in, Toryo matches it to the best agent based on the task's profile (research-heavy? code-heavy? review?) and each agent's strengths + current trust level.

Quality Ratcheting

Inspired by Karpathy's autoresearch pattern:

Score ≥ 6.0 → git commit ✓
Score < 6.0 → git revert → Ralph Loop retry

                      Retry passes → git commit ✓
                      Retry fails  → discard, move on

Every result is logged to results.tsv (Karpathy format):

timestamp               cycle  task         agent   score  status   description
2026-03-19T10:15:00Z    42     write-tests  coder   8.2    keep     QA approved: PASS
2026-03-19T10:45:00Z    43     refactor     coder   4.1    discard  QA rejected: FAIL
2026-03-19T11:15:00Z    44     security     coder   7.5    keep     QA approved after retry 1: PASS

Configure thresholds in toryo.config.json:

{
  "ratchet": {
    "threshold": 6.0,
    "maxRetries": 1,
    "gitStrategy": "commit-revert"
  }
}

Dashboard

Real-time web dashboard showing agent status, results, and live events:

npx toryo dashboard
# Opens http://localhost:3456

Toryo Dashboard

Features:

  • Agent status cards with trust scores and autonomy levels
  • Results table with sortable columns and color-coded status
  • Live event feed via WebSocket
  • Metrics summary (cycles, success rate, avg scores)

Notifications

Get notified on breakthroughs, failures, and periodic status:

{
  "notifications": {
    "provider": "ntfy",
    "target": "my-project-toryo",
    "events": ["breakthrough", "failure", "status"]
  }
}

Supported providers: ntfy, slack, discord, webhook

Architecture

Toryo is a composable TypeScript monorepo. Use the full orchestrator or individual pieces:

@toryo/core         — Engine: orchestrator, delegation, ratchet, metrics, extraction
@toryo/adapters     — Agent adapters: claude-code, aider, gemini-cli, ollama, custom
toryo               — CLI: init, run, status, dashboard

Each subsystem is a standalone factory function:

import { createDelegation, createRatchet, createMetrics } from '@toryo/core';

// Use just the delegation system
const delegation = createDelegation({ initialTrust: 0.5 });
const level = delegation.getAutonomyLevel(agentState);

// Use just the ratchet for git-based quality gates
const ratchet = createRatchet({ threshold: 7.0 }, process.cwd());
if (!ratchet.shouldKeep(review)) await ratchet.revert();

// Use just the metrics for experiment tracking
const metrics = createMetrics('.toryo');
await metrics.appendResult({ cycle: 1, score: 8.5, status: 'keep', ... });

Compared to Other Tools

Most multi-agent tools do one thing — run agents in parallel (Composio, AMUX) or define specs (Spec Kit). Toryo is the full loop: spec → delegate → execute → review → ratchet → improve.

FeatureToryoComposioAMUXCrewAISpec Kit
Multi-agent orchestration
Heterogeneous CLIs✅ 5+ adapters✅ 8 slots❌ Claude only❌ API only
Spec-driven workflows
Trust-based delegation
Quality ratcheting
Ralph Loop retries
Auto-extraction
results.tsv tracking
Local model first✅ Ollama native
Real-time dashboard

Configuration Reference

See examples/toryo.config.json for a complete example.

FieldTypeDefaultDescription
namestringProject name
agentsRecordAgent definitions (adapter, model, strengths, timeout)
tasksstring | TaskSpec[]Path to specs dir or inline tasks
ratchet.thresholdnumber6.0Minimum QA score to keep
ratchet.maxRetriesnumber1Ralph Loop max retries
ratchet.gitStrategystring"commit-revert""commit-revert", "branch-per-task", or "none"
delegation.initialTrustnumber0.5Starting trust for new agents
delegation.scoreWindownumber50Rolling window for score averaging
outputDirstring".toryo"Where to store results, metrics, artifacts
notifications.providerstring"none""ntfy", "slack", "discord", "webhook", "none"

License

MIT

Credits

Built on patterns from: