Parameterized flows take representative deployment-time inputs;

June 25, 2026 · View on GitHub

Every agent run, recorded and replayable.

Kitaru (来る, "to arrive") is a self-hosted, framework-agnostic runtime for autonomous agents — underneath the harness your team already picked. You keep your agent SDK, your prompts, your tools, your model. Kitaru records every step of every run — each model call, tool call, and decision — as a replayable checkpoint, so you can diagnose failures, replay runs with a different model or input, and ship agent updates with confidence. All on your own infrastructure.

Docs · Quick Start · Examples · Getting Started Guide · Roadmap · Community

Kitaru Dashboard

🧩 Where Kitaru fits

Agent stacks break cleanly into four layers. Kitaru is exactly one of them.

Layer	What it does	Examples
Model	The LLM itself — a compute unit over a context window	OpenAI, Anthropic, Google, open-weights, fine-tuned in-house
Harness	The loop around the model — prompts, tools, model loop, framework choice	Pydantic AI / Pydantic AI Harness, LangGraph, Claude Agent SDK, OpenAI Agents SDK, raw Python
Runtime (Kitaru)	How the agent's runs are recorded, replayed, and improved over time — checkpoints, replay, resume, `wait()`, versioned deployments, isolated runtimes	`@flow`, `@checkpoint`, `flow.deploy()`, `kitaru.wait()`
Platform	How your org governs — auth, entitlements, interceptors, observability, product UI, policy	Your existing stack

Kitaru lives in the middle row. Harnesses define behavior, your stack defines policy, and Kitaru gives you the execution record — and the replay loop — in between.

If you're buying an agent platform, Kitaru may feel low-level. If you're building one, that's the point.

Platform teams get the execution layer they'd otherwise build themselves — run lifecycle, checkpoint recording, replay, invocation routing, and self-hosted execution — without mandating which harness application teams use on top.

🎯 Why Kitaru?

Record, replay, improve

Every step recorded. Each checkpoint output — model call, tool call, decision — is written to your object store as a typed, versioned artifact. Step through any run, diff artifacts across runs, and trace a bad output back to the exact step that produced it.
Replay with overrides. Re-run any execution from any checkpoint, and override what you want to test: swap the model, change a parameter, inject a different tool output — and see what would have happened before you ship the change.
Compare and decide. kitaru.llm() tracks prompt, response, tokens, and latency per call, so comparing runs answers questions like "would a smaller model have done this cheaper?" with evidence instead of vibes.

Production mechanics

Crash recovery. A crash, pod eviction, or timeout doesn't send the run back to zero. Fix the bug, replay, and the completed checkpoints return cached output instead of re-burning tokens.
Pause and resume. kitaru.wait() suspends a flow, releases compute, and resumes minutes, hours, or days later when input lands from a human, another agent, a webhook, or a CLI call.
Versioned deployments. flow.deploy() freezes a flow as an immutable snapshot consumers invoke by name. Tag to roll out, re-tag to roll back. Nothing that calls the agent redeploys when a new version ships.
Isolated execution. @checkpoint(runtime="isolated") runs a specific step in its own pod or job on Kubernetes, AWS, GCP, or Azure. Heavy or risky steps stay isolated; orchestration stays inline.

Python-first, no graph DSL

Write normal Python. Use if, for, try/except — whatever your agent needs. Kitaru gives you two decorators (@flow and @checkpoint) and a handful of utility functions. That's all you need.

from kitaru import checkpoint, flow

@checkpoint
def research(topic: str) -> str:
    return do_research(topic)

@checkpoint
def write_draft(research: str) -> str:
    return generate_draft(research)

@flow
def writing_agent(topic: str) -> str:
    data = research(topic)
    return write_draft(data)

result = writing_agent.run("quantum computing").wait()

Deploy on your cloud

A single self-hosted server, your own infra. Flows run on whichever stack you pick — local, Kubernetes, GCP, AWS, or Azure — with artifacts in your own S3/GCS/Azure Blob bucket. No mandatory SaaS control plane.

Built-in UI

Every execution is observable from day one. See your agent runs, inspect checkpoint outputs, and approve human-in-the-loop wait steps, all from a UI that ships with the Kitaru server.

To start the server locally, run kitaru login after installing kitaru[local]. To connect to an existing remote server, run kitaru login <server>.

Works with your agent SDK

Wrap an existing PydanticAI agent with KitaruAgent — no rewrite. For agents built on the OpenAI Agents SDK, Anthropic Agent SDK, or raw Python, use @flow and @checkpoint around your calls. Your model, your tools, your framework — Kitaru wraps them, not the other way around.

from kitaru import flow
from kitaru.adapters.pydantic_ai import KitaruAgent
from pydantic_ai import Agent

researcher = KitaruAgent(
    Agent("openai:gpt-5.4", system_prompt="You summarize research topics.")
)

@flow
def research_flow(topic: str) -> str:
    return researcher.run_sync(topic).output

🚀 Quick Start

Install

pip install kitaru

Or with uv (recommended):

uv pip install kitaru

To wrap a PydanticAI agent, install the adapter extra:

uv pip install "kitaru[pydantic-ai]"

Optional: start a local Kitaru server

Flows run locally by default with the base install. If you also want the local dashboard and REST API, install the local extra and then run bare kitaru login:

uv pip install "kitaru[local]"
kitaru login
kitaru status

Optional: connect to an existing remote Kitaru server

If you already have a deployed Kitaru server, connect to it explicitly:

kitaru login https://my-server.example.com
# add --project <PROJECT> or other remote-login flags if your setup requires them
kitaru status

Initialize your project

kitaru init

Write your first flow

# agent.py
from kitaru import checkpoint, flow

@checkpoint
def fetch_data(url: str) -> str:
    return "some data"

@checkpoint
def process_data(data: str) -> str:
    return data.upper()

@flow
def my_agent(url: str) -> str:
    data = fetch_data(url)
    return process_data(data)

result = my_agent.run("https://example.com").wait()
print(result)  # SOME DATA

Run it

python agent.py

Every step is recorded automatically. Inspect any run, then replay it from a checkpoint — a faithful rerun, or a fork with one input changed (a different model or parameter) so you can see what would have happened before you ship the change:

kitaru executions list
kitaru executions get <EXECUTION_ID>
kitaru executions logs <EXECUTION_ID>

# Reproduce a run faithfully from a checkpoint
kitaru executions replay <EXECUTION_ID> --from process_data

# Fork the same run with one input changed
kitaru executions replay <EXECUTION_ID> --from fetch_data \
  --args '{"url": "https://other.example.com"}'

See Replay and overrides for the full reproduce → fork → diff loop.

Deploy it

When the flow is ready, deploy it as a versioned snapshot and invoke it by name — no redeploy of whatever calls the agent.

# Freeze the current code + dependencies as a versioned snapshot.
# Parameterized flows take representative deployment-time inputs;
# consumers can override them at invocation time.
my_agent.deploy(url="https://example.com")

# Consumers invoke by name — from Python, CLI, MCP, or HTTP.
from kitaru import KitaruClient
KitaruClient().deployments.invoke(
    flow="my_agent",
    inputs={"url": "https://example.com"},
)

# Tag a version into a stage; re-tag to roll back.
kitaru flow tag my_agent latest --stage=prod
kitaru flow tag my_agent v2     --stage=prod   # rollback

📚 Learn more

Resource	Description
Getting Started Guide	Full setup walkthrough with all examples
Documentation	Complete reference and guides
Agents guide	Run, replay, and improve production agents end to end
Examples	Runnable workflows for every feature
Stacks	Deploy to Kubernetes, AWS, GCP, or Azure

Discussions — ask questions, share ideas
Issues — report bugs, request features
Roadmap — see what's coming next
Docs — guides and reference

📄 License

Apache 2.0