Roadmap

June 25, 2026 · View on GitHub

Statewave is purpose-built for support-agent workflows — the first use case where structured memory clearly outperforms naive history stuffing and simple RAG. The roadmap reflects this: trust and reliability first, then support-agent superiority, then operator experience.

v0.1 — Local MVP ✅

Core domain model (Episode, Memory, ContextBundle)
FastAPI server with the core v1 endpoint surface
Heuristic memory compiler
Context assembly with token estimation
PostgreSQL + pgvector schema
Docker Compose local deployment
Python SDK v0.1.0, TypeScript SDK v0.1.0

v0.2 — Production Hardening ✅

Idempotent compilation, pluggable compilers, token-bounded context
Ranked retrieval (kind × recency × relevance)
Structured errors, request-ID, CORS, health endpoints, structured logging
LLM compilation via LiteLLM (100+ providers)
Semantic search via pgvector
Authentication (API keys), rate limiting (in-memory)
Python SDK v0.2.0, TypeScript SDK v0.2.0

v0.3 — Advanced Features ✅

Temporal reasoning, memory conflict resolution
Webhooks, multi-tenant (experimental)
Middleware ordering, validation, LLM thread-pool fix

v0.4 — Adoption Readiness ✅

Batch episode ingestion (up to 100)
OpenTelemetry tracing (optional)
Deployment guide (Docker, Fly.io, Railway)
SDK publish readiness, getting started guide
Support-agent benchmark & "Why Statewave" comparison doc

v0.5 — Reliability & Trust ✅

Reliable webhook delivery — persistent queue, exponential backoff, dead-letter
SDK retry with backoff — automatic retry on 429/5xx with jitter
Durable async compilation — Postgres-backed job queue
True multi-tenant isolation — app-layer query scoping
Distributed rate limiting — Postgres-backed
Backup/restore tooling — subject-level export/import
Admin introspection — jobs + webhooks
Compilation status API

v0.6 — Support-Agent Superiority ✅

v0.7 — Operator & Cloud Experience ✅

Goal: Make Statewave trustworthy to operate at scale. An operator should be able to deploy, monitor, upgrade, and scale Statewave without surprises.

v0.8 — Governance & Adoption ✅

Goal: Make Statewave deployable in compliance-grade settings (regulated industries, multi-tenant SaaS) and make adoption trivial for teams integrating it into existing stacks.

Governance & audit — shipped

State-assembly receipts (#49) — every /v1/context and /v1/handoff call can emit an immutable, ULID-addressable audit artifact recording exactly which memories + episodes influenced the bundle, with a SHA-256 hash of the bytes delivered to the agent and per-entry supersession status. GET /v1/receipts/{id} + cursor-paginated list per subject. Strict-superset schema with a mode discriminator so future modes (as_of_replay, eval_run) can extend without breaking. Emission gate: per-request flag → per-tenant config (always | on_request | never) → env kill-switch. Tenant-controlled retention surface (receipt_retention_days in tenant_configs; purge worker is v0.9). Full design + six negative-test acceptance criteria in receipts.md.
Sensitivity labels + per-memory policy bindings (#50) — per-memory capability tags (pii, financial, secret, …) carried as a TEXT[] column with a GIN index; set via PATCH /v1/memories/{id}/labels. Policy bundles are YAML/JSON, content-hashed, immutable, stored in policy_bundles; six predicates (memory_has_any_label, memory_has_all_labels, caller_type, caller_type_in, caller_type_not_in, caller_id) and two actions (deny, redact); first-match-wins evaluation, default-allow on no match. Per-tenant policy_mode: log_only | enforce — log_only records decisions into receipts without filtering (safe rollout), enforce drops denied memories before ranking. Receipts surface every fired decision via policy.filters_applied and the unfired-rule summary via policy.filters_skipped. Full reference in sensitivity-labels.md.
Caller identity — caller_id and caller_type on /v1/context and /v1/handoff feed the policy evaluator. Tenant config require_caller_identity: true 401s anonymous calls — the lever compliance-grade tenants flip to make policy enforcement non-bypassable.
Per-tenant configuration endpoint — GET / PATCH /admin/tenants/{tenant_id}/config for receipts emission policy, retention, policy_mode, caller-identity gating. PATCH-shape merge (only touches supplied keys, preserves the rest), enum/bound validation at the API boundary, optimistic concurrency via expected_version. Makes policy_mode: enforce and require_caller_identity: true reachable via API without a SQL shell — the gap caught in the enforce-mode prod smoke.
Cross-tenant policy bundle uniqueness (#79) — policy_bundles keyed on (tenant_id, bundle_hash) composite uniqueness (PG15+ NULLS NOT DISTINCT). Two tenants installing the IDENTICAL YAML produce two independently-resolvable rows. Pre-fix the second tenant's upload silently re-bound the first's row.

Adoption — shipped

SDK convenience methods for support endpoints — ergonomic wrappers on both statewave-py and @statewavedev/sdk for /v1/subjects/{id}/health, /v1/subjects/{id}/sla, /v1/handoff, and resolution create/list. Same auth, tenant-scoping, and retry as the rest of the client; HTTP wire contract unchanged. Sync + async on the Python side. Shipped in statewave-py 0.10.0 and @statewavedev/sdk 0.10.0 (statewave-py#15, statewave-ts#16).
Framework integrations (LangChain, CrewAI, AutoGen) — three runnable quickstart examples in statewave-examples (langchain-quickstart/, crewai-quickstart/, autogen-quickstart/). Each ships a small adapter (StatewaveMemory(BaseMemory) for LangChain; pure-function helpers for CrewAI and AutoGen), a runnable demo, and mock-based smoke tests. Dependency strategy: zero framework deps in the core SDKs — adapters live inside each example, framework versions pinned only in the example READMEs, so SDK releases don't chase framework churn (statewave-examples#12).
Webhook event filters — STATEWAVE_WEBHOOK_EVENTS (comma-separated) is an event-type allowlist on the global webhook URL. Filtered-out events are dropped before they reach the delivery queue. Unknown event types fail the server at startup, so a typo can't silently drop every webhook. Fully backward-compatible: empty filter delivers every event (statewave#150).
Memory templates for common patterns — declarative, versioned scaffolds for recurring information patterns. Five bundled templates ship today (customer support handoff, user preference, project decision log, incident summary, account onboarding); GET /v1/memory-templates is fully inspectable, POST /v1/memory-templates/{id}/apply validates field values and ingests an ordinary episode with template_id / template_version recorded in metadata.template. Pure data — no code runs inside a template; rendering is deterministic string substitution. See docs/memory-templates.md in the server repo (statewave#152).
Design partner onboarding package — a single-page guide in design-partners.md covering overview, who Statewave is for, a 30-minute setup path, recommended first use cases, data/privacy expectations, the support and feedback loop, an evaluation checklist (functional, performance, governance, operational), 30 / 60 / 90-day success criteria with benchmark reference numbers, and a 9-entry FAQ. Linked from README.md and SUPPORT.md (statewave-docs#42).
Public memory benchmark — complete equal-budget sweep on the public LoCoMo dataset across four token tiers (512 / 1024 / 2048 / 4096), 10 conversations, 1,986 questions/system. Publication-safety harness — refuses headline rankings without 100% coverage, the same question set across systems, no judge_failed rows, and measured input tokens shown beside every score. The benchmark suite is maintained in statewave-memory-benchmarks — current methodology and per-system results live there.
Connector ecosystem — fully shipped ✅ Modular packages for GitHub, Markdown/ADRs, MCP, Slack, Discord, Zendesk, Intercom, Freshdesk, Notion, Gmail, n8n, Zapier. v0.6.0 added cursor-based delta sync (Zendesk Incremental Tickets Export, Gmail History API) and Notion database scoping. Tier 2 push receivers shipped (v0.7.0–v0.11.0) — every connector with a meaningful push surface in its source system now has a real-time receiver alongside its pull connector: Slack DM/MPIM dispatch (slack.dm.*, slack.mpim.*), Freshdesk webhook, Zendesk webhook, Intercom webhook, and Gmail Cloud Pub/Sub push. statewave-connectors listen <connector> is the unified daemon; the same (Request) => Promise<Response> factory mounts on Vercel / Cloudflare / Express identically across the lineup. Tier 3 operator/cloud productization shipped (v0.12.0–v0.17.0) — TOML config file (multi-instance), hosted runner (statewave-connectors run), persistent state adapters (file / Postgres / Redis), built-in OIDC verification for Gmail Pub/Sub, auth-gated Prometheus /metrics, and deployment recipes (Docker / Compose / Helm / Fly / Railway). v0.18.0 adds preview Jira + database source connectors (database dialects PostgreSQL / MySQL / MariaDB / MSSQL — selected external rows into Statewave memory, not a Statewave storage backend). See Connectors → Roadmap for the full release timeline and what's queued next (long-running daemon shapes — Slack Socket Mode, Discord Gateway, Gmail service-account auth).

v0.9 — Replay, Signing, & Auto-Labeling ✅

Building on the v0.8 governance foundation. Shipped 2026-05-26:

Scheduled retention-purge worker (#156 · #162) — hourly worker reads tenant_configs.config.receipt_retention_days and tombstones expired receipts. Soft-delete only; rows persist for forensic lookup. Partial index keeps it cheap. Migration 0020.
HMAC signing for receipts (#157 · #163) — hmac-sha256-canonical-v1 over the canonical body. Operator-provided keys via STATEWAVE_RECEIPT_SIGNING_KEYS, never persisted to DB. Per-tenant active key via tenant_configs.config.receipt_signing_key_id. GET /v1/receipts/{id}/verify with {valid, key_id, algorithm, reason} semantics and constant-time compare. Pre-v0.9 receipts verify cleanly as no_signature. Migration 0021.
Compiler heuristic auto-labeling (#158 · #164) — opt-in STATEWAVE_AUTO_LABELING_ENABLED. Detectors stamp advisory suggested_labels, strictly separate from authoritative sensitivity_labels. v0.9 first wave: pii.email, pii.phone, financial.card (Luhn), secret.token. Migration 0022 (GIN-indexed).
Receipt-driven replay (#159 · #165) — every v0.9+ receipt embeds the active bundle's YAML (policy_snapshot). POST /v1/receipts/{id}/replay re-runs against current memories with the original policy and returns a structural diff envelope. Mode as_of_replay, child receipts link to the parent. Semantic: current code + original policy. Migration 0023.
Operator promote endpoint + admin UI (#160 · server #166, admin statewave-admin#89) — POST /admin/memories/{id}/promote-labels is review-only, with audit-trail entries on memory.metadata.label_promotions. Admin app /suggested-labels page + receipt-detail replay button rendering the diff envelope inline.
Per-tenant data residency (#161 · #167) — per-region deployment + metadata-pinned tenants. STATEWAVE_REGION + tenant_configs.config.region. Hard application-layer enforcement on /v1/ AND /admin/ (total isolation). HTTP 403 residency.mismatch on conflict. Receipts stamp region for end-to-end audit. Code + config + tests + ops runbook shipped; no second region deployed yet.

v1.0 — First stable public developer release ✅

Shipped 2026-06-09 — the first stable public developer release (see release notes).

Stable /v1 API contract — the /v1/* surface and the v0.9 governance layer (HMAC-signed receipts, receipt-driven replay, sensitivity labels + declarative policy, opt-in detector-suggested labels, per-region residency) are now stable for developer use under a self-hosted model. Backward-compatible additions only from here; carried-forward limitations stay documented in why-statewave.md.
Both SDKs to v1.0.0 — statewave (PyPI) and @statewavedev/sdk (npm) cut their first stable releases alongside the server; typed surfaces matching the REST contract, semver-stable from 1.0.0 forward.
Python SDK governance helpers (#176) — list_suggested_labels() / promote_suggested_labels() wrap the v0.9 suggested-label review surface (sync + async, typed result models).
Public version-discovery endpoint (#178) — unauthenticated GET /v1/version reports the running server version.
session_id on create_episode (#174) — both SDKs forward the optional session pin on the wire.
Webhook delivery stats + tenant scoping — optional tenant filter on event-status queries and per-tenant delivery statistics; permanent 4xx deliveries dead-letter instead of retrying.

Deferred beyond v1.0

Visual policy editor — admin-app YAML-free form for building rule sets. Listed in the original v0.9 plan but deferred to keep the v0.9 release focused on audit + replay + residency.
Admin identity — so promoted_by and future operator-action audit fields populate with the operator's id, not null. Lays groundwork for richer admin-side audit trails.
Bulk label promotion across many memories. v0.9 is one-row-per-call.
Federated cross-region audit search — explicit follow-up to #161; never as implicit cross-region access.
Memory snapshots for byte-for-byte replay — v0.9 ships current code + original policy; true historical reproduction needs memory snapshots. The data model is designed to absorb this without a schema break.

Post-v1.0 roadmap (scope TBD)

v1.0.0 shipped on 2026-06-09 — the first stable public developer release (see release notes). The shape of the post-v1.0 roadmap will be informed by:

The deferred items above (admin identity is the natural lead since it unblocks several others).
Design-partner feedback on the v0.9 audit + replay + residency surfaces.
Operator-quality-of-life items from the v0.9 ops runbooks once they get real-world use.

Not committing to a list yet; calling this section out explicitly so deferred items have a visible home.

Design principles

Raw truth first — episodes are immutable, memories are derived
Self-hosted, operator-friendly — you own your data and infra
Support-agent wedge — optimize here, prove it, then expand
Multi-provider — LiteLLM means no vendor lock-in
Trust over features — reliability beats feature count
Honest about limitations — document what doesn't work yet