Roadmap
June 25, 2026 · View on GitHub
Statewave is purpose-built for support-agent workflows — the first use case where structured memory clearly outperforms naive history stuffing and simple RAG. The roadmap reflects this: trust and reliability first, then support-agent superiority, then operator experience.
v0.1 — Local MVP ✅
- Core domain model (Episode, Memory, ContextBundle)
- FastAPI server with the core v1 endpoint surface
- Heuristic memory compiler
- Context assembly with token estimation
- PostgreSQL + pgvector schema
- Docker Compose local deployment
- Python SDK v0.1.0, TypeScript SDK v0.1.0
v0.2 — Production Hardening ✅
- Idempotent compilation, pluggable compilers, token-bounded context
- Ranked retrieval (kind × recency × relevance)
- Structured errors, request-ID, CORS, health endpoints, structured logging
- LLM compilation via LiteLLM (100+ providers)
- Semantic search via pgvector
- Authentication (API keys), rate limiting (in-memory)
- Python SDK v0.2.0, TypeScript SDK v0.2.0
v0.3 — Advanced Features ✅
- Temporal reasoning, memory conflict resolution
- Webhooks, multi-tenant (experimental)
- Middleware ordering, validation, LLM thread-pool fix
v0.4 — Adoption Readiness ✅
- Batch episode ingestion (up to 100)
- OpenTelemetry tracing (optional)
- Deployment guide (Docker, Fly.io, Railway)
- SDK publish readiness, getting started guide
- Support-agent benchmark & "Why Statewave" comparison doc
v0.5 — Reliability & Trust ✅
- Reliable webhook delivery — persistent queue, exponential backoff, dead-letter
- SDK retry with backoff — automatic retry on 429/5xx with jitter
- Durable async compilation — Postgres-backed job queue
- True multi-tenant isolation — app-layer query scoping
- Distributed rate limiting — Postgres-backed
- Backup/restore tooling — subject-level export/import
- Admin introspection — jobs + webhooks
- Compilation status API
v0.6 — Support-Agent Superiority ✅
- Session-aware context assembly
- Resolution tracking (open/resolved/unresolved)
- Handoff context packs (structured escalation briefs)
- Repeat-issue detection (prior resolution surfacing)
- Support-specific ranked retrieval
- Customer health scoring (0–100, explainable factors)
- Health-aware handoff (risk level + factors in briefs)
- Proactive health alerts (webhooks on state transitions)
- SLA tracking (response time, resolution time, breach flags)
- SLA integration into health + handoff
- Product website (statewave.ai)
- Proof layer: 3 eval suites (56 assertions), 2 benchmarks (8/8 vs 2/8)
v0.7 — Operator & Cloud Experience ✅
Goal: Make Statewave trustworthy to operate at scale. An operator should be able to deploy, monitor, upgrade, and scale Statewave without surprises.
- Deep health checks —
/readyzverifies DB connectivity, queue health, LLM reachability (per-check status + latency_ms; 503 onnot_ready) - Migration safety — preflight script, startup schema guard,
/ops/migrationsendpoint, runbook - Admin dashboard (read-only) — system health, jobs, webhooks, counts, health distribution
- Usage metering — episodes/month, compiles/month, per-tenant usage
- Memory TTL / expiry policies — per-kind global expiry windows configured via
STATEWAVE_KIND_TTL_DAYS(JSON). Compilers stampvalid_to = valid_from + ttl_dayson insert;/v1/contextretrieval filters expired rows out immediately ((valid_to IS NULL OR valid_to > now())); the hourly cleanup loop tombstones expired rows so the status surface stays current. Soft-delete only — rows persist for #49 receipt lookup. Status vocabulary aligned with #49 (active | superseded | tombstoned); the previous unuseddeletedenum value was renamed (alembic 0016). Per-subject / per-tenant / per-policy expiry is deferred to the policy layer in #50 — TTL ships the simple primitive so it composes cleanly. Operator config + design notes indeployment/memory-ttl.md. - Horizontal scaling guide — multi-instance reference topologies, connection-budget runbook, PgBouncer guidance, multi-instance diagnostics, and the design properties that hold across replicas (Postgres-backed compile queue, webhook DLQ, rate limit, L2 query embedding cache). See
deployment/horizontal-scaling.md. Honest framing: the guide is derived from architectural design points and operational arithmetic, not a load-test campaign — see "What we have not validated". - Helm chart + Kubernetes deployment guide — in-tree Helm chart at
helm/statewave/in thestatewaverepo (API-only; operators bring a pgvector-capable Postgres). Schema migrations run as a Helm pre-install + pre-upgrade Job; the Deployment bypassesstart.shand runsuvicorndirectly so each pod is a clean stateless API process. Companion deployment guide atdeployment/kubernetes.mdcovers Postgres options, secret-management patterns (inline vs External Secrets Operator / Sealed Secrets / SOPS), per-controller Ingress timeout cheatsheet, HPA + connection-budget guidance, and k8s-specific troubleshooting. - In-process query embedding cache (LRU + TTL) — eliminates repeat provider calls on identical task text in
/v1/context - Native pgvector similarity path —
memories.embeddingmigrated fromTEXTtovector(1536)(alembic0013_pgvector_native);search_memories_by_embeddingrewritten to use the<=>cosine-distance operator with an HNSW index. Removes the in-Python cosine compute that was the ~1.5s floor per/v1/context. Requires pgvector-bundled Postgres image — seeinfra/postgres-pgvector/for the Dockerfile + deployment runbook. -
/v1/contextcandidate-pool union —/v1/contextnow feeds the rows fromsearch_memories_by_embeddinginto the per-kind candidate pool alongside the recency-fetched rows (deduped by id). Previously, candidates were preselected bycreated_at DESC LIMIT 50per kind and the semantic-search call only contributed scores for those already-fetched rows; semantically-relevant memories outside the recency window could never enter ranking. Live evidence: docs-grounded evaldoc_match25%→100%,groundable50%→100% with no other change. Stub-provider deployments unchanged (semantic results empty → union is a no-op). - Cross-machine query embedding cache — Postgres-backed
query_embedding_cachetable (alembic0014_query_embedding_cache) shared across all backend instances. Wraps the in-process LRU as L2: L1 hit → L2 → API. A 30s cross-instance provider-latency spike on the first hit per instance is eliminated; warm calls are sub-second regardless of which instance handles them. 24h TTL, composite (text, model) key so model rotations don't alias, opportunistic cleanup on write. - Single LiteLLM adapter for all provider calls —
server/services/llm.pyis the only module that imports LiteLLM; compilers, embeddings, and the readiness check all route through it. Provider/model/api-base/timeout/retries/temperature are configured viaSTATEWAVE_LITELLM_*env vars (clean break from the priorSTATEWAVE_OPENAI_*naming). Typed error hierarchy (LLMTimeoutError/LLMResponseError/LLMProviderError); api_key passed explicitly to every LiteLLM call instead of mutatingos.environ. AST-based isolation test (tests/test_llm_adapter_isolation.py) fails CI if any module underserver/other than the adapter importslitellm,openai,anthropic,cohere,voyageai,mistralai, orgoogle.generativeai. - Docs-only support memory pack — read-only memory pack derived from the official Statewave docs corpus (see
default-support-docs-pack.md). Powers the docs-grounded "Statewave Support" persona on statewave.ai and thesupport-agent-docsexample. Pack content is built once at release time bystatewave/scripts/build_support_pack.py(chunks docs, runs ingest + compile, serialises to bundled JSONL) and shipped inside the API image; container restart auto-applies via a version-aware reseed that selectively purges only pack-owned rows so operator-added content survives. The legacy live-refresh workflow instatewave/scripts/bootstrap_docs_pack.pyis retained for hot-refreshing production between image rebuilds. - Visible citations on retrieval responses — docs-grounded
/v1/contextand thesupport-agent-docsSDK path return resolved citations (doc_path + breadcrumb + URL) alongside the assembled context. Resolved server-side from the same context bundle the model receives, never parsed from model output — no fabrication path. Surfaced in the website widget as inline source pills under each docs-grounded reply.
v0.8 — Governance & Adoption ✅
Goal: Make Statewave deployable in compliance-grade settings (regulated industries, multi-tenant SaaS) and make adoption trivial for teams integrating it into existing stacks.
Governance & audit — shipped
- State-assembly receipts (#49) — every
/v1/contextand/v1/handoffcall can emit an immutable, ULID-addressable audit artifact recording exactly which memories + episodes influenced the bundle, with a SHA-256 hash of the bytes delivered to the agent and per-entry supersession status.GET /v1/receipts/{id}+ cursor-paginated list per subject. Strict-superset schema with amodediscriminator so future modes (as_of_replay,eval_run) can extend without breaking. Emission gate: per-request flag → per-tenant config (always | on_request | never) → env kill-switch. Tenant-controlled retention surface (receipt_retention_daysintenant_configs; purge worker is v0.9). Full design + six negative-test acceptance criteria inreceipts.md. - Sensitivity labels + per-memory policy bindings (#50) — per-memory capability tags (
pii,financial,secret, …) carried as aTEXT[]column with a GIN index; set viaPATCH /v1/memories/{id}/labels. Policy bundles are YAML/JSON, content-hashed, immutable, stored inpolicy_bundles; six predicates (memory_has_any_label,memory_has_all_labels,caller_type,caller_type_in,caller_type_not_in,caller_id) and two actions (deny,redact); first-match-wins evaluation, default-allow on no match. Per-tenantpolicy_mode: log_only | enforce—log_onlyrecords decisions into receipts without filtering (safe rollout),enforcedrops denied memories before ranking. Receipts surface every fired decision viapolicy.filters_appliedand the unfired-rule summary viapolicy.filters_skipped. Full reference insensitivity-labels.md. - Caller identity —
caller_idandcaller_typeon/v1/contextand/v1/handofffeed the policy evaluator. Tenant configrequire_caller_identity: true401s anonymous calls — the lever compliance-grade tenants flip to make policy enforcement non-bypassable. - Per-tenant configuration endpoint —
GET / PATCH /admin/tenants/{tenant_id}/configfor receipts emission policy, retention, policy_mode, caller-identity gating. PATCH-shape merge (only touches supplied keys, preserves the rest), enum/bound validation at the API boundary, optimistic concurrency viaexpected_version. Makespolicy_mode: enforceandrequire_caller_identity: truereachable via API without a SQL shell — the gap caught in the enforce-mode prod smoke. - Cross-tenant policy bundle uniqueness (#79) —
policy_bundleskeyed on(tenant_id, bundle_hash)composite uniqueness (PG15+NULLS NOT DISTINCT). Two tenants installing the IDENTICAL YAML produce two independently-resolvable rows. Pre-fix the second tenant's upload silently re-bound the first's row.
Adoption — shipped
- SDK convenience methods for support endpoints — ergonomic wrappers on both
statewave-pyand@statewavedev/sdkfor/v1/subjects/{id}/health,/v1/subjects/{id}/sla,/v1/handoff, and resolution create/list. Same auth, tenant-scoping, and retry as the rest of the client; HTTP wire contract unchanged. Sync + async on the Python side. Shipped instatewave-py0.10.0 and@statewavedev/sdk0.10.0 (statewave-py#15, statewave-ts#16). - Framework integrations (LangChain, CrewAI, AutoGen) — three runnable quickstart examples in
statewave-examples(langchain-quickstart/,crewai-quickstart/,autogen-quickstart/). Each ships a small adapter (StatewaveMemory(BaseMemory)for LangChain; pure-function helpers for CrewAI and AutoGen), a runnable demo, and mock-based smoke tests. Dependency strategy: zero framework deps in the core SDKs — adapters live inside each example, framework versions pinned only in the example READMEs, so SDK releases don't chase framework churn (statewave-examples#12). - Webhook event filters —
STATEWAVE_WEBHOOK_EVENTS(comma-separated) is an event-type allowlist on the global webhook URL. Filtered-out events are dropped before they reach the delivery queue. Unknown event types fail the server at startup, so a typo can't silently drop every webhook. Fully backward-compatible: empty filter delivers every event (statewave#150). - Memory templates for common patterns — declarative, versioned scaffolds for recurring information patterns. Five bundled templates ship today (customer support handoff, user preference, project decision log, incident summary, account onboarding);
GET /v1/memory-templatesis fully inspectable,POST /v1/memory-templates/{id}/applyvalidates field values and ingests an ordinary episode withtemplate_id/template_versionrecorded inmetadata.template. Pure data — no code runs inside a template; rendering is deterministic string substitution. Seedocs/memory-templates.mdin the server repo (statewave#152). - Design partner onboarding package — a single-page guide in
design-partners.mdcovering overview, who Statewave is for, a 30-minute setup path, recommended first use cases, data/privacy expectations, the support and feedback loop, an evaluation checklist (functional, performance, governance, operational), 30 / 60 / 90-day success criteria with benchmark reference numbers, and a 9-entry FAQ. Linked fromREADME.mdandSUPPORT.md(statewave-docs#42). - Public memory benchmark — complete equal-budget sweep on the public LoCoMo dataset across four token tiers (512 / 1024 / 2048 / 4096), 10 conversations, 1,986 questions/system. Publication-safety harness — refuses headline rankings without 100% coverage, the same question set across systems, no judge_failed rows, and measured input tokens shown beside every score. The benchmark suite is maintained in statewave-memory-benchmarks — current methodology and per-system results live there.
- Connector ecosystem — fully shipped ✅ Modular packages for GitHub, Markdown/ADRs, MCP, Slack, Discord, Zendesk, Intercom, Freshdesk, Notion, Gmail, n8n, Zapier. v0.6.0 added cursor-based delta sync (Zendesk Incremental Tickets Export, Gmail History API) and Notion database scoping. Tier 2 push receivers shipped (v0.7.0–v0.11.0) — every connector with a meaningful push surface in its source system now has a real-time receiver alongside its pull connector: Slack DM/MPIM dispatch (
slack.dm.*,slack.mpim.*), Freshdesk webhook, Zendesk webhook, Intercom webhook, and Gmail Cloud Pub/Sub push.statewave-connectors listen <connector>is the unified daemon; the same(Request) => Promise<Response>factory mounts on Vercel / Cloudflare / Express identically across the lineup. Tier 3 operator/cloud productization shipped (v0.12.0–v0.17.0) — TOML config file (multi-instance), hosted runner (statewave-connectors run), persistent state adapters (file / Postgres / Redis), built-in OIDC verification for Gmail Pub/Sub, auth-gated Prometheus/metrics, and deployment recipes (Docker / Compose / Helm / Fly / Railway). v0.18.0 adds preview Jira + database source connectors (database dialects PostgreSQL / MySQL / MariaDB / MSSQL — selected external rows into Statewave memory, not a Statewave storage backend). See Connectors → Roadmap for the full release timeline and what's queued next (long-running daemon shapes — Slack Socket Mode, Discord Gateway, Gmail service-account auth).
v0.9 — Replay, Signing, & Auto-Labeling ✅
Building on the v0.8 governance foundation. Shipped 2026-05-26:
- Scheduled retention-purge worker (#156 · #162) — hourly worker reads
tenant_configs.config.receipt_retention_daysand tombstones expired receipts. Soft-delete only; rows persist for forensic lookup. Partial index keeps it cheap. Migration 0020. - HMAC signing for receipts (#157 · #163) —
hmac-sha256-canonical-v1over the canonical body. Operator-provided keys viaSTATEWAVE_RECEIPT_SIGNING_KEYS, never persisted to DB. Per-tenant active key viatenant_configs.config.receipt_signing_key_id.GET /v1/receipts/{id}/verifywith{valid, key_id, algorithm, reason}semantics and constant-time compare. Pre-v0.9 receipts verify cleanly asno_signature. Migration 0021. - Compiler heuristic auto-labeling (#158 · #164) — opt-in
STATEWAVE_AUTO_LABELING_ENABLED. Detectors stamp advisorysuggested_labels, strictly separate from authoritativesensitivity_labels. v0.9 first wave:pii.email,pii.phone,financial.card(Luhn),secret.token. Migration 0022 (GIN-indexed). - Receipt-driven replay (#159 · #165) — every v0.9+ receipt embeds the active bundle's YAML (
policy_snapshot).POST /v1/receipts/{id}/replayre-runs against current memories with the original policy and returns a structural diff envelope. Modeas_of_replay, child receipts link to the parent. Semantic: current code + original policy. Migration 0023. - Operator promote endpoint + admin UI (#160 · server #166, admin statewave-admin#89) —
POST /admin/memories/{id}/promote-labelsis review-only, with audit-trail entries onmemory.metadata.label_promotions. Admin app/suggested-labelspage + receipt-detail replay button rendering the diff envelope inline. - Per-tenant data residency (#161 · #167) — per-region deployment + metadata-pinned tenants.
STATEWAVE_REGION+tenant_configs.config.region. Hard application-layer enforcement on/v1/AND/admin/(total isolation). HTTP 403residency.mismatchon conflict. Receipts stampregionfor end-to-end audit. Code + config + tests + ops runbook shipped; no second region deployed yet.
v1.0 — First stable public developer release ✅
Shipped 2026-06-09 — the first stable public developer release (see release notes).
- Stable
/v1API contract — the/v1/*surface and the v0.9 governance layer (HMAC-signed receipts, receipt-driven replay, sensitivity labels + declarative policy, opt-in detector-suggested labels, per-region residency) are now stable for developer use under a self-hosted model. Backward-compatible additions only from here; carried-forward limitations stay documented in why-statewave.md. - Both SDKs to v1.0.0 —
statewave(PyPI) and@statewavedev/sdk(npm) cut their first stable releases alongside the server; typed surfaces matching the REST contract, semver-stable from 1.0.0 forward. - Python SDK governance helpers (#176) —
list_suggested_labels()/promote_suggested_labels()wrap the v0.9 suggested-label review surface (sync + async, typed result models). - Public version-discovery endpoint (#178) — unauthenticated
GET /v1/versionreports the running server version. -
session_idoncreate_episode(#174) — both SDKs forward the optional session pin on the wire. - Webhook delivery stats + tenant scoping — optional tenant filter on event-status queries and per-tenant delivery statistics; permanent 4xx deliveries dead-letter instead of retrying.
Deferred beyond v1.0
- Visual policy editor — admin-app YAML-free form for building rule sets. Listed in the original v0.9 plan but deferred to keep the v0.9 release focused on audit + replay + residency.
- Admin identity — so
promoted_byand future operator-action audit fields populate with the operator's id, notnull. Lays groundwork for richer admin-side audit trails. - Bulk label promotion across many memories. v0.9 is one-row-per-call.
- Federated cross-region audit search — explicit follow-up to #161; never as implicit cross-region access.
- Memory snapshots for byte-for-byte replay — v0.9 ships current code + original policy; true historical reproduction needs memory snapshots. The data model is designed to absorb this without a schema break.
Post-v1.0 roadmap (scope TBD)
v1.0.0 shipped on 2026-06-09 — the first stable public developer release (see release notes). The shape of the post-v1.0 roadmap will be informed by:
- The deferred items above (admin identity is the natural lead since it unblocks several others).
- Design-partner feedback on the v0.9 audit + replay + residency surfaces.
- Operator-quality-of-life items from the v0.9 ops runbooks once they get real-world use.
Not committing to a list yet; calling this section out explicitly so deferred items have a visible home.
Design principles
- Raw truth first — episodes are immutable, memories are derived
- Self-hosted, operator-friendly — you own your data and infra
- Support-agent wedge — optimize here, prove it, then expand
- Multi-provider — LiteLLM means no vendor lock-in
- Trust over features — reliability beats feature count
- Honest about limitations — document what doesn't work yet