Engram

May 9, 2026 · View on GitHub

Durable memory for AI agents. Multi-tenant by default. Local-first or cloud. Benchmarked, not vapor.

CI Python License: Apache 2.0 PyPI

from engram import Engram

async with await Engram.open(":memory:") as memory:
    await memory.record("Alice prefers espresso over drip.", user_id="alice", role="user")
    await memory.record("Alice's brother lives in Tokyo.",   user_id="alice", role="user")

    results = await memory.recall(query="what does alice drink?", user_id="alice", top_k=1)
    print(results[0].fact.text)
    #  Alice prefers espresso over drip.

That's the whole shape. Record what users (or agents) say; ask questions later; get back the relevant facts. With per-user isolation, temporal grounding, and a recommendation mode that goes beyond simple recall.


Why Engram?

There are several memory libraries for agents already (Mem0, Letta, Zep, and the published AgentMemory). Engram makes a few specific bets:

1. Mode-aware reading. Most memory layers do one thing: retrieve facts that match your query. Engram ships two distinct read paths: Reader(mode="recall") for "what did the user say about X?" and Reader(mode="synthesis") for "recommend something based on the user's preferences." On LongMemEval-S, this lifts single-session-preference accuracy from 29% to 71%.

2. Multi-tenant by default. Every API takes user_id + org_id (a Scope). Facts, vectors, and chat messages are partitioned at the SQL and HNSW levels — no cross-tenant leakage by construction. There is no "admin" scope that can read across tenants.

3. Local-first OR cloud. Pluggable providers for both LLM (Ollama / OpenAI / Anthropic) and embeddings (Ollama / OpenAI / synthetic). Run completely offline with nomic-embed-text + llama3.2:3b, or swap in cloud providers when you need throughput.

4. Active fact versioning. Engram.supersede(old_id, new_id) marks a fact as replaced. Recall filters superseded facts by default. So when a user changes a preference, the old one doesn't poison future answers.

5. Benchmarked, not vapor. Reproducible LongMemEval-S runs in benchmarks/. We publish overall and per-category numbers (currently 71% overall, see below). The 25-point gap to AgentMemory's frontier 96.2% is a public roadmap item — not hidden.

6. MCP-native. Ships an MCP server so Claude Code, Cursor, and other agentic tools can use Engram memory directly. Three tools: memory_record, memory_recall, memory_context.


Install

uv add jamjet-engram                  # or: pip install jamjet-engram
uv add 'jamjet-engram[rerank]'        # cross-encoder reranker (recommended)

Requires Python 3.12+. SQLite (stdlib) and hnswlib (auto-installed) are the only persistence dependencies.


60-second tour

Record + recall

import asyncio
from engram import Engram

async def main():
    async with await Engram.open(":memory:") as memory:
        await memory.record("user prefers oat milk", user_id="alice")
        await memory.record("user drinks coffee twice a day", user_id="alice")
        await memory.record("user's dog is named Whiskey", user_id="alice")

        results = await memory.recall(query="coffee", user_id="alice", top_k=3)
        for r in results:
            print(f"  [{r.score:.2f}]  {r.fact.text}")

asyncio.run(main())

Build a context window for an LLM

ctx = await memory.context(
    query="what should I order at the cafe?",
    user_id="alice",
    token_budget=2000,
)
# ctx is a newline-joined string ready to drop into a system prompt

Per-category routing (the differentiator)

For preference/recommendation questions, route to synthesis mode — the reader generates a recommendation grounded in stored user preferences instead of just listing facts.

from engram import Engram, Reader, RuleBasedClassifier, is_preference_question
from engram.llm.tier import ModelTier

tier = ModelTier.default()  # gpt-4o-mini
classifier = RuleBasedClassifier()

async with await Engram.open(":memory:", tier=tier) as memory:
    for turn in conversation_history:
        await memory.record(text=turn["content"], role=turn["role"], user_id="alice")

    question = "Recommend a movie for tonight"
    qt = await classifier.classify(question)

    if is_preference_question(question, qt):
        ctx = await memory.context(query=question, user_id="alice", role_filter=("user",))
        reader = Reader(tier.reader, mode="synthesis")
    else:
        ctx = await memory.context(query=question, user_id="alice", classifier=classifier)
        reader = Reader(tier.reader)

    res = await reader.read(question=question, context=ctx)
    print(res.answer)

Full walk-through: docs/quickstart.md. Working example: examples/05_preference_synthesis.py.

Run as an HTTP server

import uvicorn
from engram import Engram
from engram.server.http import build_http_app

memory = await Engram.open("./engram.db")
uvicorn.run(build_http_app(memory), host="127.0.0.1", port=19090)
curl -sX POST localhost:19090/v1/memory/record \
  -H 'Content-Type: application/json' \
  -d '{"text": "alice prefers espresso", "user_id": "alice"}'

Endpoints: record, extract, recall, search, context, raw_facts, sessions, messages, healthz.

Run as an MCP server (for Claude Code, Cursor, etc.)

from engram.server.mcp import build_mcp_server
import mcp.server.stdio

server = build_mcp_server(memory)
async with mcp.server.stdio.stdio_server() as streams:
    await server.run(streams[0], streams[1], server.create_initialization_options())

Use with JamJet

Engram is the memory layer for agents built on JamJet — JamJet Labs' durable execution runtime. The two projects share design DNA: Apache 2.0, MCP-native, multi-tenant by default. Engram handles "what does the agent remember?"; JamJet handles "what happens when the process restarts mid-execution?"

Three composition patterns:

Engram works fine standalone if you're not using JamJet. But if you're building durable AI agents on the JVM or Python, the two-piece stack saves you from reinventing memory.

Quick disambiguations (the ecosystem has accumulated some name overlap):

  • pip install jamjet-engram (this library) — the standalone Python memory library.
  • pip install jamjet — JamJet's Python SDK for the runtime; not this library.
  • "Durable" in JamJet means durable execution (your agent process can crash and resume from a checkpoint).
  • "Durable" in Engram means durable memory (your stored facts persist across runs and version cleanly).
  • The two are orthogonal capabilities — both apply to the same agent.

Core concepts

ConceptWhat it isWhy you care
Scope(org_id, user_id) pair attached to every factMulti-tenancy. No cross-tenant leakage at the SQL or vector index layer.
FactPydantic model: text, scope, validity window, confidence, optional event_date / session_id / role / metadataThe unit of memory. Versioned via supersede, scored at retrieval, governed by per-tenant scope.
EngramAsync facade over store + embedder + retriever + (optional) extractor + (optional) tierThe thing you actually call. record(), recall(), context(), extract().
ReaderLLM-driven answer generator with two modes: "recall" (verifier-gated fact recall) and "synthesis" (recommendation grounded in user preferences)The mode you pick changes the whole pipeline — verifier on or off, tool loop or not, recall prompt or synthesis prompt.
RuleBasedClassifierMaps a question to one of the 6 LongMemEval question typesUsed for per-category token budgets and to decide between recall vs synthesis. Override with your own QuestionClassifier if you need different categories.
MemoryTierWORKING / EPISODIC / SEMANTIC enum on each factOptional structuring of memory by time horizon. The store doesn't enforce it; your code does.
Tool + ToolRegistryThe reader's tool-use protocol (text-marker [TOOL_USE]{...}[/TOOL_USE])When mode="recall", the reader can call tools mid-generation. Six built-in tools (search_facts, search_events, solve_temporal, count_between, add_days, days_between). Bring your own.

All importable from the top-level package: from engram import Engram, Scope, Fact, Reader, ....


Architecture (one paragraph + a diagram)

Three layers under the Engram facade. Storage = SqliteStore (FTS5 for keyword) + HnswVectorStore (cosine similarity). Retrieval = HybridRetriever blends vector / keyword / temporal scoring with optional cross-encoder rerank. Reading = Reader (verifier-gated recall) or synthesis-mode (verifier off, recommendation prompt). Optional extras: ExtractionPipeline (LLM extracts facts from chat turns), EventExtractor (SVO event calendar for temporal solvers), QueryDecomposer (splits compound questions), and a tool registry the reader can call mid-generation. Everything is async; everything is scope-isolated.

                   ┌──────────────────────────────────────────────┐
                   │  Engram (async facade)                       │
                   │  record  •  recall  •  context  •  extract  │
                   │  supersede  •  record_message               │
                   └───────────┬───────────┬──────────┬──────────┘
              ┌────────────────┘           │          └──────────────┐
              ▼                            ▼                         ▼
   ┌──────────────────┐          ┌──────────────────┐      ┌──────────────────┐
   │  SqliteStore     │          │  HnswVectorStore │      │  Reader          │
   │  facts + msgs    │          │  per-scope HNSW  │      │  recall mode     │
   │  FTS5 keyword    │          │  cosine          │      │   ↳ verifier     │
   │  scope-isolated  │          │  deterministic   │      │   ↳ tool loop    │
   └──────────────────┘          └──────────────────┘      │   ↳ escalation   │
              │                            │                │  synthesis mode  │
              └─────────────┬──────────────┘                │   ↳ direct LLM   │
                            ▼                                └──────────────────┘
                  ┌──────────────────┐
                  │  HybridRetriever │  vector + keyword + temporal scoring
                  │  + reranker      │  cross-encoder rerank (optional)
                  └──────────────────┘

Files to start exploring: src/engram/engram.py is the facade. src/engram/read/reader.py is where the two modes live. benchmarks/smoke_runner.py is the LongMemEval harness.


Benchmarks — LongMemEval-S

Engram is benchmarked against LongMemEval-S (Wu et al., 2024) — 500 questions about a long synthetic chat history, judged by gpt-4o-mini.

Latest result (v0.1.0)

71.0% on a 100-question stratified subset (gpt-4o-mini reader, --decompose --tools, preference-aware routing on).

categoryscoren
single-session-assistant88%17
temporal-reasoning75%16
single-session-preference71%17
knowledge-update71%17
single-session-user69%16
multi-session59%17
overall71%100

Frontier comparison: AgentMemory reports 96.2% on the full 500. Engram is currently 25pp behind. Documented gap-closing work in Roadmap.

Reproduce

set -a && source /path/to/.env && set +a    # OPENAI_API_KEY required
export LONGMEMEVAL_ORACLE=/path/to/longmemeval_oracle.json
uv run python -m benchmarks.smoke_runner --n 100 --decompose --tools

The runner writes a per-question JSONL trace + a markdown report to benchmarks/reports/. Set PYTHONHASHSEED=42 for fully deterministic insertion order.

What we tried, what worked, what didn't

The benchmark history (visible in CHANGELOG.md) documents two ablation programmes — a Tier 2-3 batch (April 2026, 64% → 68%, 8 techniques ablated independently) and a preference uplift (May 2026, 29% → 71% on single-session-preference after a failed attempt-1 led to a redesign). We publish the negative results too: --ft-cross-encoder looked promising on IR metrics but lost 7pp downstream because the training labels misaligned with multi-session task structure; --reextract and --self-consistency didn't help because the verifier short-circuited them. Honest history makes the benchmark numbers credible.


Running in production

Scope isolation. Every API call takes user_id + org_id (combined into a Scope). Facts, vectors, and chat-message storage are partitioned by scope at the SQL and HNSW levels — no cross-tenant leakage by construction. There is no "admin" scope that can read all data.

Embedding provider tradeoffs. OllamaEmbedding runs entirely local (private, free, slower) — recommended for sensitive data or offline usage. OpenAIEmbedding is faster and more accurate but sends every recorded text to OpenAI. SyntheticEmbedding is for tests only.

LLM API keys. Engram never stores API keys. They're read from the environment by the OpenAILLM / AnthropicLLM / OllamaLLM clients. Use a secrets manager in production.

Rate limits. The default extraction pipeline calls one LLM per session ingested. Bulk imports of large chat histories should chunk + rate-limit to fit your provider's tier. The OpenAILLM and AnthropicLLM clients use the official SDKs which handle retries with exponential backoff.

Determinism. HnswVectorStore defaults to random_seed=42 for reproducible benchmark runs. Set PYTHONHASHSEED=42 and use the same insertion order for byte-for-byte reproducibility.

Per-category routing. For preference/recommendation questions, route to Reader(mode="synthesis") with Engram.context(role_filter=("user",)). Other categories take the default fact-recall reader. See docs/quickstart.md for the canonical pattern.


Hosted MCP server

jamjet-engram 0.2.0+ ships an engram-server binary that exposes Engram's MCP tools over Streamable HTTP. Use it to host a remote Engram for JamJet Cloud agents, ChatGPT custom connectors, or any HTTP-aware MCP client.

Quickstart

pip install jamjet-engram
export ENGRAM_AUTH_TOKEN=$(python -c "import secrets; print(secrets.token_urlsafe(32))")
engram-server --transport http --port 8765

Or via Docker:

docker run --rm \
  -e ENGRAM_AUTH_TOKEN="$ENGRAM_AUTH_TOKEN" \
  -v engram-data:/data \
  -p 8765:8765 \
  ghcr.io/jamjet-labs/engram-py-server:0.2.0

Connecting a client

curl -i -X POST http://localhost:8765/mcp \
  -H "Authorization: Bearer $ENGRAM_AUTH_TOKEN" \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize",
       "params":{"protocolVersion":"2025-03-26","capabilities":{},
                 "clientInfo":{"name":"curl","version":"1"}}}'

The server returns MCP-Session-Id in the response headers; subsequent requests echo it.

Auth model (v0.2)

  • /mcp requires Authorization: Bearer <token> matching $ENGRAM_AUTH_TOKEN.
  • The server refuses to start if $ENGRAM_AUTH_TOKEN is unset (fail-closed). The only opt-out is --no-auth, which is rejected unless --host is 127.0.0.1 or ::1.
  • /v1/memory/* REST routes remain unauthed (matches the Rust engram-server REST behavior; deploy them behind a reverse proxy if you need auth).
  • OAuth2 dynamic client registration (the Claude Desktop remote-server flavor) is on the roadmap.

Stdio transport

For Claude Desktop and other stdio MCP clients, keep using the Rust ghcr.io/jamjet-labs/engram-server:0.5.0 image — that path is unaffected by this release.


Contributing

We genuinely welcome contributions. The repo is set up so a new contributor can ship a meaningful change in a couple of hours.

Quick start

git clone https://github.com/jamjet-labs/engram.git
cd engram
uv sync --all-extras
uv run pytest                # 309 tests, ~50s
uv run ruff check && uv run ruff format --check && uv run mypy src/engram

CI gates on all three. Coverage floor is 79% (--cov-fail-under=79) and a downstream py.typed job builds the wheel and verifies mypy --strict passes for library users.

Full developer guide: CONTRIBUTING.md. Security policy: SECURITY.md.

Where contributions are most welcome

AreaWhat you'd doDifficulty
New embedding providerImplement EmbeddingProvider for Cohere / Voyage / local-sbert / etc. Pattern in src/engram/embedding/.Easy — single file + tests
New LLM providerImplement LLMClient for Vertex / Together / etc. Pattern in src/engram/llm/.Easy
New tool for the readerImplement Tool for domain-specific work (e.g., search_calendar, query_database). Pattern in src/engram/tools/.Easy
Tighten preference predicate recallCurrently 76% on LongMemEval-S — misses "what do you think?" decision-help shapes. ~5 questions worth of headroom. See src/engram/read/preference_gate.py.Medium — requires benchmark validation
Postgres backendImplement EngramStore against Postgres. Schema in src/engram/store/.Medium — schema + tests + CI matrix entry
Performance benchmarksRecall latency / throughput at scale. There's no baseline yet — open territory.Medium
Migration guide from Rust v0.5.xDocument the API mapping between the Rust runtime and the Python rewrite.Easy if you've used both
Move LongMemEval ≥80%The 25pp gap to AgentMemory's 96.2% is the headline gap. Pick a category (multi-session at 59% is the lowest) and find a +5pp improvement.Hard — but high-impact

We tag issues with good first issue for the first three categories. Open an issue before starting a large change so we can sanity-check the approach.

What makes a good PR here

  • A failing test first (TDD is enforced by the CI checks)
  • For benchmark-affecting changes, an n=100 smoke result vs the published baseline (within ±2pp on non-target categories, real lift on the target)
  • A CHANGELOG.md entry under [Unreleased]
  • Conventional commit message: feat(scope): description or fix(scope): description

How decisions get made

Architectural changes go through a brainstorm → spec → plan → implementation cycle. The specs and plans are local-only (gitignored under docs/superpowers/); the outcomes land in CHANGELOG entries with PR references. We document negative results too — see the "Preference uplift v1 (rolled back)" entry for a worked example of a strategy that didn't move the metric and what we learned.


Roadmap

In rough priority order:

  • LongMemEval-S ≥85% — the top headline. Currently 71%; AgentMemory frontier is 96.2%. Multi-session at 59% is the lowest-hanging fruit.
  • Postgres backend — currently SQLite-only. Schema is straightforward; needs a EngramStore Postgres implementation + CI matrix entry.
  • Sonnet / Haiku reader compatibility — the v0.1.0 synthesis prompt is calibrated against gpt-4o-mini. Sonnet/Haiku regress on tool-protocol parsing. Native Anthropic tool-use API path would unlock +5–10pp on tool-augmented runs.
  • Performance benchmarks — recall latency, throughput, memory footprint at scale. No published baseline yet.
  • Per-category synthesis modes for non-preference categories — only single-session-preference currently uses synthesis mode. Multi-session might benefit from a different tailored prompt.
  • Migration guide from Rust v0.5.x — the Rust runtime stays in maintenance mode at jamjet-labs/jamjet; a side-by-side API map would help users migrate.
  • Cross-language client SDKs — Engram is Python today. The HTTP API is stable enough that a TypeScript / Go client is mostly mechanical.

If any of these are interesting, open an issue and let's talk.


Project meta


Engram is built by JamJet Labs. The benchmark targets and design decisions are public; the failures (and there have been a few) are documented in CHANGELOG. If you'd like to help us push past 71% — please do.