Chapter 2: Memory Architecture in Letta
April 13, 2026 · View on GitHub
Welcome to Chapter 2: Memory Architecture in Letta. In this part of Letta Tutorial: Stateful LLM Agents, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Understand core memory, archival memory, and recall memory - the three pillars of persistent agent memory.
Memory Architecture
flowchart TD
CTX[LLM Context Window]
subgraph CoreMem["Core Memory (In-Context)"]
PM[Persona Block]
HM[Human Block]
end
subgraph ArchivalMem["Archival Memory (External Store)"]
VS[Vector Database]
KG[Knowledge Graph]
end
subgraph RecallMem["Recall Memory (Conversation History)"]
CH[Past Messages Index]
end
CTX --> CoreMem
CoreMem -->|search_archival| ArchivalMem
CoreMem -->|search_recall| RecallMem
ArchivalMem -->|retrieved chunks| CTX
RecallMem -->|retrieved messages| CTX
classDef ctx fill:#e1f5fe,stroke:#01579b
classDef core fill:#f3e5f5,stroke:#4a148c
classDef external fill:#fff3e0,stroke:#ef6c00
class CTX ctx
class PM,HM core
class VS,KG,CH external
Overview
Letta's memory system is hierarchical and designed to give agents virtually unlimited context. This chapter explores the three types of memory and how they work together.
The Three Memory Types
1. Core Memory
Core memory is the agent's "working memory" - the most important information that should always be accessible. It includes:
- Agent identity and persona
- Key facts about the user
- Current goals and context
- Critical instructions
from letta import create_client
client = create_client()
# Get an agent's core memory
agent = client.get_agent("sam")
core_memory = agent.memory
print("Core Memory:")
for memory_block in core_memory:
print(f"- {memory_block.name}: {memory_block.value}")
2. Archival Memory
Archival memory is long-term storage for facts, events, and information that might be relevant later but isn't needed immediately. It's like the agent's "external hard drive".
# Add to archival memory
client.add_to_archival_memory("sam", "John's favorite programming language is Python")
# Search archival memory
results = client.search_archival_memory("sam", "programming")
3. Recall Memory
Recall memory contains recent conversation history and context. It's automatically managed and provides the immediate conversational context.
# Get recent messages
messages = client.get_messages("sam", limit=10)
for msg in messages:
print(f"{msg.role}: {msg.content}")
Memory Hierarchy
┌─────────────────┐
│ Core Memory │ ← Always in context, high priority
│ (Identity, │
│ Key Facts) │
├─────────────────┤
│ Recall Memory │ ← Recent conversation, auto-managed
│ (Last N turns) │
├─────────────────┤
│ Archival Memory │ ← Long-term storage, searchable
│ (Facts, Events)│
└─────────────────┘
How Memory Works in Practice
When you send a message:
- Core memory is always included in the context
- Recall memory provides recent conversation history
- Archival memory is searched for relevant information
- The LLM generates a response
- New information is automatically stored in appropriate memory types
Inspecting Memory
View what your agent knows:
# View core memory
letta get-agent --name sam
# Search archival memory
letta search-memory --name sam --query "python"
# View recent conversations
letta get-messages --name sam --limit 5
Memory Management
Adding to Core Memory
# Update core memory blocks
client.update_memory_block("sam", "human", "Name: John, Occupation: Developer, Location: SF")
Manual Archival Storage
# Store important facts
client.add_to_archival_memory("sam", "John completed the Python certification on 2024-01-15")
client.add_to_archival_memory("sam", "John prefers dark mode in all applications")
Memory Retrieval
# Semantic search
results = client.search_archival_memory("sam", "certification", top_k=3)
for result in results:
print(f"Score: {result.score}, Content: {result.content}")
Memory Limits and Optimization
Context Window Management
Letta automatically manages context to fit within LLM limits:
- Core memory: Always included (highest priority)
- Recall memory: Recent messages, truncated if needed
- Archival memory: Relevant chunks retrieved via search
Memory Compression
For very long conversations, Letta can compress or summarize older recall memory to save space.
Practical Examples
Remembering User Preferences
$ letta chat --name sam
Human: I prefer coffee over tea, and I'm allergic to nuts.
Assistant: I'll remember you prefer coffee and have a nut allergy. I'll keep this in mind for any food/drink recommendations.
Human: What should I drink in the morning?
Assistant: Based on what you've told me, you prefer coffee over tea. Would you like coffee recommendations?
Learning Over Time
# Agent learns about user's work
client.add_to_archival_memory("sam", "John works on machine learning projects at TechCorp")
client.add_to_archival_memory("sam", "John uses PyTorch for deep learning")
client.add_to_archival_memory("sam", "John is preparing for an ML conference talk")
# Later conversations will reference this knowledge
Memory Persistence
All memory is stored in a local database (SQLite by default) and persists across:
- Agent restarts
- System reboots
- Letta version updates
Advanced Memory Features
Memory Blocks
Core memory is organized into blocks:
# View memory blocks
blocks = client.get_memory_blocks("sam")
for block in blocks:
print(f"{block.name}: {block.value}")
Custom Memory Types
For advanced users, you can create custom memory blocks and retrieval strategies.
Best Practices
- Core Memory: Keep only essential, frequently-used information
- Archival Memory: Store detailed facts, events, and preferences
- Regular Cleanup: Periodically review and clean up outdated information
- Structured Storage: Use consistent formats for better retrieval
Memory vs. Traditional Chatbots
| Aspect | Traditional Chatbot | Letta Agent |
|---|---|---|
| Context | Limited to current conversation | Unlimited via hierarchical memory |
| Learning | None | Learns and remembers over time |
| Personalization | Basic | Deeply personalized experiences |
| Consistency | May contradict itself | Maintains consistent knowledge |
Next: Configure agent personalities and behavior with system prompts and models.
What Problem Does This Solve?
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for client, memory, John so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 2: Memory Architecture in Letta as an operating subsystem inside Letta Tutorial: Stateful LLM Agents, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around name, add_to_archival_memory, letta as your checklist when adapting these patterns to your own repository.
How it Works Under the Hood
Under the hood, Chapter 2: Memory Architecture in Letta usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
client. - Input normalization: shape incoming data so
memoryreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
John. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Source Walkthrough
Key source files in letta-ai/letta:
letta/memory.py--CoreMemory,ArchivalMemory,RecallMemorybase classes;__repr__shows what's visible in contextletta/agent.py--_build_system_message()assembles the context window by combining core memory blocks with tool definitionsletta/functions/function_sets/base.py-- built-in memory tools:archival_memory_search,archival_memory_insert,recall_memory_search
Suggested trace: watch how Agent.step() calls _build_system_message() to construct the prompt, then observe how archival search results get injected into the tool response context.