Memory Module
June 13, 2026 · View on GitHub
Import: from selectools import ConversationMemory
Stability: stable
from selectools import Agent, AgentConfig, ConversationMemory, Message, Role, tool
from selectools.providers.stubs import LocalProvider
@tool(description="Look up a fact")
def lookup(query: str) -> str:
return f"The answer to '{query}' is 42."
memory = ConversationMemory(max_messages=20)
provider = LocalProvider()
agent = Agent(
tools=[lookup],
provider=provider,
memory=memory,
config=AgentConfig(max_iterations=1),
)
# Turn 1 — agent remembers this
agent.run([Message(role=Role.USER, content="My name is Alice")])
# Turn 2 — context preserved via memory
result = agent.run([Message(role=Role.USER, content="What is my name?")])
print(result.content)
print(f"Messages in memory: {len(memory.get_history())}")
!!! tip "See Also" - Sessions - Persistent session storage with JSON, SQLite, and Redis backends - Entity Memory - LLM-based named entity extraction and context injection
File: src/selectools/memory.py
Classes: ConversationMemory
Table of Contents
- Overview
- Memory Management
- Integration with Agent
- Implementation
- Summarize-on-Trim
- Best Practices
- Related Memory Modules
Overview
The ConversationMemory class maintains dialogue history across multiple agent interactions, implementing a sliding window that keeps the most recent messages when limits are exceeded.
Purpose
- Multi-Turn Conversations: Enable context retention across calls
- Memory Management: Prevent token limit explosions
- History Access: Retrieve conversation state for debugging/logging
Memory Management
Configuration
memory = ConversationMemory(
max_messages=20, # Keep last 20 messages
max_tokens=4000 # Optional token-based limit
)
Sliding Window
Initial: []
Add: USER("Hello")
└─→ [USER("Hello")]
Add: ASSISTANT("Hi!")
└─→ [USER("Hello"), ASSISTANT("Hi!")]
Add: USER("What's 2+2?")
└─→ [USER("Hello"), ASSISTANT("Hi!"), USER("What's 2+2?")]
... continues until limit ...
At limit (max_messages=3):
[USER("Hello"), ASSISTANT("Hi!"), USER("What's 2+2?")]
Add: ASSISTANT("4")
└─→ Remove oldest: USER("Hello")
└─→ [ASSISTANT("Hi!"), USER("What's 2+2?"), ASSISTANT("4")]
Tool-Pair-Aware Trimming
After the sliding window trim, the memory scans forward to find the first safe boundary. This prevents orphaning a TOOL result without its preceding ASSISTANT tool-use message, which would violate provider API contracts.
def _fix_tool_pair_boundary(self) -> None:
while len(self._messages) > 1:
first = self._messages[0]
if first.role == Role.TOOL:
self._messages.pop(0)
continue
if first.role == Role.ASSISTANT and first.tool_calls:
self._messages.pop(0)
continue
break
Before fix: Trim might leave [TOOL("result..."), USER("next question")] — invalid.
After fix: Advances past orphaned messages to [USER("next question")] — valid.
Observer Notifications
When an AgentObserver is registered, the agent fires on_memory_trim whenever trimming occurs — both for messages added during the run (via _memory_add) and for the initial user messages added at the start of run()/arun()/astream() (via _memory_add_many):
from selectools import AgentObserver
class MemoryWatcher(AgentObserver):
def on_memory_trim(self, run_id, messages_removed, messages_remaining, reason):
print(f"[{run_id}] Trimmed {messages_removed} messages, {messages_remaining} remaining")
The reason parameter is "enforce_limits" for sliding window / max-tokens trimming.
Implementation
def _enforce_limits(self) -> None:
# 1. Enforce message count limit
if len(self._messages) > self.max_messages:
excess = len(self._messages) - self.max_messages
self._messages = self._messages[excess:]
# 2. Enforce token count limit (if configured)
if self.max_tokens is not None:
while len(self._messages) > 1: # Keep at least 1
total_tokens = sum(
estimate_tokens(msg.content)
for msg in self._messages
)
if total_tokens <= self.max_tokens:
break
# Remove oldest message
self._messages.pop(0)
# 3. Fix tool-pair boundary
self._fix_tool_pair_boundary()
Integration with Agent
With Memory
from selectools import Agent, ConversationMemory, Message, Role
memory = ConversationMemory(max_messages=20)
agent = Agent(tools=[...], provider=provider, memory=memory)
# Turn 1
response1 = agent.run([
Message(role=Role.USER, content="My name is Alice")
])
# Turn 2 - Context preserved
response2 = agent.run([
Message(role=Role.USER, content="What's my name?")
])
# Agent knows: "Alice"
Flow
graph TD
A["run() called"] --> B["memory.get_history()"]
B --> C["Append new user messages"]
C --> D["memory.add_many(new_messages)"]
D --> E["Execute agent loop"]
E --> F["memory.add(final_response)"]
F --> G["Return response"]
E -.-> E1["LLM sees full history"]
E -.-> E2["Tool calls append to history"]
E -.-> E3["memory.add() for each message"]
Without Memory
agent = Agent(tools=[...], provider=provider) # No memory
# Each call is independent
response1 = agent.run([Message(role=Role.USER, content="My name is Alice")])
response2 = agent.run([Message(role=Role.USER, content="What's my name?")])
# Agent doesn't know - no memory
Implementation
Class Structure
class ConversationMemory:
def __init__(self, max_messages: int = 20, max_tokens: Optional[int] = None):
if max_messages < 1:
raise ValueError("max_messages must be at least 1")
if max_tokens is not None and max_tokens < 1:
raise ValueError("max_tokens must be at least 1")
self.max_messages = max_messages
self.max_tokens = max_tokens
self._messages: List[Message] = []
Core Methods
def add(self, message: Message) -> None:
"""Add a single message to history."""
self._messages.append(message)
self._enforce_limits()
def add_many(self, messages: List[Message]) -> None:
"""Add multiple messages at once."""
self._messages.extend(messages)
self._enforce_limits()
def get_history(self) -> List[Message]:
"""Get full conversation history."""
return list(self._messages)
def get_recent(self, n: int) -> List[Message]:
"""Get last N messages."""
if n < 1:
raise ValueError("n must be at least 1")
return self._messages[-n:] if len(self._messages) >= n else list(self._messages)
def clear(self) -> None:
"""Clear all messages."""
self._messages.clear()
Serialization
def to_dict(self) -> Dict[str, Any]:
"""Serialize memory for logging/persistence."""
return {
"max_messages": self.max_messages,
"max_tokens": self.max_tokens,
"message_count": len(self._messages),
"messages": [msg.to_dict() for msg in self._messages],
"summary": self._summary,
}
Deserialization with from_dict()
Reconstruct a ConversationMemory from a dictionary produced by to_dict(). The restored instance preserves the exact persisted state — _enforce_limits() is not re-run, so no messages are silently dropped during reconstruction. The tool-pair boundary is fixed to ensure a valid starting message.
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "ConversationMemory":
"""Reconstruct a ConversationMemory from a to_dict() output."""
...
Usage:
import json
# Save
with open("conversation.json", "w") as f:
json.dump(memory.to_dict(), f)
# Restore
with open("conversation.json", "r") as f:
data = json.load(f)
memory = ConversationMemory.from_dict(data)
# Summary is preserved
print(memory.summary) # Restored if present
Key behaviors:
- Config fields (
max_messages,max_tokens) are restored from the dict - Messages are reconstructed via
Message.from_dict() - The
summaryfield (from summarize-on-trim) is preserved _fix_tool_pair_boundary()runs to ensure valid conversation start_last_trimmedis reset to empty (trim history is not persisted)
Summarize-on-Trim
When messages are trimmed by the sliding window, important early context is normally lost. Summarize-on-trim generates a summary of the dropped messages and preserves it as system context.
Configuration
Summarize-on-trim is configured via AgentConfig, not on ConversationMemory directly:
from selectools import Agent, AgentConfig, ConversationMemory
memory = ConversationMemory(max_messages=30)
agent = Agent(
tools=[...], provider=provider, memory=memory,
config=AgentConfig(
summarize_on_trim=True,
summarize_provider=provider, # Provider for summarization
summarize_model="gpt-4o-mini", # Use a cheap/fast model
summarize_max_tokens=150, # Max tokens for the summary
),
)
How It Works
graph TD
A["Messages exceed max_messages"] --> B["_enforce_limits() trims oldest"]
B --> C["Trimmed messages stored in _last_trimmed"]
C --> D["Agent detects _last_trimmed is non-empty"]
D --> E["Send trimmed messages to summarize_provider"]
E --> F["Provider returns 2-3 sentence summary"]
F --> G["Summary stored in memory.summary"]
G --> H["on_memory_summarize observer event fired"]
H --> I["Next turn: summary injected as\n[Conversation Summary] in system prompt"]
Key Properties
memory.summary: Read the current summary (orNoneif no trimming has occurred)memory._last_trimmed: Messages removed during the most recent_enforce_limits()call
Summary Persistence
When using to_dict() / from_dict(), the summary is included:
data = memory.to_dict()
# data["summary"] contains the current summary string (or None)
restored = ConversationMemory.from_dict(data)
print(restored.summary) # Summary is preserved
Best Practices
1. Choose Appropriate Limits
# Short interactions (Q&A bot)
memory = ConversationMemory(max_messages=10)
# Standard conversations
memory = ConversationMemory(max_messages=20)
# Long-form dialogues
memory = ConversationMemory(max_messages=50)
2. Use Token Limits for Cost Control
# Limit by tokens to prevent large prompts
memory = ConversationMemory(
max_messages=100, # High message count
max_tokens=4000 # But limit tokens
)
3. Clear Memory Between Sessions
# Start fresh conversation
memory.clear()
4. Access Recent Context
# Get last 5 messages for display
recent = memory.get_recent(5)
for msg in recent:
print(f"{msg.role}: {msg.content}")
5. Serialize and Restore
import json
# Save conversation
with open("conversation.json", "w") as f:
json.dump(memory.to_dict(), f)
# Restore conversation (preserves summary and all messages)
with open("conversation.json", "r") as f:
data = json.load(f)
memory = ConversationMemory.from_dict(data)
Testing
def test_memory_sliding_window():
memory = ConversationMemory(max_messages=3)
# Add 5 messages
for i in range(5):
memory.add(Message(role=Role.USER, content=f"Message {i}"))
# Should only keep last 3
history = memory.get_history()
assert len(history) == 3
assert history[0].content == "Message 2"
assert history[2].content == "Message 4"
def test_memory_with_agent():
memory = ConversationMemory(max_messages=10)
agent = Agent(tools=[...], provider=LocalProvider(), memory=memory)
# First turn
agent.run([Message(role=Role.USER, content="Hello")])
assert len(memory.get_history()) > 0
# Second turn
agent.run([Message(role=Role.USER, content="Goodbye")])
assert len(memory.get_history()) > 1
Common Pitfalls
1. Forgetting to Share Memory
# ❌ Bad - Each agent has separate memory
agent1 = Agent(..., memory=ConversationMemory())
agent2 = Agent(..., memory=ConversationMemory())
# ✅ Good - Shared memory
memory = ConversationMemory()
agent1 = Agent(..., memory=memory)
agent2 = Agent(..., memory=memory)
2. Not Clearing Between Users
# ❌ Bad - User A sees User B's history
def handle_user_a():
agent.run([...])
def handle_user_b():
agent.run([...]) # Sees User A's messages!
# ✅ Good - Clear between users
def handle_user(user_id):
if user_id != previous_user:
memory.clear()
agent.run([...])
3. Setting Limits Too Low
# ❌ Bad - Forgets context quickly
memory = ConversationMemory(max_messages=2)
# ✅ Good - Reasonable context
memory = ConversationMemory(max_messages=20)
Related Memory Modules (v0.16.0)
The following memory features were shipped in v0.16.0 and integrate with ConversationMemory via AgentConfig:
- Sessions — Persistent session storage with JSON file, SQLite, and Redis backends
- Entity Memory — LLM-based named entity extraction and context injection
- Knowledge Graph — Relationship triple extraction with in-memory and SQLite storage
- Knowledge Memory — Cross-session durable memory with daily logs and auto-registered
remember+recalltools (recall searches stored entries by keyword; see Auto-Registered Tools)
Unified Memory via Config
Stability: beta (v1.1)
UnifiedMemory (shipped standalone in v0.24.0) composes the memory tiers —
short-term (ConversationMemory), long-term (KnowledgeMemory with
importance-based auto-promotion), entity, and episodic — into one lifecycle.
Since v1.1 it is reachable directly from AgentConfig:
from selectools import Agent, AgentConfig, MemoryConfig
agent = Agent(
tools=[lookup],
provider=provider,
config=AgentConfig(
memory=MemoryConfig(
unified=True,
importance_threshold=0.7,
short_term_limit=100,
long_term_limit=1000,
episodic_retention_days=30,
auto_promote=True,
),
),
)
agent.run([Message(role=Role.USER, content="My name is Alice")])
# ... once that turn ages out of the short-term window it is scored
# (identity rule -> 0.9) and auto-promoted to long-term memory, then
# injected back as context on later runs.
agent.unified_memory.recall("user's name") # federated recall across tiers
How the agent drives it
- Before each call the agent injects one system message built by
UnifiedMemory.assemble_context(max_tokens=context_max_tokens, include_conversation=False)— the long-term, entity, and episodic tiers. The short-term tier is sent as structured messages (it is the conversation history), so it is excluded from the context block. Compaction triggers at 70% ofcontext_max_tokens. - After each run (including max-iterations, budget-exceeded, and
cancelled exits) the completed turn is written back via
UnifiedMemory.add_turn()— the single write path that records the episode, feeds the entity tier when present, and promotes aged-out short-term items whose importance clearsimportance_threshold. agent.reset()clears the short-term, episodic, and dedup state but preserves long-term memory.clone_for_isolation()drops unified memory, matching the existingmemorysemantics.
MemoryConfig fields (beta)
| Field | Default | Meaning |
|---|---|---|
unified | False | Master switch. Off = zero behavior change. |
unified_memory | None | Pre-built UnifiedMemory instance (implies unified=True); use for custom tiers, scorers, or summarizers. Scalar fields below are then ignored. |
importance_threshold | 0.7 | Minimum score for STM -> LTM promotion. |
short_term_limit | 100 | Rolling window size, in messages. |
long_term_limit | 1000 | Max long-term entries before importance-based eviction. |
episodic_retention_days | 30 | Episodes older than this are pruned. |
auto_promote | True | Promote aging-out items automatically. |
context_max_tokens | 4000 | Budget for the injected context block. |
Constraints
- Mutually exclusive with
entity_memory,knowledge_graph, andknowledge_memoryonMemoryConfig(inject custom tiers through aUnifiedMemoryinstance instead), with the Agentmemory=parameter (unified memory manages its own short-term tier), and withsession_store(unified memory is in-process in v1.1; session persistence is a planned follow-up). Each conflict raisesValueErrorat construction time. - The tier parameters are validated only while unified memory is enabled; they are inert otherwise.
Future Enhancements
Potential improvements (see Roadmap):
- Semantic Pruning: Remove similar/redundant messages to maximize useful context
Further Reading
- Agent Module - How agents use memory (including session, entity, KG, and knowledge integration)
- Sessions Module - Persistent session storage backends
- Entity Memory Module - Named entity extraction and tracking
- Knowledge Graph Module - Relationship triple extraction
- Knowledge Memory Module - Cross-session durable memory
- Types Module - Message data structure
Next Steps: Learn about usage tracking in the Usage Module.
Related Examples
| # | Script | Description |
|---|---|---|
| 04 | 04_conversation_memory.py | Multi-turn conversation with sliding window memory |
| 20 | 20_customer_support_bot.py | Full customer support bot with memory, guardrails, and tools |
| 34 | 34_summarize_on_trim.py | Summarize-on-trim to preserve context when memory overflows |
| 106 | 106_unified_memory.py | UnifiedMemory standalone: tiers, scoring, promotion, recall |
| 112 | 112_unified_memory_config.py | Unified memory wired into an Agent via MemoryConfig(unified=True) |