Memory Module

June 13, 2026 · View on GitHub

Import: from selectools import ConversationMemory

Stability: stable

from selectools import Agent, AgentConfig, ConversationMemory, Message, Role, tool
from selectools.providers.stubs import LocalProvider

@tool(description="Look up a fact")
def lookup(query: str) -> str:
    return f"The answer to '{query}' is 42."

memory = ConversationMemory(max_messages=20)
provider = LocalProvider()

agent = Agent(
    tools=[lookup],
    provider=provider,
    memory=memory,
    config=AgentConfig(max_iterations=1),
)

# Turn 1 — agent remembers this
agent.run([Message(role=Role.USER, content="My name is Alice")])

# Turn 2 — context preserved via memory
result = agent.run([Message(role=Role.USER, content="What is my name?")])
print(result.content)
print(f"Messages in memory: {len(memory.get_history())}")

!!! tip "See Also" - Sessions - Persistent session storage with JSON, SQLite, and Redis backends - Entity Memory - LLM-based named entity extraction and context injection

File: src/selectools/memory.py Classes: ConversationMemory

Overview
Memory Management
Integration with Agent
Implementation
Summarize-on-Trim
Best Practices
Related Memory Modules

Overview

The ConversationMemory class maintains dialogue history across multiple agent interactions, implementing a sliding window that keeps the most recent messages when limits are exceeded.

Purpose

Multi-Turn Conversations: Enable context retention across calls
Memory Management: Prevent token limit explosions
History Access: Retrieve conversation state for debugging/logging

Memory Management

Configuration

memory = ConversationMemory(
    max_messages=20,    # Keep last 20 messages
    max_tokens=4000     # Optional token-based limit
)

Sliding Window

Initial: []

Add: USER("Hello")
└─→ [USER("Hello")]

Add: ASSISTANT("Hi!")
└─→ [USER("Hello"), ASSISTANT("Hi!")]

Add: USER("What's 2+2?")
└─→ [USER("Hello"), ASSISTANT("Hi!"), USER("What's 2+2?")]

... continues until limit ...

At limit (max_messages=3):
[USER("Hello"), ASSISTANT("Hi!"), USER("What's 2+2?")]

Add: ASSISTANT("4")
└─→ Remove oldest: USER("Hello")
└─→ [ASSISTANT("Hi!"), USER("What's 2+2?"), ASSISTANT("4")]

After the sliding window trim, the memory scans forward to find the first safe boundary. This prevents orphaning a TOOL result without its preceding ASSISTANT tool-use message, which would violate provider API contracts.

def _fix_tool_pair_boundary(self) -> None:
    while len(self._messages) > 1:
        first = self._messages[0]
        if first.role == Role.TOOL:
            self._messages.pop(0)
            continue
        if first.role == Role.ASSISTANT and first.tool_calls:
            self._messages.pop(0)
            continue
        break

Before fix: Trim might leave [TOOL("result..."), USER("next question")] — invalid.

After fix: Advances past orphaned messages to [USER("next question")] — valid.

Observer Notifications

When an AgentObserver is registered, the agent fires on_memory_trim whenever trimming occurs — both for messages added during the run (via _memory_add) and for the initial user messages added at the start of run()/arun()/astream() (via _memory_add_many):

from selectools import AgentObserver

class MemoryWatcher(AgentObserver):
    def on_memory_trim(self, run_id, messages_removed, messages_remaining, reason):
        print(f"[{run_id}] Trimmed {messages_removed} messages, {messages_remaining} remaining")

The reason parameter is "enforce_limits" for sliding window / max-tokens trimming.

Implementation

def _enforce_limits(self) -> None:
    # 1. Enforce message count limit
    if len(self._messages) > self.max_messages:
        excess = len(self._messages) - self.max_messages
        self._messages = self._messages[excess:]

    # 2. Enforce token count limit (if configured)
    if self.max_tokens is not None:
        while len(self._messages) > 1:  # Keep at least 1
            total_tokens = sum(
                estimate_tokens(msg.content)
                for msg in self._messages
            )

            if total_tokens <= self.max_tokens:
                break

            # Remove oldest message
            self._messages.pop(0)

    # 3. Fix tool-pair boundary
    self._fix_tool_pair_boundary()

Integration with Agent

With Memory

from selectools import Agent, ConversationMemory, Message, Role

memory = ConversationMemory(max_messages=20)
agent = Agent(tools=[...], provider=provider, memory=memory)

# Turn 1
response1 = agent.run([
    Message(role=Role.USER, content="My name is Alice")
])

# Turn 2 - Context preserved
response2 = agent.run([
    Message(role=Role.USER, content="What's my name?")
])
# Agent knows: "Alice"

Flow

graph TD
    A["run() called"] --> B["memory.get_history()"]
    B --> C["Append new user messages"]
    C --> D["memory.add_many(new_messages)"]
    D --> E["Execute agent loop"]
    E --> F["memory.add(final_response)"]
    F --> G["Return response"]
    E -.-> E1["LLM sees full history"]
    E -.-> E2["Tool calls append to history"]
    E -.-> E3["memory.add() for each message"]

Without Memory

agent = Agent(tools=[...], provider=provider)  # No memory

# Each call is independent
response1 = agent.run([Message(role=Role.USER, content="My name is Alice")])
response2 = agent.run([Message(role=Role.USER, content="What's my name?")])
# Agent doesn't know - no memory

Implementation

Class Structure

class ConversationMemory:
    def __init__(self, max_messages: int = 20, max_tokens: Optional[int] = None):
        if max_messages < 1:
            raise ValueError("max_messages must be at least 1")
        if max_tokens is not None and max_tokens < 1:
            raise ValueError("max_tokens must be at least 1")

        self.max_messages = max_messages
        self.max_tokens = max_tokens
        self._messages: List[Message] = []

Core Methods

def add(self, message: Message) -> None:
    """Add a single message to history."""
    self._messages.append(message)
    self._enforce_limits()

def add_many(self, messages: List[Message]) -> None:
    """Add multiple messages at once."""
    self._messages.extend(messages)
    self._enforce_limits()

def get_history(self) -> List[Message]:
    """Get full conversation history."""
    return list(self._messages)

def get_recent(self, n: int) -> List[Message]:
    """Get last N messages."""
    if n < 1:
        raise ValueError("n must be at least 1")
    return self._messages[-n:] if len(self._messages) >= n else list(self._messages)

def clear(self) -> None:
    """Clear all messages."""
    self._messages.clear()

Serialization

def to_dict(self) -> Dict[str, Any]:
    """Serialize memory for logging/persistence."""
    return {
        "max_messages": self.max_messages,
        "max_tokens": self.max_tokens,
        "message_count": len(self._messages),
        "messages": [msg.to_dict() for msg in self._messages],
        "summary": self._summary,
    }

Deserialization with `from_dict()`

Reconstruct a ConversationMemory from a dictionary produced by to_dict(). The restored instance preserves the exact persisted state — _enforce_limits() is not re-run, so no messages are silently dropped during reconstruction. The tool-pair boundary is fixed to ensure a valid starting message.

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "ConversationMemory":
    """Reconstruct a ConversationMemory from a to_dict() output."""
    ...

Usage:

import json

# Save
with open("conversation.json", "w") as f:
    json.dump(memory.to_dict(), f)

# Restore
with open("conversation.json", "r") as f:
    data = json.load(f)
    memory = ConversationMemory.from_dict(data)

# Summary is preserved
print(memory.summary)  # Restored if present

Key behaviors:

Config fields (max_messages, max_tokens) are restored from the dict
Messages are reconstructed via Message.from_dict()
The summary field (from summarize-on-trim) is preserved
_fix_tool_pair_boundary() runs to ensure valid conversation start
_last_trimmed is reset to empty (trim history is not persisted)

Summarize-on-Trim

When messages are trimmed by the sliding window, important early context is normally lost. Summarize-on-trim generates a summary of the dropped messages and preserves it as system context.

Configuration

Summarize-on-trim is configured via AgentConfig, not on ConversationMemory directly:

from selectools import Agent, AgentConfig, ConversationMemory

memory = ConversationMemory(max_messages=30)
agent = Agent(
    tools=[...], provider=provider, memory=memory,
    config=AgentConfig(
        summarize_on_trim=True,
        summarize_provider=provider,       # Provider for summarization
        summarize_model="gpt-4o-mini",     # Use a cheap/fast model
        summarize_max_tokens=150,          # Max tokens for the summary
    ),
)

How It Works

graph TD
    A["Messages exceed max_messages"] --> B["_enforce_limits() trims oldest"]
    B --> C["Trimmed messages stored in _last_trimmed"]
    C --> D["Agent detects _last_trimmed is non-empty"]
    D --> E["Send trimmed messages to summarize_provider"]
    E --> F["Provider returns 2-3 sentence summary"]
    F --> G["Summary stored in memory.summary"]
    G --> H["on_memory_summarize observer event fired"]
    H --> I["Next turn: summary injected as\n[Conversation Summary] in system prompt"]

Key Properties

memory.summary: Read the current summary (or None if no trimming has occurred)
memory._last_trimmed: Messages removed during the most recent _enforce_limits() call

Summary Persistence

When using to_dict() / from_dict(), the summary is included:

data = memory.to_dict()
# data["summary"] contains the current summary string (or None)

restored = ConversationMemory.from_dict(data)
print(restored.summary)  # Summary is preserved

Best Practices

1. Choose Appropriate Limits

# Short interactions (Q&A bot)
memory = ConversationMemory(max_messages=10)

# Standard conversations
memory = ConversationMemory(max_messages=20)

# Long-form dialogues
memory = ConversationMemory(max_messages=50)

2. Use Token Limits for Cost Control

# Limit by tokens to prevent large prompts
memory = ConversationMemory(
    max_messages=100,     # High message count
    max_tokens=4000       # But limit tokens
)

3. Clear Memory Between Sessions

# Start fresh conversation
memory.clear()

4. Access Recent Context

# Get last 5 messages for display
recent = memory.get_recent(5)
for msg in recent:
    print(f"{msg.role}: {msg.content}")

5. Serialize and Restore

import json

# Save conversation
with open("conversation.json", "w") as f:
    json.dump(memory.to_dict(), f)

# Restore conversation (preserves summary and all messages)
with open("conversation.json", "r") as f:
    data = json.load(f)
    memory = ConversationMemory.from_dict(data)

Testing

def test_memory_sliding_window():
    memory = ConversationMemory(max_messages=3)

    # Add 5 messages
    for i in range(5):
        memory.add(Message(role=Role.USER, content=f"Message {i}"))

    # Should only keep last 3
    history = memory.get_history()
    assert len(history) == 3
    assert history[0].content == "Message 2"
    assert history[2].content == "Message 4"

def test_memory_with_agent():
    memory = ConversationMemory(max_messages=10)
    agent = Agent(tools=[...], provider=LocalProvider(), memory=memory)

    # First turn
    agent.run([Message(role=Role.USER, content="Hello")])
    assert len(memory.get_history()) > 0

    # Second turn
    agent.run([Message(role=Role.USER, content="Goodbye")])
    assert len(memory.get_history()) > 1

Common Pitfalls

# ❌ Bad - Each agent has separate memory
agent1 = Agent(..., memory=ConversationMemory())
agent2 = Agent(..., memory=ConversationMemory())

# ✅ Good - Shared memory
memory = ConversationMemory()
agent1 = Agent(..., memory=memory)
agent2 = Agent(..., memory=memory)

2. Not Clearing Between Users

# ❌ Bad - User A sees User B's history
def handle_user_a():
    agent.run([...])

def handle_user_b():
    agent.run([...])  # Sees User A's messages!

# ✅ Good - Clear between users
def handle_user(user_id):
    if user_id != previous_user:
        memory.clear()
    agent.run([...])

3. Setting Limits Too Low

# ❌ Bad - Forgets context quickly
memory = ConversationMemory(max_messages=2)

# ✅ Good - Reasonable context
memory = ConversationMemory(max_messages=20)

The following memory features were shipped in v0.16.0 and integrate with ConversationMemory via AgentConfig:

Sessions — Persistent session storage with JSON file, SQLite, and Redis backends
Entity Memory — LLM-based named entity extraction and context injection
Knowledge Graph — Relationship triple extraction with in-memory and SQLite storage
Knowledge Memory — Cross-session durable memory with daily logs and auto-registered remember + recall tools (recall searches stored entries by keyword; see Auto-Registered Tools)

Unified Memory via Config

Stability: beta (v1.1)

UnifiedMemory (shipped standalone in v0.24.0) composes the memory tiers — short-term (ConversationMemory), long-term (KnowledgeMemory with importance-based auto-promotion), entity, and episodic — into one lifecycle. Since v1.1 it is reachable directly from AgentConfig:

from selectools import Agent, AgentConfig, MemoryConfig

agent = Agent(
    tools=[lookup],
    provider=provider,
    config=AgentConfig(
        memory=MemoryConfig(
            unified=True,
            importance_threshold=0.7,
            short_term_limit=100,
            long_term_limit=1000,
            episodic_retention_days=30,
            auto_promote=True,
        ),
    ),
)

agent.run([Message(role=Role.USER, content="My name is Alice")])
# ... once that turn ages out of the short-term window it is scored
# (identity rule -> 0.9) and auto-promoted to long-term memory, then
# injected back as context on later runs.

agent.unified_memory.recall("user's name")  # federated recall across tiers

How the agent drives it

Before each call the agent injects one system message built by UnifiedMemory.assemble_context(max_tokens=context_max_tokens, include_conversation=False) — the long-term, entity, and episodic tiers. The short-term tier is sent as structured messages (it is the conversation history), so it is excluded from the context block. Compaction triggers at 70% of context_max_tokens.
After each run (including max-iterations, budget-exceeded, and cancelled exits) the completed turn is written back via UnifiedMemory.add_turn() — the single write path that records the episode, feeds the entity tier when present, and promotes aged-out short-term items whose importance clears importance_threshold.
agent.reset() clears the short-term, episodic, and dedup state but preserves long-term memory. clone_for_isolation() drops unified memory, matching the existing memory semantics.

MemoryConfig fields (beta)

Field	Default	Meaning
`unified`	`False`	Master switch. Off = zero behavior change.
`unified_memory`	`None`	Pre-built `UnifiedMemory` instance (implies `unified=True`); use for custom tiers, scorers, or summarizers. Scalar fields below are then ignored.
`importance_threshold`	`0.7`	Minimum score for STM -> LTM promotion.
`short_term_limit`	`100`	Rolling window size, in messages.
`long_term_limit`	`1000`	Max long-term entries before importance-based eviction.
`episodic_retention_days`	`30`	Episodes older than this are pruned.
`auto_promote`	`True`	Promote aging-out items automatically.
`context_max_tokens`	`4000`	Budget for the injected context block.

Constraints

Mutually exclusive with entity_memory, knowledge_graph, and knowledge_memory on MemoryConfig (inject custom tiers through a UnifiedMemory instance instead), with the Agent memory= parameter (unified memory manages its own short-term tier), and with session_store (unified memory is in-process in v1.1; session persistence is a planned follow-up). Each conflict raises ValueError at construction time.
The tier parameters are validated only while unified memory is enabled; they are inert otherwise.

Future Enhancements

Potential improvements (see Roadmap):

Semantic Pruning: Remove similar/redundant messages to maximize useful context

#	Script	Description
04	`04_conversation_memory.py`	Multi-turn conversation with sliding window memory
20	`20_customer_support_bot.py`	Full customer support bot with memory, guardrails, and tools
34	`34_summarize_on_trim.py`	Summarize-on-trim to preserve context when memory overflows
106	`106_unified_memory.py`	UnifiedMemory standalone: tiers, scoring, promotion, recall
112	`112_unified_memory_config.py`	Unified memory wired into an Agent via `MemoryConfig(unified=True)`

Memory Module

Table of Contents

Overview

Purpose

Memory Management

Configuration

Sliding Window

Tool-Pair-Aware Trimming

Observer Notifications

Implementation

Integration with Agent

With Memory

Flow

Without Memory

Implementation

Class Structure

Core Methods

Serialization

Deserialization with `from_dict()`

Summarize-on-Trim

Configuration

How It Works

Key Properties

Summary Persistence

Best Practices

1. Choose Appropriate Limits

2. Use Token Limits for Cost Control

3. Clear Memory Between Sessions

4. Access Recent Context

5. Serialize and Restore

Testing

Common Pitfalls

2. Not Clearing Between Users

3. Setting Limits Too Low

Unified Memory via Config

How the agent drives it

MemoryConfig fields (beta)

Constraints

Future Enhancements

Further Reading