README.md

May 9, 2026 ยท View on GitHub

ReMe Logo

Python Version PyPI Version PyPI Downloads GitHub commit activity

License English ็ฎ€ไฝ“ไธญๆ–‡ GitHub Stars DeepWiki

agentscope-ai%2FReMe | Trendshift

A memory management toolkit for AI agents โ€” Remember Me, Refine Me.

For the older version, please refer to the 0.2.x documentation.


๐Ÿ“ฐ Latest Articles

DateTitle
2026-03-30Context Management Design

๐Ÿง  ReMe is a memory management framework designed for AI agents, providing both file-based and vector-based memory systems.

It tackles two core problems of agent memory: limited context window (early information is truncated or lost in long conversations) and stateless sessions (new sessions cannot inherit history and always start from scratch).

ReMe gives agents real memory โ€” old conversations are automatically compacted, important information is persistently stored, and relevant context is automatically recalled in future interactions.

ReMe achieves state-of-the-art results on the LoCoMo and HaluMem benchmarks; see the Experimental results.

What you can do with ReMe
  • Personal assistant: Provide long-term memory for agents like QwenPaw, remembering user preferences and conversation history.
  • Coding assistant: Record code style preferences and project context, maintaining a consistent development experience across sessions.
  • Customer service bot: Track user issue history and preference settings for personalized service.
  • Task automation: Learn success/failure patterns from historical tasks to continuously optimize execution strategies.
  • Knowledge Q&A: Build a searchable knowledge base with semantic search and exact matching support.
  • Multi-turn dialogue: Automatically compress long conversations while retaining key information within limited context windows.

๐Ÿ“ File-based memory system (ReMeLight)

Memory as files, files as memory.

Treat memory as files โ€” readable, editable, and copyable. QwenPaw integrates long-term memory and context management by inheriting from ReMeLight.

Traditional memory systemFile-based ReMe
๐Ÿ—„๏ธ Database storage๐Ÿ“ Markdown files
๐Ÿ”’ Opaque๐Ÿ‘€ Always readable
โŒ Hard to modifyโœ๏ธ Directly editable
๐Ÿšซ Hard to migrate๐Ÿ“ฆ Copy to migrate
working_dir/
โ”œโ”€โ”€ MEMORY.md              # Long-term memory: persistent info such as user preferences
โ”œโ”€โ”€ memory/
โ”‚   โ””โ”€โ”€ YYYY-MM-DD.md      # Daily journal: automatically written after each conversation
โ”œโ”€โ”€ dialog/                # Raw conversation records: full dialog before compression
โ”‚   โ””โ”€โ”€ YYYY-MM-DD.jsonl   # Daily conversation messages in JSONL format
โ””โ”€โ”€ tool_result/           # Cache for long tool outputs (auto-managed, expired entries auto-cleaned)
    โ””โ”€โ”€ <uuid>.txt

Core capabilities

ReMeLight is the core class of the file-based memory system. It provides full memory management capabilities for AI agents:

CategoryMethodFunctionKey components
Context Managementcheck_context๐Ÿ“Š Check context sizeContextChecker โ€” checks whether context exceeds thresholds and splits messages
compact_memory๐Ÿ“ฆ Compact history into summaryCompactor โ€” ReActAgent that generates structured context summaries
compact_tool_resultโœ‚๏ธ Compact long tool outputsToolResultCompactor โ€” truncates long tool outputs and stores them in tool_result/ while keeping file references in messages
pre_reasoning_hook๐Ÿ”„ Pre-reasoning hookcompact_tool_result + check_context + compact_memory + summary_memory (async)
Long-term Memorysummary_memory๐Ÿ“ Persist important memory to filesSummarizer โ€” ReActAgent + file tools (read / write / edit)
memory_search๐Ÿ” Semantic memory searchMemorySearch โ€” hybrid retrieval with vectors + BM25
Session Memoryget_in_memory_memory๐Ÿ’พ Create in-session memory instanceReturns ReMeInMemoryMemory with dialog_path configured for persistence
await_summary_tasksโณ Wait for async summary tasksBlock until all background summary tasks complete
-start๐Ÿš€ Start memory systemInitialize file storage, file watcher, and embedding cache; clean up expired tool result files
-close๐Ÿ“• Shutdown and cleanupClean up tool result files, stop file watcher, and persist embedding cache

๐Ÿš€ Quick start

Installation

Install from source:

git clone https://github.com/agentscope-ai/ReMe.git
cd ReMe
pip install -e ".[light]"

Update to the latest version:

git pull
pip install -e ".[light]"

Environment variables

ReMeLight uses environment variables to configure the embedding model and storage backends:

VariableDescriptionExample
LLM_API_KEYLLM API keysk-xxx
LLM_BASE_URLLLM base URLhttps://dashscope.aliyuncs.com/compatible-mode/v1
EMBEDDING_API_KEYEmbedding API key (optional)sk-xxx
EMBEDDING_BASE_URLEmbedding base URL (optional)https://dashscope.aliyuncs.com/compatible-mode/v1

Python usage

import asyncio

from reme.reme_light import ReMeLight


async def main():
    # Initialize ReMeLight
    reme = ReMeLight(
        default_as_llm_config={"model_name": "qwen3.5-35b-a3b"},
        # default_embedding_model_config={"model_name": "text-embedding-v4"},
        default_file_store_config={"fts_enabled": True, "vector_enabled": False},
        enable_load_env=True,
    )
    await reme.start()

    messages = [...]  # List of conversation messages

    # 1. Check context size (token counting, determine if compaction is needed)
    messages_to_compact, messages_to_keep, is_valid = await reme.check_context(
        messages=messages,
        memory_compact_threshold=90000,  # Threshold to trigger compaction (tokens)
        memory_compact_reserve=10000,  # Token count to reserve for recent messages
    )

    # 2. Compact conversation history into a structured summary
    summary = await reme.compact_memory(
        messages=messages,
        previous_summary="",
        max_input_length=128000,  # Model context window (tokens)
        compact_ratio=0.7,  # Trigger compaction when exceeding max_input_length * 0.7
        language="zh",  # Summary language (e.g., "zh" / "")
    )

    # 3. Compact long tool outputs (prevent tool results from blowing up context)
    messages = await reme.compact_tool_result(messages)

    # 4. Pre-reasoning hook (auto compact tool results + check context + generate summaries)
    processed_messages, compressed_summary = await reme.pre_reasoning_hook(
        messages=messages,
        system_prompt="You are a helpful AI assistant.",
        compressed_summary="",
        max_input_length=128000,
        compact_ratio=0.7,
        memory_compact_reserve=10000,
        enable_tool_result_compact=True,
        tool_result_compact_keep_n=3,
    )

    # 5. Persist important memory to files (writes to memory/YYYY-MM-DD.md)
    summary_result = await reme.summary_memory(
        messages=messages,
        language="zh",
    )

    # 6. Semantic memory search (vector + BM25 hybrid retrieval)
    result = await reme.memory_search(query="Python version preference", max_results=5)

    # 7. Create in-session memory instance (manages context for one conversation)
    memory = reme.get_in_memory_memory()  # Auto-configures dialog_path
    for msg in messages:
        await memory.add(msg)
    token_stats = await memory.estimate_tokens(max_input_length=128000)
    print(f"Current context usage: {token_stats['context_usage_ratio']:.1f}%")
    print(f"Message token count: {token_stats['messages_tokens']}")
    print(f"Estimated total tokens: {token_stats['estimated_tokens']}")

    # 8. Mark messages as compressed (auto-persists to dialog/YYYY-MM-DD.jsonl)
    # await memory.mark_messages_compressed(messages_to_compact)

    # Shutdown ReMeLight
    await reme.close()


if __name__ == "__main__":
    asyncio.run(main())

๐Ÿ“‚ Full example: test_reme_light.py ๐Ÿ“‹ Sample run log: test_reme_light_log.txt (223,838 tokens โ†’ 1,105 tokens, 99.5% compression)

Architecture of the file-based ReMeLight memory system

Context data structure

flowchart TD
    A[Context] --> B[compact_summary]
    B --> C[dialog path guide + Goal/Constraints/Progress/KeyDecisions/NextSteps]
    A --> E[messages: full dialogue history]
    A --> F[File System Cache]
    F --> G[dialog/YYYY-MM-DD.jsonl]
    F --> H[tool_result/uuid.txt N-day TTL]

MemoryManager inherits ReMeLight and integrates its memory capabilities into the agent reasoning loop:

graph LR
    Agent[Agent] -->|Before each reasoning step| Hook[pre_reasoning_hook]
    Hook --> TC[compact_tool_result<br>Compact tool outputs]
    TC --> CC[check_context<br>Token counting]
    CC -->|Exceeds limit| CM[compact_memory<br>Generate summary]
    CC -->|Exceeds limit| SM[summary_memory<br>Async persistence]
    SM -->|ReAct + FileIO| Files[memory/*.md]
    CC -->|Exceeds limit| MMC[mark_messages_compressed<br>Persist raw dialog]
    MMC --> Dialog[dialog/*.jsonl]
    Agent -->|Explicit call| Search[memory_search<br>Vector+BM25]
    Agent -->|In - session| InMem[ReMeInMemoryMemory<br>Token-aware memory]
    InMem -->|Compress/Clear| Dialog
    Files -.->|FileWatcher| Store[(FileStore<br>Vector+FTS index)]
    Search --> Store

1. check_context โ€” context checking

ContextChecker uses token counting to determine whether the context exceeds thresholds and automatically splits messages into a "to compact" group and a "to keep" group.

graph LR
    M[messages] --> H[AsMsgHandler<br>Token counting]
    H --> C{total > threshold?}
    C -->|No| K[Return all messages]
    C -->|Yes| S[Keep from tail<br>reserve tokens]
    S --> CP[messages_to_compact<br>Earlier messages]
    S --> KP[messages_to_keep<br>Recent messages]
    S --> V{is_valid<br>Tool calls aligned?}
  • Core logic: keep reserve tokens from the tail; mark the rest as messages to compact.
  • Integrity guarantee: preserves complete user-assistant turns and tool_use/tool_result pairs without splitting them.

2. compact_memory โ€” conversation compaction

Compactor uses a ReActAgent to compact conversation history into a * structured context summary*.

graph LR
    M[messages] --> H[AsMsgHandler<br>format_msgs_to_str]
    H --> A[ReActAgent<br>reme_compactor]
    P[previous_summary] -->|Incremental update| A
    A --> S[Structured summary<br>Goal/Progress/Decisions...]

Summary structure (context checkpoints):

FieldDescription
## GoalUser goals
## ConstraintsConstraints and preferences
## ProgressTask progress
## Key DecisionsKey decisions
## Next StepsNext step plans
## Critical ContextCritical data such as file paths, function names, error messages, etc.
  • Incremental updates: when previous_summary is provided, new conversations are merged into the existing summary.
  • Thinking enhancement: with add_thinking_block=True (default), a reasoning step is added before generating the summary to improve quality.

3. summary_memory โ€” persistent memory

Summarizer uses a ReAct + file tools pattern so that the AI can decide what to write and where to write it.

graph LR
    M[messages] --> A[ReActAgent<br>reme_summarizer]
    A -->|read| R[Read memory/YYYY-MM-DD.md]
    R --> T{Reason: how to merge?}
    T -->|write| W[Overwrite]
    T -->|edit| E[Edit in place]
    W --> F[memory/YYYY-MM-DD.md]
    E --> F

File tools (FileIO):

ToolFunction
readRead file content
writeOverwrite file
editFind-and-replace edit

4. compact_tool_result โ€” tool result compaction

ToolResultCompactor addresses the problem of long tool outputs bloating the context. It applies two different truncation strategies depending on whether a message falls within the recent_n window:

graph LR
    M[messages] --> B{Within recent_n?}
    B -->|Yes - recent| C[Low truncation recent_max_bytes=100KB<br>Save full content to tool_result/uuid.txt<br>Hint: 'Read from line N']
    B -->|No - old| D[High truncation old_max_bytes=3KB<br>Reference existing file<br>More aggressive truncation]
    C --> E[cleanup_expired_files<br>Delete expired files]
    D --> E
ParameterDefaultDescription
recent_n1Minimum number of trailing consecutive tool-result messages treated as "recent" (use low truncation)
recent_max_bytes100 * 1024 (100 KB)Truncation threshold for recent messages; content beyond this is saved to tool_result/ with a file path and start-line hint
old_max_bytes3000 (3 KB)Truncation threshold for older messages; truncation is more aggressive
retention_days3Number of days to retain tool result files; expired files are auto-cleaned
  • Auto cleanup: expired files (older than retention_days) are deleted automatically during start / close / compact_tool_result.

5. memory_search โ€” memory retrieval

MemorySearch provides vector + BM25 hybrid retrieval.

graph LR
    Q[query] --> E[Embedding<br>Vectorization]
    E --> V[vector_search<br>Semantic similarity]
    Q --> B[BM25<br>Keyword matching]
    V -->|" weight: 0.7 "| M[Deduplicate + weighted merge]
    B -->|" weight: 0.3 "| M
    M --> F[min_score filter]
    F --> R[Top-N results]
  • Fusion mechanism: vector weight 0.7 + BM25 weight 0.3 โ€” balancing semantic similarity and exact matches.

6. ReMeInMemoryMemory โ€” in-session memory

ReMeInMemoryMemory extends AgentScope's InMemoryMemory to provide token-aware memory management and raw conversation persistence.

graph LR
    C[content] --> G[get_memory<br>exclude_mark=COMPRESSED]
    G --> F[Filter out compressed messages]
    F --> P{prepend_summary?}
    P -->|Yes| S[Prepend previous summary]
    S --> O[Output messages]
    P -->|No| O
    M[mark_messages_compressed] --> D[Persist to dialog/YYYY-MM-DD.jsonl]
    D --> R[Remove from memory]
FunctionDescription
get_memoryFilter messages by mark and auto-append summary
estimate_tokensEstimate token usage of the context
state_dict / load_state_dictSerialize/deserialize state (session persistence)
mark_messages_compressedMark messages compressed and persist to dialog directory
clear_contentPersist all messages before clearing memory

Raw conversation persistence: When messages are compressed or cleared, they are automatically saved to {dialog_path}/{date}.jsonl with one JSON-formatted message per line.


7. pre_reasoning_hook โ€” pre-reasoning processing

This is a unified entry point that wires all the above components together and automatically manages context before each reasoning step.

graph LR
    M[messages] --> TC[compact_tool_result<br>Compact long tool outputs]
    TC --> CC[check_context<br>Compute remaining space]
    CC --> D{messages_to_compact<br>Non-empty?}
    D -->|No| K[Return original messages + summary]
    D -->|Yes| V{is_valid?}
    V -->|No| K
    V -->|Yes| CM[compact_memory<br>Sync summary generation]
    V -->|Yes| SM[add_async_summary_task<br>Async persistence]
    CM --> R[Return messages_to_keep + new summary]

Execution flow:

  1. compact_tool_result โ€” compact long tool outputs for all messages except the most recent tool_result_compact_keep_n.
  2. check_context โ€” check whether the context exceeds limits (remaining space = threshold minus tokens used by system prompt and compressed summary).
  3. compact_memory โ€” generate compact summary (sync), appended into compact_summary.
  4. summary_memory โ€” persist memory to memory/*.md (async in the background, non-blocking).
Key parameterDefaultDescription
tool_result_compact_keep_n3Skip tool result compaction for the most recent N messages (preserve full content)
memory_compact_reserve10000Token count to reserve for recent messages; messages beyond this trigger compaction
compact_ratio0.7Compaction threshold ratio: max_input_length ร— compact_ratio ร— 0.95

๐Ÿ—ƒ๏ธ Vector-based memory system

ReMe Vector Based is the core class for the vector-based memory system. It manages three types of memories:

Memory typeUse case
Personal memoryRecords user preferences and habits
Procedural memoryRecords task execution experience and patterns of success/failure
Tool memoryRecords tool usage experience and parameter tuning

Core capabilities

MethodFunctionDescription
summarize_memory๐Ÿง  SummarizeAutomatically extract and store memories from conversations
retrieve_memory๐Ÿ” RetrieveRetrieve related memories based on a query
add_memoryโž• AddManually add memories into the vector store
get_memory๐Ÿ“– GetGet a single memory by ID
update_memoryโœ๏ธ UpdateUpdate existing memory content or metadata
delete_memory๐Ÿ—‘๏ธ DeleteDelete a specific memory
list_memory๐Ÿ“‹ ListList memories with filtering and sorting

Installation and environment variables

Installation and environment configuration are the same as ReMeLight. API keys are configured via environment variables and can be stored in a .env file at the project root.

Python usage

import asyncio

from reme import ReMe


async def main():
    # Initialize ReMe
    reme = ReMe(
        working_dir=".reme",
        default_llm_config={
            "backend": "openai",
            "model_name": "qwen3.5-plus",
        },
        default_embedding_model_config={
            "backend": "openai",
            "model_name": "text-embedding-v4",
            "dimensions": 1024,
        },
        default_vector_store_config={
            "backend": "local",  # Supports local/chroma/qdrant/elasticsearch/obvec/zvec/hologres
        },
    )
    await reme.start()

    messages = [
        {"role": "user", "content": "Help me write a Python script", "time_created": "2026-02-28 10:00:00"},
        {"role": "assistant", "content": "Sure, I'll help you with that.", "time_created": "2026-02-28 10:00:05"},
    ]

    # 1. Summarize memories from conversation (automatically extract user preferences, task experience, etc.)
    result = await reme.summarize_memory(
        messages=messages,
        user_name="alice",  # Personal memory
        # task_name="code_writing",  # Procedural memory
    )
    print(f"Summary result: {result}")

    # 2. Retrieve related memories
    memories = await reme.retrieve_memory(
        query="Python programming",
        user_name="alice",
        # task_name="code_writing",
    )
    print(f"Retrieved memories: {memories}")

    # 3. Manually add a memory
    memory_node = await reme.add_memory(
        memory_content="The user prefers concise code style.",
        user_name="alice",
    )
    print(f"Added memory: {memory_node}")
    memory_id = memory_node.memory_id

    # 4. Get a single memory by ID
    fetched_memory = await reme.get_memory(memory_id=memory_id)
    print(f"Fetched memory: {fetched_memory}")

    # 5. Update memory content
    updated_memory = await reme.update_memory(
        memory_id=memory_id,
        user_name="alice",
        memory_content="The user prefers concise code with comments.",
    )
    print(f"Updated memory: {updated_memory}")

    # 6. List all memories for the user (supports filtering and sorting)
    all_memories = await reme.list_memory(
        user_name="alice",
        limit=10,
        sort_key="time_created",
        reverse=True,
    )
    print(f"User memory list: {all_memories}")

    # 7. Delete a specific memory
    await reme.delete_memory(memory_id=memory_id)
    print(f"Deleted memory: {memory_id}")

    # 8. Delete all memories (use with care)
    # await reme.delete_all()

    await reme.close()


if __name__ == "__main__":
    asyncio.run(main())

Technical architecture

graph LR
    User[User / Agent] --> ReMe[Vector Based ReMe]
    ReMe --> Summarize[Summarize memories]
    ReMe --> Retrieve[Retrieve memories]
    ReMe --> CRUD[CRUD operations]
    Summarize --> PersonalSum[PersonalSummarizer]
    Summarize --> ProceduralSum[ProceduralSummarizer]
    Summarize --> ToolSum[ToolSummarizer]
    Retrieve --> PersonalRet[PersonalRetriever]
    Retrieve --> ProceduralRet[ProceduralRetriever]
    Retrieve --> ToolRet[ToolRetriever]
    PersonalSum --> VectorStore[Vector database]
    ProceduralSum --> VectorStore
    ToolSum --> VectorStore
    PersonalRet --> VectorStore
    ProceduralRet --> VectorStore
    ToolRet --> VectorStore

Experimental results

Evaluations are conducted on two benchmarks: LoCoMo and HaluMem. Experimental settings:

  1. ReMe backbone: as specified in each table.
  2. Evaluation protocol: LLM-as-a-Judge following MemOS โ€” each answer is scored by GPT-4o-mini.

Baseline results are reproduced from their respective papers under aligned settings where possible.

LoCoMo

MethodSingle HopMulti HopTemporalOpen DomainOverall
MemoryOS62.4356.5037.1840.2854.70
Mem066.7158.1655.4540.6261.00
MemU72.7762.4133.9646.8861.15
MemOS81.4569.1572.2760.4275.87
HiMem89.2270.9274.7754.8680.71
Zep88.1171.9974.4566.6781.06
TiMem81.4362.2077.6352.0875.30
TSM84.3066.6771.0358.3376.69
MemR389.4471.3976.2261.1181.55
ReMe89.8982.9883.8071.8886.23

HaluMem

MethodMemory IntegrityMemory AccuracyQA Accuracy
MemoBase14.5592.2435.53
Supermemory41.5390.3254.07
Mem042.9186.2653.02
ProMem73.8089.4762.26
ReMe67.7294.0688.78

๐Ÿงช Procedural memory paper

Our procedural (task) memory paper is available on arXiv.

๐ŸŒ Appworld benchmark

We evaluate ReMe on the Appworld environment using Qwen3-8B (non-thinking mode):

MethodAvg@4Pass@4
w/o ReMe0.14970.3285
w/ ReMe0.1706 (+2.09%)0.3631 (+3.46%)

Pass@K measures the probability that at least one of K generated candidates successfully completes the task (score=1). The current experiments use an internal AppWorld environment, which may differ slightly from the public version.

For more details on how to reproduce the experiments, see quickstart.md.

๐Ÿ”ง BFCL-V3 benchmark

We evaluate ReMe on the BFCL-V3 multi-turn-base task (random split 50 train / 150 val) using Qwen3-8B (thinking mode):

MethodAvg@4Pass@4
w/o ReMe0.40330.5955
w/ ReMe0.4450 (+4.17%)0.6577 (+6.22%)

For more details on how to reproduce the experiments, see quickstart.md.

โญ Community & support

  • Star & Watch: Starring helps more agent developers discover ReMe; Watching keeps you up to date with new releases and features.
  • Share your results: Share how ReMe empowers your agents in Issues or Discussions โ€” we are happy to showcase great community use cases.
  • Need a new feature? Open a feature request; weโ€™ll evolve ReMe together with the community.
  • Code contributions: All forms of contributions are welcome. Please see the contribution guide.
  • Acknowledgements: We thank excellent open-source projects such as OpenClaw, Mem0, MemU, and QwenPaw for their inspiration and support.

Contributors

Thanks to all who have contributed to ReMe:

Contributors

๐Ÿ“„ Citation

@software{AgentscopeReMe2025,
  title = {AgentscopeReMe: Memory Management Kit for Agents},
  author = {ReMe Team},
  url = {https://reme.agentscope.io},
  year = {2025}
}

โš–๏ธ License

This project is open-sourced under the Apache License 2.0. See LICENSE for details.


๐Ÿค” Why ReMe?

ReMe stands for Remember Me and Refine Me, symbolizing our goal to help AI agents "remember" users and "refine" themselves through interactions. We hope ReMe is not just a cold memory module, but a partner that truly helps agents understand users, accumulate experience, and continuously evolve.


๐Ÿ“ˆ Star history

Star History Chart