memU-server: Local Backend Service for AI Memory System
March 19, 2026 Β· View on GitHub
memU-server is the backend management service for MemU, responsible for providing API endpoints, data storage, and management capabilities, as well as deep integration with the core memU framework. It powers the frontend memU-ui with reliable data support, ensuring efficient reading, writing, and maintenance of Agent memories. memU-server can be deployed locally or in private environments and supports quick startup and configuration via Docker, enabling developers to manage the AI memory system in a secure environment.
- Core Algorithm π memU: https://github.com/NevaMind-AI/memU
- One call = response + memory π memU Response API: https://memu.pro/docs#responseapi
- Try it instantly π https://app.memu.so/quick-start
β Star Us on GitHub
Star memU-server to get notified about new releases and join our growing community of AI developers building intelligent agents with persistent memory capabilities. π¬ Join our Discord community: https://discord.gg/memu
ποΈ Architecture
memU-server runs as two cooperating processes backed by shared infrastructure:
βββββββββββββββββββββββββββββββββββββββ
Client ββHTTPβββΊ β FastAPI API Server (port 8000) β
β POST /memorize β start workflow β
β GET /memorize/status/{task_id} β
β POST /retrieve, /clear, /categoriesβ
ββββββββββββββββ¬βββββββββββββββββββββββ
β gRPC
ββββββββββββββββΌβββββββββββββββββββββββ
β Temporal Server (port 7233) β
ββββββββββββββββ¬βββββββββββββββββββββββ
β poll
ββββββββββββββββΌβββββββββββββββββββββββ
β Temporal Worker Process β
β MemorizeWorkflow β task_memorize β
β (calls memu-py MemoryService) β
ββββββββββββββββ¬βββββββββββββββββββββββ
β SQL
ββββββββββββββββΌβββββββββββββββββββββββ
β PostgreSQL + pgvector (port 5432) β
β app db: memu | temporal db: temporalβ
βββββββββββββββββββββββββββββββββββββββ
| Component | Technology | Purpose |
|---|---|---|
| API Server | FastAPI 0.122+ / Python 3.13 | HTTP endpoints, request validation, workflow dispatch |
| Workflow Engine | Temporal 1.25 / temporalio SDK 1.16 | Durable async task orchestration for /memorize |
| Worker | Temporal Worker (same codebase) | Executes MemorizeWorkflow β task_memorize activity |
| Database | PostgreSQL 16 + pgvector | Vector storage for memories, Temporal persistence |
| Memory Core | memu-py 1.2+ | Three-layer memory algorithm (Resource β Item β Category) |
How /memorize Works (Async)
- Client POSTs conversation payload to
/memorize. - API server saves conversation to local storage, starts a Temporal workflow, and returns immediately with a
task_id. - Temporal dispatches the
MemorizeWorkflowto the worker process. - The worker executes
task_memorizeactivity (calls memu-pyMemoryService.memorize()), writing results to PostgreSQL. - Client polls
GET /memorize/status/{task_id}to track progress (RUNNINGβCOMPLETED/FAILED).
π Get Started
Prerequisites
- Python 3.13+ and uv package manager
- Docker & Docker Compose (for infrastructure services)
- OpenAI API key (required for LLM and embedding operations)
1. Start Infrastructure
Launch PostgreSQL (with pgvector), Temporal Server, and Temporal UI:
docker compose up -d
| Service | Port | Description |
|---|---|---|
| PostgreSQL | 5432 | Database with pgvector extension |
| Temporal | 7233 | Workflow engine gRPC API |
| Temporal UI | 8088 | Web management interface |
2. Install & Run from Source
# Clone the repository
git clone https://github.com/NevaMind-AI/memU-server.git
cd memU-server
# Install dependencies
make install
# or: uv sync
# Configure environment (create .env or export)
export OPENAI_API_KEY=your_api_key_here
# Start the API server (terminal 1)
make run
# or: uv run fastapi dev
# Start the Temporal worker (terminal 2)
uv run python -m app.workers.worker
The API server runs on http://127.0.0.1:8000.
3. Run with Docker
export OPENAI_API_KEY=your_api_key_here
# Pull and run the API server
docker pull nevamindai/memu-server:latest
docker run --rm -p 8000:8000 \
--network memu-network \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-e POSTGRES_HOST=postgres \
-e TEMPORAL_HOST=temporal \
nevamindai/memu-server:latest
Note: Both the API server and Temporal worker share the same Docker image. Override the entrypoint to run the worker:
docker run --rm \ --network memu-network \ -e OPENAI_API_KEY=$OPENAI_API_KEY \ -e POSTGRES_HOST=postgres \ -e TEMPORAL_HOST=temporal \ nevamindai/memu-server:latest \ uv run python -m app.workers.worker
Environment Variables
The memU-server API and worker processes load their configuration from environment variables or an .env file. Key application-level variables:
Docker Compose may define additional infrastructure-specific environment variables (for example,
TEMPORAL_DB); refer todocker-compose.ymlfor the complete list used by the containers.
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY | (required) | OpenAI API key |
OPENAI_BASE_URL | https://api.openai.com/v1 | OpenAI-compatible API base URL |
DEFAULT_LLM_MODEL | gpt-4o-mini | Chat model for memorization |
EMBEDDING_API_KEY | Falls back to OPENAI_API_KEY | Embedding provider API key |
EMBEDDING_BASE_URL | https://api.voyageai.com/v1 | Embedding API base URL |
EMBEDDING_MODEL | voyage-3.5-lite | Embedding model name |
POSTGRES_USER | postgres | PostgreSQL user |
POSTGRES_PASSWORD | postgres | PostgreSQL password |
POSTGRES_HOST | localhost | PostgreSQL host |
POSTGRES_PORT | 5432 | PostgreSQL port |
POSTGRES_DB | memu | Application database name |
DATABASE_URL | (auto-assembled) | Full DSN (overrides individual PG vars) |
TEMPORAL_HOST | localhost | Temporal server host |
TEMPORAL_PORT | 7233 | Temporal server gRPC port |
TEMPORAL_NAMESPACE | default | Temporal namespace |
STORAGE_PATH | ./data/storage | Local directory for conversation files |
Makefile Commands
make install # Install dependencies & pre-commit hooks
make run # Start FastAPI dev server
make check # Lint + type check + dependency check (CI)
make test # Run tests with coverage
make clean # Clean __pycache__, .pyc, build artifacts
make docker-up # Start Docker infrastructure services
make docker-down # Stop Docker infrastructure services
π‘ API Endpoints
GET / β Health Check
curl http://localhost:8000/
Response: {"message": "Hello MemU user!"}
POST /memorize β Submit Async Memorization Task
Saves conversation data and starts an async Temporal workflow. Returns immediately with a task_id for status polling.
Request:
{
"conversation": [
{"role": "user", "content": {"text": "I prefer dark mode"}, "created_at": "2025-03-20 10:00:00"},
{"role": "assistant", "content": {"text": "Noted!"}, "created_at": "2025-03-20 10:00:01"}
],
"user_id": "user-001",
"agent_id": "agent-001",
"override_config": null
}
Response:
{
"status": "success",
"result": {
"task_id": "memorize-a1b2c3d4e5f60718293a4b5c6d7e8f90",
"status": "PENDING",
"message": "Memorization task submitted for user user-001"
}
}
GET /memorize/status/{task_id} β Poll Task Status
Track a memorization task. The task_id must match the format memorize-<32 hex chars> (as returned by POST /memorize).
curl http://localhost:8000/memorize/status/memorize-a1b2c3d4e5f60718293a4b5c6d7e8f90
Response:
{
"status": "success",
"result": {
"task_id": "memorize-a1b2c3d4e5f60718293a4b5c6d7e8f90",
"status": "COMPLETED",
"detail": "SUCCESS"
}
}
Status values: RUNNING, COMPLETED, FAILED, CANCELED, TERMINATED, UNKNOWN.
POST /retrieve β Query Stored Memories
{"query": "What are the user's UI preferences?"}
Response:
{
"status": "success",
"result": { ... }
}
POST /clear β Clear Memories
Delete memories for a specific user and/or agent. At least one of user_id or agent_id must be provided.
{"user_id": "user-001", "agent_id": "agent-001"}
Response:
{
"status": "success",
"result": {
"purged_categories": 3,
"purged_items": 15,
"purged_resources": 2
}
}
POST /categories β List Memory Categories
List all memory categories for a user.
{"user_id": "user-001"}
Response:
{
"status": "success",
"result": {
"categories": [
{
"name": "UI Preferences",
"description": "User interface preferences",
"user_id": "user-001",
"agent_id": "agent-001",
"summary": "User prefers dark mode..."
}
]
}
}
π Integration Guide
Python
import httpx
BASE = "http://localhost:8000"
# Memorize a conversation
resp = httpx.post(f"{BASE}/memorize", json={
"conversation": [
{"role": "user", "content": {"text": "I like Python"}, "created_at": "2025-03-20 10:00:00"},
{"role": "assistant", "content": {"text": "Great choice!"}, "created_at": "2025-03-20 10:00:01"},
],
"user_id": "user-001",
})
task_id = resp.json()["result"]["task_id"]
# Poll until complete
import time
while True:
status = httpx.get(f"{BASE}/memorize/status/{task_id}").json()
if status["result"]["status"] in ("COMPLETED", "FAILED"):
break
time.sleep(2)
# Retrieve memories
result = httpx.post(f"{BASE}/retrieve", json={"query": "What languages does the user like?"})
print(result.json())
cURL
# Submit memorization
curl -X POST http://localhost:8000/memorize \
-H "Content-Type: application/json" \
-d '{"conversation": [{"role":"user","content":{"text":"hello"},"created_at":"2025-01-01 00:00:00"}], "user_id":"u1"}'
# Check status (use the task_id returned by POST /memorize)
curl http://localhost:8000/memorize/status/<task_id>
# Retrieve
curl -X POST http://localhost:8000/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "user preferences"}'
# List categories
curl -X POST http://localhost:8000/categories \
-H "Content-Type: application/json" \
-d '{"user_id": "u1"}'
# Clear memories
curl -X POST http://localhost:8000/clear \
-H "Content-Type: application/json" \
-d '{"user_id": "u1"}'
π Key Features
Async Memorization with Temporal
- Non-blocking
/memorizeendpoint returns immediately with a task ID - Durable workflow execution β tasks survive server restarts
- Status tracking via
/memorize/status/{task_id} - 10-minute activity timeout with automatic retry support
Quick Deployment
- Docker image for both API server and worker
- Docker Compose for infrastructure (PostgreSQL + Temporal)
- Single
make install && make runto start development
Comprehensive Memory Management
- Memorize: Async conversation ingestion via Temporal workflows
- Retrieve: Semantic search over stored memories (RAG-based or LLM-based)
- Clear: Targeted memory deletion by user/agent
- Categories: Browse and manage memory categories
π§© Why MemU?
Most memory systems in current LLM pipelines rely heavily on explicit modeling, requiring manual definition and annotation of memory categories. This limits AIβs ability to truly understand memory and makes it difficult to support diverse usage scenarios.
MemU offers a flexible and robust alternative, inspired by hierarchical storage architecture in computer systems. It progressively transforms heterogeneous input data into queryable and interpretable textual memory.
Its core architecture consists of three layers: Resource Layer β Memory Item Layer β MemoryCategory Layer.
- Resource Layer: Multimodal raw data warehouse
- Memory Item Layer: Discrete extracted memory units
- MemoryCategory Layer: Aggregated textual memory units
Key Features:
- Full Traceability: Track from raw data β items β documents and back
- Memory Lifecycle: Memorization β Retrieval β Self-evolution
- Two Retrieval Methods:
- RAG-based: Fast embedding vector search
- LLM-based: Direct file reading with deep semantic understanding
- Self-Evolving: Adapts memory structure based on usage patterns
π License
By contributing to memU-server, you agree that your contributions will be licensed under the AGPL-3.0 License.
π Community
For more information please contact info@nevamind.ai
- GitHub Issues: Report bugs, request features, and track development. Submit an issue
- Discord: Get real-time support, chat with the community, and stay updated. Join us
- X (Twitter): Follow for updates, AI insights, and key announcements. Follow us