Chapter 8: Production Deployment and Advanced Patterns

April 13, 2026 · View on GitHub

What Problem Does This Solve?

The make dev local setup is designed for a single developer with direct host access. Production deployments face different requirements:

Multi-user access: multiple users submitting research tasks concurrently
State persistence: thread history and memory must survive service restarts
Security: untrusted user inputs must not escape the sandbox; the API must require authentication
Observability: you need to know when agent runs fail, which LLM calls are expensive, and where latency is concentrated
Resource control: long-running research jobs must not starve other users
Reliability: the system must recover from sandbox container crashes, LLM API rate limits, and network failures

This chapter covers each of these production concerns with concrete configuration examples.

Profile	CPU	RAM	Disk	Use Case
Local eval	4 vCPU	8 GB	20 GB SSD	Single developer testing
Docker dev	4 vCPU	8 GB	25 GB SSD	Team dev environment
Production server	8–16 vCPU	16–32 GB	40+ GB SSD	Multi-user production

DeerFlow's LLMErrorHandlingMiddleware handles rate limit responses from LLM providers with exponential backoff. No additional configuration is needed, but you should monitor LangSmith/Langfuse for runs that are slow due to rate limit retries.

Sandbox Container Crashes:

If a Docker sandbox container crashes mid-execution:

SandboxMiddleware detects the missing container ID in ThreadState.sandbox
A new container is provisioned for the thread on the next invocation
The agent resumes from the last checkpoint (previous messages are intact)
Files in the workspace directory persist (they are on the shared volume, not inside the container)

LangGraph Server Restart:

With Postgres checkpointer:

All thread states are persisted and immediately available after restart
In-flight runs at the time of restart are marked as failed
Users can resubmit their last message to resume from the last saved checkpoint

Disk Full:

Generated artifacts (MP3s, PDFs, slides) accumulate on the shared volume. Implement periodic cleanup:

# scripts/cleanup_old_threads.py
import os
import shutil
from datetime import datetime, timedelta
from pathlib import Path

OUTPUTS_DIR = Path("/mnt/user-data/outputs")
RETENTION_DAYS = 30

def cleanup_old_outputs():
    cutoff = datetime.now() - timedelta(days=RETENTION_DAYS)
    
    for thread_dir in OUTPUTS_DIR.iterdir():
        if thread_dir.is_dir():
            mtime = datetime.fromtimestamp(thread_dir.stat().st_mtime)
            if mtime < cutoff:
                shutil.rmtree(thread_dir)
                print(f"Deleted: {thread_dir}")

if __name__ == "__main__":
    cleanup_old_outputs()

Advanced Pattern: Agent Guardrails

For deployments where you need to restrict what the agent can do (e.g., corporate environments):

# backend/docs/GUARDRAILS.md describes this pattern

# Guardrails are implemented as middleware that intercepts tool calls
class ContentGuardrailMiddleware(BaseMiddleware):
    """Block tool calls to specific domains or with specific patterns."""
    
    BLOCKED_DOMAINS = {"competitor.com", "internal.company.com"}
    
    async def before_tool_call(
        self,
        tool_name: str,
        tool_input: dict,
        state: ThreadState,
    ) -> dict | None:
        """Return None to allow, return error dict to block."""
        
        if tool_name in ("web_search", "web_fetch"):
            url = tool_input.get("url", "")
            query = tool_input.get("query", "")
            
            for domain in self.BLOCKED_DOMAINS:
                if domain in url or domain in query:
                    return {"error": f"Access to {domain} is restricted by policy."}
        
        return None  # Allow the tool call

Cost Management

LLM API costs scale with:

Number of concurrent users
Research depth (sub-agent count and search iterations)
Model selection (o3 vs. gpt-4o-mini cost ratio can be 100x)

Production cost controls:

# config.yaml — cost-efficient defaults with premium model available
models:
  - name: gpt-4o-mini
    display_name: GPT-4o Mini (Default)
    use: langchain_openai:ChatOpenAI
    model: gpt-4o-mini
    api_key: $OPENAI_API_KEY

  - name: o3-mini
    display_name: o3 Mini (Deep Research)
    use: langchain_openai:ChatOpenAI
    model: o3-mini
    api_key: $OPENAI_API_KEY
    supports_thinking: true

# Per-agent config: limit sub-agent parallelism to control costs
# workspace/agents/lead_agent/config.yaml
subagent:
  max_concurrent: 2   # Reduce from default 3 to limit parallel LLM calls

Summary

Production DeerFlow deployment requires:

Postgres checkpointer for persistent thread state
AioSandboxProvider (Docker) for isolated code execution
Authentication at Nginx or via better-auth
LangSmith or Langfuse for observability
Shared filesystem volume for artifacts persistence across service restarts
Resource sizing based on concurrent user count and research depth
Periodic cleanup for accumulated artifacts
Network isolation preventing direct exposure of internal service ports

The gateway mode (experimental) simplifies the architecture by embedding the agent runtime in the FastAPI server, reducing the process count and removing the LangGraph Platform dependency.

Tutorial Complete

You have now covered the full DeerFlow system:

Chapter 1: Installation, configuration, and first research query
Chapter 2: LangGraph state machine, 14-stage middleware pipeline, async checkpointing
Chapter 3: Research pipeline — CLARIFY → PLAN → ACT, deep research skill, citations
Chapter 4: RAG and search tools — DuckDuckGo, Tavily, Exa, Firecrawl, sandbox REPL, MCP
Chapter 5: Three-service architecture, SSE streaming, Gateway API, IM channels
Chapter 6: Skills system, custom tools, MCP servers, per-agent config overrides
Chapter 7: Podcast generation, PowerPoint, charts, image and video generation
Chapter 8: Production deployment, Postgres checkpointer, security, observability, cost management

Chapter 8: Production Deployment and Advanced Patterns

What Problem Does This Solve?

How it Works Under the Hood

Production Architecture

Deployment Sizing

Production Docker Compose

Postgres Checkpointer

Gateway Mode: Eliminating LangGraph Platform Dependency

LangSmith Observability

Langfuse: Open-Source Alternative

Security Hardening

Health Checks and Monitoring

Recovery Patterns

Advanced Pattern: Agent Guardrails

Cost Management

Summary

Tutorial Complete

Further Resources

Chapter Connections