Chapter 1: Getting Started with DeerFlow
April 13, 2026 · View on GitHub
What Problem Does This Solve?
Most teams assume DeerFlow is something it is not. The repository name sounds like a workflow scheduler. The old documentation described it as a distributed task execution platform. Neither description is accurate.
DeerFlow is an open-source super agent harness — a runtime that orchestrates a lead LLM agent to conduct deep research, write and execute code, spawn parallel sub-agents, load modular skills, and deliver structured outputs (reports, podcasts, slides) via a chat interface. It was created by ByteDance and is conceptually similar to Google's Gemini Deep Research product, but open-source and extensible.
The practical problem it solves: when you need an AI assistant that can autonomously browse the web from multiple angles, write and run Python to analyze data, synthesize a structured report with citations, and optionally produce audio or slide outputs — all from a single chat message — DeerFlow provides the production-ready runtime to do it.
This chapter gets you from zero to your first research output in under 30 minutes.
How it Works Under the Hood
Before touching any configuration, it helps to understand the three-service architecture you are about to start.
graph LR
subgraph "Your Browser / IM Client"
A[Chat UI :2026]
end
subgraph "Nginx Proxy :2026"
B[Entry Point]
end
subgraph "Three Core Services"
C[LangGraph Server :2024<br/>Agent runtime, SSE streaming,<br/>thread state management]
D[Gateway API :8001<br/>FastAPI REST endpoints<br/>models, skills, MCP, uploads]
E[Next.js Frontend :3000<br/>Chat interface, streaming<br/>message rendering]
end
subgraph "Execution Backend"
F[LocalSandboxProvider<br/>dev mode]
G[AioSandboxProvider<br/>Docker containers prod]
end
A --> B
B --> C
B --> D
B --> E
C --> F
C --> G
Every user message flows through Nginx to the LangGraph server. The LangGraph server runs the compiled lead_agent graph, which invokes the LLM, dispatches tools, streams tokens back to the frontend via SSE, and persists state via an async checkpointer.
The Gateway API handles everything that is not agent execution: listing available models, managing MCP server configuration, serving generated artifacts, handling file uploads, and managing memory records.
Prerequisites
| Requirement | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8 cores |
| RAM | 8 GB | 16 GB |
| Disk | 20 GB | 40 GB |
| Docker | 24+ | latest |
| Python | 3.12+ | 3.12+ |
| Node.js | 22+ | 22+ |
You also need at least one LLM provider API key. DeerFlow works with any OpenAI-compatible endpoint including:
- OpenAI (
gpt-4o,o1,o3) - Anthropic Claude (via OpenAI-compatible proxy or direct LangChain integration)
- DeepSeek
- Novita AI
- vLLM self-hosted endpoints
- OpenRouter (routing to Gemini, Llama, etc.)
Installation
Step 1: Clone the Repository
git clone https://github.com/bytedance/deer-flow.git
cd deer-flow
Step 2: Run the Interactive Setup Wizard
The wizard creates config.yaml and .env from your answers:
make setup
The wizard prompts for:
- LLM provider and model selection
- API key (saved to
.env, referenced as$OPENAI_API_KEYinconfig.yaml) - Web search provider (DuckDuckGo — free, no key; Tavily — better quality, requires key)
- Sandbox execution mode (local for dev, Docker for production)
After make setup, verify the generated files:
# config.yaml should exist at the project root (NOT in backend/)
ls -la config.yaml
# .env should contain your API key
cat .env
Step 3: Validate with the Doctor Script
make doctor
The doctor script checks:
config.yamlloads without parse errors- The selected LLM model is reachable (test API call)
- Web search tool (if configured) is responsive
- Docker is available if sandbox mode is set to Docker
Step 4: Choose Your Startup Mode
Docker (recommended for production and first-time setup):
make docker-init # Pull images, create network, initialize volumes
make docker-start # Start all three services + Nginx
Local development (faster iteration, no Docker for services):
make install # pip install + npm install
make dev # Concurrently starts LangGraph server, Gateway, and Next.js
Both modes expose the UI at http://localhost:2026.
Configuration Deep Dive
The config.yaml file controls every significant behavior of DeerFlow. Understanding its structure is essential for customization.
# config.yaml — canonical location: project root (deer-flow/config.yaml)
config_version: 3
models:
# Each entry defines an LLM available to the agent
- name: gpt-4o
display_name: GPT-4o
use: langchain_openai:ChatOpenAI
model: gpt-4o
api_key: $OPENAI_API_KEY
max_tokens: 16384
supports_thinking: false
supports_vision: true
# Example: OpenRouter routing to Gemini
- name: gemini-flash
display_name: Gemini 2.5 Flash
use: langchain_openai:ChatOpenAI
model: google/gemini-2.5-flash-preview
base_url: https://openrouter.ai/api/v1
api_key: $OPENROUTER_API_KEY
tools:
# Web search configuration
- name: web_search
group: web
use: deerflow.community.ddg_search:web_search_tool
max_results: 5
# File operations
- name: read_file
group: file:read
use: deerflow.tools.file:read_file_tool
- name: bash
group: bash
use: deerflow.tools.bash:bash_tool
sandbox:
# For development: direct host execution
use: deerflow.sandbox.local:LocalSandboxProvider
allow_host_bash: true
# For production: Docker-isolated execution
# use: deerflow.community.aio_sandbox:AioSandboxProvider
# auto_start: true
# port: 8080
skills:
path: ../skills # Host path to skills directory
container_path: /mnt/skills
The configuration path is resolved in this priority order:
DEER_FLOW_CONFIG_PATHenvironment variablebackend/config.yamldeer-flow/config.yaml(project root — the standard location)
Environment Variables
# .env (git-ignored, do not commit)
OPENAI_API_KEY=sk-...
TAVILY_API_KEY=tvly-... # optional, better search quality
ANTHROPIC_API_KEY=sk-ant-... # if using Claude directly
LANGCHAIN_API_KEY=ls__... # optional, for LangSmith tracing
LANGCHAIN_TRACING_V2=true # enable LangSmith
Your First Research Query
With DeerFlow running at http://localhost:2026, open the chat interface and type:
What are the key architectural differences between LangGraph and AutoGen for building multi-agent systems? Include recent benchmarks and community adoption data.
What you will observe:
- Thinking phase — The lead agent parses the question, identifies sub-topics, and may ask a clarifying question via
ask_clarificationif the request is ambiguous - Research phase — Multiple web searches fire in sequence (or in parallel via sub-agents), each fetching and reading full page content
- Synthesis phase — The agent accumulates search results in its context window
- Report phase — A structured markdown report streams to the UI with inline citations in the format
[citation:Title](URL)
The entire interaction is a single LangGraph thread, persisted in the checkpointer. You can resume it later or branch into a new query.
sequenceDiagram
participant U as User
participant UI as Next.js :3000
participant LG as LangGraph Server :2024
participant LLM as LLM Provider
participant WS as Web Search Tool
participant SB as Sandbox
U->>UI: Submit research query
UI->>LG: POST /threads/{id}/runs (SSE)
LG->>LLM: Lead agent invoke (system prompt + query)
LLM-->>LG: Tool call: web_search("LangGraph architecture")
LG->>WS: Execute search
WS-->>LG: Search results JSON
LG->>LLM: Continue with results
LLM-->>LG: Tool call: web_search("AutoGen benchmarks 2025")
LG->>WS: Execute search
WS-->>LG: Search results JSON
LG->>LLM: Continue with all results
LLM-->>LG: Final report (streaming tokens)
LG-->>UI: SSE token stream
UI-->>U: Rendered markdown report
Understanding Thread State
Every conversation is a thread. DeerFlow extends LangGraph's AgentState with ThreadState:
# backend/packages/harness/deerflow/agents/thread_state.py
class ThreadState(AgentState):
sandbox: SandboxState | None # Docker container ID for this thread
thread_data: ThreadDataState | None # Workspace/uploads/outputs paths
title: str | None # Auto-generated conversation title
artifacts: list[str] # Paths of generated outputs (reports, MP3s, slides)
todos: list | None # Task list (plan mode)
uploaded_files: list[dict] | None # Files attached by the user
viewed_images: dict[str, ViewedImageData] # Images the agent has processed
This state is checkpointed asynchronously after every step, meaning you can pause a long research run and resume it exactly where it left off.
Troubleshooting Common Setup Issues
"config.yaml not found"
The backend looks for config.yaml in the project root (deer-flow/), not in backend/. The most common mistake is placing it in backend/config.yaml. Either move it or set:
export DEER_FLOW_CONFIG_PATH=/absolute/path/to/deer-flow/config.yaml
LLM Model Not Responding
Run make doctor — it tests the model connection. If it fails, verify:
- The API key in
.envis correct and not expired - The
base_urlinconfig.yamlmatches the provider's endpoint - The model name matches exactly (case-sensitive)
Docker Sandbox Startup Timeout
The AioSandbox container image is ~500 MB. Pull it explicitly before the first run:
make setup-sandbox
Port 2026 Already in Use
# Find the process using the port
lsof -i :2026
# Stop DeerFlow services
make docker-stop
# Or kill the specific process and restart
Frontend Shows "Cannot connect to agent"
The frontend at :3000 proxies agent requests through Nginx at :2026 to the LangGraph server at :2024. If the LangGraph server is not running, no agent calls will succeed. Check:
make docker-logs # Docker mode
# or
ps aux | grep langgraph # Local mode
Key Concepts Recap
| Concept | What It Actually Is |
|---|---|
| "Workflow" | A LangGraph thread — a sequence of LLM invocations + tool calls |
| "Task" | A sub-agent invocation spawned via task_tool |
| "Worker" | A Docker sandbox container executing Python/bash code |
| "State" | The LangGraph ThreadState persisted by the checkpointer |
| "Skill" | A Markdown file loaded into the agent's context to guide behavior |
| "Config" | config.yaml at project root — LLM providers, tools, sandbox mode |
What's Next?
With DeerFlow running and your first research query completed, Chapter 2 dives into the LangGraph state machine that powers everything: how the lead_agent graph is compiled, how the 14-stage middleware pipeline wraps every invocation, and how async checkpointing enables long-running multi-turn research sessions.