README.md
May 20, 2026 ยท View on GitHub
Desktop AI Agent Framework ยท Multi-tentacle Collaboration
๐ Like an octopus, handle multiple things at once ๐
๐ฌ Demo Videos
https://github.com/user-attachments/assets/29a64b38-3f98-4cbc-99d4-662f55cbec74
Text-to-Speech Demo
https://github.com/user-attachments/assets/ef0af274-e988-436f-a7da-a007e1a814ee
WeChat Channel Demo
https://github.com/user-attachments/assets/1de4e3d3-3397-46f8-a6b5-8f9dfef2b580
โจ Core Features
|
๐ One-Click Deploy No server, no YAML โก Double-click to install ๐ Embedded Python env ๐พ Portable USB mode ๐ Data stays local |
๐ฐ Cost Transparency Know what you spend ๐ Real-time token counter ๐ Visual cost charts โ ๏ธ Budget alerts ๐ Model cost compare |
๐งฉ Markdown Skills Extend without coding ๐ Write |
๐ Visual Workflow Build AI pipelines ๐จ Drag-and-drop editor ๐งฉ 24 node types ๐ Version management ๐ Run trace & debug |
|
๐ค Visual SubAgent Create AI workers ๐จ GUI agent creator ๐ Isolated workspaces ๐ฏ Auto task dispatch ๐ง Own config & memory |
๐ Knowledge Base Your second brain ๐ Multi-format documents ๐ Markdown notes ๐ธ๏ธ Knowledge graph ๐ง AI-powered distillation |
๐ก Multi-Channel Chat everywhere ๐ฌ Desktop / WeChat ๐ฆ Slack / Discord โ๏ธ Telegram / DingTalk ๐ง Email / Webhook |
โฐ Smart Tasks Actually run tasks โถ๏ธ SubAgent execution ๐ Cron/interval/once ๐ช Survive restarts ๐ฌ Access context |
|
๐๏ธ Project Isolation Separate workspaces โ๏ธ Per-project config ๐ Switch instantly ๐ฅ Export for team ๐ฌ Never lose history |
๐ Text-to-Speech Voice your AI ๐ฃ๏ธ Multiple TTS engines ๐ต Natural voice output โ๏ธ Customizable settings ๐ฑ Real-time playback |
๐ง Observation & Memory Learn from experience ๐ 9 observation types ๐ Auto-extract insights ๐พ Promote to memory ๐ค User profile tracking |
๐ Multi-format Files Read any document ๐ PDF / DOCX / XLSX ๐ PPTX preview ๐ผ๏ธ Image understanding ๐๏ธ Context compression |
๐ Visual Workflow
Build complex AI pipelines with a drag-and-drop editor powered by ReactFlow:
Node Types (24 kinds)
| Category | Nodes |
|---|---|
| Flow | Workflow Start, Answer, Workflow End |
| AI | LLM, Question Classifier, Content Extractor |
| Tool | HTTP Request, Code Execution, Read Files, JSON Serialize/Deserialize, Text Editor |
| Logic | Condition Branch, Variable Update, Loop, Parallel Execution |
| Interaction | User Select, Form Input, Input, Plugin Output |
| Agent | Agent Node, Sub-Workflow |
Key Capabilities
- Visual Editor: Drag-and-drop canvas with auto-layout
- Node Testing: Test individual nodes in isolation before running the full workflow
- Version Management: Save, compare, and restore workflow versions
- Run Tracing: Step-by-step execution trace with variable inspection
- Loop Support: Nested loop nodes with dedicated inner canvas
- Templates: Pre-built templates for common patterns (simple chat, conditional branch)
- Auto-save: 5-second debounce with dirty state indicator
๐ Knowledge Base
A complete knowledge management system with AI-powered capabilities:
Documents
- Multi-format upload: PDF, DOCX, XLSX, PPTX, images, and more
- Chunked upload: Files >2MB automatically split into 2MB chunks (max 500MB)
- AI Distillation: Extract key insights from documents using AI
- Batch operations: Batch distill, move, and manage documents
- Preview: In-app preview for all supported formats
- Import/Export: ZIP-based import/export, Obsidian vault import
Notes
- Markdown editor: Full-featured editor with wiki-link navigation
- Vault system: Create and manage multiple knowledge vaults
- Obsidian compatible: Import existing Obsidian vaults
Knowledge Graph
- Visual exploration: WebGL-powered graph visualization (PixiJS)
- Force-directed layout: Interactive node positioning
- Relationship mapping: Discover connections between knowledge nodes
๐ก Multi-Channel Support
Connect Octopus to your favorite platforms:
| Channel | Features |
|---|---|
| ๐ฅ๏ธ Desktop | Full-featured native app with WebSocket real-time |
| ๐ฌ WeChat | QR code login, send/receive, auto-reply |
| ๐ฆ Slack | Bolt SDK integration, channel & DM support |
| ๐ฎ Discord | Bot integration, server & channel messaging |
| โ๏ธ Telegram | Bot API, chat & group support |
| ๐ฑ DingTalk | Stream protocol, conversation messaging |
| ๐ง Email | SMTP/IMAP integration |
| ๐ Webhook | Generic HTTP webhook for custom integrations |
| ๐ฆ Feishu | Lark SDK, event subscription |
๐ Extension Ecosystem
Skill Extensions (Just Markdown)
Write a SKILL.md file to teach AI new capabilities:
---
name: "Code Review"
emoji: "๐"
---
When reviewing code, check for:
1. Security issues (SQL injection, XSS)
2. Performance bottlenecks
3. Naming conventions
Drop it into workspace/extensions/my-skill/SKILL.md and restart to activate.
Extension Marketplace
- Browse and install community extensions
- Three extension types: Skill, Plugin, Worker
- Search, filter by type, sort by popularity
- One-click install with environment variable configuration
MCP Protocol Support
- Connect to any MCP server (stdio / HTTP SSE)
- Auto-discover tools, no manual configuration needed
- Visual permission management with enable/disable per tool
- Real-time connection status monitoring
๐ ๏ธ Built-in Tools
| Category | Tools | Description |
|---|---|---|
| ๐ Filesystem | read, write, edit, list | File read/write operations |
| ๐ฅ๏ธ System | shell, spawn | Command execution |
| ๐ Network | web_fetch | Web content fetching |
| ๐ฅ๏ธ Browser | browser_navigate, browser_click, browser_screenshot, ... | Playwright browser automation |
| ๐ผ๏ธ Image | image_understand, image_generate | AI image processing |
| โฐ Schedule | cron_add, cron_list, cron_remove | Task scheduling |
| ๐ฌ Message | send_message | Multi-channel messaging |
| ๐ง Memory | memory_read, memory_write | Agent memory operations |
| ๐ Knowledge | knowledge_search, knowledge_query | Knowledge base retrieval |
| โก Action | action | Execute extension actions |
๐ง Observation & Memory
Octopus automatically extracts insights from conversations:
Observation Types
| Type | Description |
|---|---|
| ๐ฏ Gotcha | Key findings and aha moments |
| ๐ง Problem-Solution | Problem-solution pairs |
| โ๏ธ How-it-works | How something works explanations |
| ๐ What-changed | Change records |
| ๐ Discovery | New discoveries |
| โ Why-it-exists | Rationale and reasons |
| ๐ Decision | Design decisions |
| โ๏ธ Trade-off | Trade-off analysis |
| ๐ก General | General observations |
Memory Features
- Auto-extraction: AI identifies and extracts observations from conversations
- Promote to memory: Elevate important observations to long-term memory
- User profiles: Track user preferences and patterns
- Contextual depth: View observations with surrounding conversation context
โ๏ธ Visual Configuration
All configuration has a graphical interface, no YAML required:
| Config Item | Description |
|---|---|
| Model Providers | Add OpenAI/Anthropic/DeepSeek, support multi-provider switching |
| Agent Settings | Model, max tokens, temperature, max iterations, compression |
| Channel Config | WeChat QR login, Telegram bot, Slack app, DingTalk, and more |
| Tool Toggles | Enable/disable tools with one click, set timeout |
| Workspace | Isolated workspaces with separate config and memory |
| Budget Limit | Set monthly token limit with over-budget alerts |
| Multimodal | Image understanding, TTS, and other multimodal settings |
๐ฐ Token Usage Visualization
Monitor the cost of every conversation in real-time:
- ๐ Real-time Stats: Input/output tokens, cache hits, completion tokens, sub-agent usage
- ๐ Historical Trends: View consumption by day (7/14/30 days)
- ๐ Breakdown Tables: Per-provider and per-model cost analysis
- โ ๏ธ Budget Alerts: Set limits with automatic warnings
โฐ Smart Scheduled Tasks
Not just notifications, but actual work:
- SubAgent Execution: Tasks run in isolated agents, performing real operations
- Flexible Scheduling: Support ISO time, interval seconds, Cron expressions
- Context Inheritance: Tasks can access session memory from creation time
- Persistent Storage: Tasks saved in SQLite, survive restarts
- Channel Delivery: Send task results to specific channels
๐๏ธ Workspace Management
Each project has its own isolated workspace:
workspace/
โโโ project-a/ # Project A
โ โโโ extensions/ # Exclusive extensions
โ โโโ memory/ # Long-term memory
โ โโโ history/ # Chat history
โโโ project-b/ # Project B
โ โโโ ...
- Switch workspace = switch complete config and memory
- Export/import workspaces supported
- Team sharing: export workspace, colleagues import to use
- Built-in file browser with Monaco Editor
- Multi-format preview: PDF, DOCX, XLSX, PPTX, images, Markdown
๐ฌ Chat History
- All conversations saved in local SQLite
- 3-level organization: Channel โ Session โ Instance
- Filter by message type, search across history
- Return to any historical session anytime
- Support parallel multi-sessions
๐ค Visual SubAgent
Create and manage specialized agents through the UI:
- Visual Editing: Modify
SOUL.mdto configure role, tools, model - One-click Creation: Fill in name to auto-generate template config
- Isolated Workspace: Each SubAgent has its own config and memory
- Master-Slave Dispatch: Main agent automatically calls appropriate SubAgent
- Tool & Extension Binding: Assign specific tools and extensions per agent
๐ Quick Start
Requirements
- Node.js >= 18
- Python >= 3.10
Install & Run
# 1. Clone repository
git clone <repository-url>
cd octopus
# 2. Install dependencies
npm install
# 3. Start development mode
npm run dev
๐ก
npm run devstarts both:
- Frontend dev server (http://localhost:3000)
- Electron desktop window
- Python backend (auto-started by Electron)
๐ฆ Build & Release
Development Commands
| Command | Description |
|---|---|
npm run dev | Dev mode (frontend + Electron) |
npm run dev:frontend | Frontend dev server only |
npm run dev:electron | Electron only |
Build Commands
| Command | Description |
|---|---|
npm run build:frontend | Build React frontend |
npm run build:python | Package Python backend |
npm run build | Full build (frontend + Electron) |
Package & Release
| Command | Description | Output |
|---|---|---|
npm run dist | Package current platform | Auto-select by platform |
npm run dist:mac | macOS package | DMG + ZIP (universal: x64/arm64) |
npm run dist:win | Windows package | NSIS installer + portable |
๐ Output:
dist-electron/๐ Detailed guide: README_BUILD.md
๐๏ธ Project Architecture
octopus/
โโโ agents/ ๐ง AI Agent workspace
โ โโโ code-reviewer/ Code review agent
โ โโโ common/ Common agent templates
โ โโโ system/ System agent config
โ โโโ avatars/ Agent avatar assets
โโโ backend/ โก Python backend
โ โโโ agent/ Agent core logic
โ โ โโโ processors/ Streaming / non-streaming / longtask processors
โ โ โโโ compressor.py Context compression
โ โ โโโ subagent.py SubAgent dispatch
โ โ โโโ observation_*.py Observation extraction & management
โ โโโ api/ FastAPI service interface
โ โโโ channels/ Multi-channel support
โ โ โโโ desktop/ Desktop channel (WebSocket)
โ โ โโโ wechat/ WeChat channel
โ โ โโโ feishu/ Feishu/Lark channel
โ โ โโโ dingtalk/ DingTalk channel
โ โ โโโ slack/ Slack channel
โ โ โโโ discord/ Discord channel
โ โ โโโ telegram/ Telegram channel
โ โ โโโ email/ Email channel
โ โ โโโ webhook/ Webhook channel
โ โโโ core/ Core modules
โ โ โโโ config/ Configuration & schema
โ โ โโโ events/ Event bus system
โ โ โโโ longtask/ Long-running task management
โ โ โโโ models/ Data models
โ โ โโโ providers/ LLM provider adapters (OpenAI/Anthropic)
โ โโโ data/ Data storage (SQLite)
โ โ โโโ migrations/ Database migrations (11 migrations)
โ โ โโโ schema/ Data schemas (agent/session/token/workflow/...)
โ โโโ extensions/ Plugin system
โ โ โโโ builtin/ Built-in extensions (cron, etc.)
โ โ โโโ loader.py Dynamic extension loader
โ โโโ mcp/ MCP protocol integration
โ โ โโโ server/ MCP server connection & tool registry
โ โ โโโ llm_bridge.py LLM-MCP bridge
โ โโโ services/ Service layer
โ โ โโโ cron/ Scheduled task service
โ โ โโโ tts/ Text-to-speech (OpenAI/MiMo engines)
โ โ โโโ workflow/ Workflow engine & executor
โ โ โโโ knowledge_*.py Knowledge base services
โ โ โโโ image_service.py Image generation service
โ โ โโโ llm_service.py LLM invocation service
โ โโโ tools/ Built-in tools
โ โ โโโ filesystem.py Filesystem tools
โ โ โโโ shell.py Shell tools
โ โ โโโ web_fetch.py Web fetch tools
โ โ โโโ browser/ Playwright browser automation
โ โ โโโ image.py Image processing tools
โ โ โโโ cron.py Cron task tools
โ โ โโโ message.py Message tools
โ โ โโโ memory.py Memory read tools
โ โ โโโ memory_write.py Memory write tools
โ โ โโโ knowledge.py Knowledge base tools
โ โ โโโ action.py Extension action tools
โ โ โโโ spawn.py Process spawn tools
โ โโโ utils/ Utility functions
โโโ electron/ ๐ฅ๏ธ Electron main process
โ โโโ main.js Main entry (Python lifecycle, window management)
โ โโโ preload.js Preload script (IPC bridge)
โโโ frontend/ ๐จ React frontend
โ โโโ src/
โ โ โโโ pages/ Page components
โ โ โ โโโ Chat/ Chat interface with streaming & tool display
โ โ โ โโโ Config/ Settings (providers/agent/channels/multimodal)
โ โ โ โโโ Workflow/ Visual workflow editor (ReactFlow)
โ โ โ โโโ Knowledge/ Knowledge base (documents/notes/graph)
โ โ โ โโโ Agents/ SubAgent management
โ โ โ โโโ MCP/ MCP server & tool management
โ โ โ โโโ Extensions/ Extension marketplace
โ โ โ โโโ Cron/ Scheduled tasks
โ โ โ โโโ Tokens/ Token usage dashboard
โ โ โ โโโ History/ Chat history browser
โ โ โ โโโ Memory/ Observation & memory viewer
โ โ โ โโโ Workspace/ File browser & editor
โ โ โโโ components/ Shared components
โ โ โ โโโ MessageList/ Message rendering with iteration folds
โ โ โ โโโ TTSPlayer/ Audio playback
โ โ โ โโโ TaskIndicator/ Task status indicator
โ โ โ โโโ MermaidDiagram/ Mermaid chart rendering
โ โ โโโ workflow/ Workflow engine
โ โ โ โโโ components/ Node components (13 registered types)
โ โ โ โโโ hooks/ Zustand workflow store
โ โ โ โโโ types/ Type definitions
โ โ โ โโโ templates/ Workflow templates
โ โ โโโ contexts/ React contexts (WebSocket, DistillTask)
โ โ โโโ hooks/ Custom hooks (useChatState, useMermaid)
โ โ โโโ utils/ Utilities
โ โโโ package.json
โโโ build/ ๐ง Build resources (icons, etc.)
โโโ workspace/ ๐ Workspace data (runtime-generated, git-ignored)
โโโ build_python.py ๐ Python packaging script (PyInstaller)
โโโ package.json ๐ Project config & scripts
โโโ README.md ๐ Project documentation
Tech Stack
| Layer | Technology | Description |
|---|---|---|
| Frontend | React 18 + Vite 5 | Modern UI framework |
| Ant Design 6 | Component library | |
| ReactFlow | Visual workflow editor | |
| Monaco Editor | Code editor | |
| ECharts 6 | Data visualization | |
| PixiJS | Knowledge graph WebGL rendering | |
| Zustand | Workflow state management | |
| Backend | Python 3.10+ + FastAPI | High-performance async web service |
| SQLite + SQLAlchemy | Local lightweight database | |
| Playwright | Browser automation | |
| APScheduler | Task scheduling | |
| Desktop | Electron 28 | Cross-platform desktop framework |
| electron-builder | App packaging tool |
Runtime Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Electron Main Process โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ BrowserWindow โ โ Python Process โ โ
โ โ (React SPA) โ โ (octopus-server) โ โ
โ โ โ โ โ โ
โ โ electronAPI โโโโโผโโโผโโโถ FastAPI โ โ
โ โ (preload bridge)โ โ (WebSocket) โ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Communication: Full WebSocket between frontend and backend
- Request-Response:
request_idbased correlation with timeout - Event Subscription: Pub/Sub pattern for real-time events
- Python Lifecycle: Managed by Electron (auto-start/stop)
๐ง Model Configuration
Add API keys in the app settings panel:
Supported Providers
| Provider | Representative Models |
|---|---|
| OpenAI | GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo, o1 |
| Anthropic | Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku |
| Gemini Pro, Gemini Ultra | |
| DeepSeek | DeepSeek Chat, DeepSeek Coder |
| Alibaba | Tongyi Qianwen series |
| Baidu | Wenxin Yiyan series |
| Custom | Any OpenAI-compatible API endpoint |
Configuration Steps
- Open app โ Settings โ Model Providers
- Add provider (select or custom)
- Enter API Key & Base URL
- Select model to use
- Save and start
๐ MCP Protocol
Octopus fully supports Model Context Protocol (MCP):
- ๐ Connect to any MCP server
- ๐ ๏ธ Use tools provided by MCP
- ๐ Secure permission management
- ๐ Real-time connection monitoring
- ๐ Visual server management (add/edit/delete/reconnect)
- ๐ Auto-discovered tools with per-tool enable/disable
Supported Transports
- stdio: Local process communication
- HTTP SSE: Server-Sent Events over HTTP
๐ค Agent Workspace
Agent system supports continuous memory and personalization:
Configuration Files
| File | Purpose |
|---|---|
SOUL.md | Agent soul - core principles and personality |
IDENTITY.md | Agent identity - self-introduction |
AGENTS.md | Workspace guide - usage instructions |
MEMORY.md | Long-term memory - important info persistence |
memory/YYYY-MM-DD.md | Daily notes - daily event records |
Creating Custom Agents
Create new folder in agents/ directory, add config files to create custom agent.
๐ Project Structure
octopus/
โโโ backend/ # Python backend (FastAPI)
โโโ frontend/ # React frontend (Vite)
โโโ electron/ # Electron main process
โโโ build/ # Build resources (icons, etc.)
โโโ build_python.py # Python packaging script
โโโ workspace/ # โ ๏ธ Runtime-generated directory (git-ignored)
โ โโโ agents/ # - User-created agent configurations
โ โโโ extensions/ # - Installed extensions
โ โโโ files/ # - Workspace files
โ โโโ images/ # - Generated images
โ โโโ ... # - Other runtime data
โโโ scripts/ # Helper scripts
Note: The
workspace/directory is created at runtime and contains user data, agent configs, and generated files. It's excluded from version control by.gitignore.
๐ Documentation
- ๐ Build Guide - Packaging & release details
- ๐ Agent Guide - Agent workspace usage
- ๐ Identity - Learn who Octopus is
- ๐ง Soul Core - Agent core principles
- ๐ MCP Docs - MCP protocol integration
- ๐ Browser Tools - Browser automation guide
๐ค Contributing
Issues and Pull Requests welcome:
- ๐ Bug reports
- โจ New features
- ๐ Documentation improvements
- ๐จ UI/UX optimizations
๐ Changelog
2026-05
| Date | Version | Changes |
|---|---|---|
| 2026-05-17 | v1.0.0 | ๐ New: Visual Workflow editor with 24 node types |
| 2026-05-17 | v1.0.0 | ๐ New: Knowledge Base with documents, notes, graph |
| 2026-05-17 | v1.0.0 | ๐ก New: Multi-channel support (Slack/Discord/Telegram/...) |
| 2026-05-17 | v1.0.0 | ๐ New: Playwright browser automation tools |
| 2026-05-17 | v1.0.0 | ๐ง New: Observation & memory system |
2026-03
| Date | Version | Changes |
|---|---|---|
| 2026-03-29 | v1.0.0 | ๐ New: Text-to-Speech (TTS) feature support |
| 2026-03-29 | v1.0.0 | ๐ค New: SubAgent management and UI improvements |
| 2026-03-28 | v1.0.0 | ๐๏ธ New: Context compression and LLM retry optimization |
| 2026-03-25 | v1.0.0 | ๐ New: PDF, DOCX, and Excel file support |
| 2026-03-24 | v1.0.0 | ๐ฌ New: WeChat channel with QR login and messaging |
| 2026-03-22 | v1.0.0 | ๐ผ๏ธ New: Frameless window support |
| 2026-03-20 | v1.0.0 | ๐ Release: Project renamed to Octopus |