py-xiaozhi

May 25, 2026 · View on GitHub

Trendshift

Release License: MIT Stars Download Gitee Usage Docs AtomGit

English | 简体中文

About

py-xiaozhi is a lightweight, cross-platform multi-modal AI interaction framework built on Python's async architecture. It supports real-time voice streaming, vision-language tasks, and IoT device control. Deployable across Windows, macOS, Linux desktops, and ARM embedded platforms (Raspberry Pi, Horizon Robotics RDK, Jetson Nano), it bridges the gap between Large Language Models and physical hardware — out of the box.

Evolved from the xiaozhi-esp32 firmware project. Officially adopted by D-Robotics (xiaozhi-in-rdk) as an upstream dependency.

  • xiaozhi-desktop — Electron desktop client with AEC echo cancellation, Live2D, floating window modes, and Windows / macOS installers

Demo

Image

Key Features

  • Real-time Voice AI — Opus codec with auto frame detection (RFC 6716 TOC parsing), async streaming, sub-20ms latency
  • Multi-modal Vision — Camera capture + vision-language model integration for image understanding and scene perception
  • MCP Tool Ecosystem — Modular JSON-RPC 2.0 tool server: music player, camera, screenshot, app management, weather, volume control
  • Cross-platform Deployment — Windows 10+ / macOS 10.15+ / Linux (x86_64 & ARM), optimized for Raspberry Pi and edge boards
  • Multiple UI Modes — PySide6 + QML GUI / CLI / GPIO, adapting to desktop, headless server, and embedded environments
  • Offline Wake Word — Sherpa-ONNX based on-device keyword spotting with custom wake word support
  • IoT & Embodied AI Ready — GPIO interface for robotics control, hardware actuation, and sensor integration
  • WebSocket / MQTT — Dual protocol communication with WSS/TLS encrypted transmission and auto-reconnection
  • Plugin Architecture — Event-driven async design, clean dependency injection, extensible plugin system

System Requirements

Basic Requirements

  • Python Version: 3.10 - 3.12
  • Operating System: Windows 10+, macOS 10.15+, Linux
  • Audio Devices: Microphone and speaker devices
  • Network Connection: Stable internet connection (for AI services and online features)
  • Memory: At least 4GB RAM (8GB+ recommended)
  • Processor: Modern CPU with AVX instruction set support
  • Storage: At least 2GB available disk space (for model files and cache)
  • Audio: Audio devices supporting 16kHz sampling rate

Optional Feature Requirements

  • Voice Wake-up: Requires downloading Sherpa-ONNX speech recognition models
  • Camera Features: Requires camera device and OpenCV support

Read This First

  • Carefully read 项目文档 for startup tutorials and file descriptions
  • The main branch has the latest code; manually reinstall pip dependencies after each update to ensure you have new dependencies

Zero to Xiaozhi Client (Video Tutorial)

Technical Architecture

Core Architecture Design

  • Event-Driven Architecture: Based on asyncio asynchronous event loop, supporting high-concurrency processing
  • Layered Design: Clear separation of application layer, protocol layer, and UI layer
  • Dependency Injection: Component lifecycle managed via bootstrap container
  • Plugin System: Audio, UI, MCP tools and other components loaded via plugin system

Key Technical Components

  • Audio Processing: Opus codec, real-time resampling
  • Speech Recognition: Sherpa-ONNX offline models, wake word recognition
  • Protocol Communication: WebSocket/MQTT dual protocol support, encrypted transmission, auto-reconnection
  • Configuration System: Hierarchical configuration, dot notation access, dynamic updates

Performance Optimization

  • Async First: Full system asynchronous architecture, avoiding blocking operations
  • Memory Management: Smart caching, garbage collection
  • Audio Optimization: 5ms low-latency processing, queue management, streaming transmission
  • Concurrency Control: Task pool management, semaphore control, thread safety

Security Mechanisms

  • Encrypted Communication: WSS/TLS encryption, certificate verification
  • Device Authentication: Dual protocol activation, device fingerprint recognition
  • Access Control: Tool permission management, API access control
  • Error Isolation: Exception isolation, fault recovery, graceful degradation

Development Guide

Project Structure

py-xiaozhi/
├── main.py                     # Application entry point
├── src/
│   ├── activation/             # Device activation
│   ├── audio_codecs/           # Audio codecs
│   ├── audio_processing/       # Wake word detection
│   ├── bootstrap/              # Application bootstrap & dependency injection
│   ├── constants/              # Constants
│   ├── core/                   # Core infrastructure (event bus, state management, task management, etc.)
│   ├── logging/                # Logging subsystem
│   ├── mcp/                    # MCP tool system
│   │   ├── mcp_server.py       # MCP server
│   │   └── tools/              # Tool modules (music/camera/screenshot/app/weather/volume)
│   ├── plugins/                # Plugin system (audio, UI, MCP, wake word, shortcuts)
│   ├── protocols/              # Communication protocols (WebSocket/MQTT)
│   ├── ui/                     # User interface
│   │   ├── gui/                # PySide6 + QML graphical interface
│   │   ├── cli/                # Command line interface
│   │   └── gpio/               # GPIO embedded interface
│   └── utils/                  # Utility functions
├── libs/                       # Third-party native libraries
│   ├── libopus/                # Opus audio codec library
│   └── webrtc_apm/             # WebRTC audio processing module
├── models/                     # Wake word models
├── assets/                     # Static resources
├── scripts/                    # Auxiliary scripts
├── documents/                  # VitePress documentation site
├── pyproject.toml              # Project configuration
└── build.json                  # Build configuration

Development Environment Setup

# Clone project
git clone https://github.com/huangjunsen0406/py-xiaozhi.git
cd py-xiaozhi

# Base install (CLI / GPIO mode)
uv sync                                    # Recommended (uv users)
# or: pip install -e .                    # pip users

# GUI mode (extra: PySide6 + qasync)
uv sync --extra gui                        # Recommended (uv users)
# or: pip install -e '.[gui]'             # pip users

# Full development environment (GUI + test / packaging tools)
uv sync --extra gui --group dev

# Code formatting
./format_code.sh

# Run program - GUI mode (default; requires gui extra)
python main.py

# Run program - CLI mode (base install is enough)
python main.py --mode cli

# Specify communication protocol
python main.py --protocol websocket  # WebSocket (default)
python main.py --protocol mqtt       # MQTT protocol

Core Development Patterns

  • Async First: Use async/await syntax, avoid blocking operations
  • Error Handling: Complete exception handling and logging
  • Configuration Management: Use ConfigManager for unified configuration access
  • Test-Driven: Write unit tests to ensure code quality

Extension Development

  • Add MCP Tools: Create new tool modules in src/mcp/tools/ directory
  • Add Protocols: Implement Protocol abstract base class
  • Add Plugins: Extend the plugin system via src/plugins/

State Transition Diagram

                        +----------------+
                        |                |
                        v                |
+------+  Wake/Button  +------------+   |   +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING  |
+------+              +------------+       +------------+
   ^                                            |
   |                                            | Voice Recognition Complete
   |          +------------+                    v
   +--------- |  SPEAKING  | <-----------------+
     Playback +------------+
     Complete

Contributing

Maintainer Workflow

  • Triage incoming work as bug, feature, docs, refactor, or maintenance
  • Prefer focused pull requests with clear validation steps and linked context
  • Require docs updates when behavior, configuration, or public APIs change
  • Merge after CI passes and review feedback is resolved
  • Release through the normal release flow; merge does not imply immediate shipping

Community and Support

Thanks to the Following Open Source Contributors

In no particular order

Xiaoxia zhh827 SmartArduino-Li Honggang HonestQiao vonweller Sun Weigong isamu2025 Rain120 kejily Radio bilibili Jun Cyber Intelligence

Sponsorship Support

Thanks to All Sponsors ❤️

Whether it's API resources, device compatibility testing, or financial support, every contribution makes the project more complete

View Sponsors Become a Sponsor

Project Statistics

Star History Chart

License

MIT License