README.md

June 29, 2026 · View on GitHub

OpenTalking

Open-source real-time digital-human pipeline: LLM, TTS, WebRTC, character voices, and pluggable model backends

中文 · Documentation · GitHub

License Python React FastAPI WebRTC

Visit OpenTalking Website

Demos · Deployment · Quickstart · Models · Roadmap · Docs & Community


Overview

OpenTalking is an open-source orchestration framework for real-time digital-human conversations. It covers the core path of a digital-human conversational product: frontend interaction, session state, LLM replies, STT, TTS and voice selection, interruption control, subtitle events, WebRTC audio/video playback, and calls into local or remote model services.

OpenTalking is designed as a practical digital-human production stack. The WebUI, avatar and voice asset libraries, knowledge bases, memory, multi-session state, LLM / STT / TTS providers, WebRTC playback, and model backends are organized in one project. You can start with the lightweight Mock mode, connect local QuickTalk / Wav2Lip, or use OmniRT for FlashTalk, FasterLivePortrait, and other higher-quality or more complex model workflows.

  • Fast trial: mock / driverless mode, useful for validating the API, TTS, and WebRTC path before downloading video model weights.
  • Real-time conversation: connect QuickTalk, Wav2Lip, FlashTalk, and other models for interactive digital-human dialogue.
  • Video creation and cloning: reuse FasterLivePortrait runtime for audio/text-driven video creation and camera/uploaded-video-driven video clone workflows.
  • Private deployment: supports local STT/TTS, OpenAI-compatible LLMs, knowledge bases, memory, OmniRT remote inference, Docker, and distributed deployment.

More documentation:

WebUI And Demos

OpenTalking provides a Web service interface for managing the digital-human conversation pipeline. You can select or create avatars, configure voices, LLM, TTS, STT, and digital-human driver models, inspect model connection status, and validate real-time conversation, subtitles, and audio/video playback on the same page.

OpenTalking WebUI

Demo Videos

These demos cover three common frontend workflows: real-time conversation, video creation, and video clone.

A. Real-time Conversation
E-commerce livestream

Companion character

News anchor

B. Video Creation
Audio driven

Text driven

Cloned voice driven

C. Video Clone
Realtime camera imitation

Uploaded video imitation

Choose A Deployment Path

OpenTalking's orchestration layer (API / Worker / frontend) and digital-human synthesis backend (mock, local, direct_ws, or OmniRT) can be deployed independently. If you are new to the project, start with Mock mode to validate the full path, then switch to a real rendering model based on your GPU, model, and private-deployment requirements.

PathRecommended model / backendDevice referenceBest forDetails
Fast trialmockCPU / no GPUValidate API, LLM, TTS, WebRTC, and browser playback without downloading model weightsQuickstart
Entry validationquicktalk / wav2lipRTX 3050 Laptop, RTX 3060, RTX 4060Run real video rendering for demos and deployment validation; lower the resolution on low-memory devicesQuickTalk / Wav2Lip
Consumer-GPU single machinequicktalk / wav2lip / musetalkRTX 3090, RTX 4090Closer to real-time local demos, private validation, and lightweight pre-production evaluationModel and backend selection
Fully local private pathsensevoice + local_cosyvoice + quicktalkRTX 3090 / 4090 or similar GPURun STT, TTS, and video driving locally; OpenTalking uses the main .venv, while CosyVoice runs in a dedicated sidecar venvLocal STT/TTS + QuickTalk
High-quality remote inferenceflashtalk / flashhead / fasterliveportrait + OmniRTMulti-GPU, Ascend 910B2, remote GPU serviceMulti-card, GPU/NPU, production isolation, higher visual quality, or video clone workflowsFlashTalk / FasterLivePortrait
Docker / production deploymentAPI, Web, Worker, external model servicesSingle GPU, remote GPU, distributed clusterService deployment, remote GPU, distributed runtime, and production validationDeployment

Quickstart

Choose one of the two quickstart paths first:

PathUse whenWhat you needWhat it validates
Compshare imageYou want to try OpenTalking before setting up dependencies or downloading model weights.A Compshare instance created from the published image, with port 5173 open.WebUI, LLM replies, streaming TTS, subtitle events, WebRTC delivery, and the prebuilt image workflow.
Self deploymentYou want to run the repo on your own machine or server, customize config, or continue into local/remote model deployment.Python, Node.js, FFmpeg, .env provider config; real models also need GPU/runtime/model weights.Mock first-run path, then local QuickTalk or remote OmniRT model paths.

1. Compshare Image

If you want to try the OpenTalking + OmniRT + QuickTalk real-time digital-human path before setting up everything manually, use the community image we published on Compshare:

The image includes OpenTalking, OmniRT, the QuickTalk runtime environment, and model files. After deploying an instance, open port 5173 and visit the instance URL provided by the platform. If you need to restart services manually, follow the commands in the guide.

2. Self Deployment

Use this path when you want to run OpenTalking from source. Start with Mock mode if you do not want to download video model weights yet: Mock mode uses the built-in static frame, while LLM replies, streaming TTS, subtitle events, and WebRTC delivery still run through the full product path.

git clone https://github.com/datascale-ai/opentalking.git
cd opentalking

uv sync --extra dev --python 3.11
source .venv/bin/activate
cp .env.example .env

Edit .env and configure at least an LLM. The default TTS can use the keyless edge voice. LLM, STT, and TTS are independent providers; see Configuration and LLM / STT.

bash scripts/start_unified.sh --mock

The default frontend URL is http://localhost:5173. To specify ports:

bash scripts/start_unified.sh --mock --api-port 8210 --web-port 5280

Stop services:

bash scripts/quickstart/stop_all.sh

Real Model Entrypoints

After Mock mode works, choose a real model path based on your machine. Weight downloads, directory layout, mirrors, checks, and troubleshooting are maintained in the docs; the README keeps only the startup entrypoints:

# Local QuickTalk: consumer-GPU single-machine path
export OPENTALKING_TORCH_DEVICE=cuda:0
export OPENTALKING_QUICKTALK_ASSET_ROOT="$PWD/models/quicktalk"
export OPENTALKING_QUICKTALK_WORKER_CACHE=1
bash scripts/start_unified.sh --backend local --model quicktalk --api-port 8210 --web-port 5280

# Remote OmniRT / FlashTalk: high-quality or multi-card path
bash scripts/start_unified.sh \
  --backend omnirt \
  --model flashtalk \
  --api-port 8210 \
  --web-port 5280 \
  --omnirt http://<gpu-server>:9000

More entrypoints:

Supported Models

ModelInputRecommended backendResource guidance
mockReference image / static framemockNo GPU required
quicktalkTemplate video + audiolocalCUDA GPU, RTX 3090 / 4090 recommended
wav2lipReference image / frames + audiolocal / omnirt>= 8 GB GPU / NPU memory
musetalkFull frames + audioomnirt / local>= 12 GB GPU memory
soulx-flashtalk-14bPortrait + audioomnirtMulti-GPU / NPU
soulx-flashhead-1.3bPortrait + audioomnirtMulti-GPU / NPU
fasterliveportraitPortrait / driving video / audioomnirtSingle-GPU real-time portrait paste-back, video creation, video clone

Consumer-GPU Reference

ModelHardwareInputOutputVRAMThroughput
quicktalkRTX 3090Template video + audio720x900 / 25fpsAbout 3.8 GiBAbout 35 fps

For weight downloads, Docker, troubleshooting, and model configuration, see Model deployment.

Cloud Model API: Atlas Cloud

Atlas Cloud

Atlas Cloud is an all-modal AI inference platform. One API gives you access to video generation, image generation, and LLMs, so you do not need to integrate multiple vendors separately. A single integration can route to 300+ curated all-modal models.

OpenTalking uses an OpenAI-compatible interface for LLMs. Point OPENTALKING_LLM_BASE_URL to https://api.atlascloud.ai/v1 to use Atlas-hosted DeepSeek / Qwen models. See LLM and STT. For budget-friendly API options, see Atlas Cloud's coding plan.

Progress And Roadmap

  • More natural real-time conversations Improve interruption handling, low-latency response, audio/video sync, long-session recovery, and runtime visibility.

  • Consumer-GPU multi-model path Improve asset checks, prewarm, cache reuse, low-memory parameters, and more RTX 3090 / 4090 / WSL2 benchmarks for QuickTalk / Wav2Lip / MuseTalk local paths, while filling in more FasterLivePortrait video creation and video clone measurements.

  • One-command Windows / WSL2 deployment Continue lowering the barrier for model downloads, runtime installation, environment checks, and diagnostics based on the current Windows docs and test records.

  • High-quality private deployment Improve external OmniRT inference services, multi-model endpoints, capacity scheduling, health checks, production monitoring, and GPU / NPU deployment guidance.

  • More cloud voice and multimodal providers Extend pluggable STT / TTS / LLM providers, unified frontend selection, and provider-level health checks on top of the current OpenAI-compatible, DashScope, and Xiaomi MiMo profiles.

  • Agent, memory, and platform capabilities Productize the asset library, knowledge bases, memory, multi-session scheduling, tool calling, and OpenClaw / external Agent integrations, then fill in observability, safety, licensed voices, and synthetic-content labeling.

Recent Progress

  • 2026-06-25: WeChat memory import and persona workflow Added WeChat memory persona import, documentation, and the related persona workflow. The frontend no longer treats persona selection and driving-model selection as mutually exclusive, so users can combine imported memory/persona context with the selected avatar driver.

  • 2026-06-23: Local CosyVoice TRT sidecar deployment Added the local CosyVoice sidecar deployment path with TensorRT / FP16 acceleration notes, runtime tuning, dedicated environment isolation, startup checks, and measured deployment guidance for pairing local TTS with QuickTalk.

  • 2026-06-22: Runtime configuration, memory refresh, and immersive scenes Added the runtime API configuration page, improved mem0 provider release during runtime refresh, and expanded the scene asset pipeline: scene asset APIs, asset-library integration, immersive conversation mode, scene/avatar anchoring, transparent background handling, and realtime media preservation across view switches.

  • 2026-06-18/19: Quickstart split, LightRAG runtime config, and scenario guides Split the quickstart into Compshare image and self-deployment paths, added LightRAG runtime configuration and quickstart updates, fixed dependency notes for mem0 / Hugging Face download tooling, and added the Huangshan digital-human guide.

  • 2026-06-12: QuickTalk local asset fixes and Apple Silicon support Organized QuickTalk local weights, HuBERT, InsightFace paths, missing-asset checks, cache preparation, and health checks. Added Apple Silicon deployment docs for validating quicktalk-cpu with MPS / CPU on macOS arm64.

  • 2026-06-12: IndexTTS, QuickTalk, and FlashTalk video creation improvements Added local IndexTTS and OmniRT IndexTTS providers, system voices, voice preview, and voice labels. Improved the QuickTalk / IndexTTS video creation path, and added FlashTalk reference-video generation with a default reference driver.

  • 2026-06-02/10: Persona Package, knowledge retrieval, and character memory Added Persona Package API / CLI / WebUI entrypoints for reusable role settings, knowledge materials, and prompts. Added LightRAG knowledge retrieval, session-level knowledge selection, a character memory panel, and BM25 / mem0 / SQLite memory providers.

  • 2026-06-05: Asset library and knowledge-base workflow Extended the WebUI asset library to connect avatar assets, knowledge materials, session selection, and Agent context building. Added audio/video exports so demos, reviews, and reusable materials can stay in the same workspace.

  • 2026-06-05/06: OpenAI-compatible audio providers and MuseTalk deployment updates Added OpenAI-compatible STT / TTS adapters, Xiaomi MiMo STT / TTS / voice clone profiles, frontend provider selection, and voice lists. Reworked .env.example into separate LLM / STT / TTS profile templates. Also improved MuseTalk local / OmniRT deployment docs, asset preparation scripts, and quickstart scripts.

  • 2026-06-04: FasterLivePortrait video creation and video clone Added the FasterLivePortrait video creation parameter panel, video clone page, custom source-asset upload, camera / uploaded-video driving input, and docs screenshots, reusing the OmniRT + FasterLivePortrait runtime path.

  • 2026-06-03: Web recording exports, asset library, and video workflows Added Web recording exports, export storage, video creation entrypoints, and the asset library workspace, connecting real-time conversation, material management, and video generation.

  • 2026-06-12/13: Homepage analytics, GitHub traffic, and deployment docs Added the English homepage, deployment-route presentation, site analytics, GitHub traffic statistics, chart style updates, and statistics-interval fixes. Added the WSL2 network-mode selection guide for Windows deployment and continued updating README demo videos and docs-site links.

  • Earlier foundation: real-time conversation path and backend decoupling Built the Web console, LLM conversation, TTS, subtitle events, WebRTC audio/video playback, Avatar prewarm and cache, unified audio2video runner, and pluggable mock / local / direct_ws / omnirt model backends.

Documentation And Community

Join the QQ community to discuss real-time digital humans, FlashTalk, OmniRT, model deployment, and product scenarios.

AI digital human QQ group QR code

AI Digital Human Community · Group ID: 1103327938

Acknowledgements

OpenTalking references and benefits from excellent projects in the real-time digital-human ecosystem:

License

Apache License 2.0