README.md

June 29, 2026 · View on GitHub

OpenTalking

Open-source real-time digital-human pipeline: LLM, TTS, WebRTC, character voices, and pluggable model backends

Demos · Deployment · Quickstart · Models · Roadmap · Docs & Community

Overview

OpenTalking is an open-source orchestration framework for real-time digital-human conversations. It covers the core path of a digital-human conversational product: frontend interaction, session state, LLM replies, STT, TTS and voice selection, interruption control, subtitle events, WebRTC audio/video playback, and calls into local or remote model services.

OpenTalking is designed as a practical digital-human production stack. The WebUI, avatar and voice asset libraries, knowledge bases, memory, multi-session state, LLM / STT / TTS providers, WebRTC playback, and model backends are organized in one project. You can start with the lightweight Mock mode, connect local QuickTalk / Wav2Lip, or use OmniRT for FlashTalk, FasterLivePortrait, and other higher-quality or more complex model workflows.

Fast trial: mock / driverless mode, useful for validating the API, TTS, and WebRTC path before downloading video model weights.
Real-time conversation: connect QuickTalk, Wav2Lip, FlashTalk, and other models for interactive digital-human dialogue.
Video creation and cloning: reuse FasterLivePortrait runtime for audio/text-driven video creation and camera/uploaded-video-driven video clone workflows.
Private deployment: supports local STT/TTS, OpenAI-compatible LLMs, knowledge bases, memory, OmniRT remote inference, Docker, and distributed deployment.

WebUI And Demos

OpenTalking provides a Web service interface for managing the digital-human conversation pipeline. You can select or create avatars, configure voices, LLM, TTS, STT, and digital-human driver models, inspect model connection status, and validate real-time conversation, subtitles, and audio/video playback on the same page.

OpenTalking WebUI

Demo Videos

These demos cover three common frontend workflows: real-time conversation, video creation, and video clone.

A. Real-time Conversation
E-commerce livestream	Companion character	News anchor

B. Video Creation
Audio driven	Text driven	Cloned voice driven

C. Video Clone
Realtime camera imitation	Uploaded video imitation

Choose A Deployment Path

OpenTalking's orchestration layer (API / Worker / frontend) and digital-human synthesis backend (mock, local, direct_ws, or OmniRT) can be deployed independently. If you are new to the project, start with Mock mode to validate the full path, then switch to a real rendering model based on your GPU, model, and private-deployment requirements.

Path	Recommended model / backend	Device reference	Best for	Details
Fast trial	`mock`	CPU / no GPU	Validate API, LLM, TTS, WebRTC, and browser playback without downloading model weights	Quickstart
Entry validation	`quicktalk` / `wav2lip`	RTX 3050 Laptop, RTX 3060, RTX 4060	Run real video rendering for demos and deployment validation; lower the resolution on low-memory devices	QuickTalk / Wav2Lip
Consumer-GPU single machine	`quicktalk` / `wav2lip` / `musetalk`	RTX 3090, RTX 4090	Closer to real-time local demos, private validation, and lightweight pre-production evaluation	Model and backend selection
Fully local private path	`sensevoice` + `local_cosyvoice` + `quicktalk`	RTX 3090 / 4090 or similar GPU	Run STT, TTS, and video driving locally; OpenTalking uses the main `.venv`, while CosyVoice runs in a dedicated sidecar venv	Local STT/TTS + QuickTalk
High-quality remote inference	`flashtalk` / `flashhead` / `fasterliveportrait` + OmniRT	Multi-GPU, Ascend 910B2, remote GPU service	Multi-card, GPU/NPU, production isolation, higher visual quality, or video clone workflows	FlashTalk / FasterLivePortrait
Docker / production deployment	API, Web, Worker, external model services	Single GPU, remote GPU, distributed cluster	Service deployment, remote GPU, distributed runtime, and production validation	Deployment

Quickstart

Choose one of the two quickstart paths first:

Path	Use when	What you need	What it validates
Compshare image	You want to try OpenTalking before setting up dependencies or downloading model weights.	A Compshare instance created from the published image, with port `5173` open.	WebUI, LLM replies, streaming TTS, subtitle events, WebRTC delivery, and the prebuilt image workflow.
Self deployment	You want to run the repo on your own machine or server, customize config, or continue into local/remote model deployment.	Python, Node.js, FFmpeg, `.env` provider config; real models also need GPU/runtime/model weights.	Mock first-run path, then local QuickTalk or remote OmniRT model paths.

1. Compshare Image

If you want to try the OpenTalking + OmniRT + QuickTalk real-time digital-human path before setting up everything manually, use the community image we published on Compshare:

Image URL: image link
Exposed port: 5173 (WebUI; API traffic is proxied internally)
Guide: Compshare image quick experience

The image includes OpenTalking, OmniRT, the QuickTalk runtime environment, and model files. After deploying an instance, open port 5173 and visit the instance URL provided by the platform. If you need to restart services manually, follow the commands in the guide.

2. Self Deployment

Use this path when you want to run OpenTalking from source. Start with Mock mode if you do not want to download video model weights yet: Mock mode uses the built-in static frame, while LLM replies, streaming TTS, subtitle events, and WebRTC delivery still run through the full product path.

git clone https://github.com/datascale-ai/opentalking.git
cd opentalking

uv sync --extra dev --python 3.11
source .venv/bin/activate
cp .env.example .env

Edit .env and configure at least an LLM. The default TTS can use the keyless edge voice. LLM, STT, and TTS are independent providers; see Configuration and LLM / STT.

bash scripts/start_unified.sh --mock

The default frontend URL is http://localhost:5173. To specify ports:

bash scripts/start_unified.sh --mock --api-port 8210 --web-port 5280

Stop services:

bash scripts/quickstart/stop_all.sh

Real Model Entrypoints

After Mock mode works, choose a real model path based on your machine. Weight downloads, directory layout, mirrors, checks, and troubleshooting are maintained in the docs; the README keeps only the startup entrypoints:

# Local QuickTalk: consumer-GPU single-machine path
export OPENTALKING_TORCH_DEVICE=cuda:0
export OPENTALKING_QUICKTALK_ASSET_ROOT="$PWD/models/quicktalk"
export OPENTALKING_QUICKTALK_WORKER_CACHE=1
bash scripts/start_unified.sh --backend local --model quicktalk --api-port 8210 --web-port 5280

# Remote OmniRT / FlashTalk: high-quality or multi-card path
bash scripts/start_unified.sh \
  --backend omnirt \
  --model flashtalk \
  --api-port 8210 \
  --web-port 5280 \
  --omnirt http://<gpu-server>:9000

More entrypoints:

Supported Models

Model	Input	Recommended backend	Resource guidance
`mock`	Reference image / static frame	`mock`	No GPU required
`quicktalk`	Template video + audio	`local`	CUDA GPU, RTX 3090 / 4090 recommended
`wav2lip`	Reference image / frames + audio	`local` / `omnirt`	`>= 8 GB` GPU / NPU memory
`musetalk`	Full frames + audio	`omnirt` / `local`	`>= 12 GB` GPU memory
`soulx-flashtalk-14b`	Portrait + audio	`omnirt`	Multi-GPU / NPU
`soulx-flashhead-1.3b`	Portrait + audio	`omnirt`	Multi-GPU / NPU
`fasterliveportrait`	Portrait / driving video / audio	`omnirt`	Single-GPU real-time portrait paste-back, video creation, video clone

Consumer-GPU Reference

Model	Hardware	Input	Output	VRAM	Throughput
`quicktalk`	RTX 3090	Template video + audio	720x900 / 25fps	About 3.8 GiB	About 35 fps

For weight downloads, Docker, troubleshooting, and model configuration, see Model deployment.

Cloud Model API: Atlas Cloud

Atlas Cloud is an all-modal AI inference platform. One API gives you access to video generation, image generation, and LLMs, so you do not need to integrate multiple vendors separately. A single integration can route to 300+ curated all-modal models.

OpenTalking uses an OpenAI-compatible interface for LLMs. Point OPENTALKING_LLM_BASE_URL to https://api.atlascloud.ai/v1 to use Atlas-hosted DeepSeek / Qwen models. See LLM and STT. For budget-friendly API options, see Atlas Cloud's coding plan.

Progress And Roadmap

More natural real-time conversations Improve interruption handling, low-latency response, audio/video sync, long-session recovery, and runtime visibility.
Consumer-GPU multi-model path Improve asset checks, prewarm, cache reuse, low-memory parameters, and more RTX 3090 / 4090 / WSL2 benchmarks for QuickTalk / Wav2Lip / MuseTalk local paths, while filling in more FasterLivePortrait video creation and video clone measurements.
One-command Windows / WSL2 deployment Continue lowering the barrier for model downloads, runtime installation, environment checks, and diagnostics based on the current Windows docs and test records.
High-quality private deployment Improve external OmniRT inference services, multi-model endpoints, capacity scheduling, health checks, production monitoring, and GPU / NPU deployment guidance.
More cloud voice and multimodal providers Extend pluggable STT / TTS / LLM providers, unified frontend selection, and provider-level health checks on top of the current OpenAI-compatible, DashScope, and Xiaomi MiMo profiles.
Agent, memory, and platform capabilities Productize the asset library, knowledge bases, memory, multi-session scheduling, tool calling, and OpenClaw / external Agent integrations, then fill in observability, safety, licensed voices, and synthetic-content labeling.

Recent Progress

2026-06-25: WeChat memory import and persona workflow Added WeChat memory persona import, documentation, and the related persona workflow. The frontend no longer treats persona selection and driving-model selection as mutually exclusive, so users can combine imported memory/persona context with the selected avatar driver.
2026-06-23: Local CosyVoice TRT sidecar deployment Added the local CosyVoice sidecar deployment path with TensorRT / FP16 acceleration notes, runtime tuning, dedicated environment isolation, startup checks, and measured deployment guidance for pairing local TTS with QuickTalk.
2026-06-22: Runtime configuration, memory refresh, and immersive scenes Added the runtime API configuration page, improved mem0 provider release during runtime refresh, and expanded the scene asset pipeline: scene asset APIs, asset-library integration, immersive conversation mode, scene/avatar anchoring, transparent background handling, and realtime media preservation across view switches.
2026-06-18/19: Quickstart split, LightRAG runtime config, and scenario guides Split the quickstart into Compshare image and self-deployment paths, added LightRAG runtime configuration and quickstart updates, fixed dependency notes for mem0 / Hugging Face download tooling, and added the Huangshan digital-human guide.
2026-06-12: QuickTalk local asset fixes and Apple Silicon support Organized QuickTalk local weights, HuBERT, InsightFace paths, missing-asset checks, cache preparation, and health checks. Added Apple Silicon deployment docs for validating quicktalk-cpu with MPS / CPU on macOS arm64.
2026-06-12: IndexTTS, QuickTalk, and FlashTalk video creation improvements Added local IndexTTS and OmniRT IndexTTS providers, system voices, voice preview, and voice labels. Improved the QuickTalk / IndexTTS video creation path, and added FlashTalk reference-video generation with a default reference driver.
2026-06-02/10: Persona Package, knowledge retrieval, and character memory Added Persona Package API / CLI / WebUI entrypoints for reusable role settings, knowledge materials, and prompts. Added LightRAG knowledge retrieval, session-level knowledge selection, a character memory panel, and BM25 / mem0 / SQLite memory providers.
2026-06-05: Asset library and knowledge-base workflow Extended the WebUI asset library to connect avatar assets, knowledge materials, session selection, and Agent context building. Added audio/video exports so demos, reviews, and reusable materials can stay in the same workspace.
2026-06-05/06: OpenAI-compatible audio providers and MuseTalk deployment updates Added OpenAI-compatible STT / TTS adapters, Xiaomi MiMo STT / TTS / voice clone profiles, frontend provider selection, and voice lists. Reworked .env.example into separate LLM / STT / TTS profile templates. Also improved MuseTalk local / OmniRT deployment docs, asset preparation scripts, and quickstart scripts.
2026-06-04: FasterLivePortrait video creation and video clone Added the FasterLivePortrait video creation parameter panel, video clone page, custom source-asset upload, camera / uploaded-video driving input, and docs screenshots, reusing the OmniRT + FasterLivePortrait runtime path.
2026-06-03: Web recording exports, asset library, and video workflows Added Web recording exports, export storage, video creation entrypoints, and the asset library workspace, connecting real-time conversation, material management, and video generation.
2026-06-12/13: Homepage analytics, GitHub traffic, and deployment docs Added the English homepage, deployment-route presentation, site analytics, GitHub traffic statistics, chart style updates, and statistics-interval fixes. Added the WSL2 network-mode selection guide for Windows deployment and continued updating README demo videos and docs-site links.
Earlier foundation: real-time conversation path and backend decoupling Built the Web console, LLM conversation, TTS, subtitle events, WebRTC audio/video playback, Avatar prewarm and cache, unified audio2video runner, and pluggable mock / local / direct_ws / omnirt model backends.

Documentation And Community

Quickstart
Models (weight downloads, mirrors, startup, validation)
Architecture
Configuration
Deployment (Docker Compose, distributed deployment)
Model adapter
Contributing (dev environment, CLI tools, ruff / mypy / pytest)

Join the QQ community to discuss real-time digital humans, FlashTalk, OmniRT, model deployment, and product scenarios.

AI digital human QQ group QR code

AI Digital Human Community · Group ID: 1103327938

Acknowledgements

OpenTalking references and benefits from excellent projects in the real-time digital-human ecosystem:

Thanks to the LINUX DO community for their support and discussions.
SoulX-FlashTalk and SoulX-FlashTalk-14B
LiveTalking
OmniRT
Edge TTS
aiortc
Wan Video

License

Apache License 2.0