Linly-Talker-Stream: Real-Time Streaming Conversational Digital Human System

February 10, 2026 ยท View on GitHub

Full-duplex, low-latency, real-time interactive digital human framework

madewithlove Python WebRTC Vue


English | ็ฎ€ไฝ“ไธญๆ–‡

News

2026.02 Update ๐Ÿ“†

  • Released Linly-Talker-Stream: the real-time streaming architecture version of Linly-Talker. Built on top of the original multimodal stack, it introduces a WebRTC real-time transport + streaming pipeline for low-latency audio/video interaction and a full-duplex conversation experience.

Table of Contents

Introduction

Why Linly-Talker-Stream?

Linly-Talker-Stream is the real-time streaming architecture version of Linly-Talker. It upgrades traditional turn-based QA into a more human-like full-duplex conversational system:

  • ๐ŸŽค Listen while speaking: user speech and avatar playback can run in parallel.
  • โšก Low-latency transport: real-time audio/video transmission via WebRTC.
  • โœ‹ Barge-in and interruption support: more natural conversational rhythm.
  • ๐Ÿงฉ Modular multimodal pipeline: ASR / LLM / TTS / Avatar modules are replaceable and extensible.

If you want to build AI assistants, digital human front desks, interactive guides, or live Q&A scenarios, this project can serve as a practical real-time interaction engineering baseline.

On top of Linly-Talkerโ€™s multimodal pipeline (ASR / LLM / TTS / Avatar), this project references LiveTalking for real-time communication design and performs a streaming pipeline refactor. Continuous optimization is planned.

Demos & Showcase

Note

Linly-Talker-Stream is positioned as the โ€œreal-time streaming version,โ€ reusing and extending Linly-Talkerโ€™s multimodal digital human capabilities:

System Architecture

Linly-Talker architecture

Web UI Preview

Linly-Talker Stream

Roadmap (TODO)

  • Introduce Omni multimodality, evolving from fixed ASR + LLM + TTS into a more complete end-to-end pipeline.
  • Add server-side VAD to improve endpoint detection, interruption handling, and turn control stability.

Important

This project is under active iteration. PRs and Issues are welcome.

Highlights

  • WebRTC real-time streaming playback with low latency in browsers.
  • Full-duplex interaction (currently available): supports speaking and listening simultaneously. The current full-duplex implementation mainly relies on browser speech recognition (with built-in VAD/endpoint detection) for user-side speech detection and transcription, while avatar audio/video is continuously streamed via WebRTC.
  • Switchable avatar engines via configuration:
    • wav2lip (2D)
    • musetalk (2D)
    • ernerf (3D)
    • talkinggaussian (3D)
  • Modular architecture with isolated dependencies for on-demand installation and extension.

Project Structure Overview

Linly-Talker-Stream/
โ”œโ”€โ”€ pyproject.toml                    # Root project config (core dependencies)
โ”œโ”€โ”€ config/                           # Runtime config files (YAML)
โ”œโ”€โ”€ scripts/                          # Environment setup / startup scripts
โ”œโ”€โ”€ models/                           # Model weights
โ”œโ”€โ”€ data/                             # Avatar assets / recorded files
โ”œโ”€โ”€ web/                              # Vue frontend
โ””โ”€โ”€ src/
    โ”œโ”€โ”€ server/                       # Backend (WebRTC + APIs)
    โ”œโ”€โ”€ asr/                          # Speech recognition engines
    โ”œโ”€โ”€ llm/                          # LLM adapters
    โ”œโ”€โ”€ tts/                          # Speech synthesis engines
    โ””โ”€โ”€ avatars/                      # Avatar engines (2D/3D)

Real-Time Interaction Pipeline

  1. Browser captures microphone/camera input.
  2. Speech enters the ASR and conversation pipeline.
  3. LLM generates response text.
  4. TTS outputs synthesized speech stream.
  5. Avatar engine drives lip-sync and renders video.
  6. WebRTC sends generated streams back to the browser in real time.

Requirements

  • Python: 3.10+
  • Node.js: 16+
  • uv: recommended Python package manager (installation docs)
  • Browser: Chrome / Edge recommended (remote microphone access usually requires HTTPS)

# 1) Clone repository
git clone https://github.com/Kedreamix/Linly-Talker-Stream.git
cd Linly-Talker-Stream

# 2) One-click environment setup (auto install uv + create .venv + install dependencies)
bash scripts/setup-env.sh wav2lip

# 3) Configure API key (default using Alibaba Cloud Bailian's Qwen-plus interface)
export DASHSCOPE_API_KEY="your_api_key_here"

# 4) One-click start backend + frontend
bash scripts/start-all.sh config/config_wav2lip.yaml

Open in browser: http://localhost:3000

Notes

  • Supported avatars: wav2lip, musetalk, ernerf, talkinggaussian
  • DashScope API key application: Alibaba Cloud Bailian Console (free quota available)
  • For detailed installation of uv / Node.js, see FAQ.md

Manual Installation Example (wav2lip)

# Backend dependencies
uv venv --python 3.10.19
uv sync
uv pip install -e src/avatars/wav2lip/

# Frontend dependencies
cd web && npm install && cd ..

# Environment variable
export DASHSCOPE_API_KEY="your_api_key_here"

# Start services
bash scripts/start-all.sh config/config_wav2lip.yaml

Microphone access for remote usage requires HTTPS:

bash scripts/create_ssl_certs.sh

Then set app.ssl: true in config and access with https://localhost:3000.

Install Other Avatar Modules

# TalkingGaussian
uv pip install -e src/avatars/talkinggaussian/
uv pip install -e src/avatars/talkinggaussian/submodules/diff-gaussian-rasterization/ --no-build-isolation
uv pip install -e src/avatars/talkinggaussian/submodules/simple-knn/ --no-build-isolation
uv pip install -e src/avatars/talkinggaussian/gridencoder/ --no-build-isolation

# MuseTalk (requires additional dependencies and post-processing)
uv pip install chumpy==0.70 --no-build-isolation
uv pip install -e src/avatars/musetalk/
uv run mim install mmengine
uv run mim install mmcv==2.2.0 --no-build-isolation
uv run mim install mmdet==3.1.0
uv run mim install mmpose==1.3.2
bash scripts/post_musetalk_install.sh

Startup Methods

A. Start Backend and Frontend Separately

# Backend
bash scripts/start-backend.sh config/config_wav2lip.yaml
# or
uv run python src/server/app.py --config config/config_wav2lip.yaml

# Frontend
bash scripts/start-frontend.sh config/config_wav2lip.yaml

B. Start with One Command

bash scripts/start-all.sh config/config_wav2lip.yaml

Default ports:

  • Backend: http://localhost:8010
  • Frontend: http://localhost:3000

Configuration

All configs are in config/*.yaml. Common fields:

  • app.listenport: backend port (default 8010)
  • app.ssl: whether to enable HTTPS (recommended for remote recording)
  • model.type: avatar type (wav2lip / musetalk / ernerf / talkinggaussian)
  • tts.type: TTS engine (e.g. edgetts, azuretts, gpt-sovits, cosyvoice)
  • asr.mode: browser (recommended) / server / auto
  • llm.*: LLM config (defaults to Qwen-plus on DashScope)

Default config reads:

export DASHSCOPE_API_KEY="YOUR_KEY_HERE"

โš ๏ธ Important: LLM features require an API key from Alibaba Cloud Bailian, which provides free usage quota.

Config Presets

The repository provides runnable config presets with modular installation:

StatusConfig FileAvatar Type2D/3DOne-Click Setup Command
โœ…config/config_wav2lip.yamlwav2lip2Dbash scripts/setup-env.sh wav2lip
โœ…config/config_musetalk.yamlmusetalk2Dbash scripts/setup-env.sh musetalk
โœ…config/config_talkinggaussian.yamltalkinggaussian3Dbash scripts/setup-env.sh talkinggaussian
โฌœconfig/config_ernerf.yamlernerf3Dbash scripts/setup-env.sh ernerf

Recommended engine switch procedure:

  1. Install the target avatar module.
  2. Start with matching config/config_*.yaml.
  3. Verify model and asset paths in the config.

Models & Data

Quick Download

AvatarTypeDownload Method
Wav2Lip2DDownload wav2lip256.pth + wav2lip256_avatar1.tar.gz from Quark Drive (from LiveTalking)
MuseTalk2Dbash scripts/download_musetalk_weights.sh
TalkingGaussian3D๐Ÿ”— TBD
ER-NeRF3D๐Ÿ”— TBD

Placement Instructions

# Wav2Lip
# 1. Rename wav2lip256.pth to wav2lip.pth and place it in models/
# 2. Extract wav2lip256_avatar1.tar.gz to data/avatars/

# MuseTalk (auto download to correct path)
bash scripts/download_musetalk_weights.sh

# TalkingGaussian
# Extract talkinggaussian_obama.tar.gz to data/avatars/

๐Ÿ’ก Advanced usage: for custom avatar assets, directory structure details, and config path setup, see FAQ.md.


Backend APIs

Main endpoints (see src/server/server.py):

  • POST /offer: WebRTC SDP handshake
  • POST /human: text dialogue (type=chat calls LLM, type=echo for text playback)
  • POST /asr: upload audio โ†’ ASR โ†’ LLM โ†’ drive avatar speech
  • POST /humanaudio: upload audio file to drive avatar speech
  • POST /record: start/stop recording
  • GET /download/{filename}: download recorded files
  • GET /health: health check

FAQ

See FAQ.md.


References

You can also refer to Linly-Talker and LiveTalking for additional context.

Acknowledgements

  • LiveTalking: provided great references for real-time avatar/WebRTC streaming pipelines; this repo refactors and extends that design.
  • Linly-Talker: the upstream multimodal digital human system integrated into this real-time streaming version.

License

This repository uses Apache License 2.0 (consistent with LiveTalking).

Caution

Please comply with local laws and regulations when using or deploying this project (copyright, privacy, data protection, etc.).

See LICENSE and NOTICE for details.

Star History

Star History Chart