FlashTTS
May 17, 2025 · View on GitHub
FlashTTS
Powered by state-of-the-art models such as SparkTTS, OrpheusTTS, MegaTTS 3, FlashTTS delivers high-quality Mandarin speech synthesis and zero-shot voice cloning. With a clean and intuitive Web interface, you can quickly generate natural, lifelike voices for dubbing, narration, accessibility, virtual characters, and more.
If you find FlashTTS helpful, please leave us a ⭐ Star!
✨ Highlights
| Feature | Description | |
|---|---|---|
| 🚀 | Multi-backend Acceleration | Supports high-performance inference engines like vllm, sglang, llama-cpp, mlx-lm,tensorrt-llm, etc. |
| 🎯 | High Concurrency | Dynamic batching and asynchronous queues to handle heavy traffic with ease |
| 🎛️ | Full Parameter Control | Adjust pitch, speaking rate, temperature, emotion tags, and more |
| 📱 | Lightweight Deployment | Built on FastAPI—start with a single command; minimal dependencies |
| 🔊 | Long-form Synthesis | Supports very long texts while maintaining consistent voice quality |
| 🔄 | Streaming TTS | Generate and play audio in real time; reduces wait time, enhances interactivity |
| 🎭 | Multi-character Dialog | Synthesize multiple roles within the same text—ideal for script dubbing |
| 🎨 | Modern Frontend | Web-ready, responsive interface |
🖼️ Frontend Demo
https://github.com/user-attachments/assets/1bd9d586-fac7-4016-b955-5a58d8fb9d7e
🔈 Voice Samples
Below are demos showcasing FlashTTS’s cloning capabilities across different models and characters.
SparkTTS Model
|
Donald Trump (EN) |
Donald Trump (ZH) |
|
Nezha |
Li Jing |
|
Yu Chengdong |
Xu Zhisheng |
MegaTTS 3 Model
|
Cai Xukun |
Taiyi Zhenren |
OrpheusTTS (ZH) Model
|
Changle |
Baizhi |
Quick Start
It is recommended to install flashtts in a Python 3.8–3.12 environment via pip:
pip install flashtts
For detailed installation steps, please refer to: installation guide
Local inference command::
flashtts infer \
-i "hello world." \
-o output.wav \
-m ./models/your_model \
-b vllm \
[other optional parameters]
For detailed usage,please refer to: quick_start.md
Server deployment:
flashtts serve \
--model_path Spark-TTS-0.5B \
--backend vllm \
--role_dir data/roles \
--llm_device cuda \
--tokenizer_device cuda \
--detokenizer_device cuda \
--wav2vec_attn_implementation sdpa \
--llm_attn_implementation sdpa \
--torch_dtype "bfloat16" \
--max_length 32768 \
--llm_gpu_memory_utilization 0.6 \
--fix_voice \ # Whether to fix the spark-tts timbre (female and male)
--host 0.0.0.0 \
--port 8000
Web address: http://localhost:8000
Interface document address: http://localhost:8000/docs
For detailed deployment,please refer to: server.md
⚡ Inference Speed
Test environment: A800 GPU · Model: Spark-TTS-0.5B · Test script: speed_test.py
| Scenario | Engine | Device | Audio Length (s) | Inference Time (s) | RTF |
|---|---|---|---|---|---|
| Short | llama-cpp | CPU | 7.48 | 6.81 | 0.91 |
| Short | torch | GPU | 7.18 | 7.68 | 1.07 |
| Short | vllm | GPU | 7.24 | 1.66 | 0.23 |
| Short | sglang | GPU | 7.58 | 1.07 | 0.14 |
| Long | llama-cpp | CPU | 121.98 | 117.83 | 0.97 |
| Long | torch | GPU | 113.70 | 107.17 | 0.94 |
| Long | vllm | GPU | 111.82 | 7.28 | 0.07 |
| Long | sglang | GPU | 117.02 | 4.20 | 0.04 |
RTF < 1 means real-time synthesis.
⚙️ Usage Tips
- SparkTTS weights must be
bfloat16orfloat32; usingfloat16will cause errors. - If you experience long silent gaps, try increasing
repetition_penalty(> 1.0). - OrpheusTTS supports inserting
<tag>in text to control emotion. SeeLANG_MAPinorpheus_engine.py. - For safety reasons, MegaTTS 3 does not publish the WaveVAE encoder. Please follow the official instructions to download it: reference audio.
🤝 Acknowledgments
⚠️ Disclaimer
FlashTTS is provided for academic research, education, and lawful purposes only, such as accessibility assistance and personalized speech synthesis. Do not use it for fraud, impersonation, deepfakes, or other illegal activities. Users are responsible for any misuse.
License
This project follows the same license as Spark-TTS. See LICENSE for details.