Kitten TTS

May 9, 2026 ยท View on GitHub

Kitten TTS

Hugging Face Demo Discord Website License

New: Kitten TTS v0.8 is out -- 15M, 40M, and 80M parameter models now available.

Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX. With models ranging from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU.

Status: Developer preview -- APIs may change between releases.

Commercial support is available. For integration assistance, custom voices, or enterprise licensing, contact us.

Table of Contents

Features

  • Ultra-lightweight -- Model sizes from 25 MB (int8) to 80 MB, suitable for edge deployment
  • CPU-optimized -- ONNX-based inference runs efficiently without a GPU
  • 8 built-in voices -- Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo
  • Adjustable speech speed -- Control playback rate via the speed parameter
  • Text preprocessing -- Built-in pipeline handles numbers, currencies, units, and more
  • 24 kHz output -- High-quality audio at a standard sample rate

Available Models

ModelParametersSizeDownload
kitten-tts-mini80M80 MBKittenML/kitten-tts-mini-0.8
kitten-tts-micro40M41 MBKittenML/kitten-tts-micro-0.8
kitten-tts-nano15M56 MBKittenML/kitten-tts-nano-0.8
kitten-tts-nano (int8)15M25 MBKittenML/kitten-tts-nano-0.8-int8

Note: Some users have reported issues with the kitten-tts-nano-0.8-int8 model. If you encounter problems, please open an issue.

Demo

https://github.com/user-attachments/assets/d80120f2-c751-407e-a166-068dd1dd9e8d

Try it online

Try Kitten TTS directly in your browser on Hugging Face Spaces.

Quick Start

Prerequisites

  • Python 3.8 or later
  • pip

Installation

pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl

Basic Usage

from kittentts import KittenTTS

model = KittenTTS("KittenML/kitten-tts-mini-0.8")
audio = model.generate("This high-quality TTS model runs without a GPU.", voice="Jasper")

import soundfile as sf
sf.write("output.wav", audio, 24000)

Advanced Usage

# Adjust speech speed (default: 1.0)
audio = model.generate("Hello, world.", voice="Luna", speed=1.2)

# Save directly to a file
model.generate_to_file("Hello, world.", "output.wav", voice="Bruno", speed=0.9)

# List available voices
print(model.available_voices)
# ['Bella', 'Jasper', 'Luna', 'Bruno', 'Rosie', 'Hugo', 'Kiki', 'Leo']

Using with GPU

pip install -r requirements_gpu.txt
m = KittenTTS("KittenML/kitten-tts-mini-0.8", backend="cuda")

Check out example_cuda.py

API Reference

KittenTTS(model_name, cache_dir=None)

Load a model from Hugging Face Hub.

ParameterTypeDefaultDescription
model_namestr"KittenML/kitten-tts-nano-0.8"Hugging Face repository ID
cache_dirstrNoneLocal directory for caching downloaded model files

model.generate(text, voice, speed, clean_text)

Synthesize speech from text, returning a NumPy array of audio samples at 24 kHz.

ParameterTypeDefaultDescription
textstr--Input text to synthesize
voicestr"expr-voice-5-m"Voice name (see available voices)
speedfloat1.0Speech speed multiplier
clean_textboolFalsePreprocess text (expand numbers, currencies, etc.)

model.generate_to_file(text, output_path, voice, speed, sample_rate, clean_text)

Synthesize speech and write directly to an audio file.

ParameterTypeDefaultDescription
textstr--Input text to synthesize
output_pathstr--Path to save the audio file
voicestr"expr-voice-5-m"Voice name
speedfloat1.0Speech speed multiplier
sample_rateint24000Audio sample rate in Hz
clean_textboolTruePreprocess text (expand numbers, currencies, etc.)

normalize_text(text, locale="en-US", return_spans=False)

Normalize text for TTS without generating audio.

from kittentts import normalize_text

normalized = normalize_text("Dr. Rivera paid \$12.50 at 3:05 p.m.")
# "Doctor Rivera paid twelve dollars and fifty cents at three oh five p m."

result = normalize_text("Fig. 2", return_spans=True)
print(result.text)
print(result.spans)

When return_spans=True, the result includes original-to-normalized character spans for changed segments such as abbreviations, dates, times, numbers, currency, URLs, and punctuation.

model.available_voices

Returns a list of available voice names: ['Bella', 'Jasper', 'Luna', 'Bruno', 'Rosie', 'Hugo', 'Kiki', 'Leo']

System Requirements

  • Operating system: Linux, macOS, or Windows
  • Python: 3.8 or later
  • Hardware: Runs on CPU; no GPU required
  • Disk space: 25-80 MB depending on model variant

A virtual environment (conda, venv, or similar) is recommended to avoid dependency conflicts.

Roadmap

  • Release optimized inference engine
  • Release mobile SDK
  • Release higher quality TTS models
  • Release multilingual TTS
  • Release KittenASR
  • Need anything else? Let us know

Commercial Support

We offer commercial support for teams integrating Kitten TTS into their products. This includes integration assistance, custom voice development, and enterprise licensing.

Contact us or email info@stellonlabs.com to discuss your requirements.

Community and Support

License

This project is licensed under the Apache License 2.0.