Maise - Mobile Artificial Intelligence Speech Engine

February 27, 2026 · View on GitHub

Maise - Mobile Artificial Intelligence Speech Engine

Maise is an open-source Android speech engine that provides high-quality, on-device text-to-speech synthesis and automatic speech recognition. The TTS component is implemented as an Android system TTS service, meaning it works out of the box with any app that uses the standard Android TextToSpeech API — no special integration required. The ASR component is implemented as an Android RecognitionService, compatible with any app using the standard SpeechRecognizer API.

How It Works

All processing runs fully on-device using ONNX Runtime.

Text-to-Speech

Text normalization — raw input text is cleaned and normalized (numbers, abbreviations, punctuation, etc.)
Phonemization — Open Phonemizer converts normalized text into phoneme sequences
Synthesis — phonemes are fed into Kokoro, a high-quality multi-lingual neural TTS model, to produce a raw PCM audio waveform
Streaming playback — sentences are synthesized and played concurrently using a producer-consumer pipeline so audio starts playing before the full text has been synthesized

Audio output is 24 kHz mono 16-bit PCM.

Automatic Speech Recognition

Recording — 16 kHz mono 16-bit PCM audio is captured from the microphone
Log-mel spectrogram — a Whisper-compatible 80-band log-mel spectrogram is computed on-device
Transcription — the spectrogram is fed through distil-whisper/distil-small.en, an encoder-decoder Transformer model, using greedy decoding to produce the transcribed text

Voices

Maise ships with a large collection of Kokoro voices across multiple languages.

Language	Voices
English (US)	alloy, aoede, bella, heart, jessica, kore, nicole, nova, river, sarah, sky, adam, echo, eric, fenrir, liam, michael, onyx, puck, santa
English (UK)	alice, emma, isabella, lily, daniel, fable, george, lewis
German	dora, alex, santa
French	siwis
Greek	alpha-f, beta-f, omega-m, psi-m
Italian	sara, nicola
Japanese	alpha-f, gongitsune, nezumi, tebukuro, kumo
Portuguese (BR)	dora, alex, santa
Chinese (Simplified)	xiaobei, xiaoni, xiaoxiao, xiaoyi, yunjian, yunxi, yunxia, yunyang

The default voice is en-US-heart-kokoro.

App

The Maise app provides a simple interface for:

Selecting a voice from the full list
Entering text and previewing speech synthesis directly in-app
Opening Android TTS settings to configure Maise as the system default

The selected voice is persisted and shared with the background TTS service so your preference is respected system-wide.

Setup

Text-to-Speech

To use Maise as your system TTS engine, set it as the default in your device settings:

Settings > Accessibility > Text-to-Speech Output

Select Maise as the preferred engine. After that, any app using the Android TextToSpeech API will use Maise automatically.

Automatic Speech Recognition

To use Maise as your system speech recognizer, set it as the default in your device settings:

Settings > Apps > Default Apps > Assist & voice input

Select Maise as the preferred recognizer. After that, any app using the Android SpeechRecognizer API will use Maise automatically. The RECORD_AUDIO permission must be granted to the app.

Cloning

git clone https://github.com/Mobile-Artificial-Intelligence/maise.git

Building

./gradlew :app:assembleRelease

The output APK will be at:

Release: app/build/outputs/apk/release/app-release.apk
Debug: app/build/outputs/apk/debug/app-debug.apk