React Native Speech Usage Guide
May 11, 2026 · View on GitHub
- React Native Speech Usage Guide
Installation
Bare React Native
Install the package using either npm or Yarn:
npm install @pocketpalai/react-native-speech
Or with Yarn:
yarn add @pocketpalai/react-native-speech
Expo
For Expo projects, follow these steps:
-
Install the package:
npx expo install @pocketpalai/react-native-speech -
Since it is not supported on Expo Go, run:
npx expo prebuild
API Overview
For text-to-speech, the library exports the Speech class, which provides methods for speech synthesis and event handling:
import Speech from '@pocketpalai/react-native-speech';
Constants
The Speech class static constants.
Values
maxInputLength
The maximum number of characters allowed in a single call to the speak methods.
Android enforces this limit, which is determined by TextToSpeech.getMaxSpeechInputLength. If your text exceeds this limit, you must manually split it into smaller utterances on the JavaScript side. (iOS has no synthesis system limit, and by default, the speech class returns Number.MAX_VALUE)
Getting Available Voices
Retrieve a list of all available voices on the device. Optionally, you can filter voices by providing a language code or tag (IETF BCP 47 language tag).
API Definition:
Speech.getAvailableVoices(language?: string): Promise<VoiceProps[]>
VoiceProps:
name: The name of the voice.identifier: The unique identifier for the voice.language: The language tag (e.g.,'en-US','fr-FR').quality: The quality level of the voice ('Default'or'Enhanced').
Example Usage:
// Retrieve all voices
Speech.getAvailableVoices().then(voices => {
console.log('Available voices:', voices);
});
// Retrieve only English voices
Speech.getAvailableVoices('en').then(voices => {
console.log('English voices:', voices);
});
// Retrieve only English (US) voices
Speech.getAvailableVoices('en-US').then(voices => {
console.log('English (US) voices:', voices);
});
Engine Management (Android)
These methods are available only on the Android platform and allow you to manage the underlying text-to-speech engine.
Get Available Engines
Gets a list of all available text-to-speech engines installed on the device.
API Definition
Speech.getEngines(): Promise<EngineProps[]>
Engine Properties:
name: The unique system identifier for the engine (e.g., "com.google.android.tts").label: The human-readable display name (e.g., "Google Text-to-Speech Engine").isDefault: A boolean flag indicating if this is the default engine.
Example Usage:
Speech.getEngines().then(engines => {
engines.forEach(engine => {
console.log(`Engine: ${engine.label} (${engine.name})`);
if (engine.isDefault) {
console.log('This is the default engine.');
}
});
});
Set Speech Engine
Sets the text-to-speech engine to use for all subsequent synthesis.
API Definition
Speech.setEngine(engineName: string): Promise<void>
Example Usage:
// First, get available engines
const engines = await Speech.getEngines();
if (engines.length > 0) {
// Then set a specific engine by its name
await Speech.setEngine(engines[0].name);
}
Open Voice Data Installer
Opens the system activity that allows the user to install or manage TTS voice data.
API Definition
Speech.openVoiceDataInstaller(): Promise<void>
Example Usage:
Speech.openVoiceDataInstaller().catch(error => {
console.error('Failed to open voice data installer.', error);
});
Initializing Global Speech Options
Set global speech options that apply to all speech synthesis calls.
API Definition:
Speech.initialize(options: VoiceOptions): void
VoiceOptions Properties:
| Property | Type | Description | Platform Support |
|---|---|---|---|
language | string | Language code or IETF BCP 47 language tag (e.g., 'en-US', 'fr-FR') | Both |
volume | number | Volume level from 0.0 (silent) to 1.0 (maximum) | Both |
voice | string | Specific voice identifier to use (obtained from getAvailableVoices()) | Both |
pitch | number | Pitch multiplier: Android 0.1–2.0, iOS 0.5–2.0 | Both |
rate | number | Speech rate: Android 0.1–2.0, iOS varies based on AVSpeechUtterance limits | Both |
ducking | boolean | If true, temporarily lowers audio from other apps while speech is active. Defaults to false | Both |
silentMode | 'obey' | 'respect' | 'ignore' | Controls how speech interacts with the device's silent switch. Ignored if ducking is true | iOS only |
silentMode Options (iOS only):
obey(default): Does not change the app's audio session. Speech follows the system default behavior.respect: Speech will be silenced by the ringer switch. Use for non-critical audio content.ignore: Speech will play even if the ringer is off. Use for critical audio when ducking is not desired.
Example Usage:
Speech.initialize({
language: 'en-US',
volume: 1.0,
pitch: 1.2,
rate: 0.8,
ducking: false,
silentMode: 'obey', // iOS only; ignored if ducking is true
});
Resetting Speech Options
Reset all global speech options to their default values.
API Definition:
Speech.reset(): void
Example Usage:
Speech.reset();
Speaking Text
Speak a given text using the current global settings.
API Definition:
Speech.speak(text: string): Promise<void>
Example Usage:
Speech.speak('Hello, world!');
Speaking Text with Custom Options
Override global options for a specific utterance.
API Definition:
Speech.speakWithOptions(text: string, options: VoiceOptions): Promise<void>
Example Usage:
Speech.speakWithOptions('Hello!', {
language: 'en-US',
pitch: 1.5,
rate: 0.8,
});
Controlling Speech
Stop Speech
Immediately stops any ongoing or in queue speech synthesis.
Speech.stop().then(() => console.log('Speech stopped'));
Pause Speech
Note: On Android, API 26+ (Android 8+) required.
Speech.pause().then(isPaused => {
console.log(isPaused ? 'Speech paused' : 'Nothing to pause');
});
Resume Speech
Note: On Android, API 26+ (Android 8+) required.
Speech.resume().then(isResumed => {
console.log(isResumed ? 'Speech resumed' : 'Nothing to resume');
});
Check if Speaking
Determine if speech synthesis is currently active.
Speech.isSpeaking().then(isSpeaking => {
console.log(isSpeaking ? 'Currently speaking or paused' : 'Not speaking');
});
Event Callbacks
Subscribe to event callbacks for speech synthesis lifecycle monitoring.
onError
Triggers when an error occurs.
const errorSubscription = Speech.onError(({id}) => {
console.error(`Speech error (ID: ${id})`);
});
//Cleanup
errorSubscription.remove();
onStart
Triggers when speech starts.
const startSubscription = Speech.onStart(({id}) => {
console.log(`Speech started (ID: ${id})`);
});
//Cleanup
startSubscription.remove();
onFinish
Triggers when speech completes.
const finishSubscription = Speech.onFinish(({id}) => {
console.log(`Speech finished (ID: ${id})`);
});
//Cleanup
finishSubscription.remove();
onPause
Triggers when speech paused.
Note: On Android, API 26+ (Android 8+) required.
const pauseSubscription = Speech.onPause(({id}) => {
console.log(`Speech paused (ID: ${id})`);
});
//Cleanup
pauseSubscription.remove();
onResume
Triggers when speech resumed.
Note: On Android, API 26+ (Android 8+) required.
const resumeSubscription = Speech.onResume(({id}) => {
console.log(`Speech resumed (ID: ${id})`);
});
//Cleanup
resumeSubscription.remove();
onStopped
Triggers when speech is stopped.
const stoppedSubscription = Speech.onStopped(({id}) => {
console.log(`Speech stopped (ID: ${id})`);
});
//Cleanup
stoppedSubscription.remove();
onProgress
Note: On Android, API 26+ (Android 8+) required.
Callback Parameters:
id: The utterance identifierlength: The text being spoken lengthlocation: The current position in the spoken text
const progressSubscription = Speech.onProgress(({id, location, length}) => {
console.log(
`Speech ${id} progress, current word length: ${length}, current char position: ${location}`,
);
});
//Cleanup
progressSubscription.remove();
HighlightedText
The HighlightedText component allows you to display text with customizable highlighted segments. This is especially useful for emphasizing parts of text (e.g., the currently synthesized text). In addition to the specialized properties listed below, the component accepts all standard React Native <Text> props.
Importing the Component
import {HighlightedText} from '@pocketpalai/react-native-speech';
Properties
-
text
Type:string
The full text content to be displayed. -
highlightedStyle
Type:StyleProp<TextStyle>
The base style applied to all highlighted segments. This style can be overridden by segment-specific styles defined in thehighlightsprop. -
highlights
Type:Array<{ start: number; end: number; style?: StyleProp<TextStyle> }>
An array of objects that define which parts of the text should be highlighted. Each object must include:- start: The starting character index of the segment.
- end: The ending character index of the segment.
- style (optional): Custom style for this particular segment.
-
onHighlightedPress
Type:(segment: { text: string; start: number; end: number }) => void
A callback function that is invoked when a highlighted segment is pressed. The function receives an object containing:- text: The text content of the pressed segment.
- start: The starting index of the segment.
- end: The ending index of the segment.
Example
import React from 'react';
import {
HighlightedText,
type HighlightedSegmentProps,
type HighlightedSegmentArgs,
} from '@pocketpalai/react-native-speech';
import {Alert, SafeAreaView, StyleSheet} from 'react-native';
const TEXT = 'This is a sample text where some parts are highlighted.';
const ExampleHighlightedText: React.FC = () => {
const highlights: Array<HighlightedSegmentProps> = [
{start: 10, end: 21},
{start: 43, end: 54, style: styles.customHighlightedStyle},
];
const onHighlightedPress = React.useCallback(
({text, start, end}: HighlightedSegmentArgs) =>
Alert.alert(
'Highlighted Segment',
`Segment "${text}" starts at ${start} and ends at ${end}`,
),
[],
);
return (
<SafeAreaView style={styles.container}>
<HighlightedText
text={TEXT}
style={styles.text}
highlights={highlights}
highlightedStyle={styles.highlighted}
onHighlightedPress={onHighlightedPress}
/>
</SafeAreaView>
);
};
const styles = StyleSheet.create({
container: {
padding: 16,
},
text: {
fontSize: 16,
},
highlighted: {
backgroundColor: 'yellow',
fontWeight: 'bold',
},
customHighlightedStyle: {
color: 'white',
backgroundColor: 'blue',
},
});
export default ExampleHighlightedText;
To learn more about how to use the component, check out here.
Example Application
Check out the example project.
Neural Engines
v2.0 adds three on-device neural engines that share the same public API (Speech.speak, Speech.stop, progress events, etc.). Pick an engine at initialize() time; switch engines by calling initialize() again with a different engine discriminant.
The library ships no model or dictionary data. Consumer apps must download assets and pass local paths. See example/src/utils/ for reference model managers and LICENSES.md for upstream sources.
Install the optional peer:
npm install onnxruntime-react-native
All neural engines accept a maxChunkSize (default 400 chars) and executionProviders, an array of EPs in fallback order ('coreml' | 'xnnpack' | 'cpu', or option objects like {name: 'coreml', coreMlFlags}). Omit it to use sensible platform defaults: CoreML+xnnpack+cpu on iOS, xnnpack+cpu on Android. Long text is chunked and synthesized incrementally; chunkProgress events fire as each chunk starts.
Kokoro
Kokoro-82M, multi-language (EN, ZH, KO, JA), Apache-2.0.
Required config fields:
modelPath— Kokoro ONNX model.voicesPath— packed voices binary.- Either
tokenizerPath(HuggingFace tokenizer JSON), orvocabPath+mergesPath(legacy BPE pair).
Optional:
phonemizerType:'js'(default, MIT hans00),'js-ipa'(raw IPA),'none'(pass-through).dictPath: EPD1-format IPA dictionary when using'js'/'js-ipa'.maxChunkSize,executionProviders.
import Speech, {TTSEngine} from '@pocketpalai/react-native-speech';
await Speech.initialize({
engine: TTSEngine.KOKORO,
modelPath: 'file:///.../kokoro.onnx',
voicesPath: 'file:///.../voices.bin',
tokenizerPath: 'file:///.../tokenizer.json',
dictPath: 'file:///.../en-us.bin',
maxChunkSize: 200,
});
await Speech.speak('Hello from Kokoro.', 'af_bella', {speed: 1.0, volume: 1.0});
Voice blending: pass multiple voice IDs (see src/engines/kokoro/VoiceLoader.ts).
Android note: the q8-quantized Kokoro ONNX build (
model_q8f16.onnx) produces NaN samples under onnxruntime-react-native's CPU execution provider and plays back as silence. Ship the fp16 (model_fp16.onnx) or full (model.onnx) variant for Android. iOS (CoreML) handles q8 fine.
Supertonic
Fast English / multilingual pipeline composed of four ONNX models.
Required config fields (all local ONNX paths plus unicode indexer):
durationPredictorPathtextEncoderPathvectorEstimatorPathvocoderPathunicodeIndexerPathvoicesPath— directory or manifest JSON.
Optional:
defaultInferenceSteps— diffusion steps (default 5).maxChunkSize,executionProviders.
await Speech.initialize({
engine: TTSEngine.SUPERTONIC,
durationPredictorPath: 'file:///.../duration_predictor.onnx',
textEncoderPath: 'file:///.../text_encoder.onnx',
vectorEstimatorPath: 'file:///.../vector_estimator.onnx',
vocoderPath: 'file:///.../vocoder.onnx',
unicodeIndexerPath: 'file:///.../unicode_indexer.json',
voicesPath: 'file:///.../voices/',
});
await Speech.speak('Hello from Supertonic.', 'F1');
Chunking: Supertonic is fast enough that larger chunks (400–800 chars) typically win. Set maxChunkSize lower only for a streaming-like first-audio latency.
Kitten
Compact IPA-driven model. Variants: micro, nano-int8, nano-fp32, mini.
Required config fields:
modelPathvoicesPath— voices JSON (pre-converted from NPZ) or manifest.
Optional:
tokenizerPath— overrides built-in IPA symbol table.dictPath— EPD1 dict for JS phonemizer.maxChunkSize,executionProviders.
await Speech.initialize({
engine: TTSEngine.KITTEN,
modelPath: 'file:///.../kitten_tts_nano_v0_8.onnx',
voicesPath: 'file:///.../voices.json',
dictPath: 'file:///.../en-us.bin',
});
await Speech.speak('Hello from Kitten.', 'expr-voice-2-f');
Chunk progress
Neural engines emit chunkProgress events as each chunk starts synthesizing:
Speech.addChunkProgressListener(event => {
// event.chunkIndex / event.totalChunks / event.chunkText / event.textRange
});
textRange.start / textRange.end map the chunk back into the original input string; this is what the HighlightedText component uses to move its highlight.
Audio interruptions
All engines forward OS-level audio interruptions (iOS AVAudioSession, Android AudioFocus) as a JS event:
Speech.addAudioInterruptionListener(event => {
// event.type, event.hint, event.reason
});
Release / cleanup
Neural engines hold ONNX sessions and voice buffers that can run to 200+ MB. Call Speech.release() when you're done to free the sessions. Re-initialize before the next call. See ARCHITECTURE.md for memory characteristics per engine.