React Native Speech Usage Guide

May 11, 2026 · View on GitHub


Installation

Bare React Native

Install the package using either npm or Yarn:

npm install @pocketpalai/react-native-speech

Or with Yarn:

yarn add @pocketpalai/react-native-speech

Expo

For Expo projects, follow these steps:

  1. Install the package:

    npx expo install @pocketpalai/react-native-speech
    
  2. Since it is not supported on Expo Go, run:

    npx expo prebuild
    

API Overview

For text-to-speech, the library exports the Speech class, which provides methods for speech synthesis and event handling:

import Speech from '@pocketpalai/react-native-speech';

Constants

The Speech class static constants.

Values

maxInputLength

The maximum number of characters allowed in a single call to the speak methods.

Android enforces this limit, which is determined by TextToSpeech.getMaxSpeechInputLength. If your text exceeds this limit, you must manually split it into smaller utterances on the JavaScript side. (iOS has no synthesis system limit, and by default, the speech class returns Number.MAX_VALUE)

Getting Available Voices

Retrieve a list of all available voices on the device. Optionally, you can filter voices by providing a language code or tag (IETF BCP 47 language tag).

API Definition:

Speech.getAvailableVoices(language?: string): Promise<VoiceProps[]>

VoiceProps:

  • name: The name of the voice.
  • identifier: The unique identifier for the voice.
  • language: The language tag (e.g., 'en-US', 'fr-FR').
  • quality: The quality level of the voice ('Default' or 'Enhanced').

Example Usage:

// Retrieve all voices
Speech.getAvailableVoices().then(voices => {
  console.log('Available voices:', voices);
});

// Retrieve only English voices
Speech.getAvailableVoices('en').then(voices => {
  console.log('English voices:', voices);
});

// Retrieve only English (US) voices
Speech.getAvailableVoices('en-US').then(voices => {
  console.log('English (US) voices:', voices);
});

Engine Management (Android)

These methods are available only on the Android platform and allow you to manage the underlying text-to-speech engine.

Get Available Engines

Gets a list of all available text-to-speech engines installed on the device.

API Definition

Speech.getEngines(): Promise<EngineProps[]>

Engine Properties:

  • name: The unique system identifier for the engine (e.g., "com.google.android.tts").
  • label: The human-readable display name (e.g., "Google Text-to-Speech Engine").
  • isDefault: A boolean flag indicating if this is the default engine.

Example Usage:

Speech.getEngines().then(engines => {
  engines.forEach(engine => {
    console.log(`Engine: ${engine.label} (${engine.name})`);
    if (engine.isDefault) {
      console.log('This is the default engine.');
    }
  });
});

Set Speech Engine

Sets the text-to-speech engine to use for all subsequent synthesis.

API Definition

Speech.setEngine(engineName: string): Promise<void>

Example Usage:

// First, get available engines
const engines = await Speech.getEngines();

if (engines.length > 0) {
  // Then set a specific engine by its name
  await Speech.setEngine(engines[0].name);
}

Open Voice Data Installer

Opens the system activity that allows the user to install or manage TTS voice data.

API Definition

Speech.openVoiceDataInstaller(): Promise<void>

Example Usage:

Speech.openVoiceDataInstaller().catch(error => {
  console.error('Failed to open voice data installer.', error);
});

Initializing Global Speech Options

Set global speech options that apply to all speech synthesis calls.

API Definition:

Speech.initialize(options: VoiceOptions): void

VoiceOptions Properties:

PropertyTypeDescriptionPlatform Support
languagestringLanguage code or IETF BCP 47 language tag (e.g., 'en-US', 'fr-FR')Both
volumenumberVolume level from 0.0 (silent) to 1.0 (maximum)Both
voicestringSpecific voice identifier to use (obtained from getAvailableVoices())Both
pitchnumberPitch multiplier: Android 0.12.0, iOS 0.52.0Both
ratenumberSpeech rate: Android 0.12.0, iOS varies based on AVSpeechUtterance limitsBoth
duckingbooleanIf true, temporarily lowers audio from other apps while speech is active. Defaults to falseBoth
silentMode'obey' | 'respect' | 'ignore'Controls how speech interacts with the device's silent switch. Ignored if ducking is trueiOS only

silentMode Options (iOS only):

  • obey (default): Does not change the app's audio session. Speech follows the system default behavior.
  • respect: Speech will be silenced by the ringer switch. Use for non-critical audio content.
  • ignore: Speech will play even if the ringer is off. Use for critical audio when ducking is not desired.

Example Usage:

Speech.initialize({
  language: 'en-US',
  volume: 1.0,
  pitch: 1.2,
  rate: 0.8,
  ducking: false,
  silentMode: 'obey', // iOS only; ignored if ducking is true
});

Resetting Speech Options

Reset all global speech options to their default values.

API Definition:

Speech.reset(): void

Example Usage:

Speech.reset();

Speaking Text

Speak a given text using the current global settings.

API Definition:

Speech.speak(text: string): Promise<void>

Example Usage:

Speech.speak('Hello, world!');

Speaking Text with Custom Options

Override global options for a specific utterance.

API Definition:

Speech.speakWithOptions(text: string, options: VoiceOptions): Promise<void>

Example Usage:

Speech.speakWithOptions('Hello!', {
  language: 'en-US',
  pitch: 1.5,
  rate: 0.8,
});

Controlling Speech

Stop Speech

Immediately stops any ongoing or in queue speech synthesis.

Speech.stop().then(() => console.log('Speech stopped'));

Pause Speech

Note: On Android, API 26+ (Android 8+) required.

Speech.pause().then(isPaused => {
  console.log(isPaused ? 'Speech paused' : 'Nothing to pause');
});

Resume Speech

Note: On Android, API 26+ (Android 8+) required.

Speech.resume().then(isResumed => {
  console.log(isResumed ? 'Speech resumed' : 'Nothing to resume');
});

Check if Speaking

Determine if speech synthesis is currently active.

Speech.isSpeaking().then(isSpeaking => {
  console.log(isSpeaking ? 'Currently speaking or paused' : 'Not speaking');
});

Event Callbacks

Subscribe to event callbacks for speech synthesis lifecycle monitoring.

onError

Triggers when an error occurs.

const errorSubscription = Speech.onError(({id}) => {
  console.error(`Speech error (ID: ${id})`);
});

//Cleanup
errorSubscription.remove();

onStart

Triggers when speech starts.

const startSubscription = Speech.onStart(({id}) => {
  console.log(`Speech started (ID: ${id})`);
});

//Cleanup
startSubscription.remove();

onFinish

Triggers when speech completes.

const finishSubscription = Speech.onFinish(({id}) => {
  console.log(`Speech finished (ID: ${id})`);
});

//Cleanup
finishSubscription.remove();

onPause

Triggers when speech paused.

Note: On Android, API 26+ (Android 8+) required.

const pauseSubscription = Speech.onPause(({id}) => {
  console.log(`Speech paused (ID: ${id})`);
});

//Cleanup
pauseSubscription.remove();

onResume

Triggers when speech resumed.

Note: On Android, API 26+ (Android 8+) required.

const resumeSubscription = Speech.onResume(({id}) => {
  console.log(`Speech resumed (ID: ${id})`);
});

//Cleanup
resumeSubscription.remove();

onStopped

Triggers when speech is stopped.

const stoppedSubscription = Speech.onStopped(({id}) => {
  console.log(`Speech stopped (ID: ${id})`);
});

//Cleanup
stoppedSubscription.remove();

onProgress

Note: On Android, API 26+ (Android 8+) required.

Callback Parameters:

  • id: The utterance identifier
  • length: The text being spoken length
  • location: The current position in the spoken text
const progressSubscription = Speech.onProgress(({id, location, length}) => {
  console.log(
    `Speech ${id} progress, current word length: ${length}, current char position: ${location}`,
  );
});

//Cleanup
progressSubscription.remove();

HighlightedText

The HighlightedText component allows you to display text with customizable highlighted segments. This is especially useful for emphasizing parts of text (e.g., the currently synthesized text). In addition to the specialized properties listed below, the component accepts all standard React Native <Text> props.

Importing the Component

import {HighlightedText} from '@pocketpalai/react-native-speech';

Properties

  • text
    Type: string
    The full text content to be displayed.

  • highlightedStyle
    Type: StyleProp<TextStyle>
    The base style applied to all highlighted segments. This style can be overridden by segment-specific styles defined in the highlights prop.

  • highlights
    Type: Array<{ start: number; end: number; style?: StyleProp<TextStyle> }>
    An array of objects that define which parts of the text should be highlighted. Each object must include:

    • start: The starting character index of the segment.
    • end: The ending character index of the segment.
    • style (optional): Custom style for this particular segment.
  • onHighlightedPress
    Type: (segment: { text: string; start: number; end: number }) => void
    A callback function that is invoked when a highlighted segment is pressed. The function receives an object containing:

    • text: The text content of the pressed segment.
    • start: The starting index of the segment.
    • end: The ending index of the segment.

Example

import React from 'react';
import {
  HighlightedText,
  type HighlightedSegmentProps,
  type HighlightedSegmentArgs,
} from '@pocketpalai/react-native-speech';
import {Alert, SafeAreaView, StyleSheet} from 'react-native';

const TEXT = 'This is a sample text where some parts are highlighted.';

const ExampleHighlightedText: React.FC = () => {
  const highlights: Array<HighlightedSegmentProps> = [
    {start: 10, end: 21},
    {start: 43, end: 54, style: styles.customHighlightedStyle},
  ];

  const onHighlightedPress = React.useCallback(
    ({text, start, end}: HighlightedSegmentArgs) =>
      Alert.alert(
        'Highlighted Segment',
        `Segment "${text}" starts at ${start} and ends at ${end}`,
      ),
    [],
  );

  return (
    <SafeAreaView style={styles.container}>
      <HighlightedText
        text={TEXT}
        style={styles.text}
        highlights={highlights}
        highlightedStyle={styles.highlighted}
        onHighlightedPress={onHighlightedPress}
      />
    </SafeAreaView>
  );
};

const styles = StyleSheet.create({
  container: {
    padding: 16,
  },
  text: {
    fontSize: 16,
  },
  highlighted: {
    backgroundColor: 'yellow',
    fontWeight: 'bold',
  },
  customHighlightedStyle: {
    color: 'white',
    backgroundColor: 'blue',
  },
});

export default ExampleHighlightedText;

To learn more about how to use the component, check out here.

Example Application

Check out the example project.


Neural Engines

v2.0 adds three on-device neural engines that share the same public API (Speech.speak, Speech.stop, progress events, etc.). Pick an engine at initialize() time; switch engines by calling initialize() again with a different engine discriminant.

The library ships no model or dictionary data. Consumer apps must download assets and pass local paths. See example/src/utils/ for reference model managers and LICENSES.md for upstream sources.

Install the optional peer:

npm install onnxruntime-react-native

All neural engines accept a maxChunkSize (default 400 chars) and executionProviders, an array of EPs in fallback order ('coreml' | 'xnnpack' | 'cpu', or option objects like {name: 'coreml', coreMlFlags}). Omit it to use sensible platform defaults: CoreML+xnnpack+cpu on iOS, xnnpack+cpu on Android. Long text is chunked and synthesized incrementally; chunkProgress events fire as each chunk starts.

Kokoro

Kokoro-82M, multi-language (EN, ZH, KO, JA), Apache-2.0.

Required config fields:

  • modelPath — Kokoro ONNX model.
  • voicesPath — packed voices binary.
  • Either tokenizerPath (HuggingFace tokenizer JSON), or vocabPath + mergesPath (legacy BPE pair).

Optional:

  • phonemizerType: 'js' (default, MIT hans00), 'js-ipa' (raw IPA), 'none' (pass-through).
  • dictPath: EPD1-format IPA dictionary when using 'js' / 'js-ipa'.
  • maxChunkSize, executionProviders.
import Speech, {TTSEngine} from '@pocketpalai/react-native-speech';

await Speech.initialize({
  engine: TTSEngine.KOKORO,
  modelPath: 'file:///.../kokoro.onnx',
  voicesPath: 'file:///.../voices.bin',
  tokenizerPath: 'file:///.../tokenizer.json',
  dictPath: 'file:///.../en-us.bin',
  maxChunkSize: 200,
});

await Speech.speak('Hello from Kokoro.', 'af_bella', {speed: 1.0, volume: 1.0});

Voice blending: pass multiple voice IDs (see src/engines/kokoro/VoiceLoader.ts).

Android note: the q8-quantized Kokoro ONNX build (model_q8f16.onnx) produces NaN samples under onnxruntime-react-native's CPU execution provider and plays back as silence. Ship the fp16 (model_fp16.onnx) or full (model.onnx) variant for Android. iOS (CoreML) handles q8 fine.

Supertonic

Fast English / multilingual pipeline composed of four ONNX models.

Required config fields (all local ONNX paths plus unicode indexer):

  • durationPredictorPath
  • textEncoderPath
  • vectorEstimatorPath
  • vocoderPath
  • unicodeIndexerPath
  • voicesPath — directory or manifest JSON.

Optional:

  • defaultInferenceSteps — diffusion steps (default 5).
  • maxChunkSize, executionProviders.
await Speech.initialize({
  engine: TTSEngine.SUPERTONIC,
  durationPredictorPath: 'file:///.../duration_predictor.onnx',
  textEncoderPath: 'file:///.../text_encoder.onnx',
  vectorEstimatorPath: 'file:///.../vector_estimator.onnx',
  vocoderPath: 'file:///.../vocoder.onnx',
  unicodeIndexerPath: 'file:///.../unicode_indexer.json',
  voicesPath: 'file:///.../voices/',
});

await Speech.speak('Hello from Supertonic.', 'F1');

Chunking: Supertonic is fast enough that larger chunks (400–800 chars) typically win. Set maxChunkSize lower only for a streaming-like first-audio latency.

Kitten

Compact IPA-driven model. Variants: micro, nano-int8, nano-fp32, mini.

Required config fields:

  • modelPath
  • voicesPath — voices JSON (pre-converted from NPZ) or manifest.

Optional:

  • tokenizerPath — overrides built-in IPA symbol table.
  • dictPath — EPD1 dict for JS phonemizer.
  • maxChunkSize, executionProviders.
await Speech.initialize({
  engine: TTSEngine.KITTEN,
  modelPath: 'file:///.../kitten_tts_nano_v0_8.onnx',
  voicesPath: 'file:///.../voices.json',
  dictPath: 'file:///.../en-us.bin',
});

await Speech.speak('Hello from Kitten.', 'expr-voice-2-f');

Chunk progress

Neural engines emit chunkProgress events as each chunk starts synthesizing:

Speech.addChunkProgressListener(event => {
  // event.chunkIndex / event.totalChunks / event.chunkText / event.textRange
});

textRange.start / textRange.end map the chunk back into the original input string; this is what the HighlightedText component uses to move its highlight.

Audio interruptions

All engines forward OS-level audio interruptions (iOS AVAudioSession, Android AudioFocus) as a JS event:

Speech.addAudioInterruptionListener(event => {
  // event.type, event.hint, event.reason
});

Release / cleanup

Neural engines hold ONNX sessions and voice buffers that can run to 200+ MB. Call Speech.release() when you're done to free the sessions. Re-initialize before the next call. See ARCHITECTURE.md for memory characteristics per engine.