🧠 OtosakuKWS

June 14, 2025 Β· View on GitHub

OtosakuKWS is a lightweight, privacy-focused keyword spotting engine for iOS, designed to detect speech commands in real time β€” entirely on device.

It uses a CRNN CoreML model combined with log-Mel spectrograms for fast, accurate, and low-latency voice command recognition.


πŸŽ₯ Demo

Watch the model running live on iPhone 13:

Demo running on iPhone


πŸš€ Getting Started

1. Install Feature Extractor

This project depends on the OtosakuFeatureExtractor-iOS Swift package, which extracts log-Mel spectrograms in real time using Accelerate.

It also includes a ready-to-use filterbank archive (filterbank.npy, hann_window.npy).


2. Download Pretrained Model

The CRNN model was trained on the keywords: β€œgo”, β€œno”, β€œstop”, β€œyes”

⬇️ Download model archive

Includes:

  • CRNNKeywordSpotter.mlmodelc
  • classes.txt

πŸ§ͺ Validation Metrics

MetricValue
val_accuracy0.971313
val_f1_go0.964216
val_f1_no0.974067
val_f1_other0.949783
val_f1_stop0.983282
val_f1_yes0.98564
val_loss0.0846668
val_precision_go0.977573
val_precision_no0.966123
val_precision_other0.949195
val_precision_stop0.985112
val_precision_yes0.979248
val_recall_go0.95122
val_recall_no0.982143
val_recall_other0.950372
val_recall_stop0.981459
val_recall_yes0.992116

The model was trained on a balanced subset of [Google Speech Commands v2], using strong augmentations and class balancing.


🧩 Integration Example

let kws = try OtosakuKWS(
    modelRootURL: modelURL,
    featureExtractorRootURL: featurizerURL,
    configuration: .init()
)

kws.onKeywordDetected = { keyword, confidence in
    print("Detected: \(keyword) [\(confidence)]")
}

let audioInput = AudioStreamer()

// The `onBuffer` callback receives a chunk of audio sampled at 16kHz, mono (1 channel).
// `AudioStreamer` here is a dummy real-time microphone streamer that simulates live input.
audioInput.onBuffer = { buffer in
    Task {
        await kws.handleAudioBuffer(buffer)
    }
}

πŸ“¬ Need custom commands?

If you need a custom KWS model for your use case β€” different keywords, languages, or domain-specific speech β€” feel free to reach out:

πŸ“§ otosaku.dsp@gmail.com


πŸ—οΈ Keywords

CoreML, keyword spotting, speech commands, offline voice recognition, privacy-first AI, log-Mel spectrogram, iOS speech processing, CRNN, on-device inference, streaming audio, Swift AI