🎧 OtosakuFeatureExtractor

June 14, 2025 Β· View on GitHub

A lightweight Swift-based feature extraction library for transforming raw audio chunks into log-Mel spectrograms, suitable for use in CoreML and on-device inference.

Built with ❀️ for on-device audio intelligence.


πŸ“¦ Installation

You can add OtosakuFeatureExtractor as a Swift Package dependency:

.package(url: "https://github.com/Otosaku/OtosakuFeatureExtractor-iOS.git", from: "1.0.2")

Then add it to the target dependencies:

.target(
    name: "YourApp",
    dependencies: [
        .product(name: "OtosakuFeatureExtractor", package: "OtosakuFeatureExtractor")
    ]
)

πŸ” Audio Processing Pipeline

[Raw Audio Chunk (Float64)] 
       ↓ pre-emphasis
[Pre-emphasized audio] 
       ↓ STFT (with Hann window)
[STFT result (complex)]
       ↓ Power Spectrum
[|FFT|^2]
       ↓ Mel Filterbank Projection (matrix multiply)
[Mel energies]
       ↓ log(Ξ΅ + x)
[Log-Mel Spectrogram]
       ↓ MLMultiArray
[CoreML-compatible tensor]

πŸ§ͺ Usage

1. Initialize the Extractor

You must provide a directory containing:

  • filterbank.npy β€” shape [80, 201], float32 or float64
  • hann_window.npy β€” shape [400], float32 or float64
import OtosakuFeatureExtractor

let extractor = try OtosakuFeatureExtractor(directoryURL: featureFolderURL)

πŸ“₯ Downloads

πŸ’¬ Want a model trained on custom keywords?
Drop me a message at otosaku.dsp@gmail.com β€” let’s talk!


2. Process a Chunk of Audio

The input must be a raw audio chunk as Array<Double>, typically at 16kHz sample rate.

let logMel: MLMultiArray = try extractor.processChunk(chunk: audioChunk)

audioChunk should be at least 400 samples long to match the FFT window size.


3. (Optional) Save Log-Mel Features to JSON

saveLogMelToJSON(logMel: features)

πŸ“š Dependencies


πŸ“ File Structure

OtosakuFeatureExtractor/
β”œβ”€β”€ Sources/
β”‚   └── OtosakuFeatureExtractor/
β”‚       β”œβ”€β”€ OtosakuFeatureExtractor.swift
β”œβ”€β”€ filterbank.npy
β”œβ”€β”€ hann_window.npy

πŸ—£οΈ Attribution

Project by @otosaku-ai under the Otosaku brand.


πŸ§ͺ License

MIT License