π§ OtosakuFeatureExtractor
June 14, 2025 Β· View on GitHub
A lightweight Swift-based feature extraction library for transforming raw audio chunks into log-Mel spectrograms, suitable for use in CoreML and on-device inference.
Built with β€οΈ for on-device audio intelligence.
π¦ Installation
You can add OtosakuFeatureExtractor as a Swift Package dependency:
.package(url: "https://github.com/Otosaku/OtosakuFeatureExtractor-iOS.git", from: "1.0.2")
Then add it to the target dependencies:
.target(
name: "YourApp",
dependencies: [
.product(name: "OtosakuFeatureExtractor", package: "OtosakuFeatureExtractor")
]
)
π Audio Processing Pipeline
[Raw Audio Chunk (Float64)]
β pre-emphasis
[Pre-emphasized audio]
β STFT (with Hann window)
[STFT result (complex)]
β Power Spectrum
[|FFT|^2]
β Mel Filterbank Projection (matrix multiply)
[Mel energies]
β log(Ξ΅ + x)
[Log-Mel Spectrogram]
β MLMultiArray
[CoreML-compatible tensor]
π§ͺ Usage
1. Initialize the Extractor
You must provide a directory containing:
filterbank.npyβ shape[80, 201], float32 or float64hann_window.npyβ shape[400], float32 or float64
import OtosakuFeatureExtractor
let extractor = try OtosakuFeatureExtractor(directoryURL: featureFolderURL)
π₯ Downloads
- π Feature Extractor Assets
Download precomputedfilterbank.npyandhann_window.npyfiles required byOtosakuFeatureExtractor.
β‘οΈ OtosakuFeatureExtractor Assets (.zip)
π¬ Want a model trained on custom keywords?
Drop me a message at otosaku.dsp@gmail.com β letβs talk!
2. Process a Chunk of Audio
The input must be a raw audio chunk as Array<Double>, typically at 16kHz sample rate.
let logMel: MLMultiArray = try extractor.processChunk(chunk: audioChunk)
audioChunkshould be at least 400 samples long to match the FFT window size.
3. (Optional) Save Log-Mel Features to JSON
saveLogMelToJSON(logMel: features)
π Dependencies
- Accelerate β for optimized DSP
- CoreML
- pocketfft
- plain-pocketfft
π File Structure
OtosakuFeatureExtractor/
βββ Sources/
β βββ OtosakuFeatureExtractor/
β βββ OtosakuFeatureExtractor.swift
βββ filterbank.npy
βββ hann_window.npy
π£οΈ Attribution
Project by @otosaku-ai under the Otosaku brand.
π§ͺ License
MIT License