ohr

April 15, 2026 · View on GitHub

Version 0.1.6 Swift 6.3+ macOS 26+ No Xcode Required License: MIT 100% On-Device

On-device speech-to-text on your Mac. Transcribe audio files, generate subtitles, stream from the microphone — all locally, no cloud.

No API keys. No network. No subscriptions. The speech recognition is already on your computer — ohr lets you use it.

What is this

Every Mac with Apple Silicon has a built-in speech recognizer — Apple's on-device SpeechAnalyzer, shipped as part of the Speech framework (macOS 26+). ohr wraps it in a CLI and an OpenAI-compatible HTTP server — so you can actually use it. All inference runs on-device, no network calls.

  • UNIX toolohr meeting.m4a — file in, text out. Pipe-friendly, multiple output formats, proper exit codes
  • Subtitle generatorohr -o srt lecture.wav > lecture.srt — SRT and VTT with precise timestamps
  • Live transcriptionohr --listen — real-time microphone input
  • OpenAI-compatible serverohr --serve — drop-in replacement for POST /v1/audio/transcriptions
  • Zero cost — no API keys, no cloud, no subscriptions, 30 languages supported

Requirements & Install

  • Apple Silicon Mac, macOS 26 Tahoe or newer
  • Building from source requires Command Line Tools with macOS 26 SDK (ships Swift 6.3). No Xcode required.

Homebrew (recommended):

brew tap Arthur-Ficial/tap
brew install Arthur-Ficial/tap/ohr

Build from source:

git clone https://github.com/Arthur-Ficial/ohr.git
cd ohr
make install

Quick Start

Transcribe a file

ohr meeting.m4a

Generate subtitles

# SRT format
ohr -o srt lecture.wav > lecture.srt

# WebVTT format
ohr --vtt interview.m4a > interview.vtt

JSON output with segments

ohr -o json recording.m4a | jq .
{
  "model": "apple-speechanalyzer",
  "text": "Hello, this is a test of the speech to text system.",
  "segments": [
    { "id": 0, "start": 0.0, "end": 1.86, "text": "Hello, this is a test" },
    { "id": 1, "start": 1.86, "end": 4.44, "text": "of the speech to text system." }
  ],
  "duration": 4.44,
  "language": "en",
  "metadata": { "on_device": true, "version": "0.1.3" }
}

Timestamps in plain text

ohr --timestamps meeting.m4a
[00:00:00,000] Hello, this is a test
[00:00:01,860] of the speech to text system.

Pipe from stdin

cat recording.wav | ohr

Pipe to apfel for summarization

ohr meeting.m4a | apfel "summarize this meeting"

Live microphone transcription

ohr --listen                    # plain text with timestamps
ohr --listen --json             # JSONL stream
ohr --listen --srt              # SRT as you speak

Select language

ohr -l de-DE meeting.m4a       # German
ohr -l fr-FR interview.m4a     # French
ohr -l ja-JP recording.m4a     # Japanese

OpenAI-compatible server

# Start server
ohr --serve

# In another terminal:
curl -X POST http://localhost:11434/v1/audio/transcriptions \
  -F file=@meeting.m4a \
  -F model=apple-speechanalyzer
{"text": "Hello, this is a test of the speech to text system."}

All five response formats:

# JSON (default)
curl -X POST http://localhost:11434/v1/audio/transcriptions \
  -F file=@audio.m4a -F response_format=json

# Verbose JSON (with segments, timestamps, duration)
curl -X POST http://localhost:11434/v1/audio/transcriptions \
  -F file=@audio.m4a -F response_format=verbose_json

# Plain text
curl -X POST http://localhost:11434/v1/audio/transcriptions \
  -F file=@audio.m4a -F response_format=text

# SRT subtitles
curl -X POST http://localhost:11434/v1/audio/transcriptions \
  -F file=@audio.m4a -F response_format=srt

# WebVTT subtitles
curl -X POST http://localhost:11434/v1/audio/transcriptions \
  -F file=@audio.m4a -F response_format=vtt

Demos

See demo/ for real-world shell scripts powered by ohr.

subtitle — generate subtitles:

demo/subtitle lecture.m4a --save           # saves lecture.srt next to file

audio-grep — search inside audio files:

demo/audio-grep "budget" meetings/*.m4a    # find mentions with timestamps
demo/audio-grep -c "deadline" *.m4a        # count matches per file

minutes — meeting to minutes (ohr + apfel):

demo/minutes standup.m4a -o markdown > standup.md

batch-transcribe — transcribe a whole folder:

demo/batch-transcribe ~/recordings/ -o srt

whisper-compat — drop-in Whisper CLI replacement:

demo/whisper-compat audio.m4a --output_format srt --language en

Also in demo/:

OpenAI API Compatibility

Base URL: http://localhost:11434/v1

FeatureStatusNotes
POST /v1/audio/transcriptionsSupportedAll 5 response formats
GET /v1/modelsSupportedReturns apple-speechanalyzer
GET /healthSupportedModel availability, formats, languages
GET /v1/logsDebug onlyAvailable with --debug
GET /v1/logs/statsDebug onlyAvailable with --debug
response_formatSupportedjson, verbose_json, text, srt, vtt
languageSupportedBCP-47 language code
modelAcceptedIgnored (only one model)
promptAcceptedIgnored (SpeechAnalyzer doesn't support prompting)
temperatureAcceptedValidated (0.0–1.0)
CORSSupportedEnable with --cors
Token authSupported--token <secret> or --token-auto
POST /v1/chat/completions501Use apfel
POST /v1/embeddings501Use kern

Supported Formats

FormatExtensions
Apple M4A.m4a
WAV.wav, .wave
MP3.mp3
MPEG-4.mp4
Core Audio.caf
AIFF.aiff, .aif
FLAC.flac

Supported Languages

30 languages including English, German, Spanish, French, Italian, Japanese, Korean, Portuguese, Chinese (Simplified, Traditional, Cantonese).

ohr --model-info    # full list

Performance

Tested on Apple M2 with synthetic speech. Real-world performance may vary.

Audio LengthTranscribe TimeSpeed
5 seconds300ms8x real-time
30 seconds600ms39x real-time
1 minute1.5s46x real-time
3 minutes2.5s58x real-time
10 minutes4.7s57x real-time

10 minutes of audio transcribes in under 5 seconds. No upper limit found.

Limitations

ConstraintDetail
PlatformmacOS 26+, Apple Silicon only
ModelOne model (apple-speechanalyzer), not configurable
Accuracy~90-95% on clear synthetic speech. Lower on real speech with noise, accents, or multiple speakers
NumbersSpoken numbers sometimes confused with ordinals ("five second" → "52nd")
Languages30 languages supported, but accuracy tested only for English. Other languages need the -l flag
No diarizationCannot distinguish between different speakers
Audio formatsm4a, wav, mp3, mp4, caf, aiff, flac. No OGG, OPUS, or WebM
apfel integrationWhen piping to apfel, transcripts longer than ~3000 words may exceed apfel's 4096-token context window
Testing gapAll testing used synthetic say command speech, not real human recordings

See docs/testing.md for the full QA report with methodology and detailed results.

CLI Reference

ohr <file>                     Transcribe audio file
ohr -o srt <file>              Generate SRT subtitles
ohr -o vtt <file>              Generate VTT subtitles
ohr -o json <file>             JSON output with segments
ohr --listen                   Live microphone transcription
ohr --serve                    Start OpenAI-compatible server
cat audio.wav | ohr            Transcribe from stdin

Output options:

FlagDescription
-o, --output <fmt>Output format: plain (default), json, srt, vtt
--jsonShorthand for -o json
--srtShorthand for -o srt
--vttShorthand for -o vtt
--timestampsShow timestamps in plain text output
-l, --language <code>Language code (e.g. en-US, de-DE)
-q, --quietSuppress headers and chrome
--no-colorDisable ANSI colors

Server options (--serve):

FlagDescription
--port <n>Server port (default: 11434)
--host <addr>Bind address (default: 127.0.0.1)
--corsEnable CORS headers for browser clients
--allowed-origins <origins>Add comma-separated allowed origins
--no-origin-checkDisable origin checking
--token <secret>Require Bearer token authentication
--token-autoGenerate and print a random Bearer token
--public-healthKeep /health unauthenticated on non-loopback
--footgunDisable all protections
--max-concurrent <n>Max concurrent requests (default: 5)
--debugVerbose logging and enable /v1/logs endpoints

Info options:

FlagDescription
-v, --versionPrint version
-h, --helpShow help
--releaseShow detailed build info
--model-infoShow model capabilities and languages

Exit Codes

CodeMeaning
0Success
1Runtime error
2Usage error (bad flags)
3Unsupported audio format
4File not found
5Transcription failed
6Rate limited

Environment Variables

VariableDescription
OHR_PORTServer port (default: 11434)
OHR_HOSTServer bind address (default: 127.0.0.1)
OHR_TOKENBearer token for server authentication
OHR_LANGUAGEDefault language code
NO_COLORDisable colors (no-color.org)

Architecture

CLI (file/stdin/mic) ──┐
                       ├──→ Speech.SpeechAnalyzer (file transcription)
                       ├──→ Speech.SpeechTranscriber (live microphone)
HTTP Server (/v1/*) ───┘    (100% on-device, zero network)

Built with Swift 6.3 strict concurrency. Single Package.swift, three targets:

  • OhrCore — pure logic library (no Speech framework dependency, unit-testable)
  • ohr — executable (CLI + server)
  • ohr-tests — 109 unit tests

No Xcode required. Builds and tests with Command Line Tools only.

Build & Test

# Build + install (auto-bumps patch version each time)
make install                             # build release + install to /usr/local/bin
make build                               # build release only (no install)

# Version management (zero manual editing)
make version                             # print current version
make release-minor                       # bump minor: 0.1.x -> 0.2.0
make release-major                       # bump major: 0.x.y -> 1.0.0

# Debug build (no version bump, uses swift directly)
swift build                              # quick debug build

# Unit tests
swift run ohr-tests                      # 109 pure Swift unit tests (no XCTest needed)

# Integration tests (requires server running)
ohr --serve --token test --debug &       # start server
OHR_TEST_TOKEN=test python3 -m pytest Tests/integration/ -v  # 42 integration tests

Every make build/make install automatically:

  • Bumps the patch version (.version file is the single source of truth)
  • Generates build metadata (commit, date, Swift version) viewable via ohr --release

Part of the apfel ecosystem

ToolWhatApple FrameworkRepo
apfelLLM (text generation)FoundationModelsgolden example
ohrSpeech-to-textSpeechAnalyzeryou are here
kernText embeddingsNLContextualEmbeddingsister project
augeVision / OCRVisionsister project

Meta-repo: apfel-ecosystem

License

MIT