README.md

May 13, 2026 · View on GitHub

Muesli - Speech that is free, Speech that is yours

Muesli

Local-first dictation & meeting transcription for macOS
100% on-device speech-to-text · Zero cloud costs · Privacy by default

MIT License Buy Me A Coffee macOS 14.2+ Apple Silicon


What is Muesli?

Muesli is a lightweight native macOS app that combines WisprFlow-style dictation and Granola-style meeting transcription in one tool. All transcription runs locally on Apple Silicon — your audio never leaves your device unless you choose a cloud-backed meeting summary provider.

Muesli interface showing dictations and meeting history

Dictation

Hold your hotkey (or double-tap for hands-free mode) → speak → release → transcribed text is pasted at your cursor. ~0.13 second latency via Parakeet TDT on the Apple Neural Engine.

Meeting Transcription

Start a meeting recording → Muesli captures your mic (You) and system audio (Others) simultaneously → VAD-driven chunked transcription happens during the meeting at natural speech boundaries → speaker diarization identifies individual remote speakers (Speaker 1, Speaker 2, etc.) → when you stop, the transcript is ready in seconds, not minutes. Generate structured meeting notes via OpenAI, free OpenRouter models, your ChatGPT Plus/Pro subscription, or local Ollama models.


Features

  • Native Swift, zero Python — Pure Swift app with CoreML and Metal backends. No bundled runtimes, no subprocess IPC.
  • Multiple ASR models — Parakeet TDT (Neural Engine), Cohere Transcribe 2B (mixed precision CoreML), Whisper Small/Medium/Large Turbo (CoreML/ANE via WhisperKit), and Qwen3 ASR (52 languages, CoreML).
  • Hold-to-talk & hands-free — Hold hotkey for quick dictation, or double-tap for sustained recording.
  • Meeting recording — Captures mic + system audio (including Bluetooth/AirPods) with a CoreAudio process tap by default and ScreenCaptureKit fallback. System audio from Zoom, Teams, and other call clients stays on the Others side of the transcript.
  • VAD-driven chunk rotation — Silero VAD detects natural speech boundaries in real-time, splitting mic audio at pauses instead of fixed intervals. No mid-sentence cuts.
  • Speaker diarization — Identifies individual speakers in system audio (Speaker 1, Speaker 2, etc.) using FluidAudio's pyannote-based CoreML diarization model.
  • Camera-based meeting detection — Detects when your webcam + mic activate in a recognized meeting app (Zoom, Chrome, Teams, FaceTime, Slack, WhatsApp). Camera alone (e.g. Photo Booth) won't trigger false positives.
  • Join & Record — Extracts meeting URLs from calendar events (Zoom, Google Meet, Teams, Webex, Chime, FaceTime). Split-button notification: "Join & Record" opens the meeting + starts recording, "Join Only" opens without recording, "Record Only" starts recording without joining. Platform icons (Zoom, Meet) in the notification panel.
  • Google Calendar integration — Connect your Google Calendar to see upcoming meetings in the Coming Up section and status bar. Event-driven notifications via EKEventStoreChangedNotification for instant calendar change detection. Pre-meeting countdowns via Marauder's Map easter egg.
  • Meeting export — Export meeting notes or transcripts as PDF (paginated US Letter) or Markdown. Format picker in the save panel, auto-opens the exported file.
  • Meeting templates — Built-in and custom templates for meeting notes. Choose a template before or after recording — re-summarize any meeting with a different template.
  • Dismiss calendar events — Hide irrelevant events from Coming Up, status bar, and menu bar. Dismissed events are pruned automatically.
  • Filler word removal — Automatically strips "uh", "um", "er", "hmm" and verbal disfluencies.
  • AI meeting notes — BYOK with OpenAI or OpenRouter, sign in with your ChatGPT Plus/Pro subscription (no API key needed), or use local Ollama models. Auto-generated meeting titles. Re-summarize any meeting.
  • ChatGPT OAuth — Sign in with your existing ChatGPT subscription via browser-based OAuth (PKCE). Tokens stored in the app support directory with owner-only file permissions.
  • Personal dictionary — Add custom words, phrase matches, and replacement pairs. Jaro-Winkler fuzzy matching auto-corrects transcription output.
  • Model management — Download, delete, and switch between models from the Models tab. Background downloads that don't block the app.
  • Configurable hotkeys — Choose any modifier key (Cmd, Option, Ctrl, Fn, Shift) for dictation.
  • Onboarding — First-launch wizard with model selection, real OS permission verification, hotkey configuration, smoother Accessibility handoff, live dictation test to verify the full pipeline works, and optional summary setup for ChatGPT, OpenAI, OpenRouter, or Ollama. Progress saved on every step — survives crashes and manual quits.
  • Launch at Login — Start Muesli automatically with macOS login items, with approval-state refresh in Settings.
  • Dark & light mode — Adaptive theme with toggle in sidebar.
  • SwiftUI dashboard — Dictation history, meeting notes (Notes-style split view), meeting folders, dictionary, models, shortcuts, settings, about page.
  • Floating indicator — Frosted glass pill with dynamic waveform, accent color customization, and click-to-stop for meetings.

Install

Download the latest .dmg from Releases, open it, and drag Muesli to Applications — or double-click to install automatically.

Homebrew

brew tap pHequals7/muesli
brew install --cask muesli

Build from source

Requirements: macOS 14.2+, Xcode 16+

# Clone
git clone https://github.com/pHequals7/muesli.git
cd muesli

# Build and install to /Applications
./scripts/build_native_app.sh

# Contributor dev build without the maintainer Developer ID certificate
MUESLI_SKIP_SIGN=1 ./scripts/dev-test.sh

Release builds are signed by the maintainer Developer ID certificate. External contributors can use the unsigned dev build for local testing; it installs MuesliDev.app with a separate bundle ID and app data directory. See CONTRIBUTING.md for the full local development workflow.

The transcription model (~450MB for Parakeet v3) downloads automatically on first use.


Agent CLI

Muesli bundles an agent-friendly local CLI inside the app bundle:

  • Installed path: /Applications/Muesli.app/Contents/MacOS/muesli-cli
  • Dev path: native/MuesliNative/.build/arm64-apple-macosx/debug/muesli-cli

The CLI is designed for coding agents such as Codex and Claude Code. It exposes meetings, dictations, raw transcripts, and stored notes as stable JSON so an agent can analyze them with its own model and write notes back without requiring a user-supplied OpenAI or OpenRouter key.

What agents should do

  1. Discover the CLI:
    command -v muesli-cli || echo "/Applications/Muesli.app/Contents/MacOS/muesli-cli"
    
  2. Inspect the command contract:
    /Applications/Muesli.app/Contents/MacOS/muesli-cli spec
    
  3. List recent meetings or dictations:
    /Applications/Muesli.app/Contents/MacOS/muesli-cli meetings list --limit 10
    /Applications/Muesli.app/Contents/MacOS/muesli-cli dictations list --limit 10
    
  4. Fetch a full record:
    /Applications/Muesli.app/Contents/MacOS/muesli-cli meetings get 125
    /Applications/Muesli.app/Contents/MacOS/muesli-cli dictations get 42
    
  5. Summarize or analyze locally in the agent.
  6. Write improved meeting notes back:
    cat notes.md | /Applications/Muesli.app/Contents/MacOS/muesli-cli meetings update-notes 125 --stdin
    

Commands

  • muesli-cli spec
  • muesli-cli info
  • muesli-cli meetings list [--limit N] [--folder-id ID]
  • muesli-cli meetings get <id>
  • muesli-cli meetings update-notes <id> (--stdin | --file <path>)
  • muesli-cli dictations list [--limit N]
  • muesli-cli dictations get <id>

JSON contract

All CLI commands return JSON on stdout.

Success shape:

{
  "ok": true,
  "command": "muesli-cli meetings get",
  "data": {},
  "meta": {
    "schemaVersion": 1,
    "generatedAt": "2026-03-17T00:00:00Z",
    "dbPath": "/Users/example/Library/Application Support/Muesli/muesli.db",
    "warnings": []
  }
}

Failure shape:

{
  "ok": false,
  "command": "muesli-cli meetings get 999",
  "error": {
    "code": "not_found",
    "message": "No meeting exists with id 999.",
    "fix": "Run `muesli-cli meetings list` to find a valid ID."
  },
  "meta": {
    "schemaVersion": 1,
    "generatedAt": "2026-03-17T00:00:00Z",
    "dbPath": "",
    "warnings": []
  }
}

Important meeting fields:

  • rawTranscript
  • formattedNotes
  • notesState
  • calendarEventID
  • micAudioPath
  • systemAudioPath

notesState values:

  • missing
  • raw_transcript_fallback
  • structured_notes

Notes for agent authors

  • The CLI is JSON-first and intended to be machine-consumed.
  • formattedNotes is the only write-back surface in v1.
  • rawTranscript is read-only and should be treated as source material.
  • If notesState is missing or raw_transcript_fallback, agents should prefer summarizing from rawTranscript.
  • Use --db-path or --support-dir only when the default Muesli data location is wrong.

Models

ModelBackendRuntimeSizeLanguagesLatency
Parakeet v3 (recommended)FluidAudioCoreML / Neural Engine~450 MB25 languages~0.13s
Parakeet v2FluidAudioCoreML / Neural Engine~450 MBEnglish only~0.13s
Cohere Transcribe 2BCoreMLFP16 encoder + INT8 decoder~3.8 GBEnglish~1s
Qwen3 ASRFluidAudioCoreML / Neural Engine~1.3 GB52 languages~2-3s
Whisper SmallWhisperKitCoreML / Neural Engine~190 MBEnglish only~1-2s
Whisper MediumWhisperKitCoreML / Neural Engine~1.5 GBEnglish only~2-3s
Whisper Large TurboWhisperKitCoreML / Neural Engine~600 MBMultilingual~2-4s

Cohere Transcribe is a 2B parameter model (#1 on Open ASR Leaderboard) running in mixed precision — FP16 FastConformer encoder on the Neural Engine with INT8 quantized decoders. Includes VAD-gated silence detection to prevent hallucination. Best for high-accuracy English dictation.

Models download on demand from HuggingFace. Manage them from the Models tab in the dashboard.


Permissions

Muesli needs these macOS permissions (guided during onboarding):

PermissionWhy
MicrophoneRecord audio for dictation and meetings
System Audio RecordingCapture call audio from Zoom/Meet/Teams
AccessibilitySimulate Cmd+V to paste transcribed text
Input MonitoringDetect hotkey presses globally
Camera (implicit)Detect webcam activation for meeting detection
Calendar (optional)Show upcoming meetings from Google Calendar

Tech Stack

ComponentTechnology
AppSwift, AppKit, SwiftUI
Primary ASRFluidAudio (Parakeet TDT + Qwen3 ASR on CoreML/ANE)
Cohere ASRCohere Transcribe (FP16 encoder + INT8 decoder on CoreML)
Whisper ASRWhisperKit (CoreML/ANE)
Voice activitySilero VAD via FluidAudio (streaming, event-driven)
Speaker diarizationpyannote via FluidAudio (CoreML on ANE)
Camera detectionCoreMediaIO property listeners (event-driven)
System audioCoreAudio process tap by default; ScreenCaptureKit (SCStream) fallback
Meeting notesOpenAI / OpenRouter (BYOK), ChatGPT subscription (OAuth), or Ollama
CalendarGoogle Calendar API (OAuth 2.0)
ExportPDF (NSPrintOperation, paginated US Letter) + Markdown
Word correctionJaro-Winkler similarity (native Swift)
StorageSQLite (WAL mode)
SigningDeveloper ID + hardened runtime (notarization ready)

Contributing

Contributions welcome! To get started:

git clone https://github.com/pHequals7/muesli.git
cd muesli
swift build --package-path native/MuesliNative -c release
swift test --package-path native/MuesliNative
./scripts/test_packaged_cli.sh

684 tests covering model configuration, custom word and phrase matching, filler removal, transcription routing, data persistence, CLI contract/path-resolution logic, speaker diarization alignment, token consolidation, camera-based meeting detection, CoreAudio system capture, ChatGPT OAuth logic, Ollama summaries, update-flow policy, launch at login, paste/clipboard safety, meeting export, meeting navigation, and Google Calendar URL extraction.

Current test scope:

  • Covered by tests: CLI command contract generation, CLI path-resolution logic, SQLite read/write behavior, note-state classification, meeting/dictation retrieval/update flows, update-flow policy, CoreAudio cleanup, paste/clipboard safety, launch at login, Ollama summary routing, and Computer Use planner foundations.
  • Not covered by Swift unit tests: app-bundle packaging and copying muesli-cli into /Applications/Muesli.app/Contents/MacOS.
  • Packaging is verified by scripts/test_packaged_cli.sh, which builds an isolated app bundle, checks that Contents/MacOS/muesli-cli exists and is executable, and runs muesli-cli spec from the packaged path.

Please open an issue before submitting large PRs.


Support

If Muesli saves you time, consider supporting development:

Buy Me A Coffee


Acknowledgements

  • FluidAudio — CoreML speech models for Apple devices (Parakeet TDT, Qwen3 ASR, Silero VAD, speaker diarization)
  • WhisperKit — Swift Whisper inference on CoreML/ANE
  • Core Audio by Apple — system audio process taps
  • ScreenCaptureKit by Apple — system audio fallback capture
  • NVIDIA Parakeet — FastConformer TDT speech recognition model
  • Cohere Transcribe — 2B parameter autoregressive ASR (#1 Open ASR Leaderboard)
  • Qwen3-ASR — Multilingual speech recognition (52 languages)
  • pyannote — Speaker diarization (via FluidAudio CoreML conversion)

License

MIT — free and open source.

Star History

Star History Chart