AITranscribe
April 20, 2026 · View on GitHub

AITranscribe
A powerful macOS menu bar application for local speech-to-text transcription and AI-powered summarization. Runs completely offline using state-of-the-art AI models. Your voice stays on your Mac — nothing is sent to the cloud.
Quick Install
- Download the latest DMG from Releases
- Open the DMG and drag AiTranscribe to Applications
- Run this once in Terminal (app is not signed with Apple Developer certificate):
xattr -cr /Applications/AiTranscribe.app
Showcase
https://github.com/user-attachments/assets/a5f44cdb-234e-43ad-b4c2-4f566bf4b369
Completely redesigned UI with transparent glass cards, capsule buttons, and smooth stagger animations. Navigate through Dashboard, Sessions, History, Models, and more — all from a collapsible sidebar. Summarize recordings with on-device AI, track your stats, and manage everything in one place.
Recording Indicator
https://github.com/user-attachments/assets/543446b2-f50c-416b-acde-471ee3c46c13
The floating indicator features GPU-synced side-wave arcs that expand with your voice, a glassy capsule design, and magnetic snap to 8 screen positions.
What's New in v0.2
AI-Powered Summaries
Summarize any transcription or session recording using on-device LLMs powered by MLX. Choose from Gemma 4 models (E2B 4-bit recommended, E4B for higher quality) with four built-in presets — General, Meeting Notes, Action Items, and Technical. Models download on demand (~3.6GB), run entirely on Apple Silicon, and auto-unload after use to free RAM.
Dashboard & Analytics
A full statistics hub showing total time saved, words transcribed, average WPM, and transcription count. Dive into weekly patterns, hourly activity heatmaps, WPM trends, transcription length distribution, top words, and model usage — all rendered with animated chart grow-ins.
Auto-Updates
AiTranscribe now checks GitHub Releases automatically every 24 hours. When a new version is available, an in-app window shows the release notes (rendered Markdown), download progress, and a one-click install that replaces the app and relaunches.
Redesigned Recording Indicator
A glassy capsule with layered glass highlights and shadow. During recording, side-wave arcs expand outward in response to audio volume at 10Hz via GPU-synced TimelineView. During transcription, three dots rotate and pulse. Drag it anywhere — it snaps magnetically to corners, edges, and center with a glassy placeholder preview.
Similarity Search & History Indexing
Index your entire transcription history and search by meaning using Apple's native NLEmbedding from the Natural Language framework. Keyword search is still there, but now you can also find transcriptions that are semantically similar to your query — all processed on-device with zero setup.
Redesigned Menu Bar Panel
The menu bar dropdown is now a fully custom compact popover (280px) — no more default macOS menu. It shows the app icon, version badge, and a live status dot (connected/offline). The main area is context-aware: idle shows Record and Session buttons, recording shows a red indicator bar with duration and stop/cancel, session shows an orange bar, and transcribing shows a progress spinner. A microphone picker and settings/quit footer round it out.
Full UI Overhaul
Every screen has been rebuilt from scratch with a transparent bubble design system — glass cards, capsule buttons, stagger animations, and smooth transitions. The settings window now has a collapsible sidebar (compact icon mode or expanded with labels) and all seven tabs (Dashboard, General, Models, Sessions, History, Shortcuts, About) have been redesigned.
Features
Core
- 100% Local and Private — All processing happens on your Mac. No internet, no cloud, no data leaves your machine.
- Global Hotkey — Press
Control + Pfrom any app to start recording. Press again to stop. The transcription is in your clipboard before you can blink. - Menu Bar App — Lives in your menu bar, always accessible, never in your way.
- Multiple AI Models — NVIDIA Parakeet, OpenAI Whisper (base, small, large-v3, large-v3-turbo), NVIDIA Nemotron streaming, and Gemma 4 for summarization.
- Auto-Paste — Transcribed text can be pasted automatically at your cursor position right after transcription.
- Metal GPU Acceleration — Whisper models run on Apple Silicon GPU via whisper.cpp, delivering 8-10x faster transcription than CPU-only inference. 10 minutes of audio transcribed in ~2 minutes.
AI Summaries
- On-Device LLM Summarization — Summarize transcriptions and session recordings using Gemma 4 models running locally via MLX on Apple Silicon.
- Multiple Presets — General summary, Meeting Notes, Action Items, and Technical — each with tailored system prompts and configurable output length (short, medium, long, custom).
- Isolated Runtime — Summary models run in a dedicated Python venv with one-click setup. Models auto-unload after generation to reclaim RAM.
- Streaming Generation — Watch the summary generate in real time with token-by-token streaming.
Session Recording
- Long-form Recording — Record sessions of any length (meetings, lectures, interviews). Captures both microphone and system audio simultaneously.
- RAM-Aware Batch Transcription — Long recordings are automatically split into chunks sized to your available RAM and transcribed sequentially with live progress streaming.
- Session Management — View, rename, delete, and re-transcribe sessions. Bulk actions for clearing transcriptions. Audio and transcription files stored locally.
- Per-Session Summaries — Generate summaries for any session with your choice of preset and model. View summaries in a tabbed interface alongside the full transcription.
- Floating Session Indicator — A capsule-shaped indicator with pulsing red dot and HH:MM:SS timer. Draggable, snaps to 8 screen positions.
Dashboard
- Hero Stats — Time saved, total words transcribed, average WPM, and transcription count at a glance.
- Activity Analytics — Monthly activity bars, weekly pattern breakdown, and hourly activity heatmap.
- Trends & Distribution — WPM trends over 8 weeks, transcription length distribution, top words, and most-used models.
- Animated Charts — All visualizations animate in with staggered spring grow-ins on first load.
Smart Model Management
- Lazy Loading — Models are not loaded at app startup. When you press the record shortcut, the model loads in the background while you speak. By the time you stop, the transcription is ready.
- Idle Unloading — After 2 minutes of inactivity, the model automatically unloads to free RAM. Next time you record, it loads again seamlessly.
- Streaming models excluded — NeMo streaming models (Nemotron) stay loaded since they need to be instantly available for real-time transcription.
Recording
- Floating Indicator — A glassy, draggable recording indicator with GPU-synced animations. Side-wave arcs respond to your voice volume. Snaps to screen corners and edges. Adapts to light and dark themes.
- Real-time Progress — For longer transcriptions, the UI shows live progress (e.g., "Transcribing... 45%") with real segment tracking for Whisper models.
- Audio Ducking — Automatically lowers or mutes system audio while recording, then restores it when you stop.
Audio Device Management
- Microphone Selection — Pick your preferred microphone from the dropdown. Uses native CoreAudio to set the input device.
- AirPods Support — When AirPods connect or disconnect, the device list refreshes automatically. If your selected microphone disappears, the app falls back to the default.
- Persistent Selection — Your preferred microphone is remembered across app restarts.
- Device Change Notifications — macOS notification when the default input device changes.
Transcription History
- All transcriptions are saved locally with timestamp, duration, word count, and which model was used.
- Keyword Search — Filter history instantly by keyword across transcription text and model names.
- Similarity Search — Index your entire history with one click and search by meaning, not just keywords. Uses Apple's native
NLEmbedding(Natural Language framework) for on-device sentence embeddings — no downloads, no cloud. Text matches appear first, then semantic matches, deduplicated. - View full transcription text, copy to clipboard, and view inline summaries per preset.
Auto-Updates
- GitHub Release Checking — Automatically checks for new versions every 24 hours via the GitHub Releases API.
- In-App Update Flow — See rendered release notes, download progress, and one-click install that replaces the app bundle and relaunches.
Customization
- Configurable keyboard shortcuts for recording, cancelling, and session recording
- Sound feedback when recording starts and stops
- Two audio modes during recording: mute completely, or lower volume by a percentage
- NeMo library installation support for accessing Parakeet models (guided setup in the app)
- Factory reset option in About/System settings
Installation
Option 1: Download Pre-built App
-
Download the latest DMG from Releases
-
Open the DMG and drag AiTranscribe to Applications
-
First launch — since the app is not signed with an Apple Developer certificate:
Right-click AiTranscribe in Applications, select "Open", click "Open" in the dialog. You only need to do this once.
Or run in Terminal:
xattr -cr /Applications/AiTranscribe.app -
Grant permissions when prompted:
- Microphone access (required)
- Accessibility access (optional — enables auto-paste at cursor)
Option 2: Build from Source
Building locally avoids all Gatekeeper warnings since the app is built on your machine.
Prerequisites
| Requirement | How to Install |
|---|---|
| Xcode 15+ | App Store |
| Python 3.10+ (arm64) | brew install python@3.11 or python.org |
| Command Line Tools | xcode-select --install |
Quick Build
git clone https://github.com/Ljove02/AIT-AiTranscribe-MacOS.git
cd AIT-AiTranscribe-MacOS
python3 -m venv venv && source venv/bin/activate && pip install -r backend/requirements.txt
./build_production.sh
The built app and DMG will be in the dist/ folder.
Step-by-Step Build
# 1. Clone
git clone https://github.com/Ljove02/AIT-AiTranscribe-MacOS.git
cd AIT-AiTranscribe-MacOS
# 2. Set up Python environment
python3 -m venv venv
source venv/bin/activate
pip install -r backend/requirements.txt
# 3. Build backend executable
cd backend && ./build_standalone.sh && cd ..
# 4. Build everything and create DMG
./build_production.sh
# 5. Install
cp -R dist/AiTranscribe.app /Applications/
Usage
- Click the menu bar icon (top-right of your screen)
- Go to Settings and download a model:
- Parakeet TDT 0.6B — Recommended. Best balance of speed and accuracy.
- Whisper Base (English) — Lightweight, good for quick transcriptions.
- Whisper Large v3 — Highest accuracy, requires more RAM.
- Press
Control + Pto start recording - Speak into your microphone
- Press the hotkey again to stop — text is in your clipboard
The model loads automatically the first time you record. After 2 minutes of idle, it unloads to free RAM. Next time you record, it loads again in the background while you speak.
For session recording, use Control + K to start a long-form session. The floating indicator shows elapsed time. Press again to stop and transcribe.
For AI summaries, open any session or transcription, choose a preset (General, Meeting Notes, Action Items, Technical), and generate a summary — all processed locally on your Mac.
Models
Transcription Models
| Model | Download Size | Speed | Accuracy | RAM Required | GPU Accelerated |
|---|---|---|---|---|---|
| Whisper Base (EN) | ~148MB | Fastest | Good | ~400MB | Yes (Metal) |
| Whisper Small (EN) | ~488MB | Fast | Very Good | ~850MB | Yes (Metal) |
| Whisper Large v3 Turbo | ~1.6GB | Fast | Excellent | ~1.7GB | Yes (Metal) |
| Whisper Large v3 | ~3.1GB | Medium | Best | ~4GB | Yes (Metal) |
| Parakeet TDT 0.6B | ~1.2GB | Fast | Excellent | ~3GB | No (CPU) |
| Nemotron Streaming | — | Realtime | Good | ~2GB | No (CPU) |
Whisper models use whisper.cpp with Metal GPU acceleration for 8-10x faster inference on Apple Silicon. Models are downloaded on-demand and stored locally.
Summary Models
| Model | Download Size | Quality | RAM Required |
|---|---|---|---|
| Gemma 4 E2B (4-bit) | ~3.6GB | Great | ~4GB |
| Gemma 4 E4B (4-bit) | ~5.2GB | Excellent | ~6GB |
Summary models run locally via MLX on Apple Silicon. Installed through a one-click setup in the app with an isolated Python environment.
Development
For contributors who want to run the app with hot-reload:
Terminal 1: Start the backend
source venv/bin/activate
./run_backend.sh --dev
Terminal 2: Run the frontend
open AiTranscribe/AiTranscribe.xcodeproj
# In Xcode: Press Cmd+R to run
Project Structure
AIT-AiTranscribe-MacOS/
├── AiTranscribe/ # Swift/SwiftUI frontend
│ ├── AiTranscribe.xcodeproj
│ └── AiTranscribe/
│ ├── AiTranscribeApp.swift # App entry point
│ ├── AppState.swift # Central state management
│ ├── APIClient.swift # HTTP client for backend
│ ├── BackendManager.swift # Backend process management
│ ├── AudioRecorder.swift # Audio recording + CoreAudio
│ ├── MenuBarView.swift # Menu bar panel UI
│ ├── RecordingIndicator.swift # GPU-synced floating indicator
│ ├── UpdateChecker.swift # GitHub release auto-updates
│ ├── SummarySetupManager.swift # MLX summary runtime installer
│ ├── Sessions/ # Session recording system
│ │ ├── SessionManager.swift
│ │ ├── SessionRecorder.swift
│ │ └── SystemAudioCapture.swift
│ ├── Settings/ # Settings UI (modular tabs)
│ │ ├── SettingsView.swift # Main settings window
│ │ ├── SettingsSidebar.swift # Collapsible sidebar
│ │ ├── DashboardView.swift # Analytics dashboard
│ │ ├── DashboardStatsManager.swift # Stats computation
│ │ ├── GeneralSettingsView.swift
│ │ ├── ModelsSettingsView.swift
│ │ ├── SessionsSettingsView.swift
│ │ ├── SessionDetailView.swift
│ │ ├── HistorySettingsView.swift
│ │ ├── ShortcutsSettingsView.swift
│ │ ├── AboutSettingsView.swift
│ │ └── UpdateWindowView.swift # Auto-update UI
│ ├── HotkeyManager.swift # Global keyboard shortcuts
│ └── HistoryManager.swift # Transcription history + search
│
├── backend/ # Python/FastAPI backend
│ ├── server.py # API endpoints + SSE streaming
│ ├── model_manager.py # Model download and loading
│ ├── summary_manager.py # MLX summary model management
│ ├── summary_worker.py # Summary generation worker
│ ├── setup_summary_venv.py # Summary runtime installer
│ ├── recorder.py # Server-side recording utilities
│ ├── requirements.txt # Python dependencies
│ ├── requirements-summary.txt # Summary runtime dependencies
│ └── build_standalone.sh # PyInstaller build script
│
├── build_production.sh # Full production build (backend + app + DMG)
├── run_backend.sh # Development server script
└── README.md
Tech Stack
- Frontend: Swift / SwiftUI (macOS native)
- Backend: Python / FastAPI (local HTTP server on port 8765)
- Transcription: OpenAI Whisper via whisper.cpp (Metal GPU), NVIDIA NeMo (Parakeet)
- Summarization: Google Gemma 4 via MLX (Apple Silicon)
- Audio: AVFoundation + CoreAudio (Swift), sounddevice (Python)
- Communication: HTTP + Server-Sent Events (SSE) for streaming progress
- Updates: GitHub Releases API for version checking and in-app updates
Troubleshooting
"App is damaged and can't be opened"
xattr -cr /Applications/AiTranscribe.app
"Microphone access denied" Go to System Settings > Privacy & Security > Microphone and enable AiTranscribe.
Backend won't start
lsof -i :8765 # Check if port is in use
lsof -ti:8765 | xargs kill # Kill existing process, then restart the app
Summary setup fails
Make sure you have native arm64 Python 3.10+ installed (not Rosetta). Run python3 -c "import platform; print(platform.machine())" — it should print arm64.
Xcode build fails
xcode-select --install # Make sure Command Line Tools are installed
cd AiTranscribe && xcodebuild clean
Contributing
Contributions are welcome.
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Set up the development environment (see Development)
- Make your changes and test thoroughly
- Commit and push:
git push origin feature/your-feature - Open a Pull Request
License
This project is licensed under the MIT License — see the LICENSE file for details.
Support
If you find AITranscribe useful, consider supporting the project — it helps keep development going.
Acknowledgments
- whisper.cpp for Metal-accelerated Whisper inference
- OpenAI Whisper for Whisper models
- NVIDIA NeMo for Parakeet models
- MLX for on-device LLM inference on Apple Silicon
- Google Gemma for open-weight summary models
If you encounter issues or have questions, check the Troubleshooting section or open an issue.