TypeWhisper for Windows

May 28, 2026 · View on GitHub

Speech-to-text and AI text processing for Windows. Dictate anywhere, transcribe files, and transform text with reusable workflows. Use local models when privacy matters, or add cloud providers through plugins when speed and scale matter more.

TypeWhisper for Windows includes system-wide dictation, file transcription, workflows, history, dictionary, snippets, local and cloud transcription engines, and bundled integrations. Advanced surfaces like the HTTP API, CLI, plugin SDK, marketplace, and action plugins remain available for power users and automation.

What's New

Workflows: Prompt actions and matching rules live in one workflow surface with templates for cleanup, translation, email replies, meeting notes, checklists, JSON extraction, summaries, and custom prompts.
Expanded engines: Local SherpaOnnx, whisper.cpp, Granite Speech, and compatible server-based engines sit alongside cloud transcription providers.
Streaming preview: The recording overlay can show partial results while speech is still being captured.
Automation refresh: The local HTTP API and CLI support tokenized discovery, local-file transcription, workflow/rule listing, per-request engine/model overrides, language hints, translation targets, and dictionary term/correction management.
Plugin marketplace: Browse, install, upgrade, and remove bundled or external plugins from Settings.
Release channels: Velopack powers stable, release-candidate, and daily build delivery.

Features

Transcription

On-device models: Parakeet TDT 0.6B, Canary 180M Flash, whisper.cpp models, and Granite Speech run locally through plugins. Recommended local models run on CPU, with no GPU required.
Cloud transcription: Groq Whisper, xAI/Grok STT, OpenAI Whisper, AssemblyAI, Deepgram, ElevenLabs, Reson8, Gladia, Google Cloud STT, Soniox, Speechmatics, Cloudflare ASR, Voxtral, and any OpenAI-compatible server can be added through plugins.
Streaming preview: Silero VAD detects speech segments during recording and shows partial transcription results in the overlay before recording stops.
Short-clip handling: Brief utterances are padded and retained more reliably across local and cloud engines.
File transcription: Drag and drop audio/video files. Supports WAV, MP3, M4A, AAC, OGG, FLAC, WMA, MP4, MKV, AVI, MOV, and WebM.
Subtitle export: Export transcriptions as TXT, SRT, or WebVTT.

Dictation

System-wide dictation: Hybrid, Toggle-only, Hold-only, and workflow-specific hotkeys can paste text into any app.
Non-blocking pipeline: Queue multiple recordings while transcription runs in the background.
Sound feedback: Audio cues for recording start and stop.
Silence detection: Automatically stops recording after a configurable silence period.
Whisper mode: Boosted microphone gain for quiet speech.
Audio normalization: Automatic gain control for consistent input levels.
Media pause: Automatically pauses media playback during recording.
Audio ducking: Reduces system volume while recording.

AI Processing

Workflows: Build reusable transformations for cleanup, translation, rewriting, extraction, formatting, and app-specific automation. Workflows can run by app, website, or dedicated hotkey.
LLM providers: Groq, xAI/Grok, OpenAI, Gemini, Claude, Cerebras, Cohere, Fireworks, OpenRouter, OpenAI Compatible, and local Gemma can be used through plugins.
Custom prompts: Add fine-tuning instructions per workflow, or use custom workflow prompts when the built-in templates are not specific enough.
Translation: Cloud LLM translation can fall back to local Marian ONNX translation. Supported target languages include EN, DE, FR, ES, IT, NL, PL, SV, DA, FI, CS, RU, UK, HU, JA, ZH, AR, HI, VI, and ID.

Personalization

Workflow triggers: Match by process name, website pattern, or hotkey for language, task, model, whisper mode, prompt processing, output format, and action routing.
Dictionary: Custom term corrections fix names, jargon, and recurring misrecognitions. Includes regex support and built-in term packs for developer, medical, legal, finance, and creative domains.
Snippets: Text shortcuts with trigger -> replacement. Placeholders include {date}, {time}, {datetime}, {clipboard}, {day}, and {year}. Date/time placeholders support custom formats, such as {date:dd.MM.yyyy}.
History: Searchable transcription history with raw/final text tracking, app context, inline editing, export, retention controls, and recent-transcription access.

Integration & Extensibility

Plugin system: Extend TypeWhisper with custom transcription engines, LLM providers, post-processors, memory providers, TTS providers, event observers, and action plugins.
Plugin marketplace: Browse, install, upgrade, and remove plugins directly from Settings. Recommended extensions can be installed automatically on first run.
Action plugins: Linear, Obsidian, Script, Webhook, and LiveTranscript can turn transcriptions into issues, notes, scripts, notifications, or live windows.
HTTP API: Local REST server for status, models, transcription, workflows/rules, history, dictionary terms and corrections, and dictation control.
CLI tool: Shell-friendly transcription via the bundled typewhisper command.

General

Fluent Design: WPF-UI with Mica backdrop, native title bar, and Fluent controls.
Dynamic Island overlay: Configurable widgets for LED, timer, waveform, active workflow, and microphone level.
Home dashboard: Usage statistics, words per minute, app activity, time saved, and onboarding.
Welcome wizard: Guided setup for extension installation, model download, microphone test, and hotkeys.
Release channels: Velopack-powered delivery for stable, release-candidate, and daily builds.
Windows autostart: Optional start with Windows.
System tray: Minimizes to tray with quick access.
Localization: English, German, Japanese, and Russian.

Install

Direct Download

Download the latest installer from GitHub Releases.

Windows releases are code-signed using SignPath.io. Free code signing is provided by SignPath.io, certificate by SignPath Foundation. See the privacy policy for data handling details.

Stable releases use the default Velopack channel. Release candidates and daily builds are published as prereleases on their own update channels. Installed builds can switch channels in Settings.

Code Signing Policy

TypeWhisper for Windows release artifacts are built from this repository by GitHub Actions and submitted for signing through SignPath.io. Signing approvals are limited to project maintainers. The signed publisher may appear as SignPath Foundation because the certificate is provided through the SignPath Foundation open-source program.

Quick Start

Install TypeWhisper from the latest Windows release.
Open Settings and grant microphone access if Windows asks for it.
Pick a transcription engine and, if needed, download a local model.
Set a global hotkey or create a workflow-specific hotkey.
Trigger dictation and complete your first transcription.

System Requirements

Windows 10 or later (x64 or ARM64)
8 GB RAM minimum, 16 GB+ recommended for larger local models
Around 700 MB disk space for Parakeet, around 200 MB for Canary, more for whisper.cpp or Granite Speech models
.NET 10 SDK for building from source

Model Recommendations

Use Case	Recommended Models
Fast general dictation	Parakeet TDT 0.6B, whisper.cpp Base Q5_0
Multilingual dictation with translation	Canary 180M Flash, whisper.cpp Large V3 Turbo
Lowest disk usage	whisper.cpp Tiny Q5_0
Higher local accuracy	whisper.cpp Small or Large V3 Turbo
Experimental local speech	IBM Granite 4.0 1B Speech

Local models are provided by bundled plugins and can be installed from the built-in marketplace.

Build

Clone the repository:

git clone https://github.com/TypeWhisper/typewhisper-win.git
cd typewhisper-win

Build with .NET 10:
```
dotnet build TypeWhisper.slnx
```

Run the app:

dotnet run --project src/TypeWhisper.Windows

Run the automated checks before shipping changes:
```
dotnet test TypeWhisper.slnx
```

The app appears in the system tray, and the welcome wizard guides you through extension installation, model download, and setup.

HTTP API

The HTTP API is an advanced local automation surface. It binds to localhost and 127.0.0.1, is configurable in Settings, and uses port 8978 by default. When the server starts it writes discovery files to %LOCALAPPDATA%\TypeWhisper: legacy api-port and api-discovery.json with { "version": 1, "port": 8978, "token": "..." }.

Authentication is off by default for local compatibility. If Settings > Advanced > API Server > Require API Token is enabled, /v1/status remains public and all other routes require either Authorization: Bearer <token> or X-TypeWhisper-API-Token: <token>. OPTIONS requests return 204 No Content.

Endpoint	Method	Description
`/v1/status`	GET	App status, active engine, active model, API version, and capability flags
`/v1/models`	GET	List all available local and cloud models
`/v1/transcribe`	POST	Transcribe multipart or raw audio
`/v1/transcribe/local-file`	POST	Transcribe a file path already on this Windows machine
`/v1/history`	GET	Search history with pagination
`/v1/history`	DELETE	Delete a history entry by ID
`/v1/rules`	GET	List workflow-backed rules
`/v1/profiles`	GET	List workflow-backed profiles
`/v1/rules/toggle`	PUT	Toggle a rule by `id`
`/v1/profiles/toggle`	PUT	Toggle a profile by `id`
`/v1/dictation/start`	POST	Start recording
`/v1/dictation/stop`	POST	Stop recording
`/v1/dictation/status`	GET	Check current dictation state
`/v1/dictation/transcription`	GET	Poll dictation result by session ID
`/v1/dictionary/terms`	GET	List enabled dictionary terms
`/v1/dictionary/terms`	PUT	Replace or append dictionary terms
`/v1/dictionary/terms`	DELETE	Delete one dictionary term
`/v1/dictionary/corrections`	GET	List enabled dictionary corrections
`/v1/dictionary/corrections`	PUT	Upsert a correction
`/v1/dictionary/corrections`	DELETE	Delete a correction

/v1/transcribe accepts multipart/form-data with a file part or a raw audio request body. Multipart fields are:

language: exact source language, such as en or de
language_hint: repeatable language hints; do not combine with language
task: transcribe or translate
target_language: translate the final text to this language
response_format: json or verbose_json
prompt: request-specific transcription prompt/context
engine and model: per-request overrides using IDs from /v1/models

Raw audio requests can pass the same options with headers: X-Language, X-Language-Hints, X-Task, X-Target-Language, X-Response-Format, X-Prompt, X-Engine, and X-Model. Add ?await_download=1 to wait for a local model download or restore when supported.

/v1/transcribe/local-file accepts JSON. This is what the CLI uses for normal file paths, so large local files are not uploaded through the API process:

{
  "path": "C:\\Audio\\recording.wav",
  "language_hints": ["de", "en"],
  "task": "transcribe",
  "engine": "mock",
  "model": "tiny"
}

Dictionary terms use PUT /v1/dictionary/terms with { "terms": ["TypeWhisper"], "replace": false } and DELETE /v1/dictionary/terms with { "term": "TypeWhisper" }. Corrections use { "original": "teh", "replacement": "the", "caseSensitive": false }.

curl -X POST http://localhost:8978/v1/transcribe \
  -F "file=@recording.wav" \
  -F "language_hint=de" \
  -F "language_hint=en" \
  -F "response_format=verbose_json"

curl -X POST http://localhost:8978/v1/dictation/start
curl -X POST http://localhost:8978/v1/dictation/stop
curl "http://localhost:8978/v1/dictation/transcription?id=<session-id>"

CLI Tool

The optional typewhisper CLI talks to the local HTTP API and is intended for scripts, terminals, Raycast commands, and batch workflows. It reads %LOCALAPPDATA%\TypeWhisper\api-discovery.json before the legacy api-port file. Tokens can also be supplied with TYPEWHISPER_API_TOKEN or --api-token.

Commands

typewhisper status
typewhisper models
typewhisper transcribe recording.wav --language de --json
typewhisper transcribe recording.wav --language-hint de --language-hint en
typewhisper transcribe recording.wav --engine groq --model whisper-large-v3-turbo
typewhisper transcribe - < audio.wav

Options

Option	Description
`--port <N>`	Server port (default: auto-discover, fallback `8978`)
`--api-token <token>`	API token override
`--json`	Output as JSON
`--language <code>`	Source language, such as `en` or `de`
`--language-hint <code>`	Repeatable language hint for restricted auto-detection
`--task <task>`	`transcribe` (default) or `translate`
`--translate-to <code>`	Target language for translation
`--engine <id>`	Override the transcription engine
`--model <id>`	Override the transcription model
`--await-download`	Wait for local model restore/download

The CLI requires the API server to be running in Settings.

Workflows

Workflows let you configure transcription, transformation, and automation behavior per application, website, or hotkey. For example:

Outlook: German dictation, email reply template, auto-submit enabled
Slack: English cleanup workflow with Groq or OpenAI prompt processing
Terminal: Whisper mode enabled with a command-focused dictionary
github.com: English cleanup workflow that matches in any browser
docs.google.com: German dictation workflow that translates to English

Create workflows in Settings > Workflows. Choose a template, assign an app, website, or hotkey trigger, then configure language/task/model overrides, prompt processing, output behavior, action routing, and priority. Website patterns support wildcard matching, so *.github.com matches gist.github.com.

When you start dictating, TypeWhisper matches the active window and browser URL against enabled workflows with the following priority:

Website match: browser URL patterns, such as github.com in any supported browser
App match: process names, such as OUTLOOK.EXE or Code.exe
Hotkey match: workflow-specific hotkeys that force a selected workflow

The active workflow is shown in the recording overlay.

Plugins

TypeWhisper supports plugins for adding custom transcription engines, LLM providers, post-processors, memory providers, TTS providers, event observers, and action plugins. Plugins are .NET class libraries with a manifest.json, installed to %LocalAppData%\TypeWhisper\Plugins\.

Bundled plugin families include:

Type	Plugins
Local transcription	SherpaOnnx, whisper.cpp, Granite Speech
Cloud transcription	OpenAI, Groq, xAI/Grok, AssemblyAI, Deepgram, ElevenLabs, Reson8, Gladia, Google Cloud STT, Soniox, Speechmatics, Cloudflare ASR, Voxtral, OpenAI Compatible
Server-backed transcription	Qwen3 STT
LLM providers	OpenAI, Groq, xAI/Grok, Gemini, Claude, Cerebras, Cohere, Fireworks, OpenRouter, OpenAI Compatible, Gemma Local
TTS providers	OpenAI, xAI/Grok, Supertonic TTS
Actions	Linear, Obsidian, Script, LiveTranscript, Webhook
Memory	File Memory, OpenAI Vector Memory

Plugin Types

Interface	Purpose
`ITranscriptionEnginePlugin`	Local, cloud, or custom speech-to-text engines
`ILlmProviderPlugin`	LLM chat completions for workflow processing
`IPostProcessorPlugin`	Post-processing pipeline for cleanup and formatting
`IActionPlugin`	Custom actions triggered by workflow output
`ITypeWhisperPlugin`	Event observer, lifecycle hook, or utility plugin

SDK Helpers

The SDK includes helpers for OpenAI-compatible APIs:

OpenAiTranscriptionHelper: multipart/form-data upload for Whisper-compatible endpoints
OpenAiChatHelper: chat completion requests
OpenAiApiHelper: shared HTTP error handling

Architecture

typewhisper-win/
|-- src/
|   |-- TypeWhisper.Core/           # Core logic, models, interfaces, persistence
|   |-- TypeWhisper.PluginSDK/      # Plugin contracts, helpers, manifest models
|   |-- TypeWhisper.Cli/            # typewhisper command-line client
|   `-- TypeWhisper.Windows/        # WPF UI, services, view models, app composition
|-- plugins/                        # Bundled plugin source
|-- tests/
|   |-- TypeWhisper.Core.Tests/     # Core xUnit tests
|   `-- TypeWhisper.PluginSystem.Tests/
`-- docs/                           # Design notes and implementation plans

Patterns: MVVM with CommunityToolkit.Mvvm. App.xaml.cs is the composition root. SQLite is used for persistence with a custom migration pattern. Plugins load through AssemblyLoadContext isolation and manifest-based discovery. Workflows drive app/website/hotkey matching and prompt orchestration.

Key dependencies: NAudio (audio), NHotkey.Wpf (hotkeys), WPF-UI (Fluent Design), Microsoft.ML.OnnxRuntime (local translation), Velopack (updates), H.NotifyIcon.Wpf (system tray), and org.k2fsa.sherpa.onnx (local ASR).

License

GPLv3. See LICENSE for details. Commercial licensing is available under LICENSE-COMMERCIAL.md. See TRADEMARK.md for the trademark policy.