Voice Apps Index

April 25, 2026 · View on GitHub

Master Index

Index of voice typing, dictation, and speech-to-text applications and utilities.


Active Projects

Three parallel tracks, each with its own use case:

  • Real-time streaming at cursor — speak and see text appear as you go, for chat, IDEs, and quick input. Covered by VoiceType (hybrid local + cloud) and Parakeet Type Ubuntu (local-only proof of concept).
  • Long-form note dictation — speak a full note, get back polished, formatted text in one pass. Covered by AI Typer V2.
  • Android voice-to-text reformatter — hold-to-talk, single-pass transcription + reformatting into a chosen preset (email, prompt, to-do, Hebrew). Covered by Voxcast.

VoiceType

The flexible hybrid track — aiming to blend local and cloud STT so the user picks the tradeoff per session (latency, cost, privacy). Currently cloud-only via Deepgram Nova-3 streaming with keyterm prompting; local inference is planned. Python + PyQt6, single-process, no root (evdev uinput via the input group). System tray, hotkeys, push-to-talk, VAD, and an in-app cost tracking dialog. Ships as a .deb package.

View Repo


Parakeet Type Ubuntu

The local-only track — a focused proof of concept for running NVIDIA Parakeet / NeMo ASR models on AMD CPU inference via sherpa-onnx, with no cloud and no GPU required. Built-in punctuation, multiple model profiles, system tray, configurable hotkeys.

View Repo


AI Typer V2

The long-form dictation track — single-pass multimodal audio understanding (Gemini via OpenRouter) where the model transcribes and formats in one call. Smart format detection (email / list / notes), VAD + AGC preprocessing, optional second-pass coherence review, custom dictionary with CSV import/export, streaming live-text preview, global F13–F24 hotkeys, append mode, and type-at-cursor that works in terminals as well as GUI apps.

View Repo


Voxcast

The Android mobile track — a hold-to-talk voice-to-text app (Expo / React Native) that transcribes and reformats in a single OpenRouter call (Gemini 3.1 Flash Lite) into one of eight serious presets: business email, AI prompt, dev prompt, basic cleanup, to-do list, note to self, casual Hebrew, and Hebrew email. Email modes return separate subject + body for two-tap copy. One preset active at a time, no layering. Sibling project to Crazy-Keyboard but reframed as a productivity tool.

View Repo


Earlier Iterations

Kept for reference — superseded by the active projects above.

Thought Pad

Two-stage process for creating notes from dictated speech — transcription via Whisper API followed by light text formatting. Exports to markdown. Predecessor to AI Typer V2.

View Repo


Whisper Typer 0911

Early Whisper-based voice typing iteration.

View Repo


Voice Keyboard

Early voice keyboard prototype.

View Repo


Transcription Tools

Gemini Audio Transcriber

File upload based multimodal transcription tool using Gemini via Open Router.

View Repo


Gemini Transcription Notepad

Gemini-powered transcription notepad with cleanup.

View Repo


Gemini ASR Transcriber

Transcription notepad for Gemini ASR.

View Repo


DVR Transcriber

Workflow workspace for importing recordings from a DVR and using AI for transcription.

View Repo


Transcript Creator

Audio cleanup and transcription tool.

View Repo


Local Multimodal Transcriber

Local transcription app with audio multimodal design.

View Repo


ASR Transcription Pipeline

ASR transcription pipeline.

View Repo


Transcription MCPs

Gemini Transcription MCP

MCP server for Gemini multimodal audio transcription with built-in post-processing.

View Repo


Cloud ASR MCP

MCP for using various cloud ASR models for speech-to-text and transcription.

View Repo


Local AI Transcription MCP

MCP for local AI transcription.

View Repo


Local Transcription MCP

WIP MCP for local STT with cleanup on AMD GPU machines.

View Repo


OR Audio Transcription MCP

Open Router-based audio transcription MCP server.

View Repo


Evaluations & Benchmarks

Whisper Fine Tune Accuracy Eval

Comparing Whisper fine-tunes versus stock Whisper on local inference.

View Repo


Whisper WPM Background Noise Eval

Quick eval to answer: how much does speaking pace affect WER/accuracy in ASR?

View Repo


Transcription Cleanup Eval

Evaluating various cloud audio understanding models on the transcribe-and-cleanup workflow.

View Repo


One Shot Transcription Microphone Eval

Test samples for various microphones with an STT accuracy evaluation.

View Repo


Local ASR STT Benchmark

Quick evaluation to find the best STT model in Speech Note (Ubuntu) for local hardware.

View Repo


Whisper WPM Test

Whisper words-per-minute testing.

View Repo


Gemini 3.1 Lite Audio Understanding Eval

Evaluation of Gemini 3.1 Lite on audio understanding tasks.

View Repo


Voice Cleanup Prompt Experiment

Testing various permutations in system prompting for raw audio transcript cleanup and comparing multimodal ASR vs. the STT + LLM approach.

View Repo


Whisper Fine-Tuning & Setup

Whisper Finetune V2

Whisper fine-tuning iteration.

View Repo


Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset.

View Repo


Whisper Fine Tuning Data

Whisper fine-tuning dataset.

View Repo


Whisper Fine Tune 171125

Whisper fine-tuning iteration.

View Repo


Whisper Base FUTO

Whisper base model via FUTO.

View Repo


Local STT Fine Tune Tests

Local STT fine-tuning tests.

View Repo


Fine Tuned STT Formats

Fine-tuned STT data formats.

View Repo


whisper-wayland-rocm

Whisper-Wayland with ROCm GPU acceleration — Docker setup for AMD GPUs.

View Repo


whisper-cpp-rocm-setup

whisper.cpp ROCm setup scripts.

View Repo


Whisper Local Notes

Notes on local Whisper usage.

View Repo


ASR Training Data

ASR Training Data Collector

GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.

View Repo


ASR Training Data Collector GUI Template

GUI template for ASR training data collection.

View Repo


ASR Training Data Chunker

Breaks up texts by approximate reading duration for ASR training.

View Repo


Other Utilities

Voice Note Recorder Ubuntu

GUI for recording voice notes on Ubuntu/Linux.

View Repo


Readiness Voice Agent

Voice agent implementation for readiness checklists.

View Repo


Voice Note Classification Model

Model for classifying voice notes.

View Repo


Voice Note Classifier Model

Voice note classifier model.

View Repo


Voice Note Dataset

Frontend for open source voice note dataset for annotation/classification project.

View Repo


Voice Note Ragie Pipeline

Test pipeline: voice context data to Ragie.

View Repo


Voice Prompt Cleanup Script

Audio processing cleanup script.

View Repo


Transcription Macropad

Macropad configuration for transcription workflows.

View Repo


Dictation Macropad

Plan/key allocation for a macropad optimised for heavy daily dictation workflows.

View Repo


Voicepad

Planning notes for a macropad for STT users.

View Repo


Voice Typer HW

Voice typer hardware notes.

View Repo


Voice Headset Design

Voice headset design notes.

View Repo


Dictation Microphones

Dictation microphone notes and comparisons.

View Repo


speech-notes-with-text-fixes

Speech Note Linux app with text fixes — note taking, reading and translating with offline STT, TTS, and machine translation.

View Repo


Hebrish Whisper Tester

Testing Whisper with Hebrew-English mixed speech.

View Repo


Notes & Ideas

VoiceBox

Concept for a speech tech solution — specced out by Claude.

View Repo


Linux Realtime Voice Typing

Planning and research for real-time voice typing on Linux (Deepgram, Gemini, Parakeet).

View Repo


Live Typing UX Research

Claude-assisted technical research into live voice typing implementation approaches — streaming inference patterns, partial-result handling, turn detection, and UX tradeoffs for at-cursor dictation.

View Repo


Linux Voice Typing App Notes

Planning notes for a Linux voice typing tool.

View Repo


Speech To Text Chain Notes

Notes on STT processing chain for future voice projects.

View Repo


Cloud STT Price Points

Point-in-time pricing snapshots for ASR services.

View Repo


ASR And STT AI Notebook

Prompts and outputs on STT, ASR, and fine-tuning with Claude.

View Repo


Linux Friendly Voice Tech

List of resources for voice technology with Linux support, encompassing STT, ASR, and dev frameworks.

View Repo


Voice Control Linux

Claude-enhanced research for voice control platforms with Linux support.

View Repo


voice-typing-collection

Collection of voice typing / STT GitHub repos for testing on Linux.

View Repo


Awesome Whisper Apps

Useful speech-to-text tools that use Whisper under the hood (API/local).

View Repo


Voiceflow Planner

Voiceflow planning notes.

View Repo


STT TTS Train 1125

STT and TTS training notes.

View Repo