README.md

May 13, 2026 ยท View on GitHub

English | ็ฎ€ไฝ“ไธญๆ–‡
Input 0 Logo

Input 0

๐Ÿ—๏ธ This project's AI agent harness is built with Harness Engineering.

A macOS voice input tool โ€” hold a hotkey to record, release to get polished text auto-pasted into any input field.

Local AI transcription โ†’ LLM text optimization โ†’ auto-paste. Private, fast, effortless.

Input 0 Main Interface

Features

  • Press & Speak โ€” Hold Option+Space (customizable) to record, release to transcribe + optimize + paste. No window switching needed.
  • Privacy-First Local STT โ€” Six AI engines (Whisper, SenseVoice, Paraformer, Moonshine, FireRedASR, Zipformer CTC) run entirely on your Mac via Metal GPU. Audio never leaves your device.
  • AI-Powered Polish โ€” LLM auto-corrects grammar, removes filler words, and structures your text. Built-in technical term correction (e.g. phonetic Chinese โ†’ "React"), with custom vocabulary support.
  • Auto-Paste Anywhere โ€” Optimized text is automatically pasted into your active input field โ€” Slack, WeChat, VS Code, browsers, any app.
  • 99+ Languages โ€” 6 engines, 12 models, covering 99+ languages. The system recommends the best model based on your language.
  • On-Demand Models โ€” Lightweight app, download only the STT models you need. One-click switch, progress display, smart recommendations.
  • ESC to Cancel โ€” Cancel at any stage (recording, transcribing, optimizing) by pressing ESC.
  • History โ€” Review past transcriptions with original and AI-optimized text side by side.
  • Custom Vocabulary โ€” Add professional terms, names, and product names. Auto-learning detects corrections and validates via LLM.
  • Dark & Light Themes โ€” Dual theme support to match your preference.

System Requirements

  • macOS 11.0+
  • Apple Silicon processor recommended (for Metal GPU acceleration)

Getting Started

1. Install

Download the latest .dmg from GitHub Releases, open it, and drag Input 0 into your Applications folder.

Using Homebrew:

brew install --cask input0

On first launch, macOS may show a security warning. Go to System Settings โ†’ Privacy & Security and click Open Anyway.

2. Grant Permissions

Input 0 needs these macOS permissions to work properly. The app will prompt you on first launch:

PermissionWhy
MicrophoneRecords your voice
AccessibilitySimulates Cmd+V to paste text into other apps

Go to System Settings โ†’ Privacy & Security to grant each permission, then restart the app.

3. Configure LLM API Key โšก

This step is required. Without an API key, voice transcription still works, but the AI text optimization (grammar fix, filler removal, structuring) will be skipped.

  1. Open Input 0 โ†’ Click the โš™๏ธ Settings icon in the sidebar
  2. Find the LLM API section
  3. Enter your settings:
FieldDescriptionDefault
API KeyYour OpenAI (or compatible) API key(empty โ€” must fill in)
API Base URLService endpointhttps://api.openai.com/v1
ModelLLM model namegpt-4o-mini

Using a third-party provider? (e.g. Azure OpenAI, Groq, local Ollama, or any OpenAI-compatible API)

  • Change the API Base URL to your provider's endpoint
  • Set the Model to your provider's model name
  • Enter the corresponding API Key

Click Test Connection to verify your API key works.

4. Download an STT Model โšก

This step is required. You need at least one model to transcribe voice to text.

  1. Open Input 0 โ†’ Go to Settings โ†’ Model section
  2. The app recommends the best model based on your language setting โ€” follow the recommendation or pick your own
  3. Click Download next to the model you want
  4. Wait for the download to complete (progress bar shown)

Which model should I choose?

If you speak...Recommended modelSize
ChineseSenseVoice Small~228 MB
EnglishWhisper Large v3 Turbo~1.5 GB
Japanese / KoreanSenseVoice Small~228 MB
CantoneseParaformer Trilingual (zh/en/yue)~234 MB
Multiple languagesWhisper Large v3~2.9 GB
Chinese SOTA (highest accuracy)FireRedASR Large v1~1.74 GB
Want the fastestMoonshine Base (EN-only)~274 MB
Want smallest downloadWhisper Base~142 MB

5. Start Using

  1. Press and hold Option+Space (or your custom hotkey)
  2. Speak naturally
  3. Release the key โ€” your text is transcribed, polished, and auto-pasted

That's it! ๐ŸŽ‰

Troubleshooting

Model Download Issues

If a model download fails or gets stuck:

  1. Check your network โ€” Models are downloaded from Hugging Face. If you're in a region where Hugging Face is slow or blocked, consider using a VPN or proxy.
  2. Retry โ€” Close and reopen the app, then try the download again. Partial downloads will resume from where they left off.
  3. Disk space โ€” Make sure you have enough free space. The largest model (Whisper Large v3) needs ~2.9 GB.
  4. Manual download โ€” As a last resort, you can manually download model files and place them in:
    ~/Library/Application Support/com.input0/models/<model-id>/
    
    See the model registry for exact file names and download URLs.

No Text Output After Speaking

  • Make sure at least one STT model is downloaded and selected
  • Check that your Microphone permission is granted
  • Check that Accessibility permission is granted (required for auto-paste)

AI Optimization Not Working

  • Verify your API Key is set correctly in Settings
  • Click Test Connection to check the API is reachable
  • If using a third-party provider, double-check the API Base URL and Model name

Hotkey Not Responding

  • Make sure no other app is using the same hotkey
  • Try changing the hotkey in Settings
  • Restart the app after changing the hotkey

All STT Models

ModelSizeBest For
Whisper Base~142 MBFast & lightweight, good for daily use
Whisper Small~466 MBBalanced accuracy and speed
Whisper Medium~1.4 GBExcellent multilingual accuracy
Whisper Large v3~2.9 GBHighest accuracy, 99 languages
Whisper Large v3 Turbo~1.5 GBTop accuracy for English & multilingual
Whisper Large v3 Turbo Q5~547 MBQuantized high-accuracy, balanced size
SenseVoice Small~228 MBBest for Chinese / Japanese / Korean
Paraformer Chinese~217 MBChinese-optimized, ultra-fast inference
Paraformer Trilingual~234 MBChinese + English + Cantonese (only Cantonese-capable model)
Moonshine Base (EN)~274 MBEnglish-only, ~5x faster than Whisper
FireRedASR Large v1~1.74 GBChinese ASR SOTA (CER โ‰ˆ 2%), for maximum accuracy
Zipformer CTC (Chinese)~350 MBOffline Chinese CTC, next-gen Kaldi, lightweight

License

This project is licensed under CC BY-NC 4.0. You are free to share and adapt, but not for commercial use.