README.md

May 26, 2026 Β· View on GitHub

My Translator β€” Real-time Speech Translation

Latest Release Built with Tauri macOS Windows License Stars

My Translator is a real-time speech translation desktop app built with Tauri. It captures audio directly from your system or microphone, transcribes it, and displays translations in a minimal overlay β€” with no intermediary server involved.

πŸ“– Installation guides: macOS (EN) Β· macOS (VI) Β· Windows (EN) Β· Windows (VI)


How It Works

                                    β”Œβ”€β”€ ☁️  Soniox  (text)              ──┐
System Audio / Mic β†’ 16kHz PCM ─────┼── ⚑ OpenAI Realtime (text+πŸ”Š)      ─┼─→ Overlay UI
                                    β”œβ”€β”€ 🌏 Qwen LiveTranslate Flash (text only) β”‚
                                    └── πŸ–₯️  Local MLX  (text, offline)   β”€β”˜
                                                                            ↓ (optional, text engines)
                                                  TTS (Edge / Google / ElevenLabs) β†’ πŸ”Š

Four translation engines, pick what fits your call:

FeatureDetail
Engines☁️ Soniox Β· ⚑ OpenAI Realtime Β· 🌏 Qwen LiveTranslate Flash Β· πŸ–₯️ Local MLX
Latency~2 s (Soniox / OpenAI) Β· ~4 s (Qwen) Β· ~10 s (Local)
Languages70+ source β†’ any target (Soniox), 13 targets (OpenAI), 60+ source+target (Qwen), JA/EN/ZH/KO β†’ VI/EN (Local)
Cost~$0.12/hr (Soniox) Β· ~$4/hr (OpenAI, includes voice) Β· Free preview (Qwen, text-only) Β· Free (Local)
TTS3 providers for Soniox / Local (Edge free, Google, ElevenLabs) β€” OpenAI streams its own voice (off by default), Qwen text-only
PlatformmacOS (ARM + Intel) Β· Windows Β· Local mode = Apple Silicon only
Signedβœ… macOS signed & notarized
Auto-Updateβœ… Built-in, check & install from Settings

πŸ“Š Detailed head-to-head: OpenAI Realtime vs Soniox benchmark β€” speed, quality, cost, and translation-mechanism comparison from a 5-min real-world test.


Features

πŸ“– Dual Panel View

Two display modes:

  • Single (default) β€” Translation text only, clean and focused
  • Dual β€” Source | Translation side-by-side, each panel scrolls independently

Toggle with the panel button (bottom-right on hover).

πŸ”„ Smart Scroll

Auto-scroll only when you're at the bottom. Scroll up to read old content without being yanked back down.

πŸ”€ Quick Font Size

A- / A+ floating controls (bottom-right on hover). Font size adjustable up to 140px β€” great for presentations.

πŸ”„ Two-Way Translation

Translate conversations between two languages simultaneously β€” ideal for bilingual meetings.

  • One-way: Source language β†’ Target language (e.g., Japanese β†’ Vietnamese)
  • Two-way: Language A ↔ Language B (e.g., Vietnamese ↔ Japanese) β€” the app detects who is speaking and translates to the other language automatically

Setup for video calls (Zoom, Google Meet, MS Teams):

  1. Audio Source: Both (System + Mic)
  2. Translation Type: Two-way
  3. Set Language A and Language B

Note: TTS narration is automatically disabled in two-way mode to prevent audio feedback loops (TTS output β†’ mic recapture β†’ re-translation).

πŸŽ™οΈ TTS Narration

Read translations aloud in one-way mode β€” 3 providers:

Edge TTS ⭐Google Chirp 3 HDElevenLabs
CostFreeFree 1M chars/mo~$5/mo+
Qualityβ˜…β˜…β˜…β˜…β˜† Neuralβ˜…β˜…β˜…β˜…β˜… Near-humanβ˜…β˜…β˜…β˜…β˜… Premium
Vietnameseβœ… 2 voicesβœ… 6 voicesβœ… Yes
SetupNoneGoogle Cloud API keyAPI key
Speed controlβœ…βœ… 0.5x–2.0x❌

TTS is OFF by default β€” toggle with the TTS button or ⌘ T.

πŸ“– TTS guide: English Β· TiαΊΏng Việt

πŸ“– Custom Translation Terms

Define how domain-specific words should be translated:

Original sin = Tα»™i nguyΓͺn tα»•
Christ = KitΓ΄
Pneumonia = ViΓͺm phα»•i

Add terms in Settings β†’ Translation β†’ Translation terms. Great for religious, medical, or technical content.

⚑ OpenAI Realtime Mode

Single-call streaming translation via OpenAI's gpt-realtime-translate (May 2026 GA). Returns translated text and translated speech audio over one WebSocket β€” no separate TTS step, lower end-to-end latency, more idiomatic output. Trade-off: ~$4/hr, charged to your own OpenAI account. 13 target languages: en, es, pt, fr, de, it, ru, hi, id, vi, ja, ko, zh.

Two-way mode and the custom TTS toggle are unavailable while OpenAI Realtime is selected (audio is native).

🌏 Qwen LiveTranslate Flash Mode

Alibaba DashScope qwen3-livetranslate-flash-realtime β€” streams translated text (no native voice) on Qwen's free preview tier, with a 60-language picker matching the mobile app. Server-side VAD handles turn detection, so it works with mic / system audio / both. Translation-only display (no source-transcript panel; the model doesn't expose ASR). Get a key from Alibaba Cloud Bailian (Singapore region only β€” other regions hit a different endpoint and fail).

Source language must be picked explicitly (auto-detect is disabled on this engine β€” Live Flash stalls on real mic input when source is "auto"). Two-way mode and the custom TTS toggle are also disabled while Qwen is selected.

πŸ–₯️ Local Mode (Apple Silicon only)

Experimental offline mode using MLX + Whisper + Gemma β€” runs 100% on-device. JA/EN/ZH/KO β†’ VI/EN.


Privacy

Your audio never touches our servers β€” because there are none.

  • App connects directly to APIs you configure β€” no relay, no middleman
  • You own your API keys β€” stored locally, never transmitted elsewhere
  • No account, no telemetry, no analytics β€” zero tracking
  • Transcripts saved as .md files locally, per session

Tech Stack


Build from Source

git clone https://github.com/phuc-nt/my-translator.git
cd my-translator
npm install
npm run tauri build

Requires: Rust (stable), Node.js 18+, macOS 13+ or Windows 10+.


Star History

Star History Chart

License

MIT