🚀 OmniRoute

May 14, 2026 · View on GitHub

🚀 OmniRoute — The Free AI Gateway

Never stop coding. Save 15-95% eligible tokens with RTK+Caveman compression + auto-fallback to FREE & low-cost AI models.

The most complete open-source AI proxy — one endpoint, 160+ providers, 13 routing strategies, zero downtime. Multi-platform: Web, Desktop (Electron), Mobile (PWA + Termux). Fully extensible via MCP Server (37 tools), A2A Protocol, and Memory/Skills systems. Available in 40+ languages.

$Get \$100 Free AI Credits$

_{🔥 Limited offer: Sign up at AgentRouter and get $100 in free AI credits
Access GPT-5, Claude, Gemini, DeepSeek & 100+ models. No credit card required. Claim your credits →}

🚀 Quick Start • 💡 Features • 🗜️ Compression • 💰 Pricing • 🎯 Use Cases • 🌍 Proxy • ❓ FAQ • 📖 Docs • 💬 WhatsApp

🖼️ Main Dashboard

📸 Dashboard Preview

Click to see dashboard screenshots

Page	Screenshot
Providers
Combos
Analytics
Health
Translator
Settings
CLI Tools
Usage Logs
Endpoints

🤖 Free AI Provider for your favorite coding agents

Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.

OpenClaw _{⭐ 205K}	NanoBot _{⭐ 20.9K}	PicoClaw _{⭐ 14.6K}	ZeroClaw _{⭐ 9.9K}	IronClaw _{⭐ 2.1K}
OpenCode _{⭐ 106K}	Codex CLI _{⭐ 60.8K}	Claude Code _{⭐ 67.3K}	Gemini CLI _{⭐ 94.7K}	Kilo Code _{⭐ 15.5K}

_{📡 All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 — one config, unlimited models and quota}

📺 OmniRoute in Action — Video Guides

🇧🇷 Português
_{Guia completo do OmniRoute}

🇺🇸 English
_{Complete OmniRoute walkthrough}

🇷🇺 Русский
_{Полное руководство по OmniRoute}

🎬 Made a video about OmniRoute? We'd love to feature it here! Open an issue or discussion with the link and we'll add it to this showcase.

🤔 Why OmniRoute?

Stop wasting money, tokens and hitting limits:

❌ Subscription quota expires unused every month ❌ Rate limits stop you mid-coding ❌ Tool outputs (git diff, grep, ls...) burn tokens fast ❌ Expensive APIs ($20-50/month per provider) ❌ Manual switching between providers ❌ Each provider has a different API format ❌ AI providers blocked in your country

OmniRoute solves all of this:

✅ Prompt Compression — auto-compress prompts & tool outputs, save 15-95% eligible tokens per request with RTK+Caveman stacked mode ✅ Maximize subscriptions — track quota, use every bit before reset ✅ Auto fallback — Subscription → API Key → Cheap → Free, zero downtime ✅ Multi-account — round-robin between accounts per provider ✅ Format translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API, any tool works ✅ 3-level proxy — bypass geo-blocks with global, per-provider, and per-key proxies ✅ 10 multi-modal APIs — chat, images, video, music, audio, search in one endpoint ✅ MCP + A2A — 29 MCP tools + agent-to-agent protocol, production-ready ✅ Universal — works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool

📧 Support

💬 Join our community! WhatsApp Group — Get help, share tips, and stay updated.

Website: omniroute.online
GitHub: github.com/diegosouzapw/OmniRoute
Issues: github.com/diegosouzapw/OmniRoute/issues
WhatsApp: Community Group
Contributing: See CONTRIBUTING.md, open a PR, or pick a good first issue
Original Project: 9router by decolua

🐛 Reporting a Bug?

When opening an issue, please run the system-info command and attach the generated file:

npm run system-info

This generates a system-info.txt with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages — everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.

🛠️ Supported CLI Tools

OmniRoute works seamlessly with 16+ AI coding tools — one config, all tools:

Claude Code _Anthropic	Codex CLI _OpenAI	Gemini CLI _Google	Cursor _IDE	OpenClaw _CLI	Antigravity _{VS Code}
Cline _Extension	Continue _Extension	Kilo Code _Extension	Kiro _{AWS IDE}	OpenCode _CLI	Droid _CLI
AMP _CLI	Copilot _GitHub	Windsurf _IDE	Hermes _CLI	Qwen CLI _Alibaba	Custom _{Any tool}

📖 Full setup for each tool: docs/CLI-TOOLS.md

🌐 Supported Providers — 160+

🔐 OAuth Providers

Claude Code _{Anthropic OAuth}	Antigravity _{Google OAuth}	Codex _{OpenAI OAuth}	GitHub Copilot _{GitHub OAuth}	Cursor _{Cursor OAuth}
Kimi Coding _{Moonshot OAuth}	Kilo Code _{Kilo OAuth}	Cline _{Cline OAuth}

🆓 Free Providers (No Cost)

🟢 Kiro AI _{Claude Sonnet/Haiku Unlimited FREE}	🟢 Qoder AI _{Kimi-K2, DeepSeek-R1 Unlimited FREE}	🟢 Pollinations _{GPT-5, Claude, Llama 4 No API key needed}	🟢 Qwen Code _{Qwen3 Coder Plus Unlimited FREE}
🟢 LongCat AI _{Flash-Lite 50M tokens/day}	🟢 Cloudflare AI _{50+ models 10K neurons/day}	🟢 Puter AI _{GPT-4.1, Claude Rate-limited free}	🟢 NVIDIA NIM _{Llama, Mistral 1K req/day free}

🔑 API Key Providers (120+)

OpenAI	Anthropic	Gemini	DeepSeek	Groq	xAI (Grok)
Mistral	OpenRouter	GLM	Kimi	MiniMax	Fireworks
Together AI	Cerebras	Cohere	NVIDIA	Perplexity	SiliconFlow
Nebius	HuggingFace	DeepInfra	SambaNova	Vertex AI	Azure OpenAI
AWS Bedrock	Snowflake	Databricks	Venice.ai	AI21 Labs	Meta Llama

...and 90+ more providers

Alibaba · Amazon Q · AssemblyAI · Baidu Qianfan · Baseten · Black Forest Labs · Blackbox · Brave Search · Bytez · CablyAI · Cartesia · ChatGPT Web · Chutes.ai · Clarifai · Codestral · CrofAI · DataRobot · Deepgram · ElevenLabs · Empower · Exa Search · Fal.ai · Featherless AI · FenayAI · FriendliAI · Galadriel · GigaChat · GitLab Duo · GLHF Chat · GoAPI · Heroku AI · Hyperbolic · IBM watsonx · Inference.net · Inworld · Jina AI · Kilo Gateway · Lambda AI · LaoZhang · Linkup Search · LlamaGate · Maritalk · Modal · Moonshot AI · Morph · Muse Spark · NanoBanana · NanoGPT · NLP Cloud · Nous Research · Novita AI · nScale · OCI · Ollama Cloud · OVHcloud · PiAPI · PlayHT · Poe · Predibase · PublicAI · Qwen Code · Recraft · Reka · Runway · SAP · Scaleway · SearchAPI · SearXNG · Serper · Stability AI · Synthetic · Tavily · TheB.AI · Topaz · Upstage · v0 (Vercel) · Vercel AI Gateway · Volcengine · Voyage AI · W&B Inference · Xiaomi MiMo · You.com · Z.AI · + OpenAI/Anthropic-compatible custom endpoints

🏠 Self-Hosted

LM Studio	Ollama	vLLM	Llamafile	Docker Model Runner
NVIDIA Triton	XInference	oobabooga	ComfyUI	SD WebUI

🔄 How It Works

┌─────────────┐
│  Your CLI   │  (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
│   Tool      │
└──────┬──────┘
       │ http://localhost:20128/v1
       ↓
┌──────────────────────────────────────────────────┐
│              OmniRoute (Smart Router)             │
│  • 🗜️ Prompt Compression (save 15-95% eligible)  │
│  • Format translation (OpenAI ↔ Claude ↔ Gemini) │
│  • Quota tracking + Embeddings + Images          │
│  • Auto token refresh + Rate limit management    │
└──────┬───────────────────────────────────────────┘
       │
       ├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
       │   ↓ quota exhausted
       ├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
       │   ↓ budget limit
       ├─→ [Tier 3: CHEAP] GLM (\$0.6/1M), MiniMax (\$0.2/1M)
       │   ↓ budget limit
       └─→ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)

Result: Never stop coding, minimal cost + 15-95% eligible token savings

🗜️ Prompt Compression — Save 15-95% Eligible Tokens Automatically

Why use many token when few token do trick? OmniRoute's built-in compression pipeline reduces token usage before requests reach the provider. It combines ideas from RTK - Rust Token Killer and Caveman (⭐ 51K+).

How It Works

Every request passes through the compression pipeline transparently — no client changes needed:

┌──────────────────┐     ┌─────────────────────────────┐     ┌──────────────┐
│   Client sends   │────▶│  OmniRoute Compression      │────▶│  Provider    │
│   full prompt    │     │  Pipeline (7 options)        │     │  receives    │
│   (10,000 tok)   │     │                              │     │  compressed  │
│                  │     │  🪶 Lite ........... ~15%     │     │  (~1,080 tok)│
│                  │     │  🪨 Standard ....... ~30%     │     │              │
│                  │     │  ⚡ Aggressive ..... ~50%     │     │  💰 up to 95%│
│                  │     │  🔥 Ultra .......... ~75%     │     │              │
│                  │     │  🧰 RTK ............ 60-90%    │     │              │
│                  │     │  🔗 Stacked ........ 78-95%    │     │              │
└──────────────────┘     └─────────────────────────────┘     └──────────────┘

7 Compression Options

Mode	Savings	Technique	Best For
Off	0%	No compression	When you need exact prompts
🪶 Lite	~15%	Whitespace collapse, dedup system prompts, image URL shortening	Always-on safe default
🪨 Standard (Caveman)	~30%	30+ regex rules: filler removal, context condensation, structural compression, multi-turn dedup	Daily coding with Claude/Codex
⚡ Aggressive	~50%	All standard + progressive message aging + tool result summarization + LLM-based compression	Long sessions with many tool calls
🔥 Ultra	~75%	All aggressive + heuristic token pruning + stopword removal + score-based filtering	Maximum savings when tokens are scarce
🧰 RTK	60-90%	49 command-aware filters, RTK-style JSON DSL, verify gate, trust-gated custom filters	Shell/test/build/git output in agents
🔗 Stacked	78-95%	RTK first, then Caveman input condensation; ~89% with upstream average math	Mixed prompts with tool logs + prose

RTK + Caveman Savings Math

These numbers are based on the upstream project READMEs under _references/_outros:

Source	Upstream claim used by OmniRoute docs
Caveman	`~75%` fewer output tokens; benchmark average `65%` output savings, range `22-87%`; `~46%` input compression tool
RTK	`60-90%` command-output token savings; sample session `~118,000 -> ~23,900` tokens, which is `79.7%` saved (`~80%`)

For the default stacked compression combo, OmniRoute runs:

RTK -> Caveman

When both engines can act on the same tool/context payload, the savings compound:

combined = 1 - (1 - RTK savings) * (1 - Caveman input savings)
average  = 1 - (1 - 0.80) * (1 - 0.46) = 89.2%
range    = 1 - (1 - 0.60..0.90) * (1 - 0.46) = 78.4-94.6%

Caveman output mode is separate from prompt compression. When enabled for responses, use Caveman's own upstream output numbers: 65% average, ~75% headline, 22-87% observed range. Total bill savings depend on the prompt/output mix, but coding-agent sessions are often tool-context heavy, so the RTK -> Caveman combo is the best default for maximum context savings.

Before & After (Standard/Caveman Mode)

🗣️ Before compression (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I would recommend using useMemo to memoize the object."

🪨 After compression (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same answer. 72% less tokens. Zero accuracy loss.

Architecture

Request Body
  │
  ├─ strategySelector.ts ─── Picks mode (config / combo override / auto-trigger)
  │
  ├─ lite.ts ─────────────── Whitespace, dedup, image URLs, redundant content
  ├─ caveman.ts ──────────── 30+ regex rules via cavemanRules.ts
  │   └─ preservation.ts ─── Protects code blocks, URLs, JSON from compression
  ├─ engines/rtk/ ────────── Command detection + JSON DSL filters + raw-output recovery
  ├─ engines/registry.ts ─── Shared engine registry for caveman, RTK, and stacked
  ├─ aggressive.ts ───────── Summarizer + tool result compressor + progressive aging
  │   ├─ summarizer.ts ───── Rule-based message summarization
  │   ├─ toolResultCompressor.ts ── file/grep/shell/JSON/error compression
  │   └─ progressiveAging.ts ──── Older messages → shorter summaries
  └─ ultra.ts ────────────── Heuristic token scoring + pruning
      └─ ultraHeuristic.ts ─ Stopword detection, score thresholds, force-preserve

Configuration

Dashboard → Context & Cache → Caveman / RTK / Compression Combos

Or per-combo override:

{
  "comboOverrides": {
    "my-coding-combo": "standard",
    "my-cheap-combo": "ultra"
  }
}

Auto-trigger: set autoTriggerTokens to automatically enable compression when a request exceeds a token threshold.

Compression combos can also assign a named compression pipeline to routing combos, so a coding combo can use RTK + Caveman while a paid subscription combo stays on lite mode.

🪨 Fun fact: The standard/caveman mode is inspired by Caveman — the viral project that reports 65% average output-token savings while keeping technical accuracy. OmniRoute takes this further with a 7-option pipeline and a default RTK -> Caveman combo that can reach ~89% average savings on eligible tool/context payloads.

📖 Full compression documentation: docs/COMPRESSION_GUIDE.md • docs/RTK_COMPRESSION.md • docs/COMPRESSION_ENGINES.md • docs/COMPRESSION_RULES_FORMAT.md • docs/COMPRESSION_LANGUAGE_PACKS.md

🎯 What OmniRoute Solves

Every developer using AI tools faces these problems daily. OmniRoute solves them all.

#	Problem	OmniRoute Solution
💸	Subscription quota expires mid-coding	Smart 4-Tier Fallback — auto-routes Subscription → API Key → Cheap → Free
🔌	Each provider has a different API format	Format Translation — unified endpoint translates OpenAI ↔ Claude ↔ Gemini ↔ Responses
🌐	AI providers block my country/region	3-Level Proxy — global, per-provider, and per-key proxy with TLS fingerprint spoofing
🆓	Can't afford AI subscriptions	11 Free Providers — Kiro, Qoder, Pollinations, LongCat, Cloudflare AI, NVIDIA NIM...
🔒	Gateway is exposed without protection	API Key Management — scoping, rotation, IP filtering, rate limiting, prompt injection guard
🛑	Provider went down, lost coding flow	Circuit Breakers — auto-failover with cooldown, retry, anti-thundering herd
🔧	Configuring each CLI tool is tedious	CLI Tools Dashboard — one-click setup for Claude Code, Codex, Cursor, OpenClaw, Kilo
🔑	Managing OAuth tokens is hell	Auto Token Refresh — OAuth PKCE for 8 providers, multi-account, LAN/remote fix
📊	Don't know how much I'm spending	Cost Analytics — per-token tracking, budget limits, usage stats per API key
🐛	Can't diagnose errors in AI calls	Unified Logs — 4-tab dashboard (request, proxy, audit, console) + p50/p95/p99 telemetry

📖 See all 31 problems OmniRoute solves

#	Problem	Solution
11	Deploying/maintaining is complex	npm global, Docker multi-arch, Electron, Termux — deploy anywhere
12	Interface is English-only	40+ languages with RTL support
13	Need more than chat (images, audio, video)	10 multi-modal APIs: embeddings, images, video, music, TTS, STT, moderation, rerank, search, batch
14	No way to test/compare models	LLM Evals, Translator Playground, Chat Tester, Live Monitor
15	Need to scale without losing performance	Semantic cache, request dedup, rate limit detection, queue & pacing
16	Want to control model behavior globally	System prompt injection, thinking budget, wildcard routing
17	Need MCP tools as first-class features	29 MCP tools, 3 transports (stdio/SSE/HTTP), 10 scopes, audit trail
18	Need A2A orchestration	JSON-RPC 2.0 + SSE streaming, task lifecycle, sync + stream paths
19	Need real MCP process health	Runtime heartbeat, PID tracking, UI status cards
20	Need auditable MCP execution	SQLite-backed audit with filters, pagination, stats
21	Need scoped MCP permissions	10 granular scopes per integration
22	Need operational controls without redeploying	Combo switches, resilience tuning, breaker resets from dashboard
23	Need A2A task lifecycle visibility	Task listing/filtering, drill-down, cancellation
24	Need active stream metrics	Active stream counters, per-state counts, A2A dashboard cards
25	Need standard agent discovery	Agent Card at `/.well-known/agent.json`
26	Need protocol discoverability	Consolidated Endpoints page with Proxy, MCP, A2A, API tabs
27	Need E2E protocol validation	Real MCP SDK + A2A client flows in `test:protocols:e2e`
28	Need unified observability	Health + audit + telemetry across OpenAI, MCP, and A2A layers
29	Need one runtime for proxy + tools + agents	OpenAI proxy + MCP + A2A in one stack with shared auth/resilience
30	Need agentic workflows without glue-code	Unified endpoint, protocol UIs, production-ready foundations
31	Long sessions crash with context limits	Proactive context compression, structural integrity guards, multi-layer dropping

📖 Deep dives: Resilience Guide • Proxy Guide • Setup Guide • Compression Guide

🆓 Start Free — Zero Configuration Cost

Setup AI coding in minutes at $0/month. Connect these free accounts and use the built-in Free Stack combo.

Step	Action	Providers Unlocked
1	Connect Kiro (AWS Builder ID OAuth)	Claude Sonnet 4.5, Haiku 4.5 — unlimited
2	Connect Qoder (Google OAuth)	kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... — unlimited
3	Connect Qwen (Device Code)	qwen3-coder-plus, qwen3-coder-flash... — unlimited
4	Connect Gemini CLI (Google OAuth)	gemini-3-flash, gemini-2.5-pro — 180K/mo free
5	`/dashboard/combos` → Free Stack ($0) template	Round-robin all free providers automatically

Point any IDE/CLI to: http://localhost:20128/v1 · API Key: any-string · Done.

Optional extra coverage (also free): Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).

⚡ Quick Start

1) Install and run

npm install -g omniroute
omniroute

Dashboard opens at http://localhost:20128 · API at http://localhost:20128/v1.

2) Connect providers

Dashboard → Providers → connect at least one provider (OAuth or API key)
Dashboard → Endpoints → create an API key
Dashboard → Combos → set your fallback chain (optional)

3) Point your coding tool

Base URL: http://localhost:20128/v1
API Key:  [copy from Endpoint page]
Model:    if/kimi-k2-thinking (or any provider/model)

Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and any OpenAI-compatible tool.

📦 More install methods (Docker, source, Arch, Void, pnpm)

Docker:

docker run -d --name omniroute --restart unless-stopped -p 20128:20128 -v omniroute-data:/app/data diegosouzapw/omniroute:latest

From source:

cp .env.example .env && npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev

pnpm: pnpm install -g omniroute && pnpm approve-builds -g && omniroute

Arch Linux (AUR): yay -S omniroute-bin && systemctl --user enable --now omniroute.service

MCP: omniroute --mcp (stdio transport)

CLI options: omniroute setup, omniroute doctor, omniroute providers available, omniroute providers list, omniroute --port 3000, omniroute --no-open, omniroute --help

Split-port mode: PORT=20128 DASHBOARD_PORT=20129 omniroute

Uninstall: npm run uninstall (keeps data) or npm run uninstall:full (removes everything)

📖 Full details: Setup Guide · Docker · Void Linux template

🐳 Docker

OmniRoute is available as a public Docker image on Docker Hub.

Quick run:

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

With environment file:

# Copy and edit .env first
cp .env.example .env

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  --env-file .env \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

Using Docker Compose:

# Base profile (no CLI tools)
docker compose --profile base up -d

# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d

Dashboard support for Docker deployments now includes a one-click Cloudflare Quick Tunnel on Dashboard → Endpoints. The first enable downloads cloudflared only when needed, starts a temporary tunnel to your current /v1 endpoint, and shows the generated https://*.trycloudflare.com/v1 URL directly below your normal public URL. Endpoint tunnel panels, including Cloudflare, Tailscale, and ngrok, can be shown or hidden from Settings → Appearance without changing active tunnel state.

Notes:

Quick Tunnel URLs are temporary and change after every restart.
Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
Managed install currently supports Linux, macOS, and Windows on x64 / arm64.
Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set CLOUDFLARED_PROTOCOL=quic or auto if you want a different transport.
Docker images bundle system CA roots and pass them to managed cloudflared, which avoids TLS trust failures when the tunnel bootstraps inside the container.
SQLite runs in WAL mode. docker stop should be allowed to finish so OmniRoute can checkpoint the latest changes back into storage.sqlite.
The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep --stop-timeout 40 (or similar) so manual stops do not cut off shutdown cleanup.
Set CLOUDFLARED_BIN=/absolute/path/to/cloudflared if you want OmniRoute to use an existing binary instead of downloading one.

Using Docker Compose with Caddy (HTTPS Auto-TLS):

OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.

services:
  omniroute:
    image: diegosouzapw/omniroute:latest
    container_name: omniroute
    restart: unless-stopped
    volumes:
      - omniroute-data:/app/data
    environment:
      - PORT=20128
      - NEXT_PUBLIC_BASE_URL=https://your-domain.com

  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128

volumes:
  omniroute-data:

Image	Tag	Size	Description
`diegosouzapw/omniroute`	`latest`	~250MB	Latest stable release
`diegosouzapw/omniroute`	`3.7.8`	~250MB	Current version

📖 Full Docker documentation: docs/DOCKER_GUIDE.md — Compose profiles, Caddy HTTPS, Cloudflare tunnels, and more.

📱 Multi-Platform — Run Anywhere

OmniRoute runs on Web, Desktop (Electron), Android (Termux), and as a Progressive Web App (PWA).

Platform	Install	Highlights
🖥️ Desktop	`npm run electron:build`	Native window, system tray, auto-start, offline mode — Windows/macOS/Linux
📱 Android	`pkg install nodejs-lts && npx -y omniroute`	ARM native, no root, 24/7 via Termux:Boot — your phone is an AI server
📲 PWA	"Add to Home Screen" in browser	Fullscreen, offline page, service worker caching — Android/iOS/Desktop

🖥️ Desktop App details

Native Electron app with system tray, auto-start, native notifications
One-click install: NSIS (Windows), DMG (macOS), AppImage (Linux)
Dev: npm run electron:dev · Build: npm run electron:build
📖 Full docs: electron/README.md

📱 Android (Termux) details

pkg update && pkg install nodejs-lts python build-essential git
npx -y omniroute@latest

Access from any device on the same network: http://PHONE_IP:20128/v1

📖 Full guide: docs/TERMUX_GUIDE.md

📲 PWA details

Android (Chrome): ⋮ → "Add to Home screen"
iOS (Safari): Share → "Add to Home Screen"
Desktop (Chrome/Edge): Install icon in address bar
📖 Full docs: docs/PWA_GUIDE.md

🌍 Bypass Geographic Blocks — Use AI From Any Country

🇷🇺 🇨🇳 🇮🇷 🇨🇺 🇹🇷 In Russia, China, Iran, or any blocked region? OmniRoute's 3-level proxy system solves this completely.

Level	Badge	Configure In	Use Case
Global	🟢	Settings → Proxy	All traffic through one proxy
Per-Provider	🟡	Provider → Proxy	Only specific providers proxied
Per-Connection	🔵	Connection → Proxy	Each API key uses its own proxy

What gets proxied: API requests ✅ • OAuth flows ✅ • Connection tests ✅ • Token refresh ✅ • Model sync ✅

Protocols: HTTP/HTTPS, SOCKS5 (ENABLE_SOCKS5_PROXY=true), Authenticated proxies

🆓 1proxy — Free Proxy Marketplace

Contributed by @oyi77 — #1847

No proxy? Use the built-in 1proxy integration for hundreds of free, validated proxies worldwide:

One-click sync (up to 500 proxies) • Quality scores (0-100) • Country filter • Auto-rotation (quality/random/sequential) • Auto-degradation • Circuit breaker

Anti-Detection

🔒 TLS Fingerprint Spoofing — browser-like TLS via wreq-js
🔏 CLI Fingerprint Matching — matches native CLI binary signatures
🏠 Proxy IP Preservation — stealth + IP masking simultaneously

📖 Full proxy documentation: docs/PROXY_GUIDE.md

💰 Pricing at a Glance

Tier	Provider	Cost	Quota Reset	Best For
💳 SUBSCRIPTION	Claude Code (Pro)	$20/mo	5h + weekly	Already subscribed
	Codex (Plus/Pro)	$20-200/mo	5h + weekly	OpenAI users
	Gemini CLI	FREE	180K/mo + 1K/day	Everyone!
	GitHub Copilot	$10-19/mo	Monthly	GitHub users
🔑 API KEY	NVIDIA NIM	FREE (dev forever)	~40 RPM	70+ open models
	Cerebras	FREE (1M tok/day)	60K TPM / 30 RPM	World's fastest
	Groq	FREE (30 RPM)	14.4K RPD	Ultra-fast Llama/Gemma
	DeepSeek V3.2	$0.27/$1.10 per 1M	None	Best price/quality reasoning
	xAI Grok-4 Fast	$0.20/$0.50 per 1M 🆕	None	Fastest + tool calling, ultralow
	xAI Grok-4 (standard)	$0.20/$1.50 per 1M 🆕	None	Reasoning flagship from xAI
	Mistral	Free trial + paid	Rate limited	European AI
	OpenRouter	Pay-per-use	None	100+ models aggr.
	AgentRouter 🆕	Pay-per-use	None	$200 free credits at signup
💰 CHEAP	GLM-5 (via Z.AI) 🆕	$0.5/1M	Daily 10AM	128K output, newest flagship
	GLM-4.7	$0.6/1M	Daily 10AM	Budget backup
	MiniMax M2.5 🆕	$0.3/1M input	5-hour rolling	Reasoning + agentic tasks
	MiniMax M2.1	$0.2/1M	5-hour rolling	Cheapest option
	Kimi K2.5 (Moonshot API) 🆕	Pay-per-use	None	Direct Moonshot API access
	Kimi K2	$9/mo flat	10M tokens/mo	Predictable cost
🆓 FREE	Qoder	$0	Unlimited	5 models unlimited
	Qwen	$0	Unlimited	4 models unlimited
	Kiro	$0	Unlimited	Claude Sonnet/Haiku (AWS Builder)
	LongCat Flash-Lite 🆕	$0 (50M tok/day 🔥)	1 RPS	Largest free quota on Earth
	Pollinations AI 🆕	$0 (no key needed)	1 req/15s	GPT-5, Claude, DeepSeek, Llama 4
	Cloudflare Workers AI 🆕	$0 (10K Neurons/day)	~150 resp/day	50+ models, global edge
	Scaleway AI 🆕	$0 (1M tokens total)	Rate limited	EU/GDPR, Qwen3 235B, Llama 70B

🆕 New models added (Mar 2026): Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.

💡 See the full $0 Free Stack (11 providers) below.

💡 Understanding Dashboard Costs:

The "cost" displayed in the Usage Analytics page is for tracking and comparison purposes only. OmniRoute itself never charges you anything — it's free, open-source software running on your machine. If your dashboard shows "$290 total cost" while using free models, that's how much you saved compared to paid API pricing. Think of it as a savings tracker, not a bill.

🆓 Free Models — 11 Providers, $0 Forever

Combine all free providers into one unbreakable combo — OmniRoute auto-routes between them when quota runs out.

Provider	Prefix	Free Models	Quota
Kiro	`kr/`	Claude Sonnet 4.5, Haiku 4.5, Opus 4.6	50 CREDITS per month
Qoder	`if/`	kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2.1	♾️ Unlimited
Qwen	`qw/`	qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next	♾️ Unlimited
Pollinations	`pol/`	GPT-5, Claude, Gemini, DeepSeek, Llama 4, Mistral	No key needed
LongCat	`lc/`	LongCat-Flash-Lite	50M tokens/day 🔥
Gemini CLI	`gc/`	gemini-3-flash, gemini-2.5-pro	180K tok/mo
Cloudflare AI	`cf/`	50+ models (Llama, Gemma, Mistral, Whisper)	10K Neurons/day
Groq	`groq/`	Llama 3.3 70B, Qwen3 32B, Kimi K2	14.4K RPD
NVIDIA NIM	`nvidia/`	129 models (DeepSeek, Llama, GLM, Kimi)	~40 RPM
Cerebras	`cerebras/`	Qwen3 235B, GPT-OSS 120B, Llama 3.1	1M tok/day
Scaleway	`scw/`	Qwen3 235B, Llama 70B, DeepSeek V3	1M tokens (EU)

📖 25+ more free providers — Groq, Cerebras, Mistral, GitHub Models, OpenRouter, and more

Also free (API Key required): Mistral (1B tok/month) · OpenRouter (35+ :free models) · GitHub Models (GPT-5, 45+ models) · Cohere (1K calls/month) · Z.AI/GLM (permanent free Flash models) · SiliconFlow (1K RPM, 50K TPM) · Kilo Code (~200 req/hr auto-router) · HuggingFace ($0.10/mo credits) · Ollama Cloud (400+ models) · LLM7.io (30+ models) · Kluster AI · IBM watsonx (300K tok/month) · OpenCode Zen · Vercel AI Gateway ($5/mo)

Trial credits (one-time): Baseten ($30) · NLP Cloud ($15) · AI21 ($10) · Upstage ($10) · SambaNova ($5) · Modal ($5/mo) · Fireworks ($1) · Nebius ($1) · Inference.net ($1 + $25 survey) · Hyperbolic ($1) · Novita ($0.50)

China-based (free tiers): ModelScope · Tencent Hunyuan · Volcengine · ChatAnywhere · InternAI · Bigmodel

Combined capacity: ~31,000+ RPD · ~32B+ tokens/month · 500+ models · $0

📖 Complete free provider directory: docs/FREE_TIERS.md — 25+ providers, quotas, base URLs, model tables, and OmniRoute combo setup.

🎙️ Free Transcription Combo

Transcribe any audio/video for $0 — Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.

Provider	Free Credits	Best Model	Rate Limit
🟢 Deepgram	$200 free (signup)	`nova-3` — best accuracy, 30+ languages	No RPM limit on free credits
🔵 AssemblyAI	$50 free (signup)	`universal-3-pro` — chapters, sentiment, PII	No RPM limit on free credits
🔴 Groq	Free forever	`whisper-large-v3` — OpenAI Whisper	30 RPM (rate limited)

Suggested combo in /dashboard/combos:

Name: free-transcription
Strategy: Priority
Nodes:
  [1] deepgram/nova-3          → uses \$200 free first
  [2] assemblyai/universal-3-pro → fallback when Deepgram credits run out
  [3] groq/whisper-large-v3    → free forever, emergency fallback

Then in /dashboard/media → Transcription tab: upload any audio or video file → select your combo endpoint → get transcription in supported formats.

💡 Key Features

4,690+ automated tests across 517 test files. Not just a relay — a full operational platform.

Feature	Why It Matters
🧠 Smart 4-Tier Fallback — Subscription → API → Cheap → Free	Never stop coding, zero downtime
🔄 Format Translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API	Works with ANY CLI tool
🗜️ Prompt Compression — 7 options including Caveman, RTK, and stacked pipelines	Save 15-95% eligible tokens
🤖 MCP Server — 37 tools, 3 transports (stdio/SSE/HTTP), 10 scopes	IDE/agent tool integration
🛡️ Resilience Engine — circuit breakers, cooldowns, TLS spoofing, anti-thundering herd	Auto-recovery from any failure
🎵 10 Multi-Modal APIs — chat, embed, images, video, music, TTS, STT, moderation, rerank, search	One endpoint for everything
🌍 3-Level Proxy — global, per-provider, per-key + 1proxy free marketplace	Access AI from any country
📊 Full Observability — unified logs, p50/p95/p99 telemetry, cost tracking, budget controls	Know exactly what's happening

📋 Complete feature list — 30+ capabilities

Routing & Intelligence

13 balancing strategies (priority, weighted, round-robin, P2C, cost-optimized, context-relay...)
Task-aware smart routing (coding/vision/analysis) · Context relay session handoffs
Thinking budget controls (passthrough/auto/custom) · Wildcard routing · System prompt injection

Translation & Compatibility

Auto token refresh (OAuth PKCE for 8 providers) · Multi-account round-robin
Responses API — full /v1/responses for Codex · Batch API with Files API
OpenAPI 3.0 live spec + Try-It UI

Protocols

A2A Server — JSON-RPC 2.0, SSE streaming, task lifecycle, skills
ACP — CLI agent discovery (14 agents + custom)

Platform

Desktop (Electron) · Android (Termux) · PWA · Docker (AMD64 + ARM64)
Cloudflare / Tailscale / ngrok tunnels · 40+ languages with RTL
Semantic + signature cache (two-tier) · Request idempotency + deduplication

Observability

Health dashboard — uptime, breakers, cache, lockouts
Evaluation framework — golden set testing · Webhooks · Compliance audit

v3.6+ Highlights: V1 WebSocket Bridge · Sync Tokens & Config Bundle · GLM Thinking (glmt) · Hybrid Token Counting · Safe Outbound Fetch · Wait For Cooldown · Runtime Env Validation · Vision Bridge · Grok-4 Fast · GLM-5 via Z.AI · MiniMax M2.5 · toolCalling flag · Multilingual Intent Detection · Benchmark-Driven Fallbacks · Request Deduplication

Architecture Examples:

Combo: "my-coding-stack"              Format Translation:
  1. cc/claude-opus-4-7                 CLI → OpenAI format
  2. nvidia/llama-3.3-70b               OmniRoute → translates
  3. glm/glm-4.7                        Provider → native format
  4. if/kimi-k2-thinking

📖 MCP Server README · A2A Server README · Resilience Guide · Features Gallery

🎯 Use Cases — Ready-Made Combo Playbooks

Case 0: "I want zero-config, auto-routing NOW"

Problem: Don't want to create combos manually. Just want AI routing to work immediately.

# No combo creation needed! Use auto/ prefix directly:
model: "auto"           # Default LKGP routing across all connected providers
model: "auto/coding"    # Quality-first weights for code generation
model: "auto/fast"      # Low-latency routing (fastest provider first)
model: "auto/cheap"     # Cost-optimized (cheapest per token)
model: "auto/offline"   # High availability (most quota available)
model: "auto/smart"     # Best discovery (10% exploration rate)

How it works:

Add providers in Dashboard → Providers (OAuth or API key)
Use auto/ prefix in any AI tool — no combo creation needed
OmniRoute dynamically builds a virtual combo from your active connections
Routes using LKGP (Last Known Good Provider) + 6-factor scoring
Session stickiness ensures consistent provider selection

Dashboard indicator: A blue banner at the top shows "Auto-Routing Active" with a link to /dashboard/combos for configuration.

Monthly cost: $0 (uses your existing free providers) or whatever your connected providers cost

Case 1: "I have a Claude Pro subscription"

Problem: Quota expires unused, rate limits during heavy coding sessions.

Combo: "maximize-claude"
  1. cc/claude-opus-4-7        (use subscription fully)
  2. glm/glm-5.1               (cheap backup when quota out — \$0.5/1M)
  3. kr/claude-sonnet-4.5      (free emergency fallback via Kiro)

Compression: standard (caveman) — saves 30% tokens = stretch quota further
Monthly cost: \$20 (subscription) + ~\$3 (backup) = \$23 total
vs. \$20 + hitting limits + lost productivity = frustration

Case 2: "I want $0 forever"

Problem: Can't afford subscriptions, need reliable AI for coding.

Combo: "free-forever"
  1. kr/claude-sonnet-4.5      (Claude 4.5 free unlimited via Kiro)
  2. if/kimi-k2-thinking       (reasoning model free via Qoder)
  3. pol/gpt-5                 (GPT-5 free via Pollinations — no key)
  4. lc/longcat-flash-lite     (50M tokens/day free backup)

Compression: aggressive — saves 50% tokens = double your free quota
Monthly cost: \$0
Quality: Production-ready models + 50% token savings

Case 3: "I need 24/7 coding, no interruptions"

Problem: Deadlines, can't afford any downtime.

Combo: "always-on"
  1. cc/claude-opus-4-7        (best quality — subscription)
  2. cx/gpt-5.5                (second subscription — OpenAI)
  3. glm/glm-5.1               (cheap, resets daily — \$0.5/1M)
  4. minimax/MiniMax-M2.5      (cheapest paid — \$0.3/1M)
  5. kr/claude-sonnet-4.5      (free unlimited — never fails)

Compression: lite — saves 15% tokens passively, zero risk
Result: 5 layers of fallback = zero downtime
Monthly cost: \$20-200 (subscriptions) + \$5-10 (backup)

Case 4: "I'm in a blocked region (Russia, China, Iran...)"

Problem: AI providers block my country, VPNs are slow.

Combo: "unblocked-ai"
  1. kr/claude-sonnet-4.5      (free via Kiro + proxy)
  2. pol/deepseek-r1           (Pollinations — no geo-block)
  3. groq/llama-3.3-70b       (Groq + proxy)

Proxy: Global proxy set in Settings → or per-provider proxy override
Result: Access ALL providers from ANY country
Monthly cost: \$0 (free providers) + \$0 (1proxy free marketplace)

Case 5: "I want maximum token savings"

Problem: Token costs are eating my budget, need to squeeze every token.

Combo: "ultra-saver"
  1. cc/claude-opus-4-7        (subscription — best quality)
  2. glm/glm-5.1               (cheap backup)

Compression: ultra — saves 75% tokens
Result: 10K token prompt → 2.5K tokens sent
Montly savings: ~\$150-300/month in token costs for heavy users

🧪 Evaluations (Evals)

OmniRoute includes a built-in evaluation framework to test LLM response quality against a golden set. Access it via Analytics → Evals in the dashboard.

Built-in Golden Set

The pre-loaded "OmniRoute Golden Set" contains test cases for:

Greetings, math, geography, code generation
JSON format compliance, translation, markdown generation
Safety refusal (harmful content), counting, boolean logic

Evaluation Strategies

Strategy	Description	Example
`exact`	Output must match exactly	`"4"`
`contains`	Output must contain substring (case-insensitive)	`"Paris"`
`regex`	Output must match regex pattern	`"1.2.3"`
`custom`	Custom JS function returns true/false	`(output) => output.length > 10`

📖 Setup Guide

Connect Your Coding Tool

Point any OpenAI-compatible tool to OmniRoute:

Base URL: http://localhost:20128/v1
API Key:  [from Dashboard → Endpoints]

Tool	Config Location
Claude Code	`claude mcp add-server omniroute --type http --url http://localhost:20128/api/mcp/stream`
Codex CLI	`OPENAI_BASE_URL=http://localhost:20128/v1 OPENAI_API_KEY=your-key codex`
Cursor	Settings → Models → Add Model → Override Base URL
Cline	Extension settings → Custom API Base URL
OpenClaw	`OPENAI_BASE_URL=http://localhost:20128/v1 openclaw`
Gemini CLI	Uses native OAuth via OmniRoute — connect in Providers

Protocols (MCP + A2A)

# MCP (stdio transport)
omniroute --mcp

# A2A (JSON-RPC 2.0)
curl http://localhost:20128/.well-known/agent.json

Key Environment Variables

Variable	Default	Purpose
`PORT`	`20128`	API and dashboard port
`DASHBOARD_PORT`	—	Separate dashboard port (split-port mode)
`REQUIRE_API_KEY`	`false`	Require API key for all requests
`DATA_DIR`	`~/.omniroute`	Database and config storage
`REQUEST_TIMEOUT_MS`	`600000`	Upstream response timeout

📖 Full Setup Guide — All CLI tools, protocols, and environment variables

📖 Complete documentation:

User Guide — Providers, combos, CLI integration
API Reference — All endpoints with examples
MCP Server — 37 tools, IDE configs
A2A Server — JSON-RPC, skills, streaming
Environment Config — Complete .env reference
VM Deployment — VM + nginx + Cloudflare

❓ Frequently Asked Questions

📊 Why does my dashboard show high costs if I'm using free models?

The dashboard tracks your token usage and displays estimated costs as if you were using paid APIs directly. This is not actual billing — it's a reference to show how much you're saving.

Example:

Dashboard shows: "$290 total cost"
Reality: You're using Kiro + Qoder (FREE unlimited)
Your actual cost: $0.00
What $290 means: Amount you saved by using free models instead of paid APIs!

The cost display is a "savings tracker" to help you understand your usage patterns and optimization opportunities.

💳 Will I be charged by OmniRoute?

No. OmniRoute is free, open-source software that runs on your own computer. It never charges you anything.

You only pay:

✅ Subscription providers (Claude Code $20/mo, Codex $20-200/mo) → Pay them directly on their websites
✅ API key providers (DeepSeek, xAI, etc.) → Pay them directly, OmniRoute just routes your requests
❌ OmniRoute itself → Never charges anything, ever

OmniRoute is a local proxy/router. It doesn't have your credit card, can't send invoices, and has no billing system. It's completely free software.

🆓 Are FREE providers really unlimited?

Yes! The current FREE providers are genuinely free with no hidden charges:

Kiro AI: Free unlimited Claude Sonnet/Haiku via AWS Builder ID / Google / GitHub OAuth
Qoder: Free unlimited kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 via PAT token
Pollinations AI: No API key needed — GPT-5, Claude, DeepSeek, Llama 4
LongCat Flash-Lite: 50M tokens/day — largest free quota available
Cloudflare Workers AI: 10K Neurons/day — 50+ models at the edge

OmniRoute just routes your requests to them — there's no "catch" or future billing.

💰 How do I minimize my actual AI costs?

Free-First Strategy:

Start with 100% free combo:

1. kr/claude-sonnet-4.5    (Kiro — unlimited free)
2. if/kimi-k2-thinking     (Qoder — unlimited free)
3. pol/gpt-5               (Pollinations — no key needed)

Cost: $0/month

Enable Prompt Compression — even lite mode saves ~15% passively
Add cheap backup only if you need it:
```
4. glm/glm-5.1  (\$0.5/1M tokens)
```
Additional cost: Only pay for what you actually use
Use subscription providers last — only if you already have them. OmniRoute helps maximize their value through quota tracking.

Result: Most users can operate at $0/month using only free tiers!

🗜️ Will compression affect response quality?

No. Compression only affects the input (your prompt), not the model's response. Each mode has been designed to preserve technical accuracy:

Lite (~15%): Only whitespace/formatting — zero semantic change
Standard (~30%): Removes filler words ("please", "I think", "basically") — same meaning
Aggressive (~50%): Summarizes old messages + compresses tool outputs — core context preserved
Ultra (~75%): Heuristic pruning — use only when token budget is critical

Code blocks, URLs, JSON, and structured data are always protected from compression via the preservation engine.

🌍 Does OmniRoute work in countries where AI is blocked?

Yes! OmniRoute has a 3-level proxy system:

Global proxy — all requests go through your proxy
Per-provider proxy — different proxy per provider
Per-API-key proxy — different proxy per key

Plus the 1proxy free marketplace for community-shared proxies. Users in Russia, China, Iran, and other restricted regions can access all 160+ providers through OmniRoute's proxy infrastructure.

See the Proxy Guide for setup instructions.

🐛 Troubleshooting

Problem	Quick Fix
"Language model did not provide messages"	Provider quota exhausted → check quota tracker, use combo fallback
Rate limiting (429)	Add fallback combo: `cc/claude → glm/glm-4.7 → if/kimi-k2-thinking`
OAuth token expired	Auto-refreshed by OmniRoute. If stuck: delete + re-auth in Providers
`unsupported_country_region_territory`	Configure proxy in Settings → Proxy (see Proxy Guide)
Docker SQLite locks	Use `--stop-timeout 40` for clean WAL checkpoint on shutdown
Node.js runtime errors	Use Node.js `>=20.20.2 <21`, `>=22.22.2 <23`, or `>=24.0.0 <25` (24 LTS recommended)
`system-info` for bug reports	Run `npm run system-info` and attach `system-info.txt` to your issue

📖 Full troubleshooting guide: docs/TROUBLESHOOTING.md

🛠️ Tech Stack

Click to expand tech stack details

Runtime: Node.js 20.20.2+, 22.22.2+, or 24.x LTS (24 LTS recommended)
Language: TypeScript 5.9 — 100% TypeScript across src/ and open-sse/ (zero any in core modules since v2.0)
Framework: Next.js 16 + React 19 + Tailwind CSS 4
Database: better-sqlite3 (SQLite) + LowDB (JSON legacy) — domain state, proxy logs, MCP audit, routing decisions, memory, skills
Schemas: Zod (MCP tool I/O validation, API contracts)
Protocols: MCP (stdio/HTTP) + A2A v0.3 (JSON-RPC 2.0 + SSE)
Streaming: Server-Sent Events (SSE) + WebSocket bridge (/v1/ws)
Auth: OAuth 2.0 (PKCE) + JWT + API Keys + MCP Scoped Authorization
Testing: Node.js test runner + Vitest (4,690+ test cases across 517 files — unit, integration, E2E, security, ecosystem)
Platforms: Desktop (Electron), Android (Termux), PWA (any browser)
CI/CD: GitHub Actions (auto npm publish + Docker Hub on release)
Website: omniroute.online
Package: npmjs.com/package/omniroute
Docker: hub.docker.com/r/diegosouzapw/omniroute
Resilience: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing, auto-combo self-healing

📖 Documentation

📘 Getting Started

Document	Description
User Guide	Providers, combos, CLI integration, deployment
Setup Guide	Full install methods, CLI tool configs, protocol setup, timeout tuning
CLI Tools Guide	Per-tool setup for Claude Code, Codex, Cursor, Cline, OpenClaw, Kilo, Copilot
Quick Start	3-step install → connect → configure

🔧 Operations & Deployment

Document	Description
Docker Guide	Docker run, Compose profiles, Caddy HTTPS, tunnels, image tags
VM Deployment	Complete guide: VM + nginx + Cloudflare setup
Fly.io Deployment	Deploy to Fly.io with persistent storage
Termux Guide	Run OmniRoute on Android via Termux
PWA Guide	Progressive Web App install, caching, architecture
Uninstall Guide	Clean removal for all install methods
Environment Config	Complete `.env` variables and references

🧠 Features & Architecture

Document	Description
Architecture	System architecture, data flow, and internals
Compression Guide	7-option pipeline: off / lite / standard / aggressive / ultra / RTK / stacked
RTK Compression	Command-output compression, filters, trust, verify, raw-output recovery
Compression Engines	Caveman, RTK, stacked pipelines, dashboard/API/MCP surfaces
Compression Rules Format	JSON rule-pack schemas for Caveman and RTK filters
Compression Language Packs	Language detection and Caveman rule-pack authoring
Resilience Guide	Circuit breakers, cooldowns, queue, anti-thundering herd, TLS spoofing
Auto-Combo Engine	6-factor scoring, mode packs, self-healing
Proxy Guide	3-level proxy system, 1proxy marketplace, registry CRUD
Free Tiers	25+ free API providers consolidated directory
Features Gallery	Visual dashboard tour with screenshots
Codebase Documentation	Beginner-friendly codebase walkthrough

🤖 Protocols & APIs

Document	Description
API Reference	All endpoints with examples
OpenAPI Spec	OpenAPI 3.0 specification
MCP Server	29 MCP tools, IDE configs, Python/TS/Go clients
MCP Server Guide	MCP installation, transports, and tool reference
A2A Server	JSON-RPC 2.0 protocol, skills, streaming, task mgmt
A2A Server Guide	A2A agent card, tasks, skills, and streaming

📋 Project & Quality

Document	Description
Contributing	Development setup and guidelines
Security Policy	Vulnerability reporting and security practices
i18n Guide	40+ language support, translation workflow, RTL
Release Checklist	Pre-release validation steps
Coverage Plan	Test coverage strategy and 4,690+ test suite

⭐ Top Contributors

OmniRoute is shaped by a passionate open-source community. These individuals have made exceptional contributions that directly impact the quality, stability, and reach of the project. Thank you.

oyi77
_{🥇 190 commits • +72K lines}
_{Analytics engine, SQL aggregations,
proxy marketplace, test coverage}

Chris Staley
_{🥈 72 commits • +5.7K lines}
_{SSE stream hardening, Responses API,
Gemini pagination, test regression fixes}

zenobit
_{🥉 62 commits • +24K lines}
_{CI/CD pipeline, i18n for 33 languages,
Void Linux package, platform fixes}

R.D. & Randi
_{🏅 107 commits • +28K lines}
_{Endpoints page, tunnel integrations,
Docker workflows, A2A status, compression UI}

benzntech
_{🏅 20 commits • +7.5K lines}
_{Electron desktop app, auto-updater,
release build workflows, cross-platform CI}

🙏 These contributors' features, bug fixes, and infrastructure improvements are a core part of what makes OmniRoute reliable and feature-rich. Every pull request, every test case, and every i18n translation file matters. Open source is built by people like them.

👥 Contributors

How to Contribute

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Releasing a New Version

# Create a release — npm publish happens automatically
gh release create v2.0.0 --title "v2.0.0" --generate-notes

📊 Star History

🌍 StarMapper

🙏 Acknowledgments

Special thanks to 9router by decolua — the original project that inspired this fork. OmniRoute builds upon that incredible foundation with additional features, multi-modal APIs, and a full TypeScript rewrite.

Special thanks to CLIProxyAPI by router-for-me — the original Go implementation that inspired this JavaScript port.

Special thanks to Caveman by JuliusBrussee (⭐ 51K+) — the viral "why use many token when few token do trick" project whose caveman-speak compression philosophy inspired OmniRoute's standard compression mode and 30+ filler/condensation regex rules.

Special thanks to RTK - Rust Token Killer by RTK AI — the high-performance command-output compression project whose terminal, build, test, git, and tool-output filtering model inspired OmniRoute's RTK engine, JSON filter DSL, raw-output recovery, and stacked RTK → Caveman compression pipeline.

📄 License

MIT License - see LICENSE for details.

⬆ Back to top · Built with ❤️ for the open-source AI community.

_{OmniRoute v3.8.0 · Node ≥22.22.2 · MIT License · omniroute.online}