๐Ÿš€ OmniRoute

May 14, 2026 ยท View on GitHub

๐Ÿš€ OmniRoute โ€” The Free AI Gateway

Never stop coding. Save 15-95% eligible tokens with RTK+Caveman compression + auto-fallback to FREE & low-cost AI models.

The most complete open-source AI proxy โ€” one endpoint, 160+ providers, 13 routing strategies, zero downtime. Multi-platform: Web, Desktop (Electron), Mobile (PWA + Termux). Fully extensible via MCP Server (37 tools), A2A Protocol, and Memory/Skills systems. Available in 40+ languages.

npm License: MIT Node Stars Trendshift


Get \$100 Free AI Credits

๐Ÿ”ฅ Limited offer: Sign up at AgentRouter and get $100 in free AI credits
Access GPT-5, Claude, Gemini, DeepSeek & 100+ models. No credit card required. Claim your credits โ†’


diegosouzapw%2FOmniRoute | Trendshift

๐Ÿš€ Quick Start โ€ข ๐Ÿ’ก Features โ€ข ๐Ÿ—œ๏ธ Compression โ€ข ๐Ÿ’ฐ Pricing โ€ข ๐ŸŽฏ Use Cases โ€ข ๐ŸŒ Proxy โ€ข โ“ FAQ โ€ข ๐Ÿ“– Docs โ€ข ๐Ÿ’ฌ WhatsApp


๐ŸŒ Available in: ๐Ÿ‡บ๐Ÿ‡ธ English | ๐Ÿ‡ง๐Ÿ‡ท Portuguรชs (Brasil) | ๐Ÿ‡ช๐Ÿ‡ธ Espaรฑol | ๐Ÿ‡ซ๐Ÿ‡ท Franรงais | ๐Ÿ‡ฎ๐Ÿ‡น Italiano | ๐Ÿ‡ท๐Ÿ‡บ ะ ัƒััะบะธะน | ๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡ (็ฎ€ไฝ“) | ๐Ÿ‡ฉ๐Ÿ‡ช Deutsch | ๐Ÿ‡ฎ๐Ÿ‡ณ เคนเคฟเคจเฅเคฆเฅ€ | ๐Ÿ‡น๐Ÿ‡ญ เน„เธ—เธข | ๐Ÿ‡บ๐Ÿ‡ฆ ะฃะบั€ะฐั—ะฝััŒะบะฐ | ๐Ÿ‡ธ๐Ÿ‡ฆ ุงู„ุนุฑุจูŠุฉ | ๐Ÿ‡ฏ๐Ÿ‡ต ๆ—ฅๆœฌ่ชž | ๐Ÿ‡ป๐Ÿ‡ณ Tiแบฟng Viแป‡t | ๐Ÿ‡ง๐Ÿ‡ฌ ะ‘ัŠะปะณะฐั€ัะบะธ | ๐Ÿ‡ฉ๐Ÿ‡ฐ Dansk | ๐Ÿ‡ซ๐Ÿ‡ฎ Suomi | ๐Ÿ‡ฎ๐Ÿ‡ฑ ืขื‘ืจื™ืช | ๐Ÿ‡ญ๐Ÿ‡บ Magyar | ๐Ÿ‡ฎ๐Ÿ‡ฉ Bahasa Indonesia | ๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด | ๐Ÿ‡ฒ๐Ÿ‡พ Bahasa Melayu | ๐Ÿ‡ณ๐Ÿ‡ฑ Nederlands | ๐Ÿ‡ณ๐Ÿ‡ด Norsk | ๐Ÿ‡ต๐Ÿ‡น Portuguรชs (Portugal) | ๐Ÿ‡ท๐Ÿ‡ด Romรขnฤƒ | ๐Ÿ‡ต๐Ÿ‡ฑ Polski | ๐Ÿ‡ธ๐Ÿ‡ฐ Slovenฤina | ๐Ÿ‡ธ๐Ÿ‡ช Svenska | ๐Ÿ‡ต๐Ÿ‡ญ Filipino | ๐Ÿ‡จ๐Ÿ‡ฟ ฤŒeลกtina

npm version NPM Weekly NPM Monthly NPM Yearly

Docker Hub Docker Pulls Electron Downloads license

total contributions github streak Website WhatsApp


๐Ÿ–ผ๏ธ Main Dashboard

OmniRoute Dashboard

๐Ÿ“ธ Dashboard Preview

Click to see dashboard screenshots
PageScreenshot
ProvidersProviders
CombosCombos
AnalyticsAnalytics
HealthHealth
TranslatorTranslator
SettingsSettings
CLI ToolsCLI Tools
Usage LogsUsage
EndpointsEndpoints

๐Ÿค– Free AI Provider for your favorite coding agents

Connect any AI-powered IDE or CLI tool through OmniRoute โ€” free API gateway for unlimited coding.

OpenClaw
OpenClaw

โญ 205K
NanoBot
NanoBot

โญ 20.9K
PicoClaw
PicoClaw

โญ 14.6K
ZeroClaw
ZeroClaw

โญ 9.9K
IronClaw
IronClaw

โญ 2.1K
OpenCode
OpenCode

โญ 106K
Codex CLI
Codex CLI

โญ 60.8K
Claude Code
Claude Code

โญ 67.3K
Gemini CLI
Gemini CLI

โญ 94.7K
Kilo Code
Kilo Code

โญ 15.5K

๐Ÿ“ก All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 โ€” one config, unlimited models and quota


๐Ÿ“บ OmniRoute in Action โ€” Video Guides

OmniRoute โ€” Guia em Portuguรชs
๐Ÿ‡ง๐Ÿ‡ท Portuguรชs
Guia completo do OmniRoute
OmniRoute โ€” English Guide
๐Ÿ‡บ๐Ÿ‡ธ English
Complete OmniRoute walkthrough
OmniRoute โ€” ะ ัƒะบะพะฒะพะดัั‚ะฒะพ ะฝะฐ ั€ัƒััะบะพะผ
๐Ÿ‡ท๐Ÿ‡บ ะ ัƒััะบะธะน
ะŸะพะปะฝะพะต ั€ัƒะบะพะฒะพะดัั‚ะฒะพ ะฟะพ OmniRoute

๐ŸŽฌ Made a video about OmniRoute? We'd love to feature it here! Open an issue or discussion with the link and we'll add it to this showcase.


๐Ÿค” Why OmniRoute?

Stop wasting money, tokens and hitting limits:

โŒ Subscription quota expires unused every month โŒ Rate limits stop you mid-coding โŒ Tool outputs (git diff, grep, ls...) burn tokens fast โŒ Expensive APIs ($20-50/month per provider) โŒ Manual switching between providers โŒ Each provider has a different API format โŒ AI providers blocked in your country

OmniRoute solves all of this:

โœ… Prompt Compression โ€” auto-compress prompts & tool outputs, save 15-95% eligible tokens per request with RTK+Caveman stacked mode โœ… Maximize subscriptions โ€” track quota, use every bit before reset โœ… Auto fallback โ€” Subscription โ†’ API Key โ†’ Cheap โ†’ Free, zero downtime โœ… Multi-account โ€” round-robin between accounts per provider โœ… Format translation โ€” OpenAI โ†” Claude โ†” Gemini โ†” Responses API, any tool works โœ… 3-level proxy โ€” bypass geo-blocks with global, per-provider, and per-key proxies โœ… 10 multi-modal APIs โ€” chat, images, video, music, audio, search in one endpoint โœ… MCP + A2A โ€” 29 MCP tools + agent-to-agent protocol, production-ready โœ… Universal โ€” works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool


๐Ÿ“ง Support

๐Ÿ’ฌ Join our community! WhatsApp Group โ€” Get help, share tips, and stay updated.

๐Ÿ› Reporting a Bug?

When opening an issue, please run the system-info command and attach the generated file:

npm run system-info

This generates a system-info.txt with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages โ€” everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.


๐Ÿ› ๏ธ Supported CLI Tools

OmniRoute works seamlessly with 16+ AI coding tools โ€” one config, all tools:

Claude Code
Anthropic
Codex CLI
OpenAI
Gemini CLI
Google
Cursor
IDE
OpenClaw
CLI
Antigravity
VS Code
Cline
Extension
Continue
Extension
Kilo Code
Extension
Kiro
AWS IDE
OpenCode
CLI
Droid
CLI
AMP
CLI
Copilot
GitHub
Windsurf
IDE
Hermes
CLI
Qwen CLI
Alibaba
Custom
Any tool

๐Ÿ“– Full setup for each tool: docs/CLI-TOOLS.md


๐ŸŒ Supported Providers โ€” 160+

๐Ÿ” OAuth Providers

Claude Code
Anthropic OAuth
Antigravity
Google OAuth
Codex
OpenAI OAuth
GitHub Copilot
GitHub OAuth
Cursor
Cursor OAuth
Kimi Coding
Moonshot OAuth
Kilo Code
Kilo OAuth
Cline
Cline OAuth

๐Ÿ†“ Free Providers (No Cost)

๐ŸŸข Kiro AI
Claude Sonnet/Haiku
Unlimited FREE
๐ŸŸข Qoder AI
Kimi-K2, DeepSeek-R1
Unlimited FREE
๐ŸŸข Pollinations
GPT-5, Claude, Llama 4
No API key needed
๐ŸŸข Qwen Code
Qwen3 Coder Plus
Unlimited FREE
๐ŸŸข LongCat AI
Flash-Lite
50M tokens/day
๐ŸŸข Cloudflare AI
50+ models
10K neurons/day
๐ŸŸข Puter AI
GPT-4.1, Claude
Rate-limited free
๐ŸŸข NVIDIA NIM
Llama, Mistral
1K req/day free

๐Ÿ”‘ API Key Providers (120+)

OpenAI Anthropic Gemini DeepSeek Groq xAI (Grok)
Mistral OpenRouter GLM Kimi MiniMax Fireworks
Together AI Cerebras Cohere NVIDIA Perplexity SiliconFlow
Nebius HuggingFace DeepInfra SambaNova Vertex AI Azure OpenAI
AWS Bedrock Snowflake Databricks Venice.ai AI21 Labs Meta Llama
...and 90+ more providers

Alibaba ยท Amazon Q ยท AssemblyAI ยท Baidu Qianfan ยท Baseten ยท Black Forest Labs ยท Blackbox ยท Brave Search ยท Bytez ยท CablyAI ยท Cartesia ยท ChatGPT Web ยท Chutes.ai ยท Clarifai ยท Codestral ยท CrofAI ยท DataRobot ยท Deepgram ยท ElevenLabs ยท Empower ยท Exa Search ยท Fal.ai ยท Featherless AI ยท FenayAI ยท FriendliAI ยท Galadriel ยท GigaChat ยท GitLab Duo ยท GLHF Chat ยท GoAPI ยท Heroku AI ยท Hyperbolic ยท IBM watsonx ยท Inference.net ยท Inworld ยท Jina AI ยท Kilo Gateway ยท Lambda AI ยท LaoZhang ยท Linkup Search ยท LlamaGate ยท Maritalk ยท Modal ยท Moonshot AI ยท Morph ยท Muse Spark ยท NanoBanana ยท NanoGPT ยท NLP Cloud ยท Nous Research ยท Novita AI ยท nScale ยท OCI ยท Ollama Cloud ยท OVHcloud ยท PiAPI ยท PlayHT ยท Poe ยท Predibase ยท PublicAI ยท Qwen Code ยท Recraft ยท Reka ยท Runway ยท SAP ยท Scaleway ยท SearchAPI ยท SearXNG ยท Serper ยท Stability AI ยท Synthetic ยท Tavily ยท TheB.AI ยท Topaz ยท Upstage ยท v0 (Vercel) ยท Vercel AI Gateway ยท Volcengine ยท Voyage AI ยท W&B Inference ยท Xiaomi MiMo ยท You.com ยท Z.AI ยท + OpenAI/Anthropic-compatible custom endpoints

๐Ÿ  Self-Hosted

LM Studio Ollama vLLM Llamafile Docker Model Runner
NVIDIA Triton XInference oobabooga ComfyUI SD WebUI

๐Ÿ”„ How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Your CLI   โ”‚  (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
โ”‚   Tool      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ http://localhost:20128/v1
       โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              OmniRoute (Smart Router)             โ”‚
โ”‚  โ€ข ๐Ÿ—œ๏ธ Prompt Compression (save 15-95% eligible)  โ”‚
โ”‚  โ€ข Format translation (OpenAI โ†” Claude โ†” Gemini) โ”‚
โ”‚  โ€ข Quota tracking + Embeddings + Images          โ”‚
โ”‚  โ€ข Auto token refresh + Rate limit management    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ”œโ”€โ†’ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
       โ”‚   โ†“ quota exhausted
       โ”œโ”€โ†’ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
       โ”‚   โ†“ budget limit
       โ”œโ”€โ†’ [Tier 3: CHEAP] GLM (\$0.6/1M), MiniMax (\$0.2/1M)
       โ”‚   โ†“ budget limit
       โ””โ”€โ†’ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)

Result: Never stop coding, minimal cost + 15-95% eligible token savings

๐Ÿ—œ๏ธ Prompt Compression โ€” Save 15-95% Eligible Tokens Automatically

Why use many token when few token do trick? OmniRoute's built-in compression pipeline reduces token usage before requests reach the provider. It combines ideas from RTK - Rust Token Killer and Caveman (โญ 51K+).

How It Works

Every request passes through the compression pipeline transparently โ€” no client changes needed:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Client sends   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  OmniRoute Compression      โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Provider    โ”‚
โ”‚   full prompt    โ”‚     โ”‚  Pipeline (7 options)        โ”‚     โ”‚  receives    โ”‚
โ”‚   (10,000 tok)   โ”‚     โ”‚                              โ”‚     โ”‚  compressed  โ”‚
โ”‚                  โ”‚     โ”‚  ๐Ÿชถ Lite ........... ~15%     โ”‚     โ”‚  (~1,080 tok)โ”‚
โ”‚                  โ”‚     โ”‚  ๐Ÿชจ Standard ....... ~30%     โ”‚     โ”‚              โ”‚
โ”‚                  โ”‚     โ”‚  โšก Aggressive ..... ~50%     โ”‚     โ”‚  ๐Ÿ’ฐ up to 95%โ”‚
โ”‚                  โ”‚     โ”‚  ๐Ÿ”ฅ Ultra .......... ~75%     โ”‚     โ”‚              โ”‚
โ”‚                  โ”‚     โ”‚  ๐Ÿงฐ RTK ............ 60-90%    โ”‚     โ”‚              โ”‚
โ”‚                  โ”‚     โ”‚  ๐Ÿ”— Stacked ........ 78-95%    โ”‚     โ”‚              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

7 Compression Options

ModeSavingsTechniqueBest For
Off0%No compressionWhen you need exact prompts
๐Ÿชถ Lite~15%Whitespace collapse, dedup system prompts, image URL shorteningAlways-on safe default
๐Ÿชจ Standard (Caveman)~30%30+ regex rules: filler removal, context condensation, structural compression, multi-turn dedupDaily coding with Claude/Codex
โšก Aggressive~50%All standard + progressive message aging + tool result summarization + LLM-based compressionLong sessions with many tool calls
๐Ÿ”ฅ Ultra~75%All aggressive + heuristic token pruning + stopword removal + score-based filteringMaximum savings when tokens are scarce
๐Ÿงฐ RTK60-90%49 command-aware filters, RTK-style JSON DSL, verify gate, trust-gated custom filtersShell/test/build/git output in agents
๐Ÿ”— Stacked78-95%RTK first, then Caveman input condensation; ~89% with upstream average mathMixed prompts with tool logs + prose

RTK + Caveman Savings Math

These numbers are based on the upstream project READMEs under _references/_outros:

SourceUpstream claim used by OmniRoute docs
Caveman~75% fewer output tokens; benchmark average 65% output savings, range 22-87%; ~46% input compression tool
RTK60-90% command-output token savings; sample session ~118,000 -> ~23,900 tokens, which is 79.7% saved (~80%)

For the default stacked compression combo, OmniRoute runs:

RTK -> Caveman

When both engines can act on the same tool/context payload, the savings compound:

combined = 1 - (1 - RTK savings) * (1 - Caveman input savings)
average  = 1 - (1 - 0.80) * (1 - 0.46) = 89.2%
range    = 1 - (1 - 0.60..0.90) * (1 - 0.46) = 78.4-94.6%

Caveman output mode is separate from prompt compression. When enabled for responses, use Caveman's own upstream output numbers: 65% average, ~75% headline, 22-87% observed range. Total bill savings depend on the prompt/output mix, but coding-agent sessions are often tool-context heavy, so the RTK -> Caveman combo is the best default for maximum context savings.

Before & After (Standard/Caveman Mode)

๐Ÿ—ฃ๏ธ Before compression (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I would recommend using useMemo to memoize the object."

๐Ÿชจ After compression (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same answer. 72% less tokens. Zero accuracy loss.

Architecture

Request Body
  โ”‚
  โ”œโ”€ strategySelector.ts โ”€โ”€โ”€ Picks mode (config / combo override / auto-trigger)
  โ”‚
  โ”œโ”€ lite.ts โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Whitespace, dedup, image URLs, redundant content
  โ”œโ”€ caveman.ts โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ 30+ regex rules via cavemanRules.ts
  โ”‚   โ””โ”€ preservation.ts โ”€โ”€โ”€ Protects code blocks, URLs, JSON from compression
  โ”œโ”€ engines/rtk/ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Command detection + JSON DSL filters + raw-output recovery
  โ”œโ”€ engines/registry.ts โ”€โ”€โ”€ Shared engine registry for caveman, RTK, and stacked
  โ”œโ”€ aggressive.ts โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Summarizer + tool result compressor + progressive aging
  โ”‚   โ”œโ”€ summarizer.ts โ”€โ”€โ”€โ”€โ”€ Rule-based message summarization
  โ”‚   โ”œโ”€ toolResultCompressor.ts โ”€โ”€ file/grep/shell/JSON/error compression
  โ”‚   โ””โ”€ progressiveAging.ts โ”€โ”€โ”€โ”€ Older messages โ†’ shorter summaries
  โ””โ”€ ultra.ts โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Heuristic token scoring + pruning
      โ””โ”€ ultraHeuristic.ts โ”€ Stopword detection, score thresholds, force-preserve

Configuration

Dashboard โ†’ Context & Cache โ†’ Caveman / RTK / Compression Combos

Or per-combo override:

{
  "comboOverrides": {
    "my-coding-combo": "standard",
    "my-cheap-combo": "ultra"
  }
}

Auto-trigger: set autoTriggerTokens to automatically enable compression when a request exceeds a token threshold.

Compression combos can also assign a named compression pipeline to routing combos, so a coding combo can use RTK + Caveman while a paid subscription combo stays on lite mode.

๐Ÿชจ Fun fact: The standard/caveman mode is inspired by Caveman โ€” the viral project that reports 65% average output-token savings while keeping technical accuracy. OmniRoute takes this further with a 7-option pipeline and a default RTK -> Caveman combo that can reach ~89% average savings on eligible tool/context payloads.

๐Ÿ“– Full compression documentation: docs/COMPRESSION_GUIDE.md โ€ข docs/RTK_COMPRESSION.md โ€ข docs/COMPRESSION_ENGINES.md โ€ข docs/COMPRESSION_RULES_FORMAT.md โ€ข docs/COMPRESSION_LANGUAGE_PACKS.md


๐ŸŽฏ What OmniRoute Solves

Every developer using AI tools faces these problems daily. OmniRoute solves them all.

#ProblemOmniRoute Solution
๐Ÿ’ธSubscription quota expires mid-codingSmart 4-Tier Fallback โ€” auto-routes Subscription โ†’ API Key โ†’ Cheap โ†’ Free
๐Ÿ”ŒEach provider has a different API formatFormat Translation โ€” unified endpoint translates OpenAI โ†” Claude โ†” Gemini โ†” Responses
๐ŸŒAI providers block my country/region3-Level Proxy โ€” global, per-provider, and per-key proxy with TLS fingerprint spoofing
๐Ÿ†“Can't afford AI subscriptions11 Free Providers โ€” Kiro, Qoder, Pollinations, LongCat, Cloudflare AI, NVIDIA NIM...
๐Ÿ”’Gateway is exposed without protectionAPI Key Management โ€” scoping, rotation, IP filtering, rate limiting, prompt injection guard
๐Ÿ›‘Provider went down, lost coding flowCircuit Breakers โ€” auto-failover with cooldown, retry, anti-thundering herd
๐Ÿ”งConfiguring each CLI tool is tediousCLI Tools Dashboard โ€” one-click setup for Claude Code, Codex, Cursor, OpenClaw, Kilo
๐Ÿ”‘Managing OAuth tokens is hellAuto Token Refresh โ€” OAuth PKCE for 8 providers, multi-account, LAN/remote fix
๐Ÿ“ŠDon't know how much I'm spendingCost Analytics โ€” per-token tracking, budget limits, usage stats per API key
๐Ÿ›Can't diagnose errors in AI callsUnified Logs โ€” 4-tab dashboard (request, proxy, audit, console) + p50/p95/p99 telemetry
๐Ÿ“– See all 31 problems OmniRoute solves
#ProblemSolution
11Deploying/maintaining is complexnpm global, Docker multi-arch, Electron, Termux โ€” deploy anywhere
12Interface is English-only40+ languages with RTL support
13Need more than chat (images, audio, video)10 multi-modal APIs: embeddings, images, video, music, TTS, STT, moderation, rerank, search, batch
14No way to test/compare modelsLLM Evals, Translator Playground, Chat Tester, Live Monitor
15Need to scale without losing performanceSemantic cache, request dedup, rate limit detection, queue & pacing
16Want to control model behavior globallySystem prompt injection, thinking budget, wildcard routing
17Need MCP tools as first-class features29 MCP tools, 3 transports (stdio/SSE/HTTP), 10 scopes, audit trail
18Need A2A orchestrationJSON-RPC 2.0 + SSE streaming, task lifecycle, sync + stream paths
19Need real MCP process healthRuntime heartbeat, PID tracking, UI status cards
20Need auditable MCP executionSQLite-backed audit with filters, pagination, stats
21Need scoped MCP permissions10 granular scopes per integration
22Need operational controls without redeployingCombo switches, resilience tuning, breaker resets from dashboard
23Need A2A task lifecycle visibilityTask listing/filtering, drill-down, cancellation
24Need active stream metricsActive stream counters, per-state counts, A2A dashboard cards
25Need standard agent discoveryAgent Card at /.well-known/agent.json
26Need protocol discoverabilityConsolidated Endpoints page with Proxy, MCP, A2A, API tabs
27Need E2E protocol validationReal MCP SDK + A2A client flows in test:protocols:e2e
28Need unified observabilityHealth + audit + telemetry across OpenAI, MCP, and A2A layers
29Need one runtime for proxy + tools + agentsOpenAI proxy + MCP + A2A in one stack with shared auth/resilience
30Need agentic workflows without glue-codeUnified endpoint, protocol UIs, production-ready foundations
31Long sessions crash with context limitsProactive context compression, structural integrity guards, multi-layer dropping

๐Ÿ“– Deep dives: Resilience Guide โ€ข Proxy Guide โ€ข Setup Guide โ€ข Compression Guide


๐Ÿ†“ Start Free โ€” Zero Configuration Cost

Setup AI coding in minutes at $0/month. Connect these free accounts and use the built-in Free Stack combo.

StepActionProviders Unlocked
1Connect Kiro (AWS Builder ID OAuth)Claude Sonnet 4.5, Haiku 4.5 โ€” unlimited
2Connect Qoder (Google OAuth)kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... โ€” unlimited
3Connect Qwen (Device Code)qwen3-coder-plus, qwen3-coder-flash... โ€” unlimited
4Connect Gemini CLI (Google OAuth)gemini-3-flash, gemini-2.5-pro โ€” 180K/mo free
5/dashboard/combos โ†’ Free Stack ($0) templateRound-robin all free providers automatically

Point any IDE/CLI to: http://localhost:20128/v1 ยท API Key: any-string ยท Done.

Optional extra coverage (also free): Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).

โšก Quick Start

1) Install and run

npm install -g omniroute
omniroute

Dashboard opens at http://localhost:20128 ยท API at http://localhost:20128/v1.

2) Connect providers

  1. Dashboard โ†’ Providers โ†’ connect at least one provider (OAuth or API key)
  2. Dashboard โ†’ Endpoints โ†’ create an API key
  3. Dashboard โ†’ Combos โ†’ set your fallback chain (optional)

3) Point your coding tool

Base URL: http://localhost:20128/v1
API Key:  [copy from Endpoint page]
Model:    if/kimi-k2-thinking (or any provider/model)

Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and any OpenAI-compatible tool.

๐Ÿ“ฆ More install methods (Docker, source, Arch, Void, pnpm)

Docker:

docker run -d --name omniroute --restart unless-stopped -p 20128:20128 -v omniroute-data:/app/data diegosouzapw/omniroute:latest

From source:

cp .env.example .env && npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev

pnpm: pnpm install -g omniroute && pnpm approve-builds -g && omniroute

Arch Linux (AUR): yay -S omniroute-bin && systemctl --user enable --now omniroute.service

MCP: omniroute --mcp (stdio transport)

CLI options: omniroute setup, omniroute doctor, omniroute providers available, omniroute providers list, omniroute --port 3000, omniroute --no-open, omniroute --help

Split-port mode: PORT=20128 DASHBOARD_PORT=20129 omniroute

Uninstall: npm run uninstall (keeps data) or npm run uninstall:full (removes everything)

๐Ÿ“– Full details: Setup Guide ยท Docker ยท Void Linux template


๐Ÿณ Docker

OmniRoute is available as a public Docker image on Docker Hub.

Quick run:

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

With environment file:

# Copy and edit .env first
cp .env.example .env

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  --env-file .env \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

Using Docker Compose:

# Base profile (no CLI tools)
docker compose --profile base up -d

# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d

Dashboard support for Docker deployments now includes a one-click Cloudflare Quick Tunnel on Dashboard โ†’ Endpoints. The first enable downloads cloudflared only when needed, starts a temporary tunnel to your current /v1 endpoint, and shows the generated https://*.trycloudflare.com/v1 URL directly below your normal public URL. Endpoint tunnel panels, including Cloudflare, Tailscale, and ngrok, can be shown or hidden from Settings โ†’ Appearance without changing active tunnel state.

Notes:

  • Quick Tunnel URLs are temporary and change after every restart.
  • Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
  • Managed install currently supports Linux, macOS, and Windows on x64 / arm64.
  • Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set CLOUDFLARED_PROTOCOL=quic or auto if you want a different transport.
  • Docker images bundle system CA roots and pass them to managed cloudflared, which avoids TLS trust failures when the tunnel bootstraps inside the container.
  • SQLite runs in WAL mode. docker stop should be allowed to finish so OmniRoute can checkpoint the latest changes back into storage.sqlite.
  • The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep --stop-timeout 40 (or similar) so manual stops do not cut off shutdown cleanup.
  • Set CLOUDFLARED_BIN=/absolute/path/to/cloudflared if you want OmniRoute to use an existing binary instead of downloading one.

Using Docker Compose with Caddy (HTTPS Auto-TLS):

OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.

services:
  omniroute:
    image: diegosouzapw/omniroute:latest
    container_name: omniroute
    restart: unless-stopped
    volumes:
      - omniroute-data:/app/data
    environment:
      - PORT=20128
      - NEXT_PUBLIC_BASE_URL=https://your-domain.com

  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128

volumes:
  omniroute-data:
ImageTagSizeDescription
diegosouzapw/omniroutelatest~250MBLatest stable release
diegosouzapw/omniroute3.7.8~250MBCurrent version

๐Ÿ“– Full Docker documentation: docs/DOCKER_GUIDE.md โ€” Compose profiles, Caddy HTTPS, Cloudflare tunnels, and more.


๐Ÿ“ฑ Multi-Platform โ€” Run Anywhere

OmniRoute runs on Web, Desktop (Electron), Android (Termux), and as a Progressive Web App (PWA).

PlatformInstallHighlights
๐Ÿ–ฅ๏ธ Desktopnpm run electron:buildNative window, system tray, auto-start, offline mode โ€” Windows/macOS/Linux
๐Ÿ“ฑ Androidpkg install nodejs-lts && npx -y omnirouteARM native, no root, 24/7 via Termux:Boot โ€” your phone is an AI server
๐Ÿ“ฒ PWA"Add to Home Screen" in browserFullscreen, offline page, service worker caching โ€” Android/iOS/Desktop
๐Ÿ–ฅ๏ธ Desktop App details
  • Native Electron app with system tray, auto-start, native notifications
  • One-click install: NSIS (Windows), DMG (macOS), AppImage (Linux)
  • Dev: npm run electron:dev ยท Build: npm run electron:build
  • ๐Ÿ“– Full docs: electron/README.md
๐Ÿ“ฑ Android (Termux) details
pkg update && pkg install nodejs-lts python build-essential git
npx -y omniroute@latest

Access from any device on the same network: http://PHONE_IP:20128/v1

๐Ÿ“ฒ PWA details
  • Android (Chrome): โ‹ฎ โ†’ "Add to Home screen"
  • iOS (Safari): Share โ†’ "Add to Home Screen"
  • Desktop (Chrome/Edge): Install icon in address bar
  • ๐Ÿ“– Full docs: docs/PWA_GUIDE.md

๐ŸŒ Bypass Geographic Blocks โ€” Use AI From Any Country

๐Ÿ‡ท๐Ÿ‡บ ๐Ÿ‡จ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ท ๐Ÿ‡จ๐Ÿ‡บ ๐Ÿ‡น๐Ÿ‡ท In Russia, China, Iran, or any blocked region? OmniRoute's 3-level proxy system solves this completely.

LevelBadgeConfigure InUse Case
Global๐ŸŸขSettings โ†’ ProxyAll traffic through one proxy
Per-Provider๐ŸŸกProvider โ†’ ProxyOnly specific providers proxied
Per-Connection๐Ÿ”ตConnection โ†’ ProxyEach API key uses its own proxy

What gets proxied: API requests โœ… โ€ข OAuth flows โœ… โ€ข Connection tests โœ… โ€ข Token refresh โœ… โ€ข Model sync โœ…

Protocols: HTTP/HTTPS, SOCKS5 (ENABLE_SOCKS5_PROXY=true), Authenticated proxies

๐Ÿ†“ 1proxy โ€” Free Proxy Marketplace

Contributed by @oyi77 โ€” #1847

No proxy? Use the built-in 1proxy integration for hundreds of free, validated proxies worldwide:

  • One-click sync (up to 500 proxies) โ€ข Quality scores (0-100) โ€ข Country filter โ€ข Auto-rotation (quality/random/sequential) โ€ข Auto-degradation โ€ข Circuit breaker

Anti-Detection

  • ๐Ÿ”’ TLS Fingerprint Spoofing โ€” browser-like TLS via wreq-js
  • ๐Ÿ” CLI Fingerprint Matching โ€” matches native CLI binary signatures
  • ๐Ÿ  Proxy IP Preservation โ€” stealth + IP masking simultaneously

๐Ÿ“– Full proxy documentation: docs/PROXY_GUIDE.md



๐Ÿ’ฐ Pricing at a Glance

TierProviderCostQuota ResetBest For
๐Ÿ’ณ SUBSCRIPTIONClaude Code (Pro)$20/mo5h + weeklyAlready subscribed
Codex (Plus/Pro)$20-200/mo5h + weeklyOpenAI users
Gemini CLIFREE180K/mo + 1K/dayEveryone!
GitHub Copilot$10-19/moMonthlyGitHub users
๐Ÿ”‘ API KEYNVIDIA NIMFREE (dev forever)~40 RPM70+ open models
CerebrasFREE (1M tok/day)60K TPM / 30 RPMWorld's fastest
GroqFREE (30 RPM)14.4K RPDUltra-fast Llama/Gemma
DeepSeek V3.2$0.27/$1.10 per 1MNoneBest price/quality reasoning
xAI Grok-4 Fast$0.20/$0.50 per 1M ๐Ÿ†•NoneFastest + tool calling, ultralow
xAI Grok-4 (standard)$0.20/$1.50 per 1M ๐Ÿ†•NoneReasoning flagship from xAI
MistralFree trial + paidRate limitedEuropean AI
OpenRouterPay-per-useNone100+ models aggr.
AgentRouter ๐Ÿ†•Pay-per-useNone$200 free credits at signup
๐Ÿ’ฐ CHEAPGLM-5 (via Z.AI) ๐Ÿ†•$0.5/1MDaily 10AM128K output, newest flagship
GLM-4.7$0.6/1MDaily 10AMBudget backup
MiniMax M2.5 ๐Ÿ†•$0.3/1M input5-hour rollingReasoning + agentic tasks
MiniMax M2.1$0.2/1M5-hour rollingCheapest option
Kimi K2.5 (Moonshot API) ๐Ÿ†•Pay-per-useNoneDirect Moonshot API access
Kimi K2$9/mo flat10M tokens/moPredictable cost
๐Ÿ†“ FREEQoder$0Unlimited5 models unlimited
Qwen$0Unlimited4 models unlimited
Kiro$0UnlimitedClaude Sonnet/Haiku (AWS Builder)
LongCat Flash-Lite ๐Ÿ†•$0 (50M tok/day ๐Ÿ”ฅ)1 RPSLargest free quota on Earth
Pollinations AI ๐Ÿ†•$0 (no key needed)1 req/15sGPT-5, Claude, DeepSeek, Llama 4
Cloudflare Workers AI ๐Ÿ†•$0 (10K Neurons/day)~150 resp/day50+ models, global edge
Scaleway AI ๐Ÿ†•$0 (1M tokens total)Rate limitedEU/GDPR, Qwen3 235B, Llama 70B

๐Ÿ†• New models added (Mar 2026): Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms โ€” 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.

๐Ÿ’ก See the full $0 Free Stack (11 providers) below.

๐Ÿ’ก Understanding Dashboard Costs:

The "cost" displayed in the Usage Analytics page is for tracking and comparison purposes only. OmniRoute itself never charges you anything โ€” it's free, open-source software running on your machine. If your dashboard shows "$290 total cost" while using free models, that's how much you saved compared to paid API pricing. Think of it as a savings tracker, not a bill.


๐Ÿ†“ Free Models โ€” 11 Providers, $0 Forever

Combine all free providers into one unbreakable combo โ€” OmniRoute auto-routes between them when quota runs out.

ProviderPrefixFree ModelsQuota
Kirokr/Claude Sonnet 4.5, Haiku 4.5, Opus 4.650 CREDITS per month
Qoderif/kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2.1โ™พ๏ธ Unlimited
Qwenqw/qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-nextโ™พ๏ธ Unlimited
Pollinationspol/GPT-5, Claude, Gemini, DeepSeek, Llama 4, MistralNo key needed
LongCatlc/LongCat-Flash-Lite50M tokens/day ๐Ÿ”ฅ
Gemini CLIgc/gemini-3-flash, gemini-2.5-pro180K tok/mo
Cloudflare AIcf/50+ models (Llama, Gemma, Mistral, Whisper)10K Neurons/day
Groqgroq/Llama 3.3 70B, Qwen3 32B, Kimi K214.4K RPD
NVIDIA NIMnvidia/129 models (DeepSeek, Llama, GLM, Kimi)~40 RPM
Cerebrascerebras/Qwen3 235B, GPT-OSS 120B, Llama 3.11M tok/day
Scalewayscw/Qwen3 235B, Llama 70B, DeepSeek V31M tokens (EU)
๐Ÿ“– 25+ more free providers โ€” Groq, Cerebras, Mistral, GitHub Models, OpenRouter, and more

Also free (API Key required): Mistral (1B tok/month) ยท OpenRouter (35+ :free models) ยท GitHub Models (GPT-5, 45+ models) ยท Cohere (1K calls/month) ยท Z.AI/GLM (permanent free Flash models) ยท SiliconFlow (1K RPM, 50K TPM) ยท Kilo Code (~200 req/hr auto-router) ยท HuggingFace ($0.10/mo credits) ยท Ollama Cloud (400+ models) ยท LLM7.io (30+ models) ยท Kluster AI ยท IBM watsonx (300K tok/month) ยท OpenCode Zen ยท Vercel AI Gateway ($5/mo)

Trial credits (one-time): Baseten ($30) ยท NLP Cloud ($15) ยท AI21 ($10) ยท Upstage ($10) ยท SambaNova ($5) ยท Modal ($5/mo) ยท Fireworks ($1) ยท Nebius ($1) ยท Inference.net ($1 + $25 survey) ยท Hyperbolic ($1) ยท Novita ($0.50)

China-based (free tiers): ModelScope ยท Tencent Hunyuan ยท Volcengine ยท ChatAnywhere ยท InternAI ยท Bigmodel

Combined capacity: ~31,000+ RPD ยท ~32B+ tokens/month ยท 500+ models ยท $0

๐Ÿ“– Complete free provider directory: docs/FREE_TIERS.md โ€” 25+ providers, quotas, base URLs, model tables, and OmniRoute combo setup.


๐ŸŽ™๏ธ Free Transcription Combo

Transcribe any audio/video for $0 โ€” Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.

ProviderFree CreditsBest ModelRate Limit
๐ŸŸข Deepgram$200 free (signup)nova-3 โ€” best accuracy, 30+ languagesNo RPM limit on free credits
๐Ÿ”ต AssemblyAI$50 free (signup)universal-3-pro โ€” chapters, sentiment, PIINo RPM limit on free credits
๐Ÿ”ด GroqFree foreverwhisper-large-v3 โ€” OpenAI Whisper30 RPM (rate limited)

Suggested combo in /dashboard/combos:

Name: free-transcription
Strategy: Priority
Nodes:
  [1] deepgram/nova-3          โ†’ uses \$200 free first
  [2] assemblyai/universal-3-pro โ†’ fallback when Deepgram credits run out
  [3] groq/whisper-large-v3    โ†’ free forever, emergency fallback

Then in /dashboard/media โ†’ Transcription tab: upload any audio or video file โ†’ select your combo endpoint โ†’ get transcription in supported formats.

๐Ÿ’ก Key Features

4,690+ automated tests across 517 test files. Not just a relay โ€” a full operational platform.

FeatureWhy It Matters
๐Ÿง  Smart 4-Tier Fallback โ€” Subscription โ†’ API โ†’ Cheap โ†’ FreeNever stop coding, zero downtime
๐Ÿ”„ Format Translation โ€” OpenAI โ†” Claude โ†” Gemini โ†” Responses APIWorks with ANY CLI tool
๐Ÿ—œ๏ธ Prompt Compression โ€” 7 options including Caveman, RTK, and stacked pipelinesSave 15-95% eligible tokens
๐Ÿค– MCP Server โ€” 37 tools, 3 transports (stdio/SSE/HTTP), 10 scopesIDE/agent tool integration
๐Ÿ›ก๏ธ Resilience Engine โ€” circuit breakers, cooldowns, TLS spoofing, anti-thundering herdAuto-recovery from any failure
๐ŸŽต 10 Multi-Modal APIs โ€” chat, embed, images, video, music, TTS, STT, moderation, rerank, searchOne endpoint for everything
๐ŸŒ 3-Level Proxy โ€” global, per-provider, per-key + 1proxy free marketplaceAccess AI from any country
๐Ÿ“Š Full Observability โ€” unified logs, p50/p95/p99 telemetry, cost tracking, budget controlsKnow exactly what's happening
๐Ÿ“‹ Complete feature list โ€” 30+ capabilities

Routing & Intelligence

  • 13 balancing strategies (priority, weighted, round-robin, P2C, cost-optimized, context-relay...)
  • Task-aware smart routing (coding/vision/analysis) ยท Context relay session handoffs
  • Thinking budget controls (passthrough/auto/custom) ยท Wildcard routing ยท System prompt injection

Translation & Compatibility

  • Auto token refresh (OAuth PKCE for 8 providers) ยท Multi-account round-robin
  • Responses API โ€” full /v1/responses for Codex ยท Batch API with Files API
  • OpenAPI 3.0 live spec + Try-It UI

Protocols

  • A2A Server โ€” JSON-RPC 2.0, SSE streaming, task lifecycle, skills
  • ACP โ€” CLI agent discovery (14 agents + custom)

Platform

  • Desktop (Electron) ยท Android (Termux) ยท PWA ยท Docker (AMD64 + ARM64)
  • Cloudflare / Tailscale / ngrok tunnels ยท 40+ languages with RTL
  • Semantic + signature cache (two-tier) ยท Request idempotency + deduplication

Observability

  • Health dashboard โ€” uptime, breakers, cache, lockouts
  • Evaluation framework โ€” golden set testing ยท Webhooks ยท Compliance audit

v3.6+ Highlights: V1 WebSocket Bridge ยท Sync Tokens & Config Bundle ยท GLM Thinking (glmt) ยท Hybrid Token Counting ยท Safe Outbound Fetch ยท Wait For Cooldown ยท Runtime Env Validation ยท Vision Bridge ยท Grok-4 Fast ยท GLM-5 via Z.AI ยท MiniMax M2.5 ยท toolCalling flag ยท Multilingual Intent Detection ยท Benchmark-Driven Fallbacks ยท Request Deduplication

Architecture Examples:

Combo: "my-coding-stack"              Format Translation:
  1. cc/claude-opus-4-7                 CLI โ†’ OpenAI format
  2. nvidia/llama-3.3-70b               OmniRoute โ†’ translates
  3. glm/glm-4.7                        Provider โ†’ native format
  4. if/kimi-k2-thinking

๐Ÿ“– MCP Server README ยท A2A Server README ยท Resilience Guide ยท Features Gallery


๐ŸŽฏ Use Cases โ€” Ready-Made Combo Playbooks

Case 0: "I want zero-config, auto-routing NOW"

Problem: Don't want to create combos manually. Just want AI routing to work immediately.

# No combo creation needed! Use auto/ prefix directly:
model: "auto"           # Default LKGP routing across all connected providers
model: "auto/coding"    # Quality-first weights for code generation
model: "auto/fast"      # Low-latency routing (fastest provider first)
model: "auto/cheap"     # Cost-optimized (cheapest per token)
model: "auto/offline"   # High availability (most quota available)
model: "auto/smart"     # Best discovery (10% exploration rate)

How it works:

  1. Add providers in Dashboard โ†’ Providers (OAuth or API key)
  2. Use auto/ prefix in any AI tool โ€” no combo creation needed
  3. OmniRoute dynamically builds a virtual combo from your active connections
  4. Routes using LKGP (Last Known Good Provider) + 6-factor scoring
  5. Session stickiness ensures consistent provider selection

Dashboard indicator: A blue banner at the top shows "Auto-Routing Active" with a link to /dashboard/combos for configuration.

Monthly cost: $0 (uses your existing free providers) or whatever your connected providers cost


Case 1: "I have a Claude Pro subscription"

Problem: Quota expires unused, rate limits during heavy coding sessions.

Combo: "maximize-claude"
  1. cc/claude-opus-4-7        (use subscription fully)
  2. glm/glm-5.1               (cheap backup when quota out โ€” \$0.5/1M)
  3. kr/claude-sonnet-4.5      (free emergency fallback via Kiro)

Compression: standard (caveman) โ€” saves 30% tokens = stretch quota further
Monthly cost: \$20 (subscription) + ~\$3 (backup) = \$23 total
vs. \$20 + hitting limits + lost productivity = frustration

Case 2: "I want $0 forever"

Problem: Can't afford subscriptions, need reliable AI for coding.

Combo: "free-forever"
  1. kr/claude-sonnet-4.5      (Claude 4.5 free unlimited via Kiro)
  2. if/kimi-k2-thinking       (reasoning model free via Qoder)
  3. pol/gpt-5                 (GPT-5 free via Pollinations โ€” no key)
  4. lc/longcat-flash-lite     (50M tokens/day free backup)

Compression: aggressive โ€” saves 50% tokens = double your free quota
Monthly cost: \$0
Quality: Production-ready models + 50% token savings

Case 3: "I need 24/7 coding, no interruptions"

Problem: Deadlines, can't afford any downtime.

Combo: "always-on"
  1. cc/claude-opus-4-7        (best quality โ€” subscription)
  2. cx/gpt-5.5                (second subscription โ€” OpenAI)
  3. glm/glm-5.1               (cheap, resets daily โ€” \$0.5/1M)
  4. minimax/MiniMax-M2.5      (cheapest paid โ€” \$0.3/1M)
  5. kr/claude-sonnet-4.5      (free unlimited โ€” never fails)

Compression: lite โ€” saves 15% tokens passively, zero risk
Result: 5 layers of fallback = zero downtime
Monthly cost: \$20-200 (subscriptions) + \$5-10 (backup)

Case 4: "I'm in a blocked region (Russia, China, Iran...)"

Problem: AI providers block my country, VPNs are slow.

Combo: "unblocked-ai"
  1. kr/claude-sonnet-4.5      (free via Kiro + proxy)
  2. pol/deepseek-r1           (Pollinations โ€” no geo-block)
  3. groq/llama-3.3-70b       (Groq + proxy)

Proxy: Global proxy set in Settings โ†’ or per-provider proxy override
Result: Access ALL providers from ANY country
Monthly cost: \$0 (free providers) + \$0 (1proxy free marketplace)

Case 5: "I want maximum token savings"

Problem: Token costs are eating my budget, need to squeeze every token.

Combo: "ultra-saver"
  1. cc/claude-opus-4-7        (subscription โ€” best quality)
  2. glm/glm-5.1               (cheap backup)

Compression: ultra โ€” saves 75% tokens
Result: 10K token prompt โ†’ 2.5K tokens sent
Montly savings: ~\$150-300/month in token costs for heavy users

๐Ÿงช Evaluations (Evals)

OmniRoute includes a built-in evaluation framework to test LLM response quality against a golden set. Access it via Analytics โ†’ Evals in the dashboard.

Built-in Golden Set

The pre-loaded "OmniRoute Golden Set" contains test cases for:

  • Greetings, math, geography, code generation
  • JSON format compliance, translation, markdown generation
  • Safety refusal (harmful content), counting, boolean logic

Evaluation Strategies

StrategyDescriptionExample
exactOutput must match exactly"4"
containsOutput must contain substring (case-insensitive)"Paris"
regexOutput must match regex pattern"1.*2.*3"
customCustom JS function returns true/false(output) => output.length > 10

๐Ÿ“– Setup Guide

Connect Your Coding Tool

Point any OpenAI-compatible tool to OmniRoute:

Base URL: http://localhost:20128/v1
API Key:  [from Dashboard โ†’ Endpoints]
ToolConfig Location
Claude Codeclaude mcp add-server omniroute --type http --url http://localhost:20128/api/mcp/stream
Codex CLIOPENAI_BASE_URL=http://localhost:20128/v1 OPENAI_API_KEY=your-key codex
CursorSettings โ†’ Models โ†’ Add Model โ†’ Override Base URL
ClineExtension settings โ†’ Custom API Base URL
OpenClawOPENAI_BASE_URL=http://localhost:20128/v1 openclaw
Gemini CLIUses native OAuth via OmniRoute โ€” connect in Providers

Protocols (MCP + A2A)

# MCP (stdio transport)
omniroute --mcp

# A2A (JSON-RPC 2.0)
curl http://localhost:20128/.well-known/agent.json

Key Environment Variables

VariableDefaultPurpose
PORT20128API and dashboard port
DASHBOARD_PORTโ€”Separate dashboard port (split-port mode)
REQUIRE_API_KEYfalseRequire API key for all requests
DATA_DIR~/.omnirouteDatabase and config storage
REQUEST_TIMEOUT_MS600000Upstream response timeout
๐Ÿ“– Full Setup Guide โ€” All CLI tools, protocols, and environment variables

๐Ÿ“– Complete documentation:


โ“ Frequently Asked Questions

๐Ÿ“Š Why does my dashboard show high costs if I'm using free models?

The dashboard tracks your token usage and displays estimated costs as if you were using paid APIs directly. This is not actual billing โ€” it's a reference to show how much you're saving.

Example:

  • Dashboard shows: "$290 total cost"
  • Reality: You're using Kiro + Qoder (FREE unlimited)
  • Your actual cost: $0.00
  • What $290 means: Amount you saved by using free models instead of paid APIs!

The cost display is a "savings tracker" to help you understand your usage patterns and optimization opportunities.

๐Ÿ’ณ Will I be charged by OmniRoute?

No. OmniRoute is free, open-source software that runs on your own computer. It never charges you anything.

You only pay:

  • โœ… Subscription providers (Claude Code $20/mo, Codex $20-200/mo) โ†’ Pay them directly on their websites
  • โœ… API key providers (DeepSeek, xAI, etc.) โ†’ Pay them directly, OmniRoute just routes your requests
  • โŒ OmniRoute itself โ†’ Never charges anything, ever

OmniRoute is a local proxy/router. It doesn't have your credit card, can't send invoices, and has no billing system. It's completely free software.

๐Ÿ†“ Are FREE providers really unlimited?

Yes! The current FREE providers are genuinely free with no hidden charges:

  • Kiro AI: Free unlimited Claude Sonnet/Haiku via AWS Builder ID / Google / GitHub OAuth
  • Qoder: Free unlimited kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 via PAT token
  • Pollinations AI: No API key needed โ€” GPT-5, Claude, DeepSeek, Llama 4
  • LongCat Flash-Lite: 50M tokens/day โ€” largest free quota available
  • Cloudflare Workers AI: 10K Neurons/day โ€” 50+ models at the edge

OmniRoute just routes your requests to them โ€” there's no "catch" or future billing.

๐Ÿ’ฐ How do I minimize my actual AI costs?

Free-First Strategy:

  1. Start with 100% free combo:

    1. kr/claude-sonnet-4.5    (Kiro โ€” unlimited free)
    2. if/kimi-k2-thinking     (Qoder โ€” unlimited free)
    3. pol/gpt-5               (Pollinations โ€” no key needed)
    

    Cost: $0/month

  2. Enable Prompt Compression โ€” even lite mode saves ~15% passively

  3. Add cheap backup only if you need it:

    4. glm/glm-5.1  (\$0.5/1M tokens)
    

    Additional cost: Only pay for what you actually use

  4. Use subscription providers last โ€” only if you already have them. OmniRoute helps maximize their value through quota tracking.

Result: Most users can operate at $0/month using only free tiers!

๐Ÿ—œ๏ธ Will compression affect response quality?

No. Compression only affects the input (your prompt), not the model's response. Each mode has been designed to preserve technical accuracy:

  • Lite (~15%): Only whitespace/formatting โ€” zero semantic change
  • Standard (~30%): Removes filler words ("please", "I think", "basically") โ€” same meaning
  • Aggressive (~50%): Summarizes old messages + compresses tool outputs โ€” core context preserved
  • Ultra (~75%): Heuristic pruning โ€” use only when token budget is critical

Code blocks, URLs, JSON, and structured data are always protected from compression via the preservation engine.

๐ŸŒ Does OmniRoute work in countries where AI is blocked?

Yes! OmniRoute has a 3-level proxy system:

  1. Global proxy โ€” all requests go through your proxy
  2. Per-provider proxy โ€” different proxy per provider
  3. Per-API-key proxy โ€” different proxy per key

Plus the 1proxy free marketplace for community-shared proxies. Users in Russia, China, Iran, and other restricted regions can access all 160+ providers through OmniRoute's proxy infrastructure.

See the Proxy Guide for setup instructions.


๐Ÿ› Troubleshooting

ProblemQuick Fix
"Language model did not provide messages"Provider quota exhausted โ†’ check quota tracker, use combo fallback
Rate limiting (429)Add fallback combo: cc/claude โ†’ glm/glm-4.7 โ†’ if/kimi-k2-thinking
OAuth token expiredAuto-refreshed by OmniRoute. If stuck: delete + re-auth in Providers
unsupported_country_region_territoryConfigure proxy in Settings โ†’ Proxy (see Proxy Guide)
Docker SQLite locksUse --stop-timeout 40 for clean WAL checkpoint on shutdown
Node.js runtime errorsUse Node.js >=20.20.2 <21, >=22.22.2 <23, or >=24.0.0 <25 (24 LTS recommended)
system-info for bug reportsRun npm run system-info and attach system-info.txt to your issue

๐Ÿ“– Full troubleshooting guide: docs/TROUBLESHOOTING.md

๐Ÿ› ๏ธ Tech Stack

Click to expand tech stack details
  • Runtime: Node.js 20.20.2+, 22.22.2+, or 24.x LTS (24 LTS recommended)
  • Language: TypeScript 5.9 โ€” 100% TypeScript across src/ and open-sse/ (zero any in core modules since v2.0)
  • Framework: Next.js 16 + React 19 + Tailwind CSS 4
  • Database: better-sqlite3 (SQLite) + LowDB (JSON legacy) โ€” domain state, proxy logs, MCP audit, routing decisions, memory, skills
  • Schemas: Zod (MCP tool I/O validation, API contracts)
  • Protocols: MCP (stdio/HTTP) + A2A v0.3 (JSON-RPC 2.0 + SSE)
  • Streaming: Server-Sent Events (SSE) + WebSocket bridge (/v1/ws)
  • Auth: OAuth 2.0 (PKCE) + JWT + API Keys + MCP Scoped Authorization
  • Testing: Node.js test runner + Vitest (4,690+ test cases across 517 files โ€” unit, integration, E2E, security, ecosystem)
  • Platforms: Desktop (Electron), Android (Termux), PWA (any browser)
  • CI/CD: GitHub Actions (auto npm publish + Docker Hub on release)
  • Website: omniroute.online
  • Package: npmjs.com/package/omniroute
  • Docker: hub.docker.com/r/diegosouzapw/omniroute
  • Resilience: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing, auto-combo self-healing

๐Ÿ“– Documentation

๐Ÿ“˜ Getting Started

DocumentDescription
User GuideProviders, combos, CLI integration, deployment
Setup GuideFull install methods, CLI tool configs, protocol setup, timeout tuning
CLI Tools GuidePer-tool setup for Claude Code, Codex, Cursor, Cline, OpenClaw, Kilo, Copilot
Quick Start3-step install โ†’ connect โ†’ configure

๐Ÿ”ง Operations & Deployment

DocumentDescription
Docker GuideDocker run, Compose profiles, Caddy HTTPS, tunnels, image tags
VM DeploymentComplete guide: VM + nginx + Cloudflare setup
Fly.io DeploymentDeploy to Fly.io with persistent storage
Termux GuideRun OmniRoute on Android via Termux
PWA GuideProgressive Web App install, caching, architecture
Uninstall GuideClean removal for all install methods
Environment ConfigComplete .env variables and references

๐Ÿง  Features & Architecture

DocumentDescription
ArchitectureSystem architecture, data flow, and internals
Compression Guide7-option pipeline: off / lite / standard / aggressive / ultra / RTK / stacked
RTK CompressionCommand-output compression, filters, trust, verify, raw-output recovery
Compression EnginesCaveman, RTK, stacked pipelines, dashboard/API/MCP surfaces
Compression Rules FormatJSON rule-pack schemas for Caveman and RTK filters
Compression Language PacksLanguage detection and Caveman rule-pack authoring
Resilience GuideCircuit breakers, cooldowns, queue, anti-thundering herd, TLS spoofing
Auto-Combo Engine6-factor scoring, mode packs, self-healing
Proxy Guide3-level proxy system, 1proxy marketplace, registry CRUD
Free Tiers25+ free API providers consolidated directory
Features GalleryVisual dashboard tour with screenshots
Codebase DocumentationBeginner-friendly codebase walkthrough

๐Ÿค– Protocols & APIs

DocumentDescription
API ReferenceAll endpoints with examples
OpenAPI SpecOpenAPI 3.0 specification
MCP Server29 MCP tools, IDE configs, Python/TS/Go clients
MCP Server GuideMCP installation, transports, and tool reference
A2A ServerJSON-RPC 2.0 protocol, skills, streaming, task mgmt
A2A Server GuideA2A agent card, tasks, skills, and streaming

๐Ÿ“‹ Project & Quality

DocumentDescription
ContributingDevelopment setup and guidelines
Security PolicyVulnerability reporting and security practices
i18n Guide40+ language support, translation workflow, RTL
Release ChecklistPre-release validation steps
Coverage PlanTest coverage strategy and 4,690+ test suite

โญ Top Contributors

OmniRoute is shaped by a passionate open-source community. These individuals have made exceptional contributions that directly impact the quality, stability, and reach of the project. Thank you.

oyi77
oyi77

๐Ÿฅ‡ 190 commits โ€ข +72K lines
Analytics engine, SQL aggregations,
proxy marketplace, test coverage
Chris Staley
Chris Staley

๐Ÿฅˆ 72 commits โ€ข +5.7K lines
SSE stream hardening, Responses API,
Gemini pagination, test regression fixes
zenobit
zenobit

๐Ÿฅ‰ 62 commits โ€ข +24K lines
CI/CD pipeline, i18n for 33 languages,
Void Linux package, platform fixes
R.D. & Randi
R.D. & Randi

๐Ÿ… 107 commits โ€ข +28K lines
Endpoints page, tunnel integrations,
Docker workflows, A2A status, compression UI
benzntech
benzntech

๐Ÿ… 20 commits โ€ข +7.5K lines
Electron desktop app, auto-updater,
release build workflows, cross-platform CI

๐Ÿ™ These contributors' features, bug fixes, and infrastructure improvements are a core part of what makes OmniRoute reliable and feature-rich. Every pull request, every test case, and every i18n translation file matters. Open source is built by people like them.


๐Ÿ‘ฅ Contributors

Contributors

How to Contribute

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Releasing a New Version

# Create a release โ€” npm publish happens automatically
gh release create v2.0.0 --title "v2.0.0" --generate-notes

๐Ÿ“Š Star History

Star History Chart

๐ŸŒ StarMapper

StarMapper

๐Ÿ™ Acknowledgments

Special thanks to 9router by decolua โ€” the original project that inspired this fork. OmniRoute builds upon that incredible foundation with additional features, multi-modal APIs, and a full TypeScript rewrite.

Special thanks to CLIProxyAPI by router-for-me โ€” the original Go implementation that inspired this JavaScript port.

Special thanks to Caveman by JuliusBrussee (โญ 51K+) โ€” the viral "why use many token when few token do trick" project whose caveman-speak compression philosophy inspired OmniRoute's standard compression mode and 30+ filler/condensation regex rules.

Special thanks to RTK - Rust Token Killer by RTK AI โ€” the high-performance command-output compression project whose terminal, build, test, git, and tool-output filtering model inspired OmniRoute's RTK engine, JSON filter DSL, raw-output recovery, and stacked RTK โ†’ Caveman compression pipeline.


๐Ÿ“„ License

MIT License - see LICENSE for details.


โฌ† Back to top ยท Built with โค๏ธ for the open-source AI community.

OmniRoute v3.8.0 ยท Node โ‰ฅ22.22.2 ยท MIT License ยท omniroute.online