๐Ÿš€ OmniRoute

July 2, 2026 ยท View on GitHub

OmniRoute Dashboard

๐Ÿš€ OmniRoute โ€” The Free AI Gateway

Never stop coding. Connect every AI tool to 237 providers โ€” 90+ free โ€” through one endpoint.

Plug Claude Code, Codex, Cursor, Cline, Copilot & Antigravity into FREE Claude / GPT / Gemini. Auto-fallback.

RTK + Caveman compression saves 15โ€“95% tokens. Never hit limits.


~1.6B documented free tokens/month โ€” up to ~2.1B in your first month with signup credits โ€” aggregated across the free tiers, plus a long tail of permanently-free, no-cap providers, and the compression above stretches every one further. (how we count โ†’)


โญ Star the repo if OMNIROUTE helped you save money and make your work easier. Stars

diegosouzapw%2FOmniRoute | Trendshift

237 AI Providers 90+ Free 1.6B Free Tokens/mo Token Savings 17 Strategies $0 to start


๐Ÿ’ฌ Join the community

Discord Telegram WhatsApp Global WhatsApp Brasil Website

Questions, provider tips, roadmap & support โ†’ Discord ยท Telegram ยท WhatsApp ๐ŸŒ Global / ๐Ÿ‡ง๐Ÿ‡ท Brasil


๐Ÿงฉ Available

npm version NPM Monthly Docker Hub License: MIT Docker Pulls Electron Downloads

๐Ÿš€ Quick Start โ€ข ๐ŸŽฏ Combos โ€ข ๐ŸŒ Providers โ€ข ๐Ÿ”Œ CLI & MCP โ€ข ๐Ÿ—œ๏ธ Compression โ€ข ๐ŸŒ Website

๐Ÿ’ฅ The Promise โ€ข ๐Ÿค” Why โ€ข ๐Ÿ† What Sets Apart โ€ข ๐Ÿค– Compatible CLIs โ€ข ๐Ÿ–ฅ๏ธ Where It Runs โ€ข ๐Ÿ”’ Private โ€ข ๐ŸŽฌ In Action โ€ข ๐Ÿ“š Explore More โ€ข ๐Ÿ“ง Support

๐ŸŒ In 42+ languages
๐Ÿ‡บ๐Ÿ‡ธ ๐Ÿ‡ง๐Ÿ‡ท ๐Ÿ‡ต๐Ÿ‡น ๐Ÿ‡ช๐Ÿ‡ธ ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ณ๐Ÿ‡ฑ ๐Ÿ‡ท๐Ÿ‡บ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ต๐Ÿ‡ฑ ๐Ÿ‡จ๐Ÿ‡ฟ ๐Ÿ‡ธ๐Ÿ‡ฐ ๐Ÿ‡ท๐Ÿ‡ด ๐Ÿ‡ญ๐Ÿ‡บ
๐Ÿ‡ง๐Ÿ‡ฌ ๐Ÿ‡ฉ๐Ÿ‡ฐ ๐Ÿ‡ซ๐Ÿ‡ฎ ๐Ÿ‡ณ๐Ÿ‡ด ๐Ÿ‡ธ๐Ÿ‡ช ๐Ÿ‡จ๐Ÿ‡ณ ๐Ÿ‡น๐Ÿ‡ผ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฐ๐Ÿ‡ท ๐Ÿ‡น๐Ÿ‡ญ ๐Ÿ‡ป๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ฉ ๐Ÿ‡ฒ๐Ÿ‡พ ๐Ÿ‡ต๐Ÿ‡ญ
๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‡ง๐Ÿ‡ฉ ๐Ÿ‡ต๐Ÿ‡ฐ ๐Ÿ‡ฎ๐Ÿ‡ท ๐Ÿ‡ธ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ ๐Ÿ‡น๐Ÿ‡ท ๐Ÿ‡ฆ๐Ÿ‡ฟ ๐Ÿ‡น๐Ÿ‡ฟ

๐Ÿ’ฐ ~1.6B Free Tokens / Month

Stacking free tiers by hand is painful โ€” dozens of SDKs, dozens of rate limits, and no idea how much you actually have. OmniRoute aggregates the documented free tiers of 40+ provider pools / 500+ models into one honest number and shows it live on the dashboard (/dashboard/free-tiers).

  • ~1.6B free tokens / month (steady) โ€” and up to ~2.1B in your first month with signup credits.
  • Pool-deduped, honest โ€” we count each shared free pool once, so the headline isn't inflated by rate-limit ceilings the way multi-billion competitor claims are. (Counting every rate limit 24/7 would read ~10B; we don't publish that.)
  • Plus the un-countable โ€” permanently-free, no-token-cap providers (SiliconFlow, Z.AI GLM-Flash, Kilo, OpenCode Zenโ€ฆ) and a $10 OpenRouter top-up that unlocks +24M/mo, both surfaced separately so they never inflate the headline.
  • Per-model breakdown, live used / remaining for the current month, and a transparent terms flag per provider.

Free-Tier Budget card (preview mockup)

Preview mockup โ€” a real screenshot lands once the /dashboard/free-tiers page is validated. Full methodology (pool dedupe, credit tiers, provider terms): docs/reference/FREE_TIERS.md.


๐Ÿ’ฅ The Promise

One endpoint. 237 providers. Never stop building โ€” and let OmniRoute pick the cheapest one that works.

๐Ÿšซ Never hit limits
Auto-fallback across 237 providers in milliseconds. Quota out? Next provider takes over โ€” zero downtime.
๐Ÿ’ธ Save up to 95% tokens
RTK + Caveman stacked compression cuts 15โ€“95% of eligible tokens (~89% avg on tool-heavy sessions).
๐Ÿ†“ \$0 to start
90+ providers with a free tier, 11 free forever (Kiro, Qoder, Pollinations, LongCatโ€ฆ). No card needed.
๐Ÿ”Œ Every tool works
24+ coding agents โ€” Claude Code, Codex, Cursor, Cline, Copilot, Antigravity โ€” through one config.
๐Ÿงฉ One endpoint
OpenAI โ†” Claude โ†” Gemini โ†” Responses API translation. Point any tool at /v1 and it just works.
๐Ÿ›ก๏ธ Production-grade
Circuit breakers, TLS stealth, MCP (95 tools), A2A, memory, guardrails, evals. 21,000+ tests.


๐Ÿค” Why OmniRoute?

Stop juggling 10 dashboards, dead API keys, and surprise bills.

โŒ The daily painโœ… How OmniRoute fixes it
๐Ÿ“‰ Subscription quota expires unused every monthMaximize subscriptions โ€” track quota, use every token before reset
๐Ÿ›‘ Rate limits stop you mid-coding4-tier auto-fallback โ€” Subscription โ†’ API โ†’ Cheap โ†’ Free, in milliseconds
๐Ÿ”ฅ Tool outputs (git diff, grep, logs) burn tokensRTK + Caveman compression โ€” save 15โ€“95% eligible tokens per request
๐Ÿ’ธ Expensive APIs ($20โ€“50/mo per provider)Cost-optimized routing โ€” auto-route to the cheapest viable model
๐Ÿงฐ Each AI tool wants its own setupOne endpoint, every tool, one dashboard
๐ŸŒ AI blocked in your country3-level proxy + TLS fingerprint stealth โ€” use AI from anywhere
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        Your IDE / CLI  (Claude Code, Cursor, Clineโ€ฆ)       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚ http://localhost:20128/v1
                          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  OmniRoute โ€” Smart Router                  โ”‚
โ”‚  RTK + Caveman compression ยท 17 routing strategies         โ”‚
โ”‚  Circuit breakers ยท TLS stealth ยท MCP ยท A2A ยท Guardrails   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ–ผ Tier 1      โ–ผ Tier 2      โ–ผ Tier 3       โ–ผ Tier 4
   SUBSCRIPTION     API KEY        CHEAP          FREE
   Claude Code,     DeepSeek,      GLM \$0.5,      Kiro, Qoder,
   Codex, Copilot   Groq, xAI      MiniMax \$0.2   Pollinations
   quota out? โ”€โ”€โ”€โ–ถ  budget hit? โ”€โ–ถ budget hit? โ”€โ–ถ always on

๐ŸŽฏ Combos โ€” The Flagship

A combo is a chain of models OmniRoute routes across automatically. Quota runs out, a provider fails, or costs spike โ€” the combo silently slides to the next model. This is what makes OmniRoute unbreakable. ๐Ÿ›ก๏ธ

โšก Zero-config โ€” just use auto

No combo to create. Set your model to auto (or a variant) and OmniRoute builds a virtual combo from your connected providers, scored live:

Model IDWhat it optimizes for
auto๐ŸŽฏ Balanced default (LKGP โ€” sticks to your last good provider)
auto/coding๐Ÿง‘โ€๐Ÿ’ป Quality-first weights for code generation
auto/fastโšก Lowest latency first
auto/cheap๐Ÿ’ฐ Cheapest per token first
auto/offline๐Ÿ”‹ Most quota / rate-limit headroom first
auto/smart๐Ÿ”ญ Quality-first + 10% exploration to discover better models

๐Ÿ”€ Or build your own โ€” 17 routing strategies

All 17 strategies โ€” mix & match per combo step:

#StrategyWhat it does
1priorityFirst-target ordered list โ€” drain each before the next ๐Ÿฅ‡
2fill-firstFill each target's quota fully before moving on
3weightedWeighted random by per-target weight
4round-robinCycle through targets in order
5p2cPower-of-two-choices random load balancing
6least-usedPick the target with the lowest current load
7randomUniform random pick (deduplicated)
8strict-randomRandom without de-duplicating repeats ๐ŸŽฒ
9cost-optimizedMinimize $ per request from live catalog pricing ๐Ÿ’ธ
10headroomPick the target with the most remaining quota
11reset-windowPrefer the target whose quota window resets soonest
12reset-awareRank by quota reset time โ€” short windows first ๐Ÿ“Š
13context-relayHand off context across targets for long conversations ๐Ÿง 
14context-optimizedPick the best fit for the current context size
15lkgpLast-Known-Good Path โ€” sticky to the last successful target
16auto9-factor live scoring across every connection ๐Ÿค–
17fusionFan out to a panel of models + a judge synthesizes one answer ๐Ÿงฌ

The Auto-Combo engine scores every candidate on 9 factors (health, quota, cost, latency, success rate, freshnessโ€ฆ) โ€” see docs/routing/AUTO-COMBO.md.

โš–๏ธ Quota-Share โ€” split one subscription across a team โœจ NEW

Running several keys against the same upstream account (one Codex Pro plan, one Kimi key, one GLM Coding seat)? A burst on one key can burn the whole 5-hour / hourly quota and lock everyone else out. Quota-Share distributes a provider's time-based quota fairly across the keys in a pool โ€” and it's work-conserving, so an idle member's slice is lent out instead of wasted.

KnobWhat it controls
โš–๏ธ Allocation weighteach key's slice of the pool โ€” e.g. 50 / 30 / 20
๐Ÿ“ Dimensionstrack % ยท requests ยท tokens ยท $, per 5h / 7d / per-model window
๐Ÿšฆ Policyhard (block over share) ยท soft (deprioritize) ยท burst (use idle headroom)
๐Ÿงฑ Capabsolute ceiling per key, independent of mode
Pool "team-codex"   ยท   1 Codex Pro account   ยท   3 keys   ยท   5-hour window
  โ”œโ”€ alice    weight 50  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   โ‰ค 50% of the shared 5h quota
  โ”œโ”€ bob      weight 30  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   โ‰ค 30%
  โ””โ”€ ci-bot   weight 20  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   โ‰ค 20%
Generous mode (<50% pool used) โ†’ idle shares are lent out
Strict mode  (โ‰ฅ50% pool used)  โ†’ each key held to its fair share

Enforced in the hot path before the request leaves OmniRoute, with per-(key, model) caps + session stickiness for prompt-cache integrity. ๐Ÿ“– Quota Sharing Engine

๐Ÿงฑ Resilience is built in (3 independent layers)

LayerScopeWhat it does
๐Ÿ”Œ Circuit breakerwhole providerStops hammering a provider that's failing upstream; auto-probes to recover
๐Ÿ’ค Connection cooldownone account / keySkips a rate-limited key while other keys keep serving
๐ŸŽฏ Model lockoutprovider + modelQuarantines just one quota-limited model, not the whole connection
Combo: "always-on"                         Strategy: priority
  1. cc/claude-opus-4-7   โ† subscription (use it fully)
  2. cx/gpt-5.5           โ† second subscription
  3. glm/glm-5.1          โ† cheap backup (\$0.5/1M)
  4. kr/claude-sonnet-4.5 โ† FREE, unlimited (never fails)
Result: 4 layers of fallback = zero downtime

๐Ÿ“– Auto-Combo Engine ยท Resilience Guide


๐Ÿ† What Sets OmniRoute Apart

FeatureOmniRouteOther routers
๐ŸŒ Providers23720โ€“100
๐Ÿ†“ Free providers90+ (11 free forever)1โ€“5
๐Ÿ”€ Routing strategies17 (priority, weighted, cost-optimized, context-relay, fusionโ€ฆ)1โ€“3
๐Ÿ—œ๏ธ Token compressionRTK + Caveman stacked (15โ€“95%)None / 20โ€“40%
๐Ÿงฐ Built-in MCP server95 tools, 3 transports, 30 scopesRare
๐Ÿค A2A agent protocol6 skills, JSON-RPC 2.0None
๐Ÿง  Memory (FTS5 + vector)YesRare
๐Ÿ›ก๏ธ Guardrails (PII, injection, vision)YesRare
โ˜๏ธ Cloud agentsCodex, Cursor, Devin, JulesNone
๐Ÿฅท TLS fingerprint stealthJA3/JA4 via wreq-jsNone
๐Ÿ–ฅ๏ธ Multi-platformWeb ยท Desktop ยท Termux ยท PWAWeb only
๐ŸŒ i18n42 locales0โ€“4

๐Ÿ“Š Detailed comparison vs LiteLLM, OpenRouter & Portkey โ†’ docs/comparison/OMNIROUTE_VS_ALTERNATIVES.md


โœจ What's New

Recent highlights from v3.8.20 โ†’ v3.8.43. Full history in CHANGELOG.md.

  • ๐Ÿ—œ๏ธ Compression hardening โ€” a default-on inflation guard (discard the stacked result and send the verbatim original whenever compression would grow the prompt), completed Caveman rule packs for German / French / Japanese (dedup + ultra) plus a new Chinese (ๆ–‡่จ€ / wรฉnyรกn) input pack with zh-vs-ja auto-detection, and RTK filters for Gradle & .NET (dotnet) build output. โ†’ Compression
  • ๐Ÿ’ธ Honest flat-rate cost โ€” subscription / coding-plan providers (ChatGPT Web, grok-web, the Minimax / Kimi / GLM / Alibaba Coding plans, Xiaomi MiMoโ€ฆ) now read $0 in cost analytics instead of an inflated per-token estimate, while budget / quota / routing keep estimating unchanged. โ†’ API Reference
  • โš–๏ธ Quota-Share routing โ€” a dedicated combo strategy that spreads load across accounts by available quota: Deficit-Round-Robin scheduling, per-connection max_concurrent with cooldown-wait queueing, multi-window usage buckets (5h / 7d / per-model), per-(key, model) caps, session stickiness for prompt-cache integrity, and proactive saturation from upstream token-usage headers. โ†’ Resilience Guide
  • ๐Ÿค– One-command CLI/agent setup โ€” a dedicated setup-* command configures each coding tool to route through OmniRoute (Claude Code, Codex, Cline, Continue, Cursor, Roo Code, Kilo Code, Crush, Goose, Qwen Code, Aider, OpenCode); omniroute launch / omniroute launch-codex are zero-config launchers. โ†’ CLI Integrations
  • ๐Ÿ›ฐ๏ธ Remote mode โ€” drive a remote OmniRoute from any machine with scoped access tokens (omniroute connect / omniroute contexts / omniroute tokens), plus an omniroute login antigravity helper that runs Google "native/desktop" OAuth on your own machine and pastes a credential blob into a remote/VPS install (where the loopback redirect is unreachable). โ†’ Remote Mode
  • ๐Ÿงญ Smarter auto-routing โ€” OpenRouter-style auto/<category>:<tier> combos (e.g. auto/coding:fast, auto/reasoning:pro), a Fusion strategy (fan out to a panel of models in parallel, then synthesize via a judge), task-aware routing (best-fit connection per task type), per-request X-Route-Model override, live Arena-ELO + models.dev model intelligence, per-step account allowlists, provider-wildcard combo steps, nested combo-ref execution, sticky weighted selection, and web_search-aware routing. โ†’ Auto-Combo
  • ๐Ÿ—œ๏ธ Pluggable compression โ€” an async pipeline of 10 composable engines with Compression Studios, an LLMLingua-2 ONNX engine and a heuristic/SLM two-tier Ultra, RTK, delegated Anthropic Context Editing, Output Styles (output-axis steering: terse-prose / less-code / terse-CJK), an adaptive context-budget dial (escalate only as far as needed to fit the context window), per-request x-omniroute-compression control, an opt-in offline eval harness, one-click Headroom proxy lifecycle management from the dashboard (Docker sidecar supported), a synthetic compression playground (Play lanes + A/B Compare with USD-capped fidelity verdicts), an opt-in per-step fidelity gate that rejects a lossy engine before it degrades the prompt, a best-of-N candidate encoder (GCF vs TOON โ€” keep whichever is shorter, with an A/B bytes/token table in the studio), CCR ranged/grep/stats retrieval (pull an exact byte/line slice or summary of a stored block instead of re-expanding it), a unified panel with named profiles + an active-profile selector, an opt-in per-engine pipeline circuit-breaker, an opt-in LLM-tier engine (a model pass for higher-ratio semantic compression), a read-lifecycle engine that collapses superseded file reads, usage-observed prefix freeze, a graduated CCR retrieval-feedback ramp, a preserveSystemPrompt mode enum, and a drag-reorder pipeline editor in the studio. โ†’ Compression
  • ๐Ÿ•ต๏ธ Transparent MITM decrypt (TPROXY) โ€” capture & translate traffic from CLIs that ignore proxy env vars, with a per-SNI certificate authority and a trust-store installer. โ†’ MITM/TPROXY
  • ๐Ÿ’ธ Cost telemetry everywhere โ€” X-OmniRoute-* cost/usage headers on every endpoint (including media), a non-token cost engine, a cache-HIT X-OmniRoute-Cost-Saved header, and per-key USD spend quotas. โ†’ API Reference
  • ๐Ÿง  Memory you control โ€” opt-in int8 vector quantization (Qdrant + sqlite-vec), opt-in typed memory decay (aged low-value memories fade on a per-type schedule), memory off by default, and a per-request x-omniroute-no-memory header. โ†’ Memory
  • ๐Ÿ›ก๏ธ Security โ€” a prompt-injection guard across every LLM route (backed by a red-team suite), plus a free DuckDuckGo last-resort web search. โ†’ Guardrails
  • ๐Ÿค More providers & agents โ€” Cursor Cloud Agent (a 4th cloud agent), CodeBuddy CN (copilot.tencent.com), a Google Flow video-generation provider, new gateways DGrid and Pioneer AI (Fastino Labs), inbound xAI Grok translators plus Grok Build (xAI) with an OAuth import-token flow, GPT-4 / GPT-4o-mini on the GitHub Copilot provider, multi-model Factory Droid, ZenMux Free (session-cookie free tier), Alibaba DashScope text-to-video (wan2.7-t2v), a refreshed 237-provider catalog (OrcaRouter, Wafer AI, OpenAdapter, dit.ai, TokenRouter, โ€ฆ), Vertex AI media generation (speech/transcription/music/video), a first-class Ollama local-provider card, the SenseNova free Token Plan (chat + text-to-image), and one-click account import from CLIProxyAPI (~/.cli-proxy-api/). โ†’ Providers
  • โšก Local performance & infra โ€” a one-click local Redis launcher (omniroute redis up, plus a dashboard Redis panel), one-click Cloudflare Workers and Deno Deploy relay deployers wired into the proxy pool, and an optional Bifrost Go sidecar that offloads the hottest relay path (BIFROST_BASE_URL, with automatic fallback to the TypeScript path on timeout) โ€” now with a relay-backend selector (OMNIROUTE_RELAY_BACKEND=ts|bifrost|auto) so the /v1/relay endpoint stays the stable surface while choosing the fastest backend internally. โ†’ Environment

๐Ÿค– Compatible CLIs & Coding Agents

One config โ€” http://localhost:20128/v1 โ€” and every AI IDE or CLI runs on free & low-cost models.

Claude Code
Claude Code
Codex CLI
Codex CLI
Cursor
Cursor
Copilot
Copilot
Continue
Continue
OpenCode
OpenCode
Kilo Code
Kilo Code
Droid
Droid
OpenClaw
OpenClaw
Kiro
Kiro
Command Code
Command
๏ผ‹ also works with ยท Cline ยท Antigravity ยท Windsurf ยท AMP ยท Hermes ยท Qwen CLI ยท Roo ยท Continue ยท any OpenAI-compatible tool

๐Ÿ“– Per-tool setup for all 24+ tools โ†’ docs/reference/CLI-TOOLS.md ยท ๐Ÿงฉ OpenCode plugin โ†’ @omniroute/opencode-provider


๐ŸŒ 237 AI Providers โ€” 90+ Free

The most complete catalog of any open-source router: 237 providers, 90+ with a free tier, 11 free forever.

๐Ÿข Every major lab โ€” through one endpoint

OpenAI
OpenAI
Anthropic
Anthropic
Gemini
Gemini
xAI Grok
xAI Grok
DeepSeek
DeepSeek
Mistral
Mistral
Qwen
Qwen
Meta Llama
Meta Llama
Groq
Groq
NVIDIA
NVIDIA
MiniMax
MiniMax
Cohere
Cohere
Perplexity
Perplexity
Hugging Face
HuggingFace
Together
Together
Fireworks
Fireworks
Cloudflare
Cloudflare
Baidu
Baidu

โ€ฆand 220+ more โ€” every icon resolves live from the dashboard's provider catalog. ๐Ÿ“– Provider Reference


๐Ÿ†“ Free Forever โ€” $0, no card

AgentRouter
AgentRouter
GPT-5, Claude, Gemini
\$100 free credits
Qoder AI
Qoder AI
Kimi-K2, DeepSeek-R1
Unlimited FREE
Pollinations
Pollinations
GPT-5, Claude, Llama 4
No key needed
LongCat
LongCat
LongCat-2.0
10M tokens one-time (KYC) ๐Ÿ”‘
Cloudflare AI
Cloudflare AI
50+ models
10K neurons/day
NVIDIA NIM
NVIDIA NIM
129 models
~40 RPM free
Cerebras
Cerebras
Qwen3 235B
1M tokens/day

๐Ÿ“– Full machine-readable catalog โ†’ docs/reference/PROVIDER_REFERENCE.md


๐Ÿ–ฅ๏ธ Where OmniRoute Runs โ€” Anywhere

Same app, your machine, your rules. From a global npm install to your phone via Termux.

PlatformInstallHighlights
๐Ÿ“ฆ npm (global)npm install -g omnirouteOne command, any OS
๐Ÿณ Dockerdocker run โ€ฆ diegosouzapw/omnirouteMulti-arch AMD64 + ARM64
๐Ÿ–ฅ๏ธ Desktop (Electron)npm run electron:buildNative window + system tray โ€” Windows / macOS / Linux
๐Ÿ’ช ARMnative arm64Raspberry Pi, ARM servers, Apple Silicon
๐Ÿ“ฑ Android (Termux)pkg install nodejs && npx -y omnirouteRuns on your phone, 24/7, no root
๐Ÿ“ฒ PWA"Add to Home Screen"Fullscreen, offline, installable from browser
๐Ÿงฉ OpenCode plugin@omniroute/opencode-providerNative OpenCode integration
๐Ÿ› ๏ธ From sourcenpm install && npm run devHack on it, contribute

๐Ÿ“– Docker Guide ยท Desktop ยท Termux ยท PWA ยท OpenCode


๐Ÿ”’ Private & Local-First

Your keys, your machine, your data. OmniRoute is a local proxy โ€” it never phones home.

  • ๐Ÿ  Runs 100% on your hardware โ€” npm, Docker, desktop, or your phone. No OmniRoute cloud sits in the request path.
  • ๐Ÿ” Credentials encrypted at rest โ€” API keys & OAuth tokens sealed with AES-256-GCM.
  • ๐Ÿšซ Zero telemetry by default โ€” your prompts go only to the providers you choose, nowhere else.
  • ๐Ÿ›ก๏ธ Hardened gateway โ€” API-key scoping, IP filtering, rate limits, prompt-injection guard, loopback-only process routes.
  • ๐Ÿ“œ MIT licensed & fully open-source โ€” audit every line, self-host forever.

๐Ÿ“– Authorization ยท Guardrails ยท Compliance


๐Ÿ”Œ Full CLI + A2A & MCP

OmniRoute isn't just a server โ€” it's a full command-line cockpit with 80+ commands, plus open agent protocols so an AI agent can drive OmniRoute by itself.

โŒจ๏ธ A real CLI (not just start)

omniroute               # serve gateway + dashboard (port 20128)
omniroute chat          # interactive TUI chat client (slash: /model /combo /skill /memory)
omniroute setup         # guided first-run wizard
omniroute doctor        # diagnose providers, ports, native deps

๐Ÿ›ฐ๏ธ Remote mode โ€” run the CLI here, OmniRoute on a VPS

OmniRoute on a server? Drive it from your laptop with the same CLI. Log in once with a scoped access token; every command then targets the remote.

omniroute connect 192.168.0.15            # password โ†’ scoped token, saved as a context
omniroute models list                     # โ† runs against the REMOTE server
omniroute configure codex                 # โ† picks a remote model, writes a local Codex profile
omniroute tokens create --name ci --scope read   # mint narrower tokens for other machines
omniroute contexts use default            # โ† switch back to the local server

Tokens are scoped read / write / admin; process-spawning routes stay loopback-only. ๐Ÿ“– Remote Mode

providers ยท oauth ยท keys ยท combo ยท nodes ยท models ยท cache ยท compression ยท cost ยท usage ยท quota ยท health ยท resilience ยท telemetry ยท logs ยท audit ยท mcp ยท a2a ยท cloud ยท memory ยท skills ยท eval ยท tunnel ยท backup ยท sync ยท webhooks ยท policy ยท pricing ยท translator ยท simulate โ€ฆ

๐Ÿค Connect an agent โ€” and it controls OmniRoute itself

Expose OmniRoute over MCP or A2A and any capable agent gets the keys to the whole gateway โ€” routing, providers, combos, cache, compression, memory โ€” autonomously.

ProtocolEndpointUse it for
๐Ÿงฐ MCP (stdio)omniroute --mcpPlug into Claude Desktop, Cursor, any MCP client
๐ŸŒŠ MCP (HTTP)http://localhost:20128/api/mcp/streamRemote MCP โ€” 95 tools, 30 scopes, full audit trail
๐Ÿ“ก MCP (SSE)http://localhost:20128/api/mcp/sseStreaming MCP transport
๐Ÿค A2Ahttp://localhost:20128/.well-known/agent.jsonAgent-to-agent, JSON-RPC 2.0 + SSE, 6 skills
# Give Claude Code the full OmniRoute toolset over MCP:
claude mcp add-server omniroute --type http --url http://localhost:20128/api/mcp/stream

๐Ÿ“– MCP Server ยท A2A Server ยท Agent Protocols


๐Ÿ—œ๏ธ Save 15โ€“95% Tokens โ€” Automatically

Why use many tokens when few tokens do the trick? Every request passes through OmniRoute's compression pipeline transparently โ€” no client changes. It's now a stack of 10 composable engines that run in order and mix & match per routing combo โ€” building on ideas from RTK, Caveman (โญ 78K+), LLMLingua-2, and Troglodita (PT-BR).

๐Ÿงฑ The 10-engine stack

Engines run in pipeline order; each is independently toggleable and configurable per combo:

#EngineWhat it does
1Session-DedupDrops content repeated across turns (content-addressed, cross-turn)
2CCRArchives large blocks behind retrieve markers, fetched on demand
3RTKSmart tool-result filtering, dedup & truncation (command-aware)
4HeadroomLossless tabular compaction of homogeneous JSON arrays (~30%+)
5RelevanceExtractive sentence scoring against the last user query
6CavemanRule-based prose compression (~65โ€“75% on output)
7LLMLingua-2ML semantic pruning via MobileBERT ONNX โ€” code-safe, async
8LiteWhitespace + image-URL trimming (latency-light baseline)
9AggressiveSummarization + progressive aging of old turns
10UltraHeuristic token pruning with an optional small-model (SLM) tier

Code blocks, URLs and structured data are always preserved byte-perfect. One-click presets combine the engines:

ModeSavingsBest for
๐Ÿชถ Lite~15%Always-on safe default
๐Ÿชจ Standard (Caveman)~30%Daily coding
โšก Aggressive~50%Long tool-heavy sessions
๐Ÿ”ฅ Ultra~75%Maximum savings
๐Ÿงฐ RTK60โ€“90%Shell/test/build/git output
๐Ÿ”— Stacked (RTK โ†’ Caveman)78โ€“95%Mixed prompts + tool logs

Real example โ€” Standard mode:

Before (69 tokens): "The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I would recommend using useMemo to memoize the object."

After (19 tokens): "New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same answer. 72% fewer tokens. Zero accuracy loss. โœ…

PT-BR example โ€” Troglodita mode:

Antes (42 tokens): "O problema รฉ que o componente estรก re-renderizando porque uma nova referรชncia de objeto estรก sendo criada em cada ciclo de renderizaรงรฃo. Eu recomendaria usar useMemo."

Depois (12 tokens): "Re-render: ref nova cada ciclo (objeto inline recriado). Usar useMemo."

Mesma resposta. ~70% menos tokens. Precisรฃo tรฉcnica intacta. โœ…


๐Ÿ“– How it works โ€” pipeline, architecture & savings math

Client (10,000 tok) โ”€โ”€โ–ถ OmniRoute Compression (10 engines) โ”€โ”€โ–ถ Provider (~1,080 tok, up to 95% saved)

Default stacked combo runs RTK โ†’ Caveman. When both act on the same tool/context payload, savings compound:

combined = 1 โˆ’ (1 โˆ’ RTK) ร— (1 โˆ’ Caveman_input)
average  = 1 โˆ’ (1 โˆ’ 0.80) ร— (1 โˆ’ 0.46) = 89.2%
range    = 78.4 โ€“ 94.6%

Code blocks, URLs, JSON and structured data are always protected by the preservation engine.

๐ŸŽš๏ธ Beyond the engines โ€” output styles, the adaptive dial & per-request control

The 10 engines above shrink what goes in. Three more layers shape how, when, and what comes out:

  • ๐Ÿช„ Output Styles (output-axis steering) โ€” inject deterministic, cache-safe response-shaping instructions; combinable, each at lite / full / ultra intensity. Adding a style is a one-line registry entry:
    • Terse prose โ€” drop filler / articles / hedging; keep technical substance exact.
    • Less code โ€” "lazy senior dev" YAGNI: smallest working change, no unrequested scaffolding.
    • Terse CJK (ๆ–‡่จ€) โ€” classical-Chinese ultra-terse style (locale-gated to zh).
  • ๐ŸŽฏ Adaptive context-budget (the dial) โ€” instead of one on/off token threshold, escalate the cheapest, most-lossless engines only as far as needed to fit the model's context window. Policy: reserve-output (default, model-aware) ยท percentage ยท absolute. Mode: floor (guarantee fit) ยท replace-autotrigger (your explicit choice wins) ยท off (legacy threshold).
  • ๐ŸŽ›๏ธ Where compression is decided (precedence, high โ†’ low) โ€” per-request x-omniroute-compression header โ€บ routing-combo override โ€บ active named profile โ€บ adaptive / auto-trigger โ€บ panel default โ€บ off. The applied plan echoes back in the X-OmniRoute-Compression: <mode>; source=<source> response header.

Auto-trigger by token threshold, flip on the adaptive dial, pin a named profile, set a one-off per request, or assign a pipeline per routing combo โ€” whichever fits the workload. An opt-in offline eval harness (npm run eval:compression) scores fidelity vs. savings on a pinned corpus before you promote a change.

๐Ÿ“– COMPRESSION_GUIDE.md ยท RTK_COMPRESSION.md ยท COMPRESSION_ENGINES.md


โšก Quick Start

1) Install & run

npm install -g omniroute
omniroute

Dashboard at http://localhost:20128 ยท API at http://localhost:20128/v1.

2) Connect a FREE provider (no signup)

Dashboard โ†’ Providers โ†’ connect Kiro AI (free Claude, ~50 credits/month per account) or OpenCode Free (no auth) โ†’ done.

3) Point your coding tool

Base URL: http://localhost:20128/v1
API Key:  [copy from Dashboard โ†’ Endpoints]
Model:    auto            (zero-config smart routing โ€” or any provider/model)

4) Verify it's working

curl http://localhost:20128/v1/models -H "Authorization: Bearer YOUR_KEY"

You should see your connected models listed. ๐ŸŽ‰ That's it โ€” start coding, and OmniRoute auto-routes & falls back for you.

If your client cannot send custom headers, OmniRoute also exposes tokenized compatibility aliases:

OpenAI catalog:   http://localhost:20128/vscode/YOUR_KEY/
OpenAI models:    http://localhost:20128/vscode/YOUR_KEY/models
OpenAI chat:      http://localhost:20128/vscode/YOUR_KEY/chat/completions
OpenAI responses: http://localhost:20128/vscode/YOUR_KEY/responses
Ollama chat:      http://localhost:20128/vscode/YOUR_KEY/api/chat
Ollama tags:      http://localhost:20128/vscode/YOUR_KEY/api/tags

Use these only for clients that cannot attach Authorization: Bearer .... Header auth remains the preferred mode.


๐Ÿ“ฆ More install methods โ€” Docker, source, pnpm, Arch

๐Ÿณ Docker

docker run -d --name omniroute --restart unless-stopped --stop-timeout 40 \
  -p 20128:20128 -v omniroute-data:/app/data diegosouzapw/omniroute:latest

๐Ÿ› ๏ธ From source

cp .env.example .env && npm install
PORT=20128 npm run dev

๐Ÿ“ฆ pnpm

pnpm add -g omniroute@latest --allow-build=better-sqlite3 --allow-build=@swc/core && omniroute

๐Ÿง Arch Linux (AUR)

yay -S omniroute-bin && systemctl --user enable --now omniroute.service

๐Ÿ”ง Nix (Flake)

# Using Nix flakes
nix develop
npm run dev

# Or using devbox
devbox run npm run dev

๐Ÿ“– Docker Guide โ€” Compose profiles, Caddy HTTPS, Cloudflare tunnels.

๐Ÿฆญ Podman

# 1. Build the image
podman build --target runner-base -t omniroute:base .

# 2. Fix data directory permissions for rootless Podman
mkdir -p data && podman unshare chown 1000:1000 ./data

# 3. Set runtime in .env, then run (see contrib/podman/ for Quadlet)
echo "CONTAINER_HOST=podman" >> .env
podman compose --profile base up -d

๐Ÿ“– Podman Guide โ€” Quadlet setup, podman-compose, Quadlet.

โšก Faster / leaner install (skip the native build)

The native SQLite engine (better-sqlite3) is an optional dependency, so a global install never blocks on compiling from source: it uses a prebuilt binary when one matches your platform/Node, and otherwise falls back transparently to a pure-JS engine (node:sqlite on Node 22+, else the bundled sql.js WASM) โ€” no build tools required.

To skip the post-install native warm-up entirely (CI, headless, or slow machines):

OMNIROUTE_SKIP_POSTINSTALL=1 npm install -g omniroute   # CI=1 also skips it

For the fastest installs prefer pnpm (content-addressed store + hard links โ€” see above). For a dashboard-free, headless runtime use the Docker base profile (above) or the Termux guide. The CLI and the web dashboard are served by the same process on one port, so there is no separate CLI-only package today.


๐ŸŽฌ OmniRoute in Action

Guia em Portuguรชs
๐Ÿ‡ง๐Ÿ‡ท Portuguรชs
Guia completo
English Guide
๐Ÿ‡บ๐Ÿ‡ธ English
Complete walkthrough
ะ ัƒะบะพะฒะพะดัั‚ะฒะพ
๐Ÿ‡ท๐Ÿ‡บ ะ ัƒััะบะธะน
ะŸะพะปะฝะพะต ั€ัƒะบะพะฒะพะดัั‚ะฒะพ

๐ŸŽฌ Made a video about OmniRoute? Open an issue or discussion with the link โ€” we'll feature it here.


๐Ÿ“š Explore More

๐Ÿ’ฐ Pricing at a glance & the \$0 Free Stack (11 providers)
TierExampleCost
๐Ÿ’ณ SubscriptionClaude Code Pro / Codex / Copilot$10โ€“200/mo
๐Ÿ”‘ API Key (free tiers)NVIDIA NIM, Cerebras, GroqFREE
๐Ÿ’ฐ CheapGLM-5 $0.5/1M ยท MiniMax M2.5 $0.3/1Mpennies
๐Ÿ†“ Free ForeverKiro, Qoder, Qwen, Pollinations, LongCat$0

The $0 Free Stack โ€” combine into one unbreakable combo:

ProviderPrefixFree modelsQuota
Kirokr/Claude Sonnet 4.5, Haiku 4.5, Opus 4.650 credits/mo
Qoderif/kimi-k2-thinking, qwen3-coder-plus, deepseek-r1โ™พ๏ธ Unlimited
Qwenqw/qwen3-coder-plus/flash/nextโ™พ๏ธ Unlimited
Pollinationspol/GPT-5, Claude, Gemini, DeepSeek, Llama 4No key needed
LongCatlc/LongCat-2.010M one-time (KYC)
Cloudflare AIcf/50+ models10K neurons/day
NVIDIA NIMnvidia/129 models~40 RPM
Cerebrascerebras/Qwen3 235B, GPT-OSS 120B1M tok/day

๐Ÿ’ก The dashboard "cost" is a savings tracker, not a bill โ€” OmniRoute never charges you. A "$290 total cost" using free models means $290 saved.

๐Ÿ“– Complete free directory โ†’ docs/reference/FREE_TIERS.md โ€” 25+ providers, quotas, base URLs.

๐ŸŽฏ Use Cases โ€” ready-made combo playbooks

$0 forever:

1. kr/claude-sonnet-4.5   (Kiro โ€” ~50 credits/mo per acct)
2. if/kimi-k2-thinking    (Qoder โ€” unlimited)
3. pol/gpt-5              (Pollinations โ€” no key)
4. lc/LongCat-2.0         (10M one-time backup, KYC)
Compression: aggressive (~50%) โ†’ double your free quota ยท Cost: \$0/mo

24/7 no interruptions: chain 2 subscriptions โ†’ cheap โ†’ free for 5 layers of fallback. Blocked region: free providers + global/per-provider proxy โ†’ access AI from any country. Max savings: subscription + cheap backup + ultra compression (~75%) โ†’ ~$150โ€“300/mo saved for heavy users.

๐ŸŒ Bypass geo-blocks โ€” 3-level proxy + stealth

๐Ÿ‡ท๐Ÿ‡บ ๐Ÿ‡จ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ท ๐Ÿ‡จ๐Ÿ‡บ ๐Ÿ‡น๐Ÿ‡ท In a blocked region? OmniRoute's 3-level proxy (Global / Per-Provider / Per-Connection) proxies API requests, OAuth flows, connection tests, token refresh & model sync.

  • Protocols: HTTP/HTTPS, SOCKS5, authenticated proxies
  • ๐Ÿ†“ 1proxy marketplace โ€” hundreds of free validated proxies, quality scores, auto-rotation
  • Anti-detection โ€” TLS fingerprint spoofing (wreq-js), CLI fingerprint matching, proxy IP preservation

๐Ÿ“– docs/ops/PROXY_GUIDE.md

โœจ Full feature list โ€” 30+ capabilities (memory, evals, observability)

Routing: 17 strategies ยท task-aware smart routing ยท thinking budget controls ยท wildcard routing ยท system prompt injection. Compatibility: OpenAI โ†” Claude โ†” Gemini โ†” Responses API ยท auto OAuth refresh (PKCE, 8 providers) ยท multi-account round-robin ยท Batch + Files API ยท live OpenAPI 3.0. Protocols: MCP (95 tools, 3 transports, 30 scopes) ยท A2A (JSON-RPC 2.0, SSE, 6 skills) ยท ACP ยท cloud agents (Codex, Cursor, Devin, Jules). Plugins: custom plugin marketplace (system-configured registry URL with SSRF-guarded fetch) ยท install / enable / disable ยท Notion + Obsidian knowledge-base integrations (WebDAV file server, vault search, note CRUD). Embedded services: one-click install & lifecycle management of local sidecar services (CLIProxy, NineRouter). Quality & Ops: built-in Evals (golden-set: exact/contains/regex/custom) ยท guardrails (PII, injection, vision) ยท health dashboard ยท p50/p95/p99 telemetry ยท webhooks ยท compliance audit. AI Agent Skills: drop-in markdown manifests โ€” point any agent at a skills/*/SKILL.md manifest. 43 skills available.

๐Ÿ“– MCP Server ยท A2A Server ยท Resilience Guide ยท Features Gallery

๐Ÿ“– Setup, env vars & FAQ
Env varDefaultPurpose
PORT20128API + dashboard port
REQUIRE_API_KEYfalseRequire API key for all requests
DATA_DIR~/.omnirouteDatabase & config storage

Will I be charged by OmniRoute? No โ€” it's free, open-source software on your machine. You only pay paid providers directly. OmniRoute has no billing system. Are FREE providers really unlimited? Mostly โ€” Qoder, Pollinations, LongCat, and Cloudflare are free with no per-account credit cap. Kiro is free too but capped at ~50 credits/month per account. Stack multiple free providers in a combo and auto-fallback keeps you serving for $0. Will compression hurt quality? No โ€” it only compresses the input; code, URLs, JSON are always protected. Does it work where AI is blocked? Yes โ€” 3-level proxy + 1proxy marketplace reach all 237 providers.

๐Ÿ“– User Guide ยท API Reference ยท Environment Config

๐Ÿ› Troubleshooting
ProblemQuick fix
"Language model did not provide messages"Provider quota exhausted โ†’ use a combo fallback
Rate limiting (429)Add fallback: cc/claude โ†’ glm/glm-4.7 โ†’ if/kimi-k2-thinking
OAuth token expiredAuto-refreshed; if stuck, delete + re-auth in Providers
unsupported_country_region_territoryConfigure proxy in Settings โ†’ Proxy
Docker SQLite locksUse --stop-timeout 40 for clean WAL checkpoint
Node runtime errorsUse Node >=22.0.0 <23 or >=24.0.0 <27

๐Ÿ› Reporting a bug? Run npm run system-info and attach system-info.txt. ๐Ÿ“– docs/guides/TROUBLESHOOTING.md

๐Ÿ“ธ Dashboard screenshots
PageScreenshotPageScreenshot
ProvidersProvidersCombosCombos
AnalyticsAnalyticsHealthHealth
TranslatorTranslatorSettingsSettings
CLI ToolsCLI ToolsUsage LogsUsage

๐Ÿ“ง Support & Community

๐Ÿ’ฌ Chat with the community โ€” Discord, Telegram & WhatsApp (๐ŸŒ / ๐Ÿ‡ง๐Ÿ‡ท) links are at the top of this README.



๐Ÿ› ๏ธ Tech Stack

  • Runtime: Node.js 22.x or 24.x LTS (24 LTS recommended) โ€” >=22.0.0 <23 || >=24.0.0 <27
  • Language: TypeScript 6.0 โ€” 100% TypeScript across src/ and open-sse/ (zero any in core modules since v2.0)
  • Framework: Next.js 16 + React 19 + Tailwind CSS 4
  • Database: better-sqlite3 (SQLite) + LowDB (JSON legacy) โ€” domain state, proxy logs, MCP audit, routing decisions, memory, skills
  • Schemas: Zod (MCP tool I/O validation, API contracts)
  • Protocols: MCP (stdio/HTTP) + A2A v0.3 (JSON-RPC 2.0 + SSE)
  • Streaming: Server-Sent Events (SSE) + WebSocket bridge (/v1/ws)
  • Auth: OAuth 2.0 (PKCE) + JWT + API Keys + MCP Scoped Authorization
  • Testing: Node.js test runner + Vitest (21,000+ test cases across 2,586 files โ€” unit, integration, E2E, security, ecosystem)
  • Platforms: Desktop (Electron), Android (Termux), PWA (any browser)
  • CI/CD: GitHub Actions (auto npm publish + Docker Hub on release)
  • Website: omniroute.online
  • Package: npmjs.com/package/omniroute
  • Docker: hub.docker.com/r/diegosouzapw/omniroute
  • Resilience: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing, auto-combo self-healing

๐Ÿ“– Documentation

๐Ÿ“˜ Getting Started

DocumentDescription
User GuideProviders, combos, CLI integration, deployment
Setup GuideFull install methods, CLI tool configs, protocol setup, timeout tuning
CLI Tools GuidePer-tool setup for Claude Code, Codex, Cursor, Cline, OpenClaw, Kilo, Copilot
Remote ModeDrive a remote OmniRoute (VPS) from your laptop CLI via scoped access tokens
Claude Code ConfigPoint Claude Code at OmniRoute (local/remote) with launch + per-model profiles
Quick Start3-step install โ†’ connect โ†’ configure

๐Ÿ”ง Operations & Deployment

DocumentDescription
Docker GuideDocker run, Compose profiles, Caddy HTTPS, tunnels, image tags
Podman GuideQuadlet systemd integration, podman-compose, SELinux
VM DeploymentComplete guide: VM + nginx + Cloudflare setup
Fly.io DeploymentDeploy to Fly.io with persistent storage
Termux GuideRun OmniRoute on Android via Termux
PWA GuideProgressive Web App install, caching, architecture
Uninstall GuideClean removal for all install methods
Environment ConfigComplete .env variables and references

๐Ÿง  Features & Architecture

DocumentDescription
ArchitectureSystem architecture, data flow, and internals
Compression Guide7-option pipeline: off / lite / standard / aggressive / ultra / RTK / stacked
RTK CompressionCommand-output compression, filters, trust, verify, raw-output recovery
Compression EnginesCaveman, RTK, stacked pipelines, dashboard/API/MCP surfaces
Compression Rules FormatJSON rule-pack schemas for Caveman and RTK filters
Compression Language PacksLanguage detection and Caveman rule-pack authoring
Resilience GuideCircuit breakers, cooldowns, queue, anti-thundering herd, TLS spoofing
Auto-Combo Engine9-factor scoring, mode packs, self-healing
Proxy Guide3-level proxy system, 1proxy marketplace, registry CRUD
Free Tiers25+ free API providers consolidated directory
Features GalleryVisual dashboard tour with screenshots
Codebase DocumentationBeginner-friendly codebase walkthrough

๐Ÿค– Protocols & APIs

DocumentDescription
API ReferenceAll endpoints with examples
OpenAPI SpecOpenAPI 3.0 specification
MCP Server95 MCP tools, IDE configs, Python/TS/Go clients
MCP Server GuideMCP installation, transports, and tool reference
A2A ServerJSON-RPC 2.0 protocol, skills, streaming, task mgmt
A2A Server GuideA2A agent card, tasks, skills, and streaming

๐Ÿ“‹ Project & Quality

DocumentDescription
ContributingDevelopment setup and guidelines
ChangelogFull per-version release history
Security PolicyVulnerability reporting and security practices
i18n Guide40+ language support, translation workflow, RTL
Release ChecklistPre-release validation steps
Coverage PlanTest coverage strategy and 21,000+ test suite

โญ Top Contributors

OmniRoute is shaped by a passionate open-source community. These individuals have made exceptional contributions that directly impact the quality, stability, and reach of the project. Thank you.

oyi77
oyi77

๐Ÿฅ‡ 189 commits โ€ข +155K lines
Analytics engine, SQL aggregations,
proxy marketplace, test coverage
Chris Staley
Chris Staley

๐Ÿฅˆ 70 commits โ€ข +5.7K lines
SSE stream hardening, Responses API,
Gemini pagination, test regression fixes
zenobit
zenobit

๐Ÿฅ‰ 62 commits โ€ข +24K lines
CI/CD pipeline, i18n for 33 languages,
Void Linux package, platform fixes
R.D. & Randi
R.D. & Randi

๐Ÿ… 108 commits โ€ข +30K lines
Endpoints page, tunnel integrations,
Docker workflows, A2A status, compression UI
benzntech
benzntech

๐Ÿ… 22 commits โ€ข +7.5K lines
Electron desktop app, auto-updater,
release build workflows, cross-platform CI
herjarsa
herjarsa

๐Ÿ… 21 commits โ€ข +6K lines
Zero-latency combos, vision-bridge auto-routing,
catalog context-length, resilience 429 hints

๐Ÿ™ These contributors' features, bug fixes, and infrastructure improvements are a core part of what makes OmniRoute reliable and feature-rich. Every pull request, every test case, and every i18n translation file matters. Open source is built by people like them.



๐Ÿ‘ฅ 280+ Contributors

Contributors

How to Contribute

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Releasing a New Version

# Create a release โ€” npm publish happens automatically
gh release create v3.8.2 --title "v3.8.2" --generate-notes

๐Ÿ“Š Stars

Star History Chart

๐ŸŒ StarMapper

StarMapper

๐Ÿ™ Acknowledgments

OmniRoute stands on the shoulders of giants. It started as a fork of 9router and a TypeScript port of the Go project CLIProxyAPI โ€” and from there, every subsystem below was inspired by an open-source project that got there first. Each one shaped a concrete piece of OmniRoute. This is our thank-you to all of them. ๐Ÿ™

โญ star counts as of June 2026 โ€” go give these projects a star.

๐Ÿงฌ Lineage & gateway

ProjectโญHow it inspired OmniRoute
9router ยท decolua19.0kThe original project this fork is built on โ€” extended here with multi-modal APIs and a full TypeScript rewrite.
CLIProxyAPI ยท router-for-me38.8kThe Go implementation that inspired this JavaScript / TypeScript port.
LiteLLM ยท BerriAI52.1kThe AI gateway whose public pricing dataset feeds our cost-tracking sync and whose provider-normalization model informed our routing.

๐Ÿ—œ๏ธ Context & token compression โ€” engines

ProjectโญHow it inspired OmniRoute
Caveman ยท JuliusBrussee78.2kThe viral "why use many token when few token do trick" project โ€” its caveman-speak philosophy powers our standard compression mode and 30+ filler/condensation rules.
RTK โ€“ Rust Token Killer ยท rtk-ai67.3kHigh-performance command-output compression โ€” inspired our RTK engine, JSON filter DSL, raw-output recovery and the stacked RTK โ†’ Caveman pipeline.
headroom ยท headroomlabs-ai54.5kReversible context-compression (SmartCrusher) โ€” inspired our headroom engine and the ccr retrieve-marker pattern.
LLMLingua ยท Microsoft6.4kPrompt-compression research (LLMLingua / LLMLingua-2) โ€” inspired our async, code-safe, fail-open llmlingua engine.
llmlingua-2-js ยท atjsh28The JS/ONNX port (MobileBERT / XLM-RoBERTa) used as the worker-thread backend for our LLMLingua engine.
Troglodita ยท Lenine Jรบnior16PT-BR token compression โ€” powers our pt-BR language pack: pleonasm reduction and filler removal tuned for Brazilian-Portuguese grammar.
ponytail ยท DietrichGebert68.8kThe viral "lazy senior dev" YAGNI-coder skill โ€” inspired our less-code Output Style: smallest-working-change steering that cuts generated code (the output-axis sibling to Caveman's terse prose).

๐Ÿงฉ Compact formats, token research & code-aware tooling

ProjectโญHow it inspired OmniRoute
TOON ยท toon-format24.7kToken-Oriented Object Notation โ€” its columnar, header-plus-rows model shaped our tabular compaction stage.
GCF โ€“ Graph Compact Format ยท Blackwell Systems14Schema-aware "JSON for LLMs" notation โ€” co-inspired our lossless homogeneous-array compaction with [N rows] markers.
token-optimizer-mcp ยท ooples421Brotli/SQLite cache + per-session context-delta โ€” inspired our session-dedup engine.
token-savior ยท Mibayy1.0kBash-output compaction + MCP profiles โ€” inspired our compression bail-out discipline and MCP tool-manifest reduction.
token-saver ยท ppgranger110Content-aware, per-file-type output compression with failure-aware bail-out โ€” validated our per-type dispatch and minimum-gain skip.
token-optimizer ยท alexgreensh1.5k"Find the ghost tokens" โ€” its offload + recoverable-handle pattern informed our CCR offload thinking.
TokenMizer ยท Shweta-Mishra-ai2A session-graph + cross-turn line-dedup blueprint that informed our session-dedup design.
OmniCompress ยท jessefreitas2Rust columnar-JSON + content-addressed retrieve + cross-message dedup โ€” validated our headroom/ccr/session-dedup engine design and the cache-stable "compressed form is position-independent" invariant.
mcp-compressor ยท Atlassian Labs89MCP tool-schema/description compression โ€” informed our MCP tool-manifest cardinality reduction.
RepoMapper ยท pdavis68181Aider-style repo-map ranking โ€” informed our repo-map / retrieval-ranking exploration.
quiet-shell-mcp ยท mrsimpson4Declarative shell-output reduction over MCP โ€” validated our declarative bash-output compaction.
ts-morph ยท David Sherret6.1kTypeScript Compiler API toolkit โ€” inspired our parser-based comment removal that preserves string, template and regex literals.

๐Ÿง  Memory & RAG

ProjectโญHow it inspired OmniRoute
Mem0 ยท mem0ai59.8kUniversal memory layer โ€” its proxy-as-write/read-boundary model shaped our memory architecture.
Letta (MemGPT) ยท letta-ai23.6kStateful agents with tiered memory โ€” inspired our Context Control & Recovery (CCR) tiered model.
WFGY ยท onestardao1.8kThe ProblemMap taxonomy of 16 recurring RAG/LLM failure modes โ€” the shared vocabulary in our troubleshooting guide.

๐Ÿ›ฐ๏ธ Traffic inspection, MITM & transparent proxy

ProjectโญHow it inspired OmniRoute
llm-interceptor ยท chouzz48MITM interception/analysis of coding-assistant โ†” LLM traffic โ€” our Traffic Inspector ports its SSE merge, conversation normalization, host passthrough and secret masking (MIT).
ProxyBridge ยท InterceptSuite5.3kTransparent per-process proxy routing โ€” inspired our crash-safe MITM teardown, socket idle-timeouts, /proc process attribution and TPROXY capture.

๐Ÿ“š Model data, observability & UI

ProjectโญHow it inspired OmniRoute
models.dev ยท SST / OpenCode5.6kOpen database of AI model specs, pricing and capabilities โ€” synced natively into our model catalog.
React Flow / xyflow ยท xyflow37.4kThe node-based graph library powering our real-time Compression Studio and Combo/Routing Studio.
LangGraph ยท LangChain36.1kLangGraph Studio's live workflow-graph visualization inspired our Studios' real-time cascade view.
Langfuse ยท Langfuse30.1kIts trace โ†’ span โ†’ generation observability model shaped our Compression Studio waterfall.
Kiali ยท Kiali3.6kIstio service-mesh observability โ€” inspired our circuit-breaker badges and error-edge visuals in the Routing/Combo Studio.
lobe-icons ยท LobeHub2.2kAI/LLM brand logos that render the provider icons across our dashboard.

๐Ÿ›ก๏ธ Security

ProjectโญHow it inspired OmniRoute
awesome-secure-defaults ยท tldrsec708A curated list of secure-by-default libraries that guides our security choices (Helmet.js, DOMPurify, ssrf-req-filter, safe-regex, Google Tink).

โค๏ธ Support

OmniRoute is free and open source, built and maintained in the open. If it saves you time or money, consider supporting development:

  • โญ Star the repo โ€” it genuinely helps visibility
  • ๐Ÿ’– GitHub Sponsors โ€” fund ongoing maintenance and new providers
  • ๐Ÿ› Report bugs and share feedback in Discussions

๐Ÿ“„ License

MIT License - see LICENSE for details.


โฌ† Back to top ยท Built with โค๏ธ for the open-source AI community.

OmniRoute v3.8.43 ยท Node โ‰ฅ22.0.0 ยท MIT License ยท omniroute.online