API Reference

June 29, 2026 · View on GitHub

🌐 Languages: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino | 🇨🇿 Čeština

Complete reference for all OmniRoute API endpoints.


Table of Contents


Chat Completions

POST /v1/chat/completions
Authorization: Bearer your-api-key
Content-Type: application/json

{
  "model": "cc/claude-opus-4-6",
  "messages": [
    {"role": "user", "content": "Write a function to..."}
  ],
  "stream": true
}

Custom Headers

HeaderDirectionDescription
X-OmniRoute-No-CacheRequestSet to true to bypass cache
x-omniroute-no-memoryRequestSet to true to skip memory + skills injection for this request (mirrors no-cache; avoids the per-call token/cost overhead)
X-OmniRoute-ProgressRequestSet to true for progress events
X-Session-IdRequestSticky session key for external session affinity
x_session_idRequestUnderscore variant also accepted (direct HTTP)
Idempotency-KeyRequestDedup key (5s window)
X-Request-IdRequestAlternative dedup key
X-OmniRoute-CacheResponseHIT or MISS (non-streaming)
X-OmniRoute-IdempotentResponsetrue if deduplicated
X-OmniRoute-ProgressResponseenabled if progress tracking on
X-OmniRoute-Session-IdResponseEffective session ID used by OmniRoute
X-OmniRoute-Request-IdResponseRequest correlation id (when known)
X-OmniRoute-VersionResponseOmniRoute build version (always present)
X-OmniRoute-Cost-SavedResponseUSD the cache avoided on a HIT (cache hits only)

Nginx note: if you rely on underscore headers (for example x_session_id), enable underscores_in_headers on;.

Cost telemetry headers: non-streaming success responses also carry the X-OmniRoute-* cost-telemetry set — X-OmniRoute-Response-Cost (USD, fixed 10 decimals; 0.0000000000 for free/unpriced), X-OmniRoute-Tokens-In / X-OmniRoute-Tokens-Out, X-OmniRoute-Model, X-OmniRoute-Provider, X-OmniRoute-Latency-Ms, X-OmniRoute-Cache-Hit, and X-OmniRoute-Fallback-Attempts (only when > 0), plus X-OmniRoute-Request-Id and X-OmniRoute-Version. These are emitted by chat completions, /v1/responses, /v1/messages, and the media endpoints/v1/embeddings, /v1/images/generations, /v1/audio/speech, /v1/audio/transcriptions, /v1/rerank, /v1/videos/generations, /v1/music/generations, and /v1/moderations (always cost 0). Media cost is computed per modality (per-image, per-second, per-character, per search-unit) when pricing is available, otherwise 0 (fail-open).

Cache-hit cost semantics: on a semantic-cache HIT (X-OmniRoute-Cache-Hit: true) no upstream call is made, so X-OmniRoute-Response-Cost is 0.0000000000 (the incremental cost of serving the hit). The original/would-have-been cost is reported separately in X-OmniRoute-Cost-Saved. Billing consumers should sum X-OmniRoute-Response-Cost (hits cost nothing); cache analytics can aggregate X-OmniRoute-Cost-Saved.

x-omniroute-compression

Per-request override of the compression plan. Highest precedence — beats the routing-combo override, the active profile, auto-trigger, and the panel Default. Values:

ValueEffect
offNo compression for this request.
defaultThe panel-derived Default profile (ignores the active profile).
engine:<id>A single engine when enabled, e.g. engine:rtk.
<combo>A named combo, matched by name (case-insensitive) first, then by id.

Notes:

  • Unknown values are ignored (the request is never rejected); resolution falls through to the normal operator precedence.
  • If multiple combos share a name, pass the combo id for a deterministic match.
  • A combo whose name is off or default cannot be selected by name (those keywords are interpreted first); reference such a combo by its id.
  • The master compression switch is a hard gate: when compression is disabled globally, this header cannot enable it.

The applied plan is echoed back in the response header:

X-OmniRoute-Compression: <mode>; source=<source>

where <source> is one of request-header, routing-override, active-profile, auto-trigger, default, or off.


Embeddings

POST /v1/embeddings
Authorization: Bearer your-api-key
Content-Type: application/json

{
  "model": "nebius/Qwen/Qwen3-Embedding-8B",
  "input": "The food was delicious"
}

Available providers: Nebius, OpenAI, Mistral, Together AI, Fireworks, NVIDIA, OpenRouter, GitHub Models.

# List all embedding models
GET /v1/embeddings

Image Generation

POST /v1/images/generations
Authorization: Bearer your-api-key
Content-Type: application/json

{
  "model": "openai/gpt-image-2",
  "prompt": "A beautiful sunset over mountains",
  "size": "1024x1024"
}

Available providers: OpenAI (GPT Image 2), xAI (Grok Image), Together AI (FLUX), Fireworks AI, Nebius (FLUX), Hyperbolic, NanoBanana, OpenRouter, SD WebUI (local), ComfyUI (local).

# List all image models
GET /v1/images/generations

List Models

GET /v1/models
Authorization: Bearer your-api-key

 Returns all chat, embedding, and image models + combos in OpenAI format

No-thinking model variants

For thinking-capable Claude models, /v1/models also advertises a no-thinking variant whose id is prefixed with claude-3-omniroute-no-thinking/:

claude-3-omniroute-no-thinking/<provider>/<model>

Selecting this id (e.g. in a Claude Code config that always attaches a thinking block) resolves back to the real <provider>/<model> with reasoning suppressed — thinking:{type:"disabled"} on the /v1/messages path, or the reasoning/reasoning_effort fields dropped on the /v1/chat/completions path. The variant is only listed for Claude-family models that support thinking and honor disabled (so e.g. adaptive-only models that reject disabled are excluded). Operators can force the variant on or off per model via ModelSpec.noThinkingAlias.


Compatibility Endpoints

MethodPathFormat
POST/v1/chat/completionsOpenAI
POST/v1/messagesAnthropic
POST/v1/responsesOpenAI Responses
POST/v1/embeddingsOpenAI
POST/v1/images/generationsOpenAI Images
POST/v1/images/editsOpenAI Images (edit/inpaint)
POST/v1/videos/generationsOpenAI-style video generation
POST/v1/music/generationsOpenAI-style music generation
POST/v1/audio/transcriptionsOpenAI Audio (STT)
POST/v1/audio/speechOpenAI TTS (returns audio body)
POST/v1/rerankCohere/Voyage-style rerank
POST/v1/moderationsOpenAI Moderations
GET/v1/modelsOpenAI
POST/v1/messages/count_tokensAnthropic
GET/v1beta/modelsGemini
POST/v1beta/models/{...path}Gemini generateContent
POST/v1/api/chatOllama
GET/api/v1/vscode/{token}/OpenAI catalog alias
GET/api/v1/vscode/{token}/modelsOpenAI models alias
POST/api/v1/vscode/{token}/chat/completionsOpenAI tokenized alias
POST/api/v1/vscode/{token}/responsesOpenAI Responses tokenized alias
POST/api/v1/vscode/{token}/api/chatOllama tokenized alias
GET/api/v1/vscode/{token}/api/tagsOllama tags tokenized alias

All POST routes follow the same shape: Bearer your-api-key + Zod-validated JSON body (v1RerankSchema, v1ModerationSchema, v1AudioSpeechSchema, etc., see src/shared/validation/schemas.ts). 4xx is returned on schema failure.

For clients that cannot attach Authorization: Bearer ..., OmniRoute also accepts API keys in the URL via either query-string compatibility (?token=..., ?apiKey=..., ?api_key=..., ?key=...) or the dedicated /api/v1/vscode/{token}/... endpoints documented below.

# Rerank
POST /v1/rerank      { "model": "cohere/rerank-3", "query": "...", "documents": ["..."] }

# Moderations
POST /v1/moderations { "model": "omni-moderation-latest", "input": "..." }

# TTS — returns audio/mpeg (or requested format) body
POST /v1/audio/speech { "model": "openai/tts-1", "input": "Hello", "voice": "alloy" }

# Image edit (multipart)
POST /v1/images/edits  -F image=@input.png -F prompt="..." -F mask=@mask.png

# Video / music generation (provider-prefixed model id)
POST /v1/videos/generations { "model": "runway/gen-3", "prompt": "..." }
POST /v1/music/generations  { "model": "suno/v3.5",   "prompt": "..." }

Dedicated Provider Routes

POST /v1/providers/{provider}/chat/completions
POST /v1/providers/{provider}/embeddings
POST /v1/providers/{provider}/images/generations

The provider prefix is auto-added if missing. Mismatched models return 400.


Files API

OpenAI-compatible files endpoint for batch input/output and file-purpose uploads.

MethodPathDescription
POST/v1/filesUpload a file (multipart: file, purpose, expires_after[anchor], expires_after[seconds]) — 512 MiB max
GET/v1/filesList files for the authenticated API key
GET/v1/files/[id]Retrieve a file's metadata
DELETE/v1/files/[id]Delete a file
GET/v1/files/[id]/contentStream the raw file body back

Auth: Bearer API key — files are scoped per-API-key via getApiKeyRequestScope.


Batches API

OpenAI-compatible batch processing.

MethodPathDescription
POST/v1/batchesCreate batch — body validated by v1BatchCreateSchema (input_file_id, endpoint, completion_window)
GET/v1/batchesList batches
GET/v1/batches/[id]Retrieve batch status + request_counts
DELETE/v1/batches/[id]Delete a finished/failed batch
POST/v1/batches/[id]/cancelCancel an in-progress batch

Auth: Bearer API key. Batches are scoped per-API-key.


Search API

Web/search provider abstraction (Tavily, Brave, Exa, Serper, etc.).

MethodPathDescription
GET/v1/searchList configured search providers + capabilities
POST/v1/searchRun a search query — body validated by v1SearchSchema, supports caching/coalescing
GET/v1/search/analyticsPer-provider hit/latency/cache stats

Auth: Bearer API key (extractApiKey + isValidApiKey). Search policy enforced via enforceApiKeyPolicy.


WebSocket Streaming

GET /v1/ws?handshake=1

Validates a WebSocket upgrade handshake and returns the wire protocol example messages (request, cancel). Actual WS frames are handled by the bundled WS server outside the Next.js route table.

Auth: Bearer API key during handshake.

Responses API over WebSocket (codex only)

# Same host:port as the HTTP API (default 20128); upgrade the connection:
wscat -c "ws://localhost:20128/v1/responses?api_key=<OMNIROUTE_API_KEY>"
# (or: -H "Authorization: Bearer <OMNIROUTE_API_KEY>")

# First frame MUST be response.create:
{ "type": "response.create", "model": "gpt-5.5", "input": [ { "role": "user", "content": "hi" } ] }

A Responses-API-over-WebSocket proxy is wired exclusively to codex (ChatGPT backend). It listens on the same port as the API/dashboard at paths /v1/responses, /responses, and /api/v1/responses. On the first response.create frame it authenticates + prepares via the internal codex-responses-ws bridge, selects a codex OAuth connection, and tunnels to wss://chatgpt.com/backend-api/codex/responses via the wreq-js transport. Non-codex models are rejected (codex_ws_provider_required). For quota-share routing use model: "qtSd/<group>/codex/<model>". Implemented in app/server-ws.mjs + scripts/dev/responses-ws-proxy.mjs + src/app/api/internal/codex-responses-ws/route.ts.

Auth: Bearer API key during handshake. The bundled HTTP server (server-ws.mjs) must be the active entrypoint (it is, by default, when app/server-ws.mjs exists).

Model id: use the bare ChatGPT id (no codex/ prefix)

The OpenAI Codex CLI validates the model name client-side when supports_websockets = true and rejects provider-prefixed ids like codex/gpt-5.5 (The 'codex/gpt-5.5' model is not supported when using Codex with a ChatGPT account). Send the bare id (e.g. gpt-5.5). OmniRoute's bridge is codex-only, so it re-resolves a bare id as a codex model (resolveCodexWsModelInfo) before tunneling upstream — even though a bare gpt-5.5 would otherwise route to another provider over HTTP.

Configuring the OpenAI Codex CLI

Point the Codex CLI at OmniRoute by adding a custom provider with WebSocket support to ~/.codex/config.toml (use a separate CODEX_HOME to avoid touching an existing config):

model = "gpt-5.5"                 # bare id — NOT "codex/gpt-5.5"
model_provider = "omniroute"

[model_providers.omniroute]
name = "OmniRoute (WS)"
base_url = "http://localhost:20128/v1"   # no trailing slash; the WS URL is derived (use https/wss in production)
wire_api = "responses"                    # only supported value since Feb 2026
supports_websockets = true                # enables the Responses-over-WS transport
env_key = "OMNIROUTE_API_KEY"             # holds the OmniRoute API key (Bearer)
export OMNIROUTE_API_KEY=sk-...           # an OmniRoute API key (any key if REQUIRE_API_KEY=false)
codex exec "Responda apenas: PONG"

The CLI upgrades base_url + /responses to a WebSocket and OmniRoute tunnels it to the selected codex OAuth connection. Validated end-to-end against the local server: ChatGPT returns codex.rate_limits + response.created and streams the completion.


Quotas & Issues Reporting

MethodPathDescription
GET/v1/quotas/checkPre-validate quota for a provider + accountId before issuing a registered key
POST/v1/issues/reportReport a quota/key issuance failure to GitHub (requires GITHUB_ISSUES_REPO + token)

Auth: Bearer API key (isAuthenticated).


Semantic Cache

# Get cache stats
GET /api/cache/stats

# Clear all caches
DELETE /api/cache/stats

Response example:

{
  "semanticCache": {
    "memorySize": 42,
    "memoryMaxSize": 500,
    "dbSize": 128,
    "hitRate": 0.65
  },
  "idempotency": {
    "activeKeys": 3,
    "windowMs": 5000
  }
}

Dashboard & Management

Authentication

EndpointMethodDescription
/api/auth/loginPOSTLogin
/api/auth/logoutPOSTLogout
/api/settings/require-loginGET/PUTToggle login required

Provider Management

EndpointMethodDescription
/api/providersGET/POSTList / create providers
/api/providers/[id]GET/PUT/DELETEManage a provider
/api/providers/[id]/testPOSTTest provider connection
/api/providers/[id]/modelsGETList provider models
/api/providers/validatePOSTValidate provider config
/api/provider-nodes*VariousProvider node management
/api/provider-modelsGET/POST/PATCH/DELETECustom models (add, update, hide/show, delete)

OAuth Flows

EndpointMethodDescription
/api/oauth/[provider]/[action]VariousProvider-specific OAuth

Routing & Config

EndpointMethodDescription
/api/models/aliasGET/POSTModel aliases
/api/models/catalogGETAll models by provider + type
/api/combos*VariousCombo management
/api/keys*VariousAPI key management
/api/pricingGETModel pricing

Usage & Analytics

EndpointMethodDescription
/api/usage/historyGETUsage history
/api/usage/logsGETUsage logs
/api/usage/request-logsGETRequest-level logs
/api/usage/[connectionId]GETPer-connection usage
/api/usage/token-limitsGET/POST/DELETEPer-API-key token-limit budgets

Settings

EndpointMethodDescription
/api/settingsGET/PUT/PATCHGeneral settings
/api/settings/proxyGET/PUTNetwork proxy config
/api/settings/proxy/testPOSTTest proxy connection
/api/settings/ip-filterGET/PUTIP allowlist/blocklist
/api/settings/thinking-budgetGET/PUTReasoning token budget
/api/settings/system-promptGET/PUTGlobal system prompt
/api/settings/compressionGET/PUTGlobal compression config
/api/settings/purge-request-historyPOSTClear request log rows and local call-log artifacts

Context & Compression

EndpointMethodDescription
/api/compression/previewPOSTPreview off/lite/standard/aggressive/ultra/RTK/stacked compression
/api/compression/language-packsGETList available Caveman language packs
/api/compression/rulesGETList Caveman rule metadata
/api/context/caveman/configGET/PUTCaveman-specific settings alias
/api/context/rtk/configGET/PUTRTK-specific settings, including custom filters and raw-output retention
/api/context/rtk/filtersGETRTK filter catalog and custom-filter diagnostics
/api/context/rtk/testPOSTRun RTK preview/test against a text payload
/api/context/rtk/raw-output/[id]GETRead retained redacted raw output by pointer id
/api/context/combosGET/POSTCompression combo list/create
/api/context/combos/[id]GET/PUT/DELETECompression combo detail/update/delete
/api/context/combos/[id]/assignmentsGET/PUTAssign compression combos to routing combos
/api/context/analyticsGETCompression analytics alias

Monitoring

EndpointMethodDescription
/api/sessionsGETActive session tracking
/api/rate-limitsGETPer-account rate limits
/api/monitoring/healthGETHealth check + provider summary (catalogCount, configuredCount, activeCount, monitoredCount)
/api/cache/statsGET/DELETECache stats / clear

Backup & Export/Import

EndpointMethodDescription
/api/db-backupsGETList available backups
/api/db-backupsPUTCreate a manual backup
/api/db-backupsPOSTRestore from a specific backup
/api/db-backups/exportGETDownload database as .sqlite file
/api/db-backups/importPOSTUpload .sqlite file to replace database
/api/db-backups/exportAllGETDownload full backup as .tar.gz archive

Cloud Sync

EndpointMethodDescription
/api/sync/cloudVariousCloud sync operations
/api/sync/initializePOSTInitialize sync
/api/cloud/*VariousCloud management

Tunnels

EndpointMethodDescription
/api/tunnels/cloudflaredGETRead Cloudflare Quick Tunnel install/runtime status for the dashboard
/api/tunnels/cloudflaredPOSTEnable or disable the Cloudflare Quick Tunnel (action=enable/disable)
/api/tunnels/ngrokGETRead ngrok Tunnel runtime status for the dashboard
/api/tunnels/ngrokPOSTEnable or disable the ngrok Tunnel (action=enable/disable)

CLI Tools

EndpointMethodDescription
/api/cli-tools/claude-settingsGETClaude CLI status
/api/cli-tools/codex-settingsGETCodex CLI status
/api/cli-tools/droid-settingsGETDroid CLI status
/api/cli-tools/openclaw-settingsGETOpenClaw CLI status
/api/cli-tools/runtime/[toolId]GETGeneric CLI runtime

CLI responses include: installed, runnable, command, commandPath, runtimeMode, reason.

ACP Agents

EndpointMethodDescription
/api/acp/agentsGETList all detected agents (built-in + custom) with status
/api/acp/agentsPOSTAdd custom agent or refresh detection cache
/api/acp/agentsDELETERemove a custom agent by id query param

GET response includes agents[] (id, name, binary, version, installed, protocol, isCustom) and summary (total, installed, notFound, builtIn, custom).

Resilience & Rate Limits

EndpointMethodDescription
/api/resilienceGET/PATCHGet/update request queue, connection cooldown, provider breaker, and wait settings
/api/resilience/resetPOSTReset provider circuit breakers
/api/resilience/model-cooldownsGETList active per-(provider, connection, model) lockouts, sorted by remaining time
/api/resilience/model-cooldownsDELETEClear a model lockout — body {provider, model} or {all: true} to wipe everything
/api/rate-limitsGETPer-account rate limit status
/api/rate-limitGETGlobal rate limit configuration

All four /api/resilience/* routes require management auth (requireManagementAuth). See Resilience (extended) for a full breakdown of provider breaker vs connection cooldown vs model lockout.

Evals

EndpointMethodDescription
/api/evalsGET/POSTList eval suites / run evaluation

Policies

EndpointMethodDescription
/api/policiesGET/POST/DELETEManage routing policies

Compliance

EndpointMethodDescription
/api/compliance/audit-logGETCompliance audit log (last N)

v1beta (Gemini-Compatible)

EndpointMethodDescription
/v1beta/modelsGETList models in Gemini format
/v1beta/models/{...path}POSTGemini generateContent endpoint

These endpoints mirror Gemini's API format for clients that expect native Gemini SDK compatibility.

Internal / System APIs

EndpointMethodDescription
/api/initGETApplication initialization check (used on first run)
/api/tagsGETOllama-compatible model tags (for Ollama clients)
/api/restartPOSTTrigger graceful server restart
/api/shutdownPOSTTrigger graceful server shutdown
/api/system/env/repairPOSTRepair OAuth provider environment variables

Note: These endpoints are used internally by the system or for Ollama client compatibility. They are not typically called by end users.

OAuth Environment Repair (v3.6.1+)

POST /api/system/env/repair
Content-Type: application/json

{
  "provider": "claude-code"
}

Repairs missing or corrupted OAuth environment variables for a specific provider. Returns:

{
  "success": true,
  "repaired": ["CLAUDE_CODE_OAUTH_CLIENT_ID", "CLAUDE_CODE_OAUTH_CLIENT_SECRET"],
  "backupPath": "/home/user/.omniroute/backups/env-repair-2026-04-11.bak"
}

Audio Transcription

POST /v1/audio/transcriptions
Authorization: Bearer your-api-key
Content-Type: multipart/form-data

Transcribe audio files using Deepgram or AssemblyAI.

Request:

curl -X POST http://localhost:20128/v1/audio/transcriptions \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@recording.mp3" \
  -F "model=deepgram/nova-3"

Response:

{
  "text": "Hello, this is the transcribed audio content.",
  "task": "transcribe",
  "language": "en",
  "duration": 12.5
}

Supported providers: deepgram/nova-3, assemblyai/best.

Supported formats: mp3, wav, m4a, flac, ogg, webm.


Ollama Compatibility

For clients that use Ollama's API format:

# Chat endpoint (Ollama format)
POST /v1/api/chat

# Model listing (Ollama format)
GET /api/tags

Requests are automatically translated between Ollama and internal formats.

Tokenized VS Code / Headerless Aliases

Use these aliases when an integration cannot inject an Authorization header and needs the API key embedded in the base URL.

# OpenAI-style catalog alias
GET /api/v1/vscode/{token}/
GET /api/v1/vscode/{token}/models

# OpenAI-style chat aliases
POST /api/v1/vscode/{token}/chat/completions
POST /api/v1/vscode/{token}/responses

# Ollama-style aliases
POST /api/v1/vscode/{token}/api/chat
GET /api/v1/vscode/{token}/api/tags

Example:

curl https://your-host.example/api/v1/vscode/YOUR_API_KEY/models
curl -X POST https://your-host.example/api/v1/vscode/YOUR_API_KEY/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"hello"}]}'

Notes:

  • The tokenized aliases reuse the same handlers as /v1/* and /api/tags; response shapes stay identical.
  • Prefer Authorization: Bearer ... whenever the client supports custom headers.
  • URL-based tokens may appear in reverse-proxy logs, browser history, and telemetry outside OmniRoute. Treat them as a compatibility option, not the default authentication mode.

Telemetry

# Get latency telemetry summary (p50/p95/p99 per provider)
GET /api/telemetry/summary

Response:

{
  "providers": {
    "claudeCode": { "p50": 245, "p95": 890, "p99": 1200, "count": 150 },
    "github": { "p50": 180, "p95": 620, "p99": 950, "count": 320 }
  }
}

Budget

# Get budget status for all API keys
GET /api/usage/budget

# Set or update a budget
POST /api/usage/budget
Content-Type: application/json

{
  "apiKeyId": "key-123",
  "dailyLimitUsd": 5.00,
  "weeklyLimitUsd": 30.00,
  "monthlyLimitUsd": 100.00,
  "warningThreshold": 0.8,
  "resetInterval": "monthly"
}

Schema notes (setBudgetSchema): apiKeyId is required; at least one of dailyLimitUsd, weeklyLimitUsd, or monthlyLimitUsd must be greater than zero. Optional fields: warningThreshold (0–1), resetInterval (daily | weekly | monthly), resetTime (HH:MM). The legacy {keyId, limit, period} shape returns 400 Bad Request.

Token Limits

Per-API-key token budgets (distinct from the USD-based Budget above). Enforced inline on the request path: when a key's current window usage reaches its limit, requests are rejected with 429 Too Many Requests. Limits can be scoped to a specific model, a provider, or applied globally across the key; when several limits match a request, the most restrictive one wins.

# List a key's token limits (includes live window usage)
GET /api/usage/token-limits?apiKeyId=key-123

# Create or update a token limit
POST /api/usage/token-limits
Content-Type: application/json

{
  "apiKeyId": "key-123",
  "scopeType": "model",
  "scopeValue": "openai/gpt-4o",
  "tokenLimit": 1000000,
  "resetInterval": "monthly",
  "enabled": true
}

# Delete a token limit by id
DELETE /api/usage/token-limits?id=tl-abc

Schema notes (setTokenLimitSchema): apiKeyId and scopeType (model | provider | global) are required. scopeValue is required unless scopeType is global (e.g. a model id for model scope, a provider id for provider scope). tokenLimit must be a positive integer (coerced from string). Optional: id (omit to create, supply to update), resetInterval (daily | weekly | monthly, default monthly), resetTime (HH:MM), enabled (default true). GET responses enrich each limit with tokensUsed, remaining, windowStart, periodStartAt, and nextResetAt. This is a management-class endpoint (auth enforced centrally by the authz pipeline).

Request Processing

  1. Client sends request to /v1/*
  2. Route handler calls handleChat, handleEmbedding, handleAudioTranscription, or handleImageGeneration
  3. Model is resolved (direct provider/model or alias/combo)
  4. Credentials selected from local DB with account availability filtering
  5. For chat: handleChatCore checks semantic/signature cache and resolves combo compression settings
  6. Proactive compression runs before provider translation when enabled (lite, Caveman, RTK, or stacked)
  7. Provider executor sends upstream request
  8. Response translated back to client format (chat) or returned as-is (embeddings/images/audio)
  9. Usage, compression analytics, and request logs are recorded
  10. Fallback applies on errors according to combo rules

Full architecture reference: ARCHITECTURE.md


Combo Management

Higher-level routing combos (already summarized under /api/combos*) can also be mapped 1:1 from a model id pattern, allowing transparent redirection of an OpenAI-style model id to a combo.

MethodPathDescription
GET/api/model-combo-mappingsList all model→combo mappings
POST/api/model-combo-mappingsCreate mapping — body: {pattern, comboId, priority?, enabled?, description?}
GET/api/model-combo-mappings/[id]Retrieve a single mapping
PUT/api/model-combo-mappings/[id]Update fields of an existing mapping
DELETE/api/model-combo-mappings/[id]Remove a mapping

Auth: management session/API key (requireManagementAuth).


Webhooks

Outbound webhook subscriptions for OmniRoute events (request completion, quota exhaustion, key rotation, etc.).

MethodPathDescription
GET/api/webhooksList webhooks (secrets are masked to <prefix>...)
POST/api/webhooksCreate webhook — body: {url, events?: ["*"], secret?, description?}
GET/api/webhooks/[id]Retrieve a webhook
PUT/api/webhooks/[id]Update url/events/secret/description
DELETE/api/webhooks/[id]Remove a webhook
POST/api/webhooks/[id]/testSend a test payload to the webhook URL and return delivery status

Auth: management session/API key (requireManagementAuth).


Registered Keys (Auto-Management)

Used by the auto-key management subsystem to issue and rotate API keys against a backing provider/account, with daily/hourly quotas.

MethodPathDescription
GET/api/v1/registered-keysList registered keys (masked prefix only)
POST/api/v1/registered-keysIssue a new registered key — body: {name, provider?, accountId?, idempotencyKey?, expiresAt?, dailyBudget?, hourlyBudget?}. Returns the raw key once. Returns 429 on quota refusal.
GET/api/v1/registered-keys/[id]Retrieve a registered key's metadata (no raw material)
DELETE/api/v1/registered-keys/[id]Revoke a registered key
POST/api/v1/registered-keys/[id]/revokeExplicit revoke endpoint (same effect as DELETE)

Auth: Bearer API key (isAuthenticated). See also /v1/quotas/check and /v1/issues/report.


Agents Protocol

Cloud agent tasks (Claude Code, Codex Cloud, OpenHands, etc.) executed remotely on behalf of OmniRoute users.

MethodPathDescription
GET/api/v1/agents/tasksList tasks — optional ?provider=, ?status=, ?limit= (1–500, default 50)
POST/api/v1/agents/tasksCreate task — body validated by CreateCloudAgentTaskSchema (providerId, prompt, source, options?). Returns 201 with task envelope
DELETE/api/v1/agents/tasks?id=...Delete a task
GET/api/v1/agents/tasks/[id]Read task — synchronously refreshes status from the upstream cloud agent when an external_id is set
POST/api/v1/agents/tasks/[id]Discriminated action: {action: "approve"}, {action: "message", message}, or {action: "cancel"}
DELETE/api/v1/agents/tasks/[id]Delete a specific task by id

Auth: management auth required on every method (requireCloudAgentManagementAuth). Prior to v3.8.0 these were unauthenticated — see commit 588a0333 for the breaking change.

# Create a Claude Code cloud task
curl -X POST http://localhost:20128/api/v1/agents/tasks \
  -H "Authorization: Bearer your-management-key" \
  -H "Content-Type: application/json" \
  -d '{"providerId":"claude-code-cloud","prompt":"Fix the failing test","source":{"repo":"...","branch":"..."}}'

Management Proxies

Outbound HTTP(S)/SOCKS proxies that can be assigned to providers, accounts, or globally.

MethodPathDescription
GET/api/v1/management/proxiesList proxies (with ?id= returns one; with ?id=&where_used=1 returns the assignment graph)
POST/api/v1/management/proxiesCreate proxy — body validated by createProxyRegistrySchema
PATCH/api/v1/management/proxiesUpdate proxy — body validated by updateProxyRegistrySchema (requires id)
DELETE/api/v1/management/proxies?id=...&force=1Delete proxy (use force=1 to detach assignments)
GET/api/v1/management/proxies/assignmentsList assignments — filterable by proxy_id, scope, scope_id; pass resolve_connection_id=<id> to resolve the active proxy for a connection
PUT/api/v1/management/proxies/assignmentsAssign — body validated by proxyAssignmentSchema ({scope, scopeId?, proxyId?}). Clears dispatcher cache
PUT/api/v1/management/proxies/bulk-assignBulk-assign — body validated by bulkProxyAssignmentSchema ({scope, scopeIds[], proxyId?})
GET/api/v1/management/proxies/health?hours=24Aggregate proxy health (success/fail counts, latency) over a window

Auth: management session/API key on every route (requireManagementAuth).

The task description's POST /api/v1/management/proxies/[id]/assignments and POST /api/v1/management/proxies/[id]/health are served by the flat /assignments and /health routes shown above — there are no per-id subroutes in the codebase.


Resilience (extended)

OmniRoute exposes three independent temporary-failure mechanisms; the management endpoints below let operators read and override them:

ScopeState storageReadReset / clear
Provider breakerdomain_circuit_breakers + in-memory/api/monitoring/healthPOST /api/resilience/reset
Connection cooldownrateLimitedUntil on provider connections/api/rate-limits, /api/providers/[id](re-enables lazily; clear via provider PUT)
Model lockoutIn-memory model-availability registryGET /api/resilience/model-cooldownsDELETE /api/resilience/model-cooldowns

PATCH /api/resilience accepts provider breaker overrides under providerBreaker.oauth and providerBreaker.apikey. Each profile supports degradationThreshold, failureThreshold, and resetTimeoutMs; the same fields are exposed in Dashboard → Settings → Resilience.

# Clear a single model lockout
curl -X DELETE http://localhost:20128/api/resilience/model-cooldowns \
  -H "Cookie: auth_token=..." \
  -H "Content-Type: application/json" \
  -d '{"provider":"openai","model":"gpt-4o-mini"}'

# Wipe every lockout
curl -X DELETE http://localhost:20128/api/resilience/model-cooldowns \
  -H "Cookie: auth_token=..." \
  -d '{"all":true}'

Full conceptual reference and breaker defaults: see CLAUDE.md → "Resilience Runtime State".


Skills

Skill framework for extending OmniRoute with custom executable handlers, plus marketplace integrations.

MethodPathDescription
GET/api/skillsList installed skills — filterable by ?q=, ?mode=on|off|auto, ?source=skillsmp|skillssh|local, paginated
GET/api/skills/[id]Retrieve one skill
PUT/api/skills/[id]Update skill (name, description, mode, schema, handler, tags)
DELETE/api/skills/[id]Uninstall a skill
POST/api/skills/installInstall a skill from a raw manifest — body: {name, version, description, schema:{input, output}, handlerCode, apiKeyId?}
GET/api/skills/executionsList recent skill executions (audit trail with inputs/outputs/duration)
GET/api/skills/marketplace?q=...Search/popular list from the SkillsMP marketplace (requires skillsmpApiKey setting)
POST/api/skills/marketplace/installInstall a skill by id from SkillsMP
GET/api/skills/skillssh?q=&limit=Search the skills.sh registry
POST/api/skills/skillssh/installInstall a skill by id from skills.sh

Auth: management session/API key. Marketplace search routes accept either management auth or a Bearer API key (isAuthenticated).


Memory

Persistent conversational/factual memory store, scoped per API key / session.

MethodPathDescription
GET/api/memoryList memories — ?apiKeyId=, ?type=, ?sessionId=, ?q=, with offset/limit or page/limit pagination
POST/api/memoryCreate memory — body validated by Zod: {content, key, type?, sessionId?, apiKeyId?, metadata?, expiresAt?}
GET/api/memory/[id]Retrieve one memory
DELETE/api/memory/[id]Delete a memory
GET/api/memory/healthMemory subsystem health (DB connectivity, embeddings backend, vector index status)

Auth: management session/API key (requireManagementAuth). type enum: FACTUAL, EPISODIC, SEMANTIC, PROCEDURAL (see MemoryType in src/lib/memory/types.ts).


MCP Server

OmniRoute ships an embedded Model Context Protocol server with 3 transports (stdio, SSE, streamable-http) and scoped tools. The dashboard endpoints below read status/audit data and proxy the HTTP transports.

| Method | Path | Description | | ------ | ---------------------- | ------------------------------------------------------------------------------------------------ | -------------------- | | GET | /api/mcp/status | Heartbeat, transport, online state, last call, top tools, 24h success rate | | GET | /api/mcp/tools | List of MCP tools with name, description, scopes, phase, auditLevel, sourceEndpoints | | GET | /api/mcp/sse | Open SSE stream for the SSE transport (returns 503 if MCP disabled or transport mismatch) | | POST | /api/mcp/sse | Send JSON-RPC frame on the SSE transport | | GET | /api/mcp/stream | Open SSE side of the Streamable HTTP transport (server-initiated messages) | | POST | /api/mcp/stream | Send JSON-RPC frame on the Streamable HTTP transport | | DELETE | /api/mcp/stream | End a Streamable HTTP session | | GET | /api/mcp/audit | Query audit log — ?limit=, ?offset=, ?tool=, ?success=true | false, ?apiKeyId= | | GET | /api/mcp/audit/stats | Aggregate audit stats (totals, success rate, avg duration, top tools) |

Auth: the sse/stream transports honor the MCP-specific auth surface (Bearer API key with mcp scope); the status/tools/audit* routes are readable from the dashboard (no extra auth required beyond reaching the dashboard host).

Both HTTP transports are gated by settings.mcpEnabled and settings.mcpTransport — a transport mismatch returns 400, an MCP disabled state returns 503.


A2A Server

OmniRoute exposes an A2A (Agent-to-Agent) JSON-RPC 2.0 endpoint plus a REST wrapper for inspection/dashboard use.

JSON-RPC

POST /a2a
Authorization: Bearer your-api-key   # optional unless OMNIROUTE_API_KEY is set
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "message/send",
  "params": {
    "skill": "smart-routing",
    "messages": [{"role": "user", "content": "Route this coding task"}]
  }
}

Supported methods (all gated on settings.a2aEnabled):

MethodDescription
message/sendSynchronous skill execution; returns {task, artifacts, metadata}
message/streamStreaming SSE execution of the same skill set
tasks/getFetch a task by taskId
tasks/cancelCancel a task by taskId

Built-in skills: smart-routing, quota-management, provider-discovery, cost-analysis, health-report.

Agent Card

GET /.well-known/agent.json

Returns the public A2A agent card (name, description, capabilities, skill catalog, auth scheme) — cached publicly for 1h. No auth required.

REST helpers

MethodPathDescription
GET/api/a2a/statusA2A enabled + task stats + cached agent card summary
GET/api/a2a/tasksList tasks — ?state=submitted|working|completed|failed|cancelled, ?skill=, ?limit= (≤200), ?offset=
POST/api/a2a/tasks(Not implemented as a REST helper — create via JSON-RPC message/send)
GET/api/a2a/tasks/[id]Retrieve one task
POST/api/a2a/tasks/[id]/cancelCancel a task

Auth: the REST helpers run without management auth (dashboard-readable); the JSON-RPC /a2a route uses Bearer OMNIROUTE_API_KEY if configured.


Cloud, Evals & Assess

| Method | Path | Description | | ------ | ------------------------------- | ------------------------------------------------------------------------------------------------- | ----------------------------- | ----------------------------------- | | POST | /api/cloud/auth | Verify a Bearer key and return masked provider connections + model aliases for cloud sync clients | | POST | /api/cloud/credentials/update | Update encrypted credentials for a cloud-synced provider | | POST | /api/cloud/model/resolve | Resolve a logical model id to a concrete provider/model using the local routing table | | GET | /api/cloud/models/alias | List model aliases as exposed to cloud sync | | GET | /api/assess | Read latest assessment categorizations (per-provider/model) | | POST | /api/assess | Run an assessment — body: {scope: {type:"all"} | {type:"provider", providerId} | {type:"model", modelId}, trigger?} | | GET | /api/evals | List built-in eval suites + most recent runs | | POST | /api/evals | Trigger an eval run | | POST | /api/evals/suites | Create a custom eval suite — body validated by evalSuiteSaveSchema | | GET | /api/evals/suites/[id] | Retrieve a custom eval suite |

Auth: /api/cloud/auth validates a Bearer key directly; the other /api/cloud/*, /api/evals/*, and /api/assess routes require management session/API key. /api/assess POST uses validateBody with a discriminated-union scope schema.


ACP (Agent Client Protocol) Management

as child processes. These endpoints manage ACP agent detection and custom agent registration.

MethodPathDescription
GET/api/acp/agentsList all known CLI agents (built-in + custom) with installation status, version, binary
POST/api/acp/agentsRegister a custom ACP agent or refresh cache — body: {id, name, binary, versionCommand, providerAlias, spawnArgs, protocol} or {action: "refresh"}
DELETE/api/acp/agentsRemove a custom ACP agent — query param: ?id=<agentId>

Response example (GET /api/acp/agents):

{
  "agents": [
    {
      "id": "claude",
      "name": "Claude Code CLI",
      "binary": "claude",
      "version": "1.0.45",
      "installed": true,
      "protocol": "stdio",
      "providerAlias": "claude",
      "isCustom": false
    },
    {
      "id": "my-custom-cli",
      "name": "My Custom CLI",
      "installed": false,
      "protocol": "stdio",
      "providerAlias": "my-provider",
      "isCustom": true
    }
  ],
  "cacheTtlMs": 60000,
  "cacheAge": 1234
}

Auth: Requires management session (dashboard auth_token cookie) or a management-scoped API key.

See ACP Framework for full details.


Analytics & Observability

Real-time analytics endpoints for monitoring routing, compression, and provider diversity. These power the /dashboard/analytics/* pages.

Auto-routing analytics

MethodPathDescription
GET/api/analytics/auto-routingAggregate auto-routing stats: total calls, strategy distribution, tier distribution, top providers
GET/api/analytics/auto-routing?days=7Time-windowed stats (default 24h)

Response example:

{
  "window": "24h",
  "totalCalls": 1234,
  "strategyBreakdown": {
    "rules": 800,
    "cost": 200,
    "latency": 150,
    "sla-aware": 50,
    "lkgp": 34
  },
  "tierBreakdown": {
    "ultra": 100,
    "pro": 500,
    "standard": 400,
    "free": 234
  },
  "topProviders": [
    { "provider": "openai", "calls": 500, "avgLatencyMs": 850 },
    { "provider": "anthropic", "calls": 300, "avgLatencyMs": 1200 }
  ]
}

Compression analytics

MethodPathDescription
GET/api/analytics/compressionAggregate compression stats: tokens saved, savings %, mode distribution, engine usage

Response example:

{
  "window": "24h",
  "totalOriginalTokens": 5000000,
  "totalCompressedTokens": 3500000,
  "totalSavings": 1500000,
  "savingsPct": 30.0,
  "modeBreakdown": {
    "lite": 400,
    "standard": 600,
    "aggressive": 100,
    "ultra": 50,
    "rtk": 84
  },
  "engineBreakdown": {
    "caveman": 800,
    "rtk": 434
  }
}

Provider diversity tracking

MethodPathDescription
GET/api/analytics/diversityShannon entropy-based diversity tracking: prevents single points of failure by measuring provider spread

Response example:

{
  "window": "24h",
  "shannonEntropy": 2.45,
  "maxEntropy": 3.17,
  "diversityRatio": 0.77,
  "providerUsage": {
    "openai": 0.4,
    "anthropic": 0.25,
    "google": 0.2,
    "kiro": 0.15
  },
  "warnings": ["OpenAI accounts for 40% of traffic — consider diversifying"]
}

Auth: Requires management session or management-scoped API key.


Admin Operations

Admin-only endpoints for operational management.

MethodPathDescription
GET/api/admin/concurrencyRead current concurrency limits (global + per-provider)
POST/api/admin/concurrencyUpdate concurrency limits — body: {global?: number, perProvider?: Record<string, number>}

Auth: Requires management session with admin scope.


CLI Tools Management

Manage CLI tools that integrate with OmniRoute (antigravity, chipotle, commandCode, devin-cli, etc.). See Provider Reference for the full list.

MethodPathDescription
GET/api/cli-tools/all-statusesStatus of all CLI tools (installed, version, last seen)
GET/api/cli-tools/[id]/statusStatus of a specific CLI tool (id can be: antigravity, chipotle, commandCode, devin-cli, etc.)
POST/api/cli-tools/applyApply a CLI tool configuration to a provider connection
GET/api/cli-tools/backupsList CLI tool configuration backups
POST/api/cli-tools/backupsCreate a backup of all CLI tool configurations
POST/api/cli-tools/[id]/restoreRestore a CLI tool from a backup
GET/api/cli-tools/antigravity-mitmAntigravity MITM proxy status (the "antigravity-mitm" CLI tool)
POST/api/cli-tools/antigravity-mitm/aliasConfigure antigravity-mitm aliases

Auth: Requires management session.


Agent Skills

Manage AI agent skills (similar to OpenAI's custom GPTs but for agents).

MethodPathDescription
GET/api/agent-skillsList all agent skills (built-in + custom)
GET/api/agent-skills/[id]Get a specific agent skill
POST/api/agent-skillsCreate a custom agent skill — body: {name, description, prompt, model?, temperature?}
PUT/api/agent-skills/[id]Update a custom agent skill
DELETE/api/agent-skills/[id]Delete a custom agent skill
GET/api/agent-skills/[id]/rawGet raw prompt + metadata (no execution)
POST/api/agent-skills/generateAI-generate a new skill from a natural language description

Auth: Requires management session or management-scoped API key.


Cache Management

Manage the semantic cache and reasoning cache.

MethodPathDescription
GET/api/cacheCache overview: total entries, hit rate, size on disk
GET/api/cache/entriesList cached entries (with pagination)
DELETE/api/cache/entriesDelete cache entries (filter by query parameters)
GET/api/cache/statsDetailed cache statistics (per-provider, per-model)
GET/api/cache/reasoningReasoning cache status (for reasoning replay)
DELETE/api/cache/reasoningClear reasoning cache — query params: ?toolCallId=<id> (single) or ?provider=<p> or no params (all)

Auth: Requires management session.


Memory System

Manage persistent memory (FTS5 + vector embeddings).

MethodPathDescription
GET/api/memoryList memory entries (filter by scope, type, search query)
POST/api/memoryCreate a new memory entry — body: {scope, type, content, metadata?}
GET/api/memory/[id]Get a specific memory entry
PUT/api/memory/[id]Update a memory entry
DELETE/api/memory/[id]Delete a memory entry
GET/api/memory/searchSearch memory (FTS5 + vector)
POST/api/memory/clearClear memory entries (with filters)
GET/api/memory/statsMemory statistics (total entries, embedding coverage, etc.)

Auth: Requires management session or management-scoped API key.


Webhooks

Manage webhook subscriptions for events.

MethodPathDescription
GET/api/webhooksList all webhook subscriptions
POST/api/webhooksCreate a webhook subscription — body: {url, events[], secret?, active?}
GET/api/webhooks/[id]Get a specific webhook subscription
PUT/api/webhooks/[id]Update a webhook subscription
DELETE/api/webhooks/[id]Delete a webhook subscription
GET/api/webhooks/eventsList all available webhook event types
GET/api/webhooks/[id]/deliveriesList delivery history for a webhook (success/failure log)
POST/api/webhooks/[id]/testSend a test event to a webhook

Auth: Requires management session.

See Webhooks Framework for full event types.


Skills Framework

Manage Skills (the agentic extensions framework).

MethodPathDescription
GET/api/skillsList all installed skills (built-in + custom)
POST/api/skills/installInstall a skill from a local path or URL
DELETE/api/skills/[id]Uninstall a skill
PUT/api/skills/[id]Enable or disable a skill — body: {enabled?: boolean, mode?: "on" | "off" | "auto"}
POST/api/skills/executionsExecute a skill — body: {skillName, apiKeyId, input?, sessionId?}
GET/api/skills/executionsList execution history for all skills (filter by ?apiKeyId=)

Auth: Requires management session or management-scoped API key.

See Skills Framework for full details.


Plugins

Manage OmniRoute plugins (third-party extensions).

MethodPathDescription
GET/api/pluginsList installed plugins
POST/api/plugins/installInstall a plugin from a local path or URL
DELETE/api/plugins/[name]Uninstall a plugin
POST/api/plugins/[name]/activateActivate a plugin
POST/api/plugins/[name]/deactivateDeactivate a plugin
GET/api/plugins/[name]/configGet plugin configuration
PUT/api/plugins/[name]/configUpdate plugin configuration

Auth: Requires management session.

See Plugins Framework for full details.


Shadow Routing

Shadow / A-B comparison of providers is not a standalone REST surface — it is configured through combo routing (see Auto-Combo). Per-combo comparison metrics are served by GET /api/combos/metrics.


Guardrails

Inspect the runtime guardrails (PII detection, prompt injection detection, vision bridging). Guardrails run on every request; per-call opt-out is via the x-omniroute-disabled-guardrails request header — there is no persisted enable/disable surface.

MethodPathDescription
GET/api/guardrailsList the registered guardrails and their status (name / enabled / priority)
POST/api/guardrails/testDry-run the pre-call pipeline over a sample input — body: {input, disabledGuardrails?}

Auth: Requires management session.

See Security > Guardrails for full details.



Authentication

  • Dashboard routes (/dashboard/*) use auth_token cookie
  • Login uses saved password hash; fallback to INITIAL_PASSWORD
  • requireLogin toggleable via /api/settings/require-login
  • /v1/* routes optionally require Bearer API key when REQUIRE_API_KEY=true

Breaking change (v3.8.0)/api/v1/agents/tasks/* and the cooldown management endpoints now require management auth (dashboard auth_token cookie or a management-scoped API key). Clients that previously called these routes unauthenticated will receive 401 Unauthorized. See commit 588a0333 (fix(auth): require management auth for agent and cooldown APIs).