Codex Proxy API Reference

May 9, 2026 · View on GitHub

Authentication

All proxy endpoints (chat/messages/responses) optionally accept Authorization: Bearer {proxy_api_key}. Dashboard UI uses cookie-based session (_codex_session).


API Proxy Endpoints

POST /v1/chat/completions

OpenAI-compatible chat completion.

// Request
{
  "model": "o4-mini",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "reasoning_effort": "medium"  // optional: low | medium | high | xhigh
}
  • Streaming: SSE with choice.delta events
  • Non-streaming: { id, choices, usage }
  • Errors: { error: { message, type, code } }
  • max_tokens, max_completion_tokens, and max_output_tokens are accepted for client compatibility but are not forwarded to Codex.

POST /v1/messages

Anthropic Messages API compatible.

// Request
{
  "model": "claude-sonnet-4-20250514",
  "messages": [{"role": "user", "content": "Hello"}],
  "max_tokens": 1024,
  "stream": true,
  "thinking": {"type": "enabled"}  // optional
}
  • Auth: x-api-key or Authorization: Bearer
  • Errors: { type: "error", error: { type, message } }

POST /v1beta/models/:model:generateContent

POST /v1beta/models/:model:streamGenerateContent

Google Gemini compatible.

// Request
{
  "contents": [{"role": "user", "parts": [{"text": "Hello"}]}],
  "generationConfig": {"temperature": 0.7, "maxOutputTokens": 1024},
  "systemInstruction": {"parts": [{"text": "You are helpful."}]}
}
  • Auth: x-goog-api-key header, key query param, or Bearer token
  • Errors: { error: { code, message, status } }

POST /v1/responses

Native Codex Responses API passthrough (WebSocket transport).

// Request
{
  "model": "o4-mini",
  "instructions": "You are helpful.",
  "input": [{"type": "message", "content": "Hello"}],
  "stream": true,
  "reasoning": {"effort": "medium"},
  "tools": [],
  "previous_response_id": "resp_xxx"  // multi-turn
}
  • Streaming: SSE with response.created, response.output_text.delta, response.completed
  • Non-streaming: { response, usage, responseId }
  • Do not send max_output_tokens to native Codex. The proxy accepts it only for compatibility and strips it, because the real Codex backend rejects it with 400 Unsupported parameter: max_output_tokens.

image_generation tool

Declare {"type": "image_generation", ...} in tools[] to let the model invoke the server-side image generation backend (gpt-image-2). Requires a ChatGPT Plus or higher account — free plans have the tool silently stripped upstream and the model falls back to returning SVG text.

Supported fields (all optional except type):

FieldEnum / rangeDefaultNotes
size1024x1024, 1024x1536, 1536x1024, 2048x2048, 2048x3072, 3072x2048, 3840x2160 (4K UHD), 2160x3840 (4K portrait), 2304x3072 (3:4), autoautoWidth and height must both be divisible by 16. Longest edge ≤ 3840 px. Total pixel budget ≈ 8 MP (3072x3072 rejected). Resolutions below 1024 px also rejected (min pixel budget)
output_formatpng / jpeg / webppnggif is rejected
output_compressioninteger 0–100100jpeg / webp only — PNG rejects any non-100
backgroundauto / opaqueautotransparent is rejected for this model
moderationauto / lowautoother enums rejected
partial_imagesinteger 0–30>3 rejected

Silently rewritten / hard-rejected fields:

  • model — whatever you send, upstream forces gpt-image-2.
  • quality — any value is echoed back as auto; the user-supplied value has no effect.
  • n — rejected (unknown_parameter); one image per call.
  • input_image, mask, input_fidelity, style, response_format — rejected.

Event stream order (when the model invokes the tool):

  1. response.created — echoes tools[] with upstream-normalized fields.
  2. response.output_item.added{type: "image_generation_call", ...}.
  3. response.image_generation_call.in_progress.generating → (optional) .partial_image × N.
  4. response.output_item.done — the completed image_generation_call with:
    • result — base64-encoded image bytes (PNG / JPEG / WebP by output_format).
    • revised_prompt — the final prompt the model actually used.
  5. response.completed.

Token accounting: response.completed.response.usage reports the host model's tokens; the image_generation tool's own tokens come back separately as response.completed.response.tool_usage.image_gen.{input_tokens, output_tokens, total_tokens}. The proxy passes both through verbatim, and tracks them as separate counters on the dashboard (total_image_input_tokens / total_image_output_tokens) so image-gen usage doesn't pollute host-model token charts.

Request accounting: the proxy also counts each image_generation request as success or failure. total_image_request_count increments when the upstream returned a real image (non-zero tool_usage.image_gen.output_tokens); total_image_request_failed_count increments when the tool was silently stripped (Free plan), the upstream returned an error, or the response came back empty. Both surfaces in /admin/usage-stats/summary and the Dashboard's "Image Requests" card.

Edit mode (supply a reference image): put an input_image block in the user message content. data: URLs and HTTPS URLs both work.

{
  "model": "gpt-5.5",
  "stream": true,
  "input": [{
    "role": "user",
    "content": [
      {"type": "input_text", "text": "Make this sky a sunset."},
      {"type": "input_image", "image_url": "data:image/png;base64,AAA...", "detail": "high"}
    ]
  }],
  "tools": [{"type": "image_generation", "size": "1024x1024"}]
}

Legal content-part types (from upstream enum validation): input_text, input_image, output_text, refusal, input_file, computer_screenshot, summary_text.

OpenAI Chat compatibility accepts tools: [{"type":"image_generation"}], but the stable image payload is exposed by /v1/responses as image_generation_call.result. Use /v1/responses for clients that need the base64 image bytes.

Ollama-Compatible Bridge

The optional bridge runs on a separate listener, defaulting to http://127.0.0.1:11434. It is disabled by default and can be controlled through Dashboard settings or the admin API. Ollama endpoints are intentionally unauthenticated; keep the listener bound to localhost unless you explicitly trust the network. Browser CORS access is restricted to loopback origins (localhost, 127.x.x.x, and ::1) so non-local web pages cannot read bridge responses by default. The bridge injects the configured Codex Proxy API key for /v1/* passthrough requests, so exposing it beyond localhost also exposes the main proxy API without requiring clients to know that key.

MethodPathDescription
GET/api/versionVersion probe → { version }
GET/api/tagsModel list in Ollama format
POST/api/showModel metadata and capabilities
POST/api/chatChat completions, streaming as NDJSON by default
Any/v1/*OpenAI-compatible passthrough to the main proxy
// POST http://127.0.0.1:11434/api/chat
{
  "model": "codex",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "think": "medium"  // optional: false | true | low | medium | high | xhigh
}

Supported request mappings:

Ollama fieldUpstream OpenAI field
messages[].imagescontent[].image_url data URLs
toolstools
thinkreasoning_effort
format: "json"response_format: { type: "json_object" }
format: { ... }strict JSON schema response format
options.temperaturetemperature
options.top_ptop_p
options.num_predictmax_tokens

Models

MethodPathDescription
GET/v1/modelsList models (OpenAI format)
GET/v1/models/catalogFull catalog with reasoning efforts
GET/v1/models/:idSingle model detail
GET/v1/models/:id/infoExtended model info
GET/v1beta/modelsList models (Gemini format)
POST/admin/refresh-modelsForce refresh from upstream

Model catalog entries can include token metadata:

FieldMeaning
contextWindowStatic or backend-provided context window for display and client hints
maxContextWindowBackend-provided maximum expandable context window, when reported
maxOutputTokensStatic or backend-provided maximum output tokens for display and client hints
truncationPolicyLimitBackend-provided truncation policy limit, when reported

Static catalog values are defined in config/models.yaml; dynamic entries from /backend-api/codex/models win when the same model ID is returned by upstream. On 2026-05-08, real Codex backend metadata returned context_window=272000, max_context_window=272000, truncation_policy.limit=10000 for gpt-5.5, and context_window=272000, max_context_window=1000000, truncation_policy.limit=10000 for gpt-5.4. Treat these as runtime Codex limits, not as proof that request-level context or max-token switches are supported.


Account Management

CRUD

MethodPathDescription
GET/auth/accountsList all accounts
POST/auth/accountsAdd single account ({ token?, refreshToken? })
DELETE/auth/accounts/:idDelete account
PATCH/auth/accounts/:id/labelSet label ({ label })

Batch Operations

MethodPathDescription
POST/auth/accounts/importBulk import ({ accounts: [{token?, refreshToken?, label?}] })
POST/auth/accounts/batch-deleteBulk delete ({ ids: [] })
POST/auth/accounts/batch-statusBulk enable/disable ({ ids: [], status: "active"|"disabled" })

Health & Quota

MethodPathDescription
POST/auth/accounts/health-checkCheck accounts ({ ids?, stagger_ms?, concurrency? })
POST/auth/accounts/:id/refreshRefresh single account
GET/auth/accounts/:id/quotaGet quota/usage
POST/auth/accounts/:id/reset-usageReset usage counters

Export

MethodPathDescription
GET/auth/accounts/exportExport accounts (?ids=a,b&format=minimal)

Cookies (Cloudflare)

MethodPathDescription
GET/auth/accounts/:id/cookiesGet stored cookies
POST/auth/accounts/:id/cookiesSet cookies ({ cookies })
DELETE/auth/accounts/:id/cookiesClear cookies

OAuth & Login

MethodPathDescription
POST/auth/login-startStart OAuth → { authUrl, state }
GET/auth/login302 redirect to Auth0
POST/auth/code-relayOAuth code exchange ({ callbackUrl })
GET/auth/callbackOAuth callback handler
POST/auth/device-loginStart device code flow
GET/auth/device-poll/:deviceCodePoll device authorization
POST/auth/import-cliImport from Codex CLI auth.json
POST/auth/tokenManual token submit
GET/auth/statusAuth status + pool summary
POST/auth/logoutClear all accounts

Proxy Pool Management

CRUD

MethodPathDescription
GET/api/proxiesList proxies with health & assignments
POST/api/proxiesAdd proxy ({ url } or { host, port, username, password })
PUT/api/proxies/:idUpdate proxy
DELETE/api/proxies/:idDelete proxy

Health & Control

MethodPathDescription
POST/api/proxies/:id/checkHealth check single proxy
POST/api/proxies/check-allHealth check all proxies
POST/api/proxies/:id/enableEnable proxy
POST/api/proxies/:id/disableDisable proxy

Assignments (Account ↔ Proxy)

MethodPathDescription
GET/api/proxies/assignmentsList all assignments
POST/api/proxies/assignAssign proxy to account ({ accountId, proxyId })
DELETE/api/proxies/assign/:accountIdUnassign
POST/api/proxies/assign-bulkBulk assign ({ assignments: [] })
POST/api/proxies/assign-ruleAuto-assign by rule ({ rule: "round-robin", ... })

Import/Export

MethodPathDescription
GET/api/proxies/exportExport as YAML
POST/api/proxies/importImport YAML or plain text (host:port:user:pass)
GET/api/proxies/assignments/exportExport assignments
POST/api/proxies/assignments/importPreview assignment import
POST/api/proxies/assignments/applyApply assignment import

Settings

MethodPathDescription
PUT/api/proxies/settingsUpdate health check interval

Admin & Settings

General

MethodPathDescription
GET/admin/general-settingsGet all settings
POST/admin/general-settingsUpdate settings (returns restart_required)
GET/admin/settingsGet proxy API key
POST/admin/settingsSet proxy API key
GET/admin/rotation-settingsGet rotation strategy
POST/admin/rotation-settingsSet rotation strategy
GET/admin/quota-settingsGet quota settings
POST/admin/quota-settingsSet quota settings
GET/admin/ollama-settingsGet Ollama Bridge settings plus runtime status
POST/admin/ollama-settingsPersist Ollama Bridge settings and restart the bridge
GET/admin/ollama-statusGet Ollama Bridge runtime status

Diagnostics

MethodPathDescription
GET/healthHealth probe → { status, authenticated, pool }
POST/admin/test-connectionFull connectivity diagnostics
GET/debug/fingerprintTLS fingerprint config (localhost only)
GET/debug/diagnosticsSystem diagnostics (localhost only)
GET/debug/modelsModel store internals

Official Codex App Server Bridge

Optional bridge to a local official codex app-server instance. This is the path for using official Codex app plugins such as the Chrome/browser plugin. It is disabled by default with official_agent.enabled: false.

All endpoints require official_agent.api_key; the bridge refuses requests when the dedicated official-agent API key is not configured. Do not reuse server.proxy_api_key here, because this bridge can drive local app-server plugins and approval flows.

MethodPathPurpose
GET/official-agent/appsList official Codex apps/connectors from app/list
POST/official-agent/threadsStart an app-server thread ({ model?, cwd? })
POST/official-agent/threads/:threadId/turnsStart a turn and stream app-server notifications as SSE

approvalPolicy, when provided on a turn, must be one of untrusted, on-request, on-failure, or never.

Example turn using an official Chrome app mention:

{
  "text": "Open localhost:8080 and inspect the dashboard",
  "app": { "id": "chrome", "name": "Chrome" }
}

The bridge sends a text item plus a mention item with path: "app://{id}". Use /official-agent/apps to discover the real app id before hard-coding one.

Updates

MethodPathDescription
GET/admin/update-statusCheck update availability
POST/admin/check-updateTrigger update check
POST/admin/apply-updateApply self-update (SSE progress stream)

Usage Statistics

MethodPathDescription
GET/admin/usage-stats/summaryCumulative usage by account/model
GET/admin/usage-stats/historyTime-series data (?granularity=hourly&hours=24)

Quota Warnings

MethodPathDescription
GET/auth/quota/warningsActive quota warnings

When quota.skip_exhausted is enabled, account acquisition filters out active accounts whose cached quota has rate_limit.limit_reached === true, secondary_rate_limit.limit_reached === true, or code_review_rate_limit.limit_reached === true. This happens before session affinity, so preferredEntryId cannot keep a request on an exhausted account. Near-full quota such as used_percent=99 is not skipped until upstream marks limit_reached or the account receives a 429 and enters rate_limited backoff. Secondary and code-review cache windows are cleared after their own reset_at passes.


Dashboard Auth

MethodPathDescription
POST/auth/dashboard-loginLogin with password → sets session cookie (rate limited: 5/min)
POST/auth/dashboard-logoutClear session
GET/auth/dashboard-statusCheck if login required

Error Formats

Each protocol returns errors in its native format:

ProtocolFormat
OpenAI{ error: { message, type, code, param } }
Anthropic{ type: "error", error: { type, message } }
Gemini{ error: { code, message, status } }
Responses{ type: "error", error: { type, code, message } }
Admin{ error: "..." }

Common HTTP status codes: 401 (not authenticated), 429 (rate limited), 503 (no available accounts).