Codex Proxy API Reference

May 9, 2026 · View on GitHub

Authentication

All proxy endpoints (chat/messages/responses) optionally accept Authorization: Bearer {proxy_api_key}. Dashboard UI uses cookie-based session (_codex_session).

API Proxy Endpoints

POST /v1/chat/completions

OpenAI-compatible chat completion.

// Request
{
  "model": "o4-mini",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "reasoning_effort": "medium"  // optional: low | medium | high | xhigh
}

Streaming: SSE with choice.delta events
Non-streaming: { id, choices, usage }
Errors: { error: { message, type, code } }
max_tokens, max_completion_tokens, and max_output_tokens are accepted for client compatibility but are not forwarded to Codex.

POST /v1/messages

Anthropic Messages API compatible.

// Request
{
  "model": "claude-sonnet-4-20250514",
  "messages": [{"role": "user", "content": "Hello"}],
  "max_tokens": 1024,
  "stream": true,
  "thinking": {"type": "enabled"}  // optional
}

Auth: x-api-key or Authorization: Bearer
Errors: { type: "error", error: { type, message } }

POST /v1beta/models/:model:generateContent

POST /v1beta/models/:model:streamGenerateContent

Google Gemini compatible.

// Request
{
  "contents": [{"role": "user", "parts": [{"text": "Hello"}]}],
  "generationConfig": {"temperature": 0.7, "maxOutputTokens": 1024},
  "systemInstruction": {"parts": [{"text": "You are helpful."}]}
}

Auth: x-goog-api-key header, key query param, or Bearer token
Errors: { error: { code, message, status } }

POST /v1/responses

Native Codex Responses API passthrough (WebSocket transport).

// Request
{
  "model": "o4-mini",
  "instructions": "You are helpful.",
  "input": [{"type": "message", "content": "Hello"}],
  "stream": true,
  "reasoning": {"effort": "medium"},
  "tools": [],
  "previous_response_id": "resp_xxx"  // multi-turn
}

Streaming: SSE with response.created, response.output_text.delta, response.completed
Non-streaming: { response, usage, responseId }
Do not send max_output_tokens to native Codex. The proxy accepts it only for compatibility and strips it, because the real Codex backend rejects it with 400 Unsupported parameter: max_output_tokens.

Declare {"type": "image_generation", ...} in tools[] to let the model invoke the server-side image generation backend (gpt-image-2). Requires a ChatGPT Plus or higher account — free plans have the tool silently stripped upstream and the model falls back to returning SVG text.

Supported fields (all optional except type):

Field	Enum / range	Default	Notes
`size`	`1024x1024`, `1024x1536`, `1536x1024`, `2048x2048`, `2048x3072`, `3072x2048`, `3840x2160` (4K UHD), `2160x3840` (4K portrait), `2304x3072` (3:4), `auto`	`auto`	Width and height must both be divisible by 16. Longest edge ≤ 3840 px. Total pixel budget ≈ 8 MP (`3072x3072` rejected). Resolutions below 1024 px also rejected (min pixel budget)
`output_format`	`png` / `jpeg` / `webp`	`png`	`gif` is rejected
`output_compression`	integer 0–100	`100`	jpeg / webp only — PNG rejects any non-100
`background`	`auto` / `opaque`	`auto`	`transparent` is rejected for this model
`moderation`	`auto` / `low`	`auto`	other enums rejected
`partial_images`	integer 0–3	0	`>3` rejected

Silently rewritten / hard-rejected fields:

model — whatever you send, upstream forces gpt-image-2.
quality — any value is echoed back as auto; the user-supplied value has no effect.
n — rejected (unknown_parameter); one image per call.
input_image, mask, input_fidelity, style, response_format — rejected.

Event stream order (when the model invokes the tool):

response.created — echoes tools[] with upstream-normalized fields.
response.output_item.added — {type: "image_generation_call", ...}.
response.image_generation_call.in_progress → .generating → (optional) .partial_image × N.
response.output_item.done — the completed image_generation_call with:
- result — base64-encoded image bytes (PNG / JPEG / WebP by output_format).
- revised_prompt — the final prompt the model actually used.
response.completed.

Token accounting: response.completed.response.usage reports the host model's tokens; the image_generation tool's own tokens come back separately as response.completed.response.tool_usage.image_gen.{input_tokens, output_tokens, total_tokens}. The proxy passes both through verbatim, and tracks them as separate counters on the dashboard (total_image_input_tokens / total_image_output_tokens) so image-gen usage doesn't pollute host-model token charts.

Request accounting: the proxy also counts each image_generation request as success or failure. total_image_request_count increments when the upstream returned a real image (non-zero tool_usage.image_gen.output_tokens); total_image_request_failed_count increments when the tool was silently stripped (Free plan), the upstream returned an error, or the response came back empty. Both surfaces in /admin/usage-stats/summary and the Dashboard's "Image Requests" card.

Edit mode (supply a reference image): put an input_image block in the user message content. data: URLs and HTTPS URLs both work.

{
  "model": "gpt-5.5",
  "stream": true,
  "input": [{
    "role": "user",
    "content": [
      {"type": "input_text", "text": "Make this sky a sunset."},
      {"type": "input_image", "image_url": "data:image/png;base64,AAA...", "detail": "high"}
    ]
  }],
  "tools": [{"type": "image_generation", "size": "1024x1024"}]
}

Legal content-part types (from upstream enum validation): input_text, input_image, output_text, refusal, input_file, computer_screenshot, summary_text.

OpenAI Chat compatibility accepts tools: [{"type":"image_generation"}], but the stable image payload is exposed by /v1/responses as image_generation_call.result. Use /v1/responses for clients that need the base64 image bytes.

Ollama-Compatible Bridge

The optional bridge runs on a separate listener, defaulting to http://127.0.0.1:11434. It is disabled by default and can be controlled through Dashboard settings or the admin API. Ollama endpoints are intentionally unauthenticated; keep the listener bound to localhost unless you explicitly trust the network. Browser CORS access is restricted to loopback origins (localhost, 127.x.x.x, and ::1) so non-local web pages cannot read bridge responses by default. The bridge injects the configured Codex Proxy API key for /v1/* passthrough requests, so exposing it beyond localhost also exposes the main proxy API without requiring clients to know that key.

Method	Path	Description
GET	`/api/version`	Version probe → `{ version }`
GET	`/api/tags`	Model list in Ollama format
POST	`/api/show`	Model metadata and capabilities
POST	`/api/chat`	Chat completions, streaming as NDJSON by default
Any	`/v1/*`	OpenAI-compatible passthrough to the main proxy

// POST http://127.0.0.1:11434/api/chat
{
  "model": "codex",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "think": "medium"  // optional: false | true | low | medium | high | xhigh
}

Supported request mappings:

Ollama field	Upstream OpenAI field
`messages[].images`	`content[].image_url` data URLs
`tools`	`tools`
`think`	`reasoning_effort`
`format: "json"`	`response_format: { type: "json_object" }`
`format: { ... }`	strict JSON schema response format
`options.temperature`	`temperature`
`options.top_p`	`top_p`
`options.num_predict`	`max_tokens`

Models

Method	Path	Description
GET	`/v1/models`	List models (OpenAI format)
GET	`/v1/models/catalog`	Full catalog with reasoning efforts
GET	`/v1/models/:id`	Single model detail
GET	`/v1/models/:id/info`	Extended model info
GET	`/v1beta/models`	List models (Gemini format)
POST	`/admin/refresh-models`	Force refresh from upstream

Model catalog entries can include token metadata:

Field	Meaning
`contextWindow`	Static or backend-provided context window for display and client hints
`maxContextWindow`	Backend-provided maximum expandable context window, when reported
`maxOutputTokens`	Static or backend-provided maximum output tokens for display and client hints
`truncationPolicyLimit`	Backend-provided truncation policy limit, when reported

Static catalog values are defined in config/models.yaml; dynamic entries from /backend-api/codex/models win when the same model ID is returned by upstream. On 2026-05-08, real Codex backend metadata returned context_window=272000, max_context_window=272000, truncation_policy.limit=10000 for gpt-5.5, and context_window=272000, max_context_window=1000000, truncation_policy.limit=10000 for gpt-5.4. Treat these as runtime Codex limits, not as proof that request-level context or max-token switches are supported.

Account Management

CRUD

Method	Path	Description
GET	`/auth/accounts`	List all accounts
POST	`/auth/accounts`	Add single account (`{ token?, refreshToken? }`)
DELETE	`/auth/accounts/:id`	Delete account
PATCH	`/auth/accounts/:id/label`	Set label (`{ label }`)

Batch Operations

Method	Path	Description
POST	`/auth/accounts/import`	Bulk import (`{ accounts: [{token?, refreshToken?, label?}] }`)
POST	`/auth/accounts/batch-delete`	Bulk delete (`{ ids: [] }`)
POST	`/auth/accounts/batch-status`	Bulk enable/disable (`{ ids: [], status: "active"\|"disabled" }`)

Health & Quota

Method	Path	Description
POST	`/auth/accounts/health-check`	Check accounts (`{ ids?, stagger_ms?, concurrency? }`)
POST	`/auth/accounts/:id/refresh`	Refresh single account
GET	`/auth/accounts/:id/quota`	Get quota/usage
POST	`/auth/accounts/:id/reset-usage`	Reset usage counters

Export

Method	Path	Description
GET	`/auth/accounts/export`	Export accounts (`?ids=a,b&format=minimal`)

Cookies (Cloudflare)

Method	Path	Description
GET	`/auth/accounts/:id/cookies`	Get stored cookies
POST	`/auth/accounts/:id/cookies`	Set cookies (`{ cookies }`)
DELETE	`/auth/accounts/:id/cookies`	Clear cookies

Method	Path	Description
POST	`/auth/login-start`	Start OAuth → `{ authUrl, state }`
GET	`/auth/login`	302 redirect to Auth0
POST	`/auth/code-relay`	OAuth code exchange (`{ callbackUrl }`)
GET	`/auth/callback`	OAuth callback handler
POST	`/auth/device-login`	Start device code flow
GET	`/auth/device-poll/:deviceCode`	Poll device authorization
POST	`/auth/import-cli`	Import from Codex CLI auth.json
POST	`/auth/token`	Manual token submit
GET	`/auth/status`	Auth status + pool summary
POST	`/auth/logout`	Clear all accounts

Proxy Pool Management

CRUD

Method	Path	Description
GET	`/api/proxies`	List proxies with health & assignments
POST	`/api/proxies`	Add proxy (`{ url }` or `{ host, port, username, password }`)
PUT	`/api/proxies/:id`	Update proxy
DELETE	`/api/proxies/:id`	Delete proxy

Health & Control

Method	Path	Description
POST	`/api/proxies/:id/check`	Health check single proxy
POST	`/api/proxies/check-all`	Health check all proxies
POST	`/api/proxies/:id/enable`	Enable proxy
POST	`/api/proxies/:id/disable`	Disable proxy

Assignments (Account ↔ Proxy)

Method	Path	Description
GET	`/api/proxies/assignments`	List all assignments
POST	`/api/proxies/assign`	Assign proxy to account (`{ accountId, proxyId }`)
DELETE	`/api/proxies/assign/:accountId`	Unassign
POST	`/api/proxies/assign-bulk`	Bulk assign (`{ assignments: [] }`)
POST	`/api/proxies/assign-rule`	Auto-assign by rule (`{ rule: "round-robin", ... }`)

Import/Export

Method	Path	Description
GET	`/api/proxies/export`	Export as YAML
POST	`/api/proxies/import`	Import YAML or plain text (`host:port:user:pass`)
GET	`/api/proxies/assignments/export`	Export assignments
POST	`/api/proxies/assignments/import`	Preview assignment import
POST	`/api/proxies/assignments/apply`	Apply assignment import

Settings

Method	Path	Description
PUT	`/api/proxies/settings`	Update health check interval

Admin & Settings

General

Method	Path	Description
GET	`/admin/general-settings`	Get all settings
POST	`/admin/general-settings`	Update settings (returns `restart_required`)
GET	`/admin/settings`	Get proxy API key
POST	`/admin/settings`	Set proxy API key
GET	`/admin/rotation-settings`	Get rotation strategy
POST	`/admin/rotation-settings`	Set rotation strategy
GET	`/admin/quota-settings`	Get quota settings
POST	`/admin/quota-settings`	Set quota settings
GET	`/admin/ollama-settings`	Get Ollama Bridge settings plus runtime status
POST	`/admin/ollama-settings`	Persist Ollama Bridge settings and restart the bridge
GET	`/admin/ollama-status`	Get Ollama Bridge runtime status

Diagnostics

Method	Path	Description
GET	`/health`	Health probe → `{ status, authenticated, pool }`
POST	`/admin/test-connection`	Full connectivity diagnostics
GET	`/debug/fingerprint`	TLS fingerprint config (localhost only)
GET	`/debug/diagnostics`	System diagnostics (localhost only)
GET	`/debug/models`	Model store internals

Official Codex App Server Bridge

Optional bridge to a local official codex app-server instance. This is the path for using official Codex app plugins such as the Chrome/browser plugin. It is disabled by default with official_agent.enabled: false.

All endpoints require official_agent.api_key; the bridge refuses requests when the dedicated official-agent API key is not configured. Do not reuse server.proxy_api_key here, because this bridge can drive local app-server plugins and approval flows.

Method	Path	Purpose
GET	`/official-agent/apps`	List official Codex apps/connectors from `app/list`
POST	`/official-agent/threads`	Start an app-server thread (`{ model?, cwd? }`)
POST	`/official-agent/threads/:threadId/turns`	Start a turn and stream app-server notifications as SSE

approvalPolicy, when provided on a turn, must be one of untrusted, on-request, on-failure, or never.

Example turn using an official Chrome app mention:

{
  "text": "Open localhost:8080 and inspect the dashboard",
  "app": { "id": "chrome", "name": "Chrome" }
}

The bridge sends a text item plus a mention item with path: "app://{id}". Use /official-agent/apps to discover the real app id before hard-coding one.

Updates

Method	Path	Description
GET	`/admin/update-status`	Check update availability
POST	`/admin/check-update`	Trigger update check
POST	`/admin/apply-update`	Apply self-update (SSE progress stream)

Usage Statistics

Method	Path	Description
GET	`/admin/usage-stats/summary`	Cumulative usage by account/model
GET	`/admin/usage-stats/history`	Time-series data (`?granularity=hourly&hours=24`)

Quota Warnings

Method	Path	Description
GET	`/auth/quota/warnings`	Active quota warnings

When quota.skip_exhausted is enabled, account acquisition filters out active accounts whose cached quota has rate_limit.limit_reached === true, secondary_rate_limit.limit_reached === true, or code_review_rate_limit.limit_reached === true. This happens before session affinity, so preferredEntryId cannot keep a request on an exhausted account. Near-full quota such as used_percent=99 is not skipped until upstream marks limit_reached or the account receives a 429 and enters rate_limited backoff. Secondary and code-review cache windows are cleared after their own reset_at passes.

Dashboard Auth

Method	Path	Description
POST	`/auth/dashboard-login`	Login with password → sets session cookie (rate limited: 5/min)
POST	`/auth/dashboard-logout`	Clear session
GET	`/auth/dashboard-status`	Check if login required

Error Formats

Each protocol returns errors in its native format:

Protocol	Format
OpenAI	`{ error: { message, type, code, param } }`
Anthropic	`{ type: "error", error: { type, message } }`
Gemini	`{ error: { code, message, status } }`
Responses	`{ type: "error", error: { type, code, message } }`
Admin	`{ error: "..." }`

Common HTTP status codes: 401 (not authenticated), 429 (rate limited), 503 (no available accounts).