OpenAI-Compatible API Preview

May 20, 2026 ยท View on GitHub

Lemon exposes a preview OpenAI-compatible HTTP surface for clients that already know the /v1 API shape. The adapter is intentionally thin: it translates OpenAI-shaped requests into supervised Lemon runs instead of bypassing the router, run graph, events, approvals, checkpoints, or operator visibility.

Endpoints

EndpointStatusBehavior
GET /v1/healthPreviewReturns Lemon API health metadata.
GET /v1/capabilitiesPreviewReturns supported compatibility features and known gaps.
GET /v1/modelsPreviewReturns Lemon's model catalog in OpenAI list shape, including Lemon capability metadata such as supportsVision.
GET /v1/models/:model_idPreviewReturns one OpenAI-shaped model object from Lemon's catalog, including Lemon capability metadata such as supportsVision, or a 404 error.
POST /v1/chat/completionsPreviewAccepts OpenAI-style model, messages, redacted URL/file-id image metadata, data URL image pass-through, and opt-in allowlisted HTTPS image URL fetch, submits a Lemon run, returns queued metadata by default, assistant text when wait: true completes before the timeout, or text/event-stream chunks when stream: true.
POST /v1/responsesPreviewAccepts OpenAI-style model, input, redacted URL/file-id image metadata, data URL image pass-through, opt-in allowlisted HTTPS image URL fetch, and optional previous_response_id, submits a Lemon run, returns queued metadata by default, Responses-style output text when wait: true completes before the timeout, or Responses-style SSE events when stream: true.
GET /v1/responses/:response_idPreviewReads resp_<run_id> from the Lemon run store and returns a Responses-shaped object with completed output when available.
GET /v1/runs/:run_idPreviewReturns redacted Lemon run status, timestamps, event count, session key, and completion state.
POST /v1/runs/:run_id/cancelPreviewRequests cancellation through LemonRouter.abort_run/2 for non-terminal runs and returns a cancelling status.

Queued mode is the default. Clients can either use the returned lemon.runId with agent.wait, subscribe to /ws events, or set wait: true plus optional timeout_ms / timeoutMs to wait synchronously through the same agent.wait path.

stream: true returns text/event-stream and subscribes the HTTP process to the run topic. Chat Completions streams chat.completion.chunk objects for :delta events, emits redacted lemon.tool_progress events for :engine_action updates, and ends with data: [DONE]. Responses streams response.output_text.delta, redacted response.tool_progress, and response.completed events, also ending with data: [DONE].

Request Shape

POST /v1/chat/completions:

{
  "model": "zai:glm-5-turbo",
  "messages": [
    {"role": "system", "content": "be brief"},
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "hello"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.test/image.png",
            "detail": "high"
          }
        }
      ]
    }
  ],
  "metadata": {
    "session_key": "agent:default:openai"
  },
  "wait": true,
  "timeout_ms": 60000
}

POST /v1/responses:

{
  "model": "openai:gpt-4o",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [
        {"type": "input_text", "text": "hello"},
        {"type": "input_image", "image_url": "data:image/png;base64,..."}
      ]
    }
  ],
  "agent_id": "default",
  "previous_response_id": "resp_run_...",
  "wait": true
}

Response Boundary

Both generation endpoints return Lemon metadata:

{
  "lemon": {
    "runId": "run_...",
    "sessionKey": "agent:default:openai",
    "imageInputCount": 1,
    "status": "queued",
    "events": {
      "jsonRpc": "agent.wait",
      "webSocket": "/ws"
    }
  }
}

With wait: true, successful runs return lemon.status: "completed" and map the completed Lemon answer into assistant text. Wait timeouts return HTTP 504. Responses use resp_<run_id> ids; GET /v1/responses/:response_id reads the stored Lemon run and maps completed answers back into Responses output text. previous_response_id validates the prior response and defaults the follow-up run to the prior session key, so continuation stays on Lemon's supervised session/run-history path. GET /v1/runs/:run_id and POST /v1/runs/:run_id/cancel expose a minimal run status/cancellation boundary without returning raw run events or assistant answers from status calls.

Image input is now a split boundary. Chat Completions accepts image_url parts, and Responses accepts input_image parts with HTTP(S) URLs, data URLs, or file_id. By default, HTTP(S) URLs and file ids remain metadata-only: they are hashed, redacted, added as bounded [image input ...] placeholders, and never returned raw. Base64 data URLs are validated at the API boundary, size/count-limited, redacted from prompts and metadata, and threaded as runtime-only image blocks through RunRequest, the router, gateway, LemonRunner, and CodingAgent.Session.prompt/3 so native Lemon providers can receive Ai.Types.ImageContent. HTTPS image URL fetch is available only when explicitly enabled and host-allowlisted. In that mode, fetched image bytes are MIME-checked, size-limited, base64 encoded, redacted from prompt/metadata, and sent through the same runtime-only image path as data URLs. Only redacted image metadata and lemon.imageInputCount are exposed in HTTP responses or run-status payloads.

For known provider-prefixed models, runtime image bytes are rejected before run submission when Lemon's model catalog says the model is text-only. Metadata-only URL/file references remain allowed because they are redacted prompt context, not provider-visible image bytes. /v1/models and /v1/models/:model_id expose the same lemon.supportsVision capability so clients can choose a compatible model before sending data URL or allowlisted fetched-image requests.

URL image fetching is disabled unless :openai_compat_image_url_fetch or LEMON_OPENAI_COMPAT_IMAGE_URL_FETCH=true is set. Fetching requires an allowed host list through :openai_compat_image_url_allowed_hosts, LEMON_OPENAI_COMPAT_IMAGE_URL_ALLOWED_HOSTS, or LEMON_OPENAI_COMPAT_IMAGE_HOST_ALLOWLIST. The preview policy accepts HTTPS only and rejects non-allowlisted hosts before starting a Lemon run.

By default the preview /v1 surface is unauthenticated for local development. Set :openai_compat_api_token in :lemon_control_plane application config or LEMON_OPENAI_COMPAT_API_TOKEN / LEMON_OPENAI_COMPAT_TOKEN in the environment to require either Authorization: Bearer <token> or x-api-key: <token>.

This keeps the compatibility slice honest while the stable adapter adds broader Responses API fields, deployed editor UI coverage, and more provider-specific image transport proof beyond the passing OpenRouter vision proof.

Proof Lane

mix test apps/lemon_control_plane/test/lemon_control_plane/http/router_test.exs --seed 1

The latest focused lane passed locally on 2026-05-17 with 25 tests, 0 failures. It covers health, capabilities, model list shape, single-model retrieve shape, chat-completions run submission, responses run submission, synchronous wait completion, Responses output text mapping, wait timeout handling, session-key metadata, previous_response_id continuation metadata, stored response retrieval, unknown stored response errors, streaming flag metadata, redacted run status, unknown-run errors, run cancellation dispatch, Chat Completions SSE over run bus events, Responses SSE over run bus events, redacted tool-progress SSE events without raw tool detail, optional bearer auth, optional x-api-key auth, redacted image-input metadata normalization for Chat Completions and Responses, data URL image pass-through into runtime-only Lemon image blocks, opt-in allowlisted HTTPS image URL fetch into runtime-only image blocks, disallowed remote image host rejection before submission, supportsVision model metadata, known non-vision model rejection before runtime-image submission, prompt normalization, and validation errors without starting a real model run.

The adapter also has a deterministic live HTTP smoke that starts a local Bandit router, calls /v1 through :httpc, exercises auth, synchronous wait, redacted image-input metadata, data URL image pass-through, allowlisted remote image URL fetch, an external Node fetch client, an official OpenAI Node SDK client, an official OpenAI Python SDK client, single-model retrieval, stored Responses, previous-response continuation, streaming tool progress, redacted run status, and cancellation, then writes redacted proof JSON:

MIX_ENV=test mix run scripts/live_openai_compat_smoke.exs

The latest smoke passed locally on 2026-05-17 with completed_count: 14 and failed_count: 0; the nested external fetch client completed 7 checks, the official OpenAI Node SDK client completed 6 checks, and the official OpenAI Python SDK client completed 6 checks. All three external clients verify single-model retrieval plus lemon.supportsVision consistency, and both SDK clients cover Chat Completions and Responses streaming through SDK stream interfaces. The top-level proof also includes non_vision_image_rejection, which posts runtime image bytes to openai:o3-mini, expects the sanitized 400 response, and verifies the stub submitter was not called. The proof is written to .lemon/proofs/openai-compat-smoke-latest.json plus a timestamped archive. proofs.status, support bundles, and mix lemon.doctor --verbose consume those redacted result rows as openai_compat_* checks. The doctor check is openai_compat.api_preview; rerun the smoke if it warns or skips.

There is also an opt-in live provider vision proof over the same /v1 Responses path:

LEMON_TEST_ALLOW_LIVE_CREDENTIALS=1 \
LEMON_OPENAI_COMPAT_LIVE_VISION_MODEL=openrouter:openai/gpt-4o-mini \
MIX_ENV=test mix run scripts/live_openai_compat_vision_smoke.exs

The vision smoke starts a local Bandit router without stubs, submits a data URL image through /v1/responses with wait: true, then runs scripts/live_openai_compat_fetch_client.mjs in vision-only mode against that same local HTTP boundary. It writes redacted proof JSON to .lemon/proofs/openai-compat-vision-smoke-latest.json and skips without a model, live-credential opt-in, or resolvable provider credential. Credential preflight uses LemonAiRuntime.provider_has_credentials?/3, the same Lemon-owned resolver used by runtime calls, so env keys, encrypted secrets, OAuth/default-secret paths, and provider-specific credentials are checked consistently. If no model is configured, the script tries credential-ready defaults in this order: OpenRouter openai/gpt-4o-mini, OpenAI gpt-4o-mini, then Z.ai glm-4.6v. Its proof omits raw API keys, prompts, answers, and image bytes. On 2026-05-16 the live proof passed through OpenRouter openai/gpt-4o-mini with completed_count: 1, failed_count: 0, external Node fetch client vision sub-proof completed_count: 1 / answer_matched_red: true, and official OpenAI Node SDK vision sub-proof completed_count: 1 / answer_matched_red: true. Direct OpenAI returned account quota errors in this environment, and Z.ai's coding endpoint accepted the text-model credential but rejected image input, so those providers are not used as the current proof source.

The external client used by that smoke can also be pointed at a running Lemon HTTP server directly. Set LEMON_OPENAI_COMPAT_CHECKS=vision plus LEMON_OPENAI_COMPAT_IMAGE_BASE64 to run only the provider-backed vision sub-proof in the fetch or SDK client; omit those variables for the default deterministic external-client checks.

LEMON_OPENAI_COMPAT_BASE_URL=http://127.0.0.1:4000 \
LEMON_OPENAI_COMPAT_API_TOKEN=... \
node scripts/live_openai_compat_fetch_client.mjs

The OpenAI SDK client expects the openai package in the current Node working directory; the live smoke installs it in a temporary directory automatically. To point the SDK client at a running Lemon server directly:

LEMON_OPENAI_COMPAT_BASE_URL=http://127.0.0.1:4000 \
LEMON_OPENAI_COMPAT_API_TOKEN=... \
node scripts/live_openai_compat_openai_sdk_client.mjs

The Python SDK client expects the openai package in the Python environment; the live smoke runs it with uv run --with openai automatically. To point the Python SDK client at a running Lemon server directly:

LEMON_OPENAI_COMPAT_BASE_URL=http://127.0.0.1:4000 \
LEMON_OPENAI_COMPAT_API_TOKEN=... \
uv run --with openai python scripts/live_openai_compat_python_sdk_client.py

Together the clients exercise health, capabilities, synchronous Chat Completions, Responses continuation, stored Responses, raw Chat Completions and Responses SSE with tool-progress events, and official SDK Chat Completions and Responses streaming, returning only hashes and counters in proof JSON.