OpenAI Responses-Compatible Endpoint

June 11, 2026 · View on GitHub

LMDeploy provides a lightweight OpenAI Responses-compatible surface for easier integration with clients that use the newer Responses API.

Supported Endpoints

POST /v1/responses
GET /v1/models

Required Headers

For POST /v1/responses, include:

content-type: application/json
Authorization: Bearer <api_key> when the server is launched with --api-keys

Notes and Current Limits

POST /v1/responses currently supports a text-first subset of the Responses API.
input may be a string or a list of Responses input items.
instructions, developer, and system messages are merged into a single leading system message for chat-template compatibility.
Function tools are converted to LMDeploy's OpenAI-compatible tool format. Tool calling requires launching API server with a configured tool parser (--tool-call-parser ...).
parallel_tool_calls defaults to true. When it is false, LMDeploy follows vLLM-style compatibility and returns only the first parsed function call.
Non-function hosted tools, such as web_search, are accepted but ignored by LMDeploy.
Responses-only logprob serialization and stream obfuscation options are accepted but currently ignored.
background mode and previous_response_id are not supported.

Example: `/v1/responses`

curl http://{server_ip}:{server_port}/v1/responses \
  -H "content-type: application/json" \
  -H "Authorization: Bearer <api_key>" \
  -d '{
    "model": "Qwen/Qwen3.5-35B-A3B",
    "input": "Reply exactly: pong",
    "max_output_tokens": 32
  }'

The response contains an output list and a convenience output_text field:

{
  "object": "response",
  "status": "completed",
  "output": [{
    "type": "message",
    "role": "assistant",
    "content": [{"type": "output_text", "text": "pong"}]
  }],
  "output_text": "pong"
}

Example: `/v1/responses` with tools

curl http://{server_ip}:{server_port}/v1/responses \
  -H "content-type: application/json" \
  -H "Authorization: Bearer <api_key>" \
  -d '{
    "model": "Qwen/Qwen3.5-35B-A3B",
    "input": "Call the search tool with query lmdeploy.",
    "max_output_tokens": 128,
    "tools": [{
      "type": "function",
      "name": "search",
      "description": "Search docs",
      "parameters": {
        "type": "object",
        "properties": {
          "query": {"type": "string"}
        },
        "required": ["query"]
      }
    }]
  }'

Streaming Events (SSE)

When stream=true, the endpoint returns text/event-stream events such as:

response.created
response.in_progress
response.output_item.added
response.content_part.added
response.output_text.delta
response.output_text.done
response.function_call_arguments.delta
response.function_call_arguments.done
response.output_item.done
response.completed

Codex Integration

May refer to codex.