OpenAI Responses-Compatible Endpoint
June 11, 2026 ยท View on GitHub
LMDeploy provides a lightweight OpenAI Responses-compatible surface for easier integration with clients that use the newer Responses API.
Supported Endpoints
POST /v1/responsesGET /v1/models
Required Headers
For POST /v1/responses, include:
content-type: application/jsonAuthorization: Bearer <api_key>when the server is launched with--api-keys
Notes and Current Limits
POST /v1/responsescurrently supports a text-first subset of the Responses API.inputmay be a string or a list of Responses input items.instructions,developer, andsystemmessages are merged into a single leading system message for chat-template compatibility.- Function tools are converted to LMDeploy's OpenAI-compatible tool format. Tool calling requires launching API server with a configured tool parser (
--tool-call-parser ...). parallel_tool_callsdefaults totrue. When it isfalse, LMDeploy follows vLLM-style compatibility and returns only the first parsed function call.- Non-function hosted tools, such as
web_search, are accepted but ignored by LMDeploy. - Responses-only logprob serialization and stream obfuscation options are accepted but currently ignored.
backgroundmode andprevious_response_idare not supported.
Example: /v1/responses
curl http://{server_ip}:{server_port}/v1/responses \
-H "content-type: application/json" \
-H "Authorization: Bearer <api_key>" \
-d '{
"model": "Qwen/Qwen3.5-35B-A3B",
"input": "Reply exactly: pong",
"max_output_tokens": 32
}'
The response contains an output list and a convenience output_text field:
{
"object": "response",
"status": "completed",
"output": [{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "pong"}]
}],
"output_text": "pong"
}
Example: /v1/responses with tools
curl http://{server_ip}:{server_port}/v1/responses \
-H "content-type: application/json" \
-H "Authorization: Bearer <api_key>" \
-d '{
"model": "Qwen/Qwen3.5-35B-A3B",
"input": "Call the search tool with query lmdeploy.",
"max_output_tokens": 128,
"tools": [{
"type": "function",
"name": "search",
"description": "Search docs",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}]
}'
Streaming Events (SSE)
When stream=true, the endpoint returns text/event-stream events such as:
response.createdresponse.in_progressresponse.output_item.addedresponse.content_part.addedresponse.output_text.deltaresponse.output_text.doneresponse.function_call_arguments.deltaresponse.function_call_arguments.doneresponse.output_item.doneresponse.completed
Codex Integration
May refer to codex.