OpenAI API embeddings endpoint {#ovmsdocsrestapiembeddings}

October 27, 2025 · View on GitHub

API Reference

OpenVINO Model Server includes now the embeddings endpoint using OpenAI API. Please see the OpenAI API Reference for more information on the API. The endpoint is exposed via a path:

http://server_name:port/v3/embeddings

Example request

curl http://localhost/v3/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gte-large",
    "input": ["This is a test"],
    "encoding_format": "float"
  }'

Example response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        -0.03440694510936737,
        -0.02553200162947178,
        -0.010130723007023335,
        -0.013917984440922737,
...
        0.02722850814461708,
        -0.017527244985103607,
        -0.0053995149210095406
      ],
      "index": 0
    }
  ],
  "usage":{"prompt_tokens":6,"total_tokens":6}
}

Request

Generic

ParamOpenVINO Model ServerOpenAI /embeddings APITypeDescription
modelstring (required)Name of the model to use. Name assigned to a MediaPipe graph configured to schedule generation using desired embedding model.
inputstring/list of strings (required)Input text to embed, encoded as a string or a list of strings
encoding_formatfloat or base64 (default: float)The format to return the embeddings in

Unsupported params from OpenAI service:

  • user
  • dimensions

Response

ParamOpenVINO Model ServerOpenAI /embeddings APITypeDescription
dataarrayA list of responses for each string
data.embeddingarray of float or base64 stringVector of embeddings for a string.
data.indexintegerResponse index
modelstringModel name
usagedictionaryInfo about assessed tokens

Error handling

Endpoint can raise an error related to incorrect request in the following conditions:

  • Incorrect format of any of the fields based on the schema
  • Any tokenized input text exceeds the maximum length of the model context. Make sure input documents are chunked to fit the model
  • The number of input documents exceeds allowed configured value - default 500

References

End to end demo with embeddings endpoint

Code snippets