Create embeddings

May 8, 2025 ยท View on GitHub

Infinity Embedding Worker Banner


High-throughput, OpenAI-compatible text embedding & reranker powered by Infinity


RunPod


  1. Quickstart
  2. Endpoint Configuration
  3. API Specification
    1. List Models
    2. Create Embeddings
    3. Rerank Documents
  4. Usage
  5. Further Documentation
  6. Acknowledgements

Quickstart

  1. ๐Ÿณ Pull an image โ€“ use the tag shown on the latest GitHub release page (e.g. runpod/worker-infinity-embedding:<version>)
  2. ๐Ÿ”ง Configure โ€“ set at least MODEL_NAMES (see Endpoint Configuration)
  3. ๐Ÿš€ Deploy โ€“ create a RunPod Serverless endpoint
  4. ๐Ÿงช Call the API โ€“ follow the example in the Usage section

Endpoint Configuration

All behaviour is controlled through environment variables:

VariableRequiredDefaultDescription
MODEL_NAMESYesโ€”One or more Hugging-Face model IDs. Separate multiple IDs with a semicolon.
Example: BAAI/bge-small-en-v1.5
BATCH_SIZESNo32Per-model batch size; semicolon-separated list matching MODEL_NAMES.
BACKENDNotorchInference engine for all models: torch, optimum, or ctranslate2.
DTYPESNoautoPrecision per model (auto, fp16, fp8). Semicolon-separated, must match MODEL_NAMES.
INFINITY_QUEUE_SIZENo48000Max items queueable inside the Infinity engine.
RUNPOD_MAX_CONCURRENCYNo300Max concurrent requests the RunPod wrapper will accept.

API Specification

Two flavours, one schema.

  • OpenAI-compatible โ€“ drop-in replacement for /v1/models, /v1/embeddings, so you can use this endpoint instead of the API from OpenAI by replacing the base url with the URL of your endpoint: https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1 and use your API key from RunPod instead of the one from OpenAI
  • Standard RunPod โ€“ call /run or /runsync with a JSON body under the input key.
    Base URL: https://api.runpod.ai/v2/<ENDPOINT_ID>

Except for transport (path + wrapper object) the JSON you send/receive is identical. The tables below describe the shared payload.

List Models

MethodPathBody
GET/openai/v1/modelsโ€“
POST/runsync{ "input": { "openai_route": "/v1/models" } }

Response

{
  "data": [
    { "id": "BAAI/bge-small-en-v1.5", "stats": {} },
    { "id": "intfloat/e5-large-v2", "stats": {} }
  ]
}

Create Embeddings

Request Fields (shared)

FieldTypeRequiredDescription
modelstringYesOne of the IDs supplied via MODEL_NAMES.
inputstring | arrayYesA single text string or list of texts to embed.

OpenAI route vs. Standard:

FlavourMethodPathBody
OpenAIPOST/v1/embeddings{ "model": "โ€ฆ", "input": "โ€ฆ" }
StandardPOST/runsync{ "input": { "model": "โ€ฆ", "input": "โ€ฆ" } }

Response (both flavours)

{
  "object": "list",
  "model": "BAAI/bge-small-en-v1.5",
  "data": [
    { "object": "embedding", "embedding": [0.01, -0.02 /* โ€ฆ */], "index": 0 }
  ],
  "usage": { "prompt_tokens": 2, "total_tokens": 2 }
}

Rerank Documents (Standard only)

FieldTypeRequiredDescription
modelstringYesAny deployed reranker model
querystringYesThe search/query text
docsarrayYesList of documents to rerank
return_docsboolNoIf true, return the documents in ranked order (default false)

Call pattern

POST /runsync
Content-Type: application/json

{
  "input": {
    "model": "BAAI/bge-reranker-large",
    "query": "Which product has warranty coverage?",
    "docs": [
      "Product A comes with a 2-year warranty",
      "Product B is available in red and blue colors",
      "All electronics include a standard 1-year warranty"
    ],
    "return_docs": true
  }
}

Response contains either scores or the full docs list, depending on return_docs.


Usage

Below are minimal curl snippets so you can copy-paste from any machine.

Replace <ENDPOINT_ID> with your endpoint ID and <API_KEY> with a RunPod API key.

OpenAI-Compatible Calls

# List models
curl -H "Authorization: Bearer <API_KEY>" \
     https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1/models

# Create embeddings
curl -X POST \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"model":"BAAI/bge-small-en-v1.5","input":"Hello world"}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1/embeddings

Standard RunPod Calls

# Create embeddings (wait for result)
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"input":{"model":"BAAI/bge-small-en-v1.5","input":"Hello world"}}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync

# Rerank
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"input":{"model":"BAAI/bge-reranker-large","query":"Which product has warranty coverage?","docs":["Product A comes with a 2-year warranty","Product B is available in red and blue colors","All electronics include a standard 1-year warranty"],"return_docs":true}}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync

Further Documentation


Acknowledgements

Special thanks to Michael Feil for creating the Infinity engine and for his ongoing support of this project.