liter-llm

May 27, 2026 · View on GitHub

Bindings Rust Python Node.js WASM Java Go C# PHP Ruby Elixir C FFI Docker Homebrew License Docs
kreuzberg.dev
Join Discord

A lighter, faster, safer universal LLM API client -- one Rust core, 14 native language bindings, 143 providers.

Why liter-llm?

A universal LLM API client, compiled from the ground up in Rust. No interpreter, no transitive dependency tree, no supply chain surface area. One binary, 14 native language bindings, 143 providers.

  • Compiled Rust core. No pip install supply chain. No .pth auto-execution hooks. No runtime dependency tree to compromise. The kind of supply chain attack that hit litellm in 2026 is structurally impossible here.
  • Secrets stay secret. API keys are wrapped in secrecy::SecretString -- zeroed on drop, redacted in logs, never serialized.
  • Polyglot from day one. Python, TypeScript, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, Dart, Swift, Zig, WebAssembly -- all thin wrappers around the same Rust core, plus a C/FFI surface for everything else. No reimplementation drift.
  • Observability built in. Production-grade OpenTelemetry with GenAI semantic conventions -- not an afterthought callback system.
  • Composable middleware. Rate limiting, caching, cost tracking, health checks, and fallback as Tower layers you stack like building blocks.

We give credit to litellm for proving the category -- our provider registry was bootstrapped from theirs. See ATTRIBUTIONS.md.

Feature Comparison

An honest look at where things stand. We're newer and leaner -- litellm has breadth we haven't matched yet, and we have depth they can't easily retrofit.

liter-llmlitellm
LanguageRust (compiled, memory-safe)Python
Bindings14 native (Rust, Python, TS, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, Dart, Swift, Zig, WASM) + C/FFIPython (+ OpenAI-compatible proxy)
Providers143 (compiled at build time)100+ (runtime resolution)
StreamingSSE + AWS EventStream binary protocolSSE + AWS EventStream
ObservabilityBuilt-in OpenTelemetry (GenAI semconv)40+ callback integrations
API key safetysecrecy::SecretString (zeroed, redacted)Plain strings
MiddlewareComposable Tower stackBuilt-in callback system
Proxy / GatewayYes (22 OpenAI-compatible endpoints, 35MB Docker)Yes
Guardrails--10+ integrations, 4 execution modes (advanced: enterprise)
Semantic caching--Redis + Qdrant backends
Virtual key mgmtYes (per-key model restrictions, RPM/TPM, budgets)Yes (key rotation: enterprise)
Management APIConfig-driven (REST admin API planned)Multi-tenant (teams, budgets, keys; tiers + reporting: enterprise)
Fine-tuning API--Enterprise only
Load balancerFallback + round-robin via Tower routerFull router with strategies
Cost trackingEmbedded pricing + OTEL spansPer-key/team/model budgets
Rate limitingPer-model RPM/TPM (Tower layer)Per-key/user/team/model
CachingIn-memory LRU + 40+ backends via OpenDAL (S3, Redis, GCS, DynamoDB, disk, ...)7 backends (Redis, S3, GCS, disk, Qdrant)
Tool callingParallel tools, structured output, JSON schemaFull support
EmbeddingsYesYes
Batch APIYesYes
Audio / SpeechYesYes
Lifecycle hooksonRequest/onResponse/onError per-clientCallback integrations
Budget enforcementPer-model + global limits, hard/soft modesPer-key/team budgets
Health checksAutomatic provider probes + cooldown--
Custom providersRuntime API + TOML config fileConfig + code-based
Config filesTOML with auto-discovery (liter-llm.toml)YAML proxy config
Search / OCR12 search + 4 OCR providersYes
Image generationYesYes

Key Features

  • 143 providers -- OpenAI, Anthropic, Google, AWS Bedrock, Groq, Mistral, Together AI, Fireworks, Perplexity, DeepSeek, Cohere, and 130+ more
  • 14 native bindings -- Rust, Python, TypeScript/Node.js, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, Dart, Swift, Zig, WebAssembly (plus a C/FFI surface shared across them)
  • First-class streaming -- SSE and AWS EventStream binary protocol with zero-copy buffers
  • TOML configuration -- liter-llm.toml with auto-discovery, custom providers, cache backends, middleware config
  • OpenTelemetry -- GenAI semantic conventions, cost tracking spans, HTTP-level tracing
  • Tower middleware -- Rate limiting, caching (40+ OpenDAL backends), cost tracking, budget enforcement, health checks, cooldowns, hooks, fallback -- all composable
  • Search & OCR -- Web search across 12 providers, document OCR across 4 providers
  • Tool calling -- Parallel tools, structured outputs, JSON schema validation
  • Embeddings -- Dimension selection, base64 format, multi-provider support
  • Per-request routing -- Automatic provider detection from model name prefix, custom provider registration at runtime
  • Schema-driven -- Provider registry and API types compiled from JSON schemas, no runtime lookups
  • Local LLM support — Run models locally with Ollama, LM Studio, vLLM, llama.cpp, LocalAI, and llamafile via OpenAI-compatible APIs

Proxy Server & CLI

Drop-in replacement for litellm's proxy -- 22 OpenAI-compatible endpoints. Install the liter-llm CLI (which ships both the proxy server and the MCP tool server) one of three ways:

# Homebrew (macOS / Linux)
brew install kreuzberg-dev/tap/liter-llm

# Pre-built binary (Linux x86_64/arm64, macOS arm64, Windows x86_64)
curl -L https://github.com/kreuzberg-dev/liter-llm/releases/latest/download/liter-llm-${VERSION}-${TARGET}.tar.gz | tar xz

# Docker (35MB image)
docker run -p 4000:4000 -e LITER_LLM_MASTER_KEY=sk-your-key ghcr.io/kreuzberg-dev/liter-llm

Then call it like OpenAI:

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key" \
  -d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

Or with a TOML config file:

# liter-llm-proxy.toml
[general]
master_key = "${LITER_LLM_MASTER_KEY}"

[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"

[[models]]
name = "claude-sonnet"
provider_model = "anthropic/claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"

[[keys]]
key = "sk-team-a"
models = ["gpt-4o"]
rpm = 100

CLI:

liter-llm api --config liter-llm-proxy.toml    # Start proxy server
liter-llm mcp --transport stdio                 # Start MCP tool server

Features: Model routing, virtual API keys, per-key rate limiting (RPM/TPM), cost tracking, budget enforcement, response caching, SSE streaming, OpenAPI 3.1 spec at /openapi.json, MCP server with 22 tools, graceful shutdown.

Architecture

liter-llm/
├── crates/
│   ├── liter-llm/           # Rust core library
│   ├── liter-llm-py/        # Python (PyO3) core
│   ├── liter-llm-node/      # Node.js (NAPI-RS) core
│   ├── liter-llm-ffi/       # C-compatible FFI layer
│   ├── liter-llm-php/       # PHP (ext-php-rs) core
│   └── liter-llm-wasm/      # WebAssembly (wasm-bindgen) core
├── packages/
│   ├── python/               # Python package
│   ├── typescript/           # TypeScript/Node.js package
│   ├── go/                   # Go (cgo) module
│   ├── java/                 # Java (Panama FFI) package
│   ├── ruby/                 # Ruby (Magnus) gem
│   ├── elixir/               # Elixir (Rustler NIF) package
│   ├── csharp/               # .NET (P/Invoke) package
│   └── php/                  # PHP (Composer) package
└── schemas/                  # Provider registry and API schemas

Quick Start

Install in your language of choice:

LanguageInstall
Pythonpip install liter-llm
Node.jspnpm add @kreuzberg/liter-llm
Rustcargo add liter-llm
Gogo get github.com/kreuzberg-dev/liter-llm/packages/go
Javadev.kreuzberg:liter-llm (Maven/Gradle)
Rubygem install liter_llm
PHPcomposer require kreuzberg/liter-llm
C#dotnet add package LiterLlm
Elixir{:liter_llm, "~> 1.4.0-rc.27"} in mix.exs
Dart / Flutterdart pub add liter_llm
SwiftSee Swift package -- .binaryTarget from release notes
Kotlin (Android)dev.kreuzberg:liter-llm-android (Maven Central)
ZigSee Zig package
WASMpnpm add @kreuzberg/liter-llm-wasm
C/FFIBuild from source -- see FFI crate

Usage

import asyncio, os
from liter_llm import LlmClient

async def main():
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])

    # Chat with any provider using the provider/model prefix
    response = await client.chat(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

    # Switch providers by changing the prefix -- no other code changes
    client2 = LlmClient(api_key=os.environ["ANTHROPIC_API_KEY"])
    response = await client2.chat(
        model="anthropic/claude-sonnet-4-20250514",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Or use a liter-llm.toml config file instead of passing everything in code:

api_key = "sk-..."
timeout_secs = 120

[cache]
max_entries = 512
ttl_seconds = 600
backend = "redis"
backend_config = { connection_string = "redis://localhost:6379" }

[budget]
global_limit = 50.0
enforcement = "hard"

[[providers]]
name = "my-provider"
base_url = "https://my-llm.example.com/v1"
model_prefixes = ["my-provider/"]

The same API is available in all 14 languages -- see the language READMEs below for idiomatic examples.

Core API

All bindings expose a unified chat() function:

LanguageUsage
RustDefaultClient::new(config).chat(messages, options).await
PythonLlmClient(api_key=...).chat(messages, config)
Node.jsnew LlmClient({ apiKey }).chat(messages, config)
Goclient.Chat(ctx, messages, config)
Javaclient.chat(messages, configJson)
RubyLiterLlm::LlmClient.new(api_key, config).chat(messages)
ElixirLiterLlm.chat(messages, config)
PHPLiterLlm\LlmClient::new($apiKey)->chat($messages, $config)
C#new LlmClient(apiKey).ChatAsync(messages, config)
WASMnew LlmClient({ apiKey }).chat(messages, config)
C FFIliter_llm_chat(client, messages_json, config_json)

Language READMEs

LanguageREADMEBinding
Pythonpackages/pythonPyO3
TypeScript / Node.jscrates/liter-llm-nodeNAPI-RS
Gopackages/gocgo
Javapackages/javaPanama FFI
Rubypackages/rubyMagnus
Elixirpackages/elixirRustler NIF
PHPpackages/phpext-php-rs
.NET (C#)packages/csharpP/Invoke
WebAssemblycrates/liter-llm-wasmwasm-bindgen
C/C++ (FFI)crates/liter-llm-ffiC ABI

Part of Kreuzberg.dev

  • Kreuzberg — document intelligence: text, tables, metadata from 90+ formats with optional OCR.
  • Kreuzberg Cloud — managed extraction API with SDKs, dashboards, and observability.
  • kreuzcrawl — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
  • html-to-markdown — fast, lossless HTML→Markdown engine.
  • tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
  • alef — the polyglot binding generator that produces all per-language bindings.
  • Discord — community, roadmap, announcements.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Join our Discord community for questions and discussion.

License

MIT -- see LICENSE for details.