liter-llm

May 27, 2026 · View on GitHub

A lighter, faster, safer universal LLM API client -- one Rust core, 14 native language bindings, 143 providers.

Why liter-llm?

A universal LLM API client, compiled from the ground up in Rust. No interpreter, no transitive dependency tree, no supply chain surface area. One binary, 14 native language bindings, 143 providers.

Compiled Rust core. No pip install supply chain. No .pth auto-execution hooks. No runtime dependency tree to compromise. The kind of supply chain attack that hit litellm in 2026 is structurally impossible here.
Secrets stay secret. API keys are wrapped in secrecy::SecretString -- zeroed on drop, redacted in logs, never serialized.
Polyglot from day one. Python, TypeScript, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, Dart, Swift, Zig, WebAssembly -- all thin wrappers around the same Rust core, plus a C/FFI surface for everything else. No reimplementation drift.
Observability built in. Production-grade OpenTelemetry with GenAI semantic conventions -- not an afterthought callback system.
Composable middleware. Rate limiting, caching, cost tracking, health checks, and fallback as Tower layers you stack like building blocks.

We give credit to litellm for proving the category -- our provider registry was bootstrapped from theirs. See ATTRIBUTIONS.md.

Feature Comparison

An honest look at where things stand. We're newer and leaner -- litellm has breadth we haven't matched yet, and we have depth they can't easily retrofit.

	liter-llm	litellm
Language	Rust (compiled, memory-safe)	Python
Bindings	14 native (Rust, Python, TS, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, Dart, Swift, Zig, WASM) + C/FFI	Python (+ OpenAI-compatible proxy)
Providers	143 (compiled at build time)	100+ (runtime resolution)
Streaming	SSE + AWS EventStream binary protocol	SSE + AWS EventStream
Observability	Built-in OpenTelemetry (GenAI semconv)	40+ callback integrations
API key safety	`secrecy::SecretString` (zeroed, redacted)	Plain strings
Middleware	Composable Tower stack	Built-in callback system
Proxy / Gateway	Yes (22 OpenAI-compatible endpoints, 35MB Docker)	Yes
Guardrails	--	10+ integrations, 4 execution modes (advanced: enterprise)
Semantic caching	--	Redis + Qdrant backends
Virtual key mgmt	Yes (per-key model restrictions, RPM/TPM, budgets)	Yes (key rotation: enterprise)
Management API	Config-driven (REST admin API planned)	Multi-tenant (teams, budgets, keys; tiers + reporting: enterprise)
Fine-tuning API	--	Enterprise only
Load balancer	Fallback + round-robin via Tower router	Full router with strategies
Cost tracking	Embedded pricing + OTEL spans	Per-key/team/model budgets
Rate limiting	Per-model RPM/TPM (Tower layer)	Per-key/user/team/model
Caching	In-memory LRU + 40+ backends via OpenDAL (S3, Redis, GCS, DynamoDB, disk, ...)	7 backends (Redis, S3, GCS, disk, Qdrant)
Tool calling	Parallel tools, structured output, JSON schema	Full support
Embeddings	Yes	Yes
Batch API	Yes	Yes
Audio / Speech	Yes	Yes
Lifecycle hooks	onRequest/onResponse/onError per-client	Callback integrations
Budget enforcement	Per-model + global limits, hard/soft modes	Per-key/team budgets
Health checks	Automatic provider probes + cooldown	--
Custom providers	Runtime API + TOML config file	Config + code-based
Config files	TOML with auto-discovery (`liter-llm.toml`)	YAML proxy config
Search / OCR	12 search + 4 OCR providers	Yes
Image generation	Yes	Yes

Key Features

143 providers -- OpenAI, Anthropic, Google, AWS Bedrock, Groq, Mistral, Together AI, Fireworks, Perplexity, DeepSeek, Cohere, and 130+ more
14 native bindings -- Rust, Python, TypeScript/Node.js, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, Dart, Swift, Zig, WebAssembly (plus a C/FFI surface shared across them)
First-class streaming -- SSE and AWS EventStream binary protocol with zero-copy buffers
TOML configuration -- liter-llm.toml with auto-discovery, custom providers, cache backends, middleware config
OpenTelemetry -- GenAI semantic conventions, cost tracking spans, HTTP-level tracing
Tower middleware -- Rate limiting, caching (40+ OpenDAL backends), cost tracking, budget enforcement, health checks, cooldowns, hooks, fallback -- all composable
Search & OCR -- Web search across 12 providers, document OCR across 4 providers
Tool calling -- Parallel tools, structured outputs, JSON schema validation
Embeddings -- Dimension selection, base64 format, multi-provider support
Per-request routing -- Automatic provider detection from model name prefix, custom provider registration at runtime
Schema-driven -- Provider registry and API types compiled from JSON schemas, no runtime lookups
Local LLM support — Run models locally with Ollama, LM Studio, vLLM, llama.cpp, LocalAI, and llamafile via OpenAI-compatible APIs

Proxy Server & CLI

Drop-in replacement for litellm's proxy -- 22 OpenAI-compatible endpoints. Install the liter-llm CLI (which ships both the proxy server and the MCP tool server) one of three ways:

# Homebrew (macOS / Linux)
brew install kreuzberg-dev/tap/liter-llm

# Pre-built binary (Linux x86_64/arm64, macOS arm64, Windows x86_64)
curl -L https://github.com/kreuzberg-dev/liter-llm/releases/latest/download/liter-llm-${VERSION}-${TARGET}.tar.gz | tar xz

# Docker (35MB image)
docker run -p 4000:4000 -e LITER_LLM_MASTER_KEY=sk-your-key ghcr.io/kreuzberg-dev/liter-llm

Then call it like OpenAI:

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key" \
  -d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

Or with a TOML config file:

# liter-llm-proxy.toml
[general]
master_key = "${LITER_LLM_MASTER_KEY}"

[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"

[[models]]
name = "claude-sonnet"
provider_model = "anthropic/claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"

[[keys]]
key = "sk-team-a"
models = ["gpt-4o"]
rpm = 100

CLI:

liter-llm api --config liter-llm-proxy.toml    # Start proxy server
liter-llm mcp --transport stdio                 # Start MCP tool server

Features: Model routing, virtual API keys, per-key rate limiting (RPM/TPM), cost tracking, budget enforcement, response caching, SSE streaming, OpenAPI 3.1 spec at /openapi.json, MCP server with 22 tools, graceful shutdown.

Architecture

liter-llm/
├── crates/
│   ├── liter-llm/           # Rust core library
│   ├── liter-llm-py/        # Python (PyO3) core
│   ├── liter-llm-node/      # Node.js (NAPI-RS) core
│   ├── liter-llm-ffi/       # C-compatible FFI layer
│   ├── liter-llm-php/       # PHP (ext-php-rs) core
│   └── liter-llm-wasm/      # WebAssembly (wasm-bindgen) core
├── packages/
│   ├── python/               # Python package
│   ├── typescript/           # TypeScript/Node.js package
│   ├── go/                   # Go (cgo) module
│   ├── java/                 # Java (Panama FFI) package
│   ├── ruby/                 # Ruby (Magnus) gem
│   ├── elixir/               # Elixir (Rustler NIF) package
│   ├── csharp/               # .NET (P/Invoke) package
│   └── php/                  # PHP (Composer) package
└── schemas/                  # Provider registry and API schemas

Quick Start

Install in your language of choice:

Language	Install
Python	`pip install liter-llm`
Node.js	`pnpm add @kreuzberg/liter-llm`
Rust	`cargo add liter-llm`
Go	`go get github.com/kreuzberg-dev/liter-llm/packages/go`
Java	`dev.kreuzberg:liter-llm` (Maven/Gradle)
Ruby	`gem install liter_llm`
PHP	`composer require kreuzberg/liter-llm`
C#	`dotnet add package LiterLlm`
Elixir	`{:liter_llm, "~> 1.4.0-rc.27"}` in mix.exs
Dart / Flutter	`dart pub add liter_llm`
Swift	See Swift package -- `.binaryTarget` from release notes
Kotlin (Android)	`dev.kreuzberg:liter-llm-android` (Maven Central)
Zig	See Zig package
WASM	`pnpm add @kreuzberg/liter-llm-wasm`
C/FFI	Build from source -- see FFI crate

Usage

import asyncio, os
from liter_llm import LlmClient

async def main():
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])

    # Chat with any provider using the provider/model prefix
    response = await client.chat(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

    # Switch providers by changing the prefix -- no other code changes
    client2 = LlmClient(api_key=os.environ["ANTHROPIC_API_KEY"])
    response = await client2.chat(
        model="anthropic/claude-sonnet-4-20250514",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Or use a liter-llm.toml config file instead of passing everything in code:

api_key = "sk-..."
timeout_secs = 120

[cache]
max_entries = 512
ttl_seconds = 600
backend = "redis"
backend_config = { connection_string = "redis://localhost:6379" }

[budget]
global_limit = 50.0
enforcement = "hard"

[[providers]]
name = "my-provider"
base_url = "https://my-llm.example.com/v1"
model_prefixes = ["my-provider/"]

The same API is available in all 14 languages -- see the language READMEs below for idiomatic examples.

Core API

All bindings expose a unified chat() function:

Language	Usage
Rust	`DefaultClient::new(config).chat(messages, options).await`
Python	`LlmClient(api_key=...).chat(messages, config)`
Node.js	`new LlmClient({ apiKey }).chat(messages, config)`
Go	`client.Chat(ctx, messages, config)`
Java	`client.chat(messages, configJson)`
Ruby	`LiterLlm::LlmClient.new(api_key, config).chat(messages)`
Elixir	`LiterLlm.chat(messages, config)`
PHP	`LiterLlm\LlmClient::new($apiKey)->chat($messages, $config)`
C#	`new LlmClient(apiKey).ChatAsync(messages, config)`
WASM	`new LlmClient({ apiKey }).chat(messages, config)`
C FFI	`liter_llm_chat(client, messages_json, config_json)`

Language READMEs

Language	README	Binding
Python	packages/python	PyO3
TypeScript / Node.js	crates/liter-llm-node	NAPI-RS
Go	packages/go	cgo
Java	packages/java	Panama FFI
Ruby	packages/ruby	Magnus
Elixir	packages/elixir	Rustler NIF
PHP	packages/php	ext-php-rs
.NET (C#)	packages/csharp	P/Invoke
WebAssembly	crates/liter-llm-wasm	wasm-bindgen
C/C++ (FFI)	crates/liter-llm-ffi	C ABI

Part of Kreuzberg.dev

Kreuzberg — document intelligence: text, tables, metadata from 90+ formats with optional OCR.
Kreuzberg Cloud — managed extraction API with SDKs, dashboards, and observability.
kreuzcrawl — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
html-to-markdown — fast, lossless HTML→Markdown engine.
tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
alef — the polyglot binding generator that produces all per-language bindings.
Discord — community, roadmap, announcements.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Join our Discord community for questions and discussion.

License

MIT -- see LICENSE for details.