chuk-llm

February 18, 2026 Β· View on GitHub

The intelligent model capability engine. Production-ready Python library with dynamic model discovery, capability-based selection, real-time streaming, and Pydantic-native architecture.

from chuk_llm import quick_question
print(quick_question("What is 2+2?"))  # "2 + 2 equals 4."

✨ What's New in v0.14

Revolutionary Registry System:

  • 🧠 Dynamic Model Discovery - No more hardcoded model lists, automatic capability detection
  • 🎯 Intelligent Selection - Find models by capabilities, cost, and quality tier
  • πŸ” Smart Queries - find_best(requires_tools=True, quality_tier="cheap")
  • πŸ—οΈ Pydantic V2 Native - Type-safe models throughout, no dictionary goop
  • ⚑ Async-First Architecture - True async/await with sync wrappers for convenience
  • πŸ“Š Layered Capability Resolution - Heuristics β†’ YAML cache β†’ Provider APIs
  • πŸš€ Zero-Config - Pull a new Ollama model, use it immediately

Latest Models (December 2025):

  • πŸ€– Gemini 2.5/3 Pro - 1M token context, adaptive thinking, multimodal (gemini-2.5-flash, gemini-3-pro-preview)
  • πŸš€ Mistral Large 3 - 675B MoE, 41B active, Apache 2.0 (mistral-large-2512, ministral-8b-2512, ministral-14b-2512)
  • πŸ’‘ DeepSeek V3.2 - 671B MoE, ultra-efficient at $0.27/M tokens (deepseek-chat, deepseek-reasoner)

Performance:

  • ⚑ 52x faster imports - Lazy loading reduces import time from 735ms to 14ms
  • πŸš€ 112x faster client creation - Automatic thread-safe caching
  • πŸ“Š <0.015% overhead - Negligible library overhead vs API latency

See REGISTRY_COMPLETE.md for architecture details.

Why chuk-llm?

  • 🧠 Intelligent: Dynamic registry selects models by capabilities, not names
  • πŸ” Auto-Discovery: Pull new models, use immediately - no configuration needed
  • ⚑ Lightning Fast: Massive performance improvements (see Performance)
  • πŸ› οΈ Clean Tools API: Function calling without complexity - tools are just parameters
  • πŸ—οΈ Type-Safe: Pydantic V2 models throughout, no dictionary goop
  • ⚑ Async-Native: True async/await with sync wrappers when needed
  • πŸ“Š Built-in Analytics: Automatic cost and usage tracking with session isolation
  • 🎯 Production-Ready: Thread-safe caching, connection pooling, negligible overhead

Quick Start

Installation

# Core functionality
pip install chuk_llm

# Or with extras
pip install chuk_llm[redis]  # Persistent sessions
pip install chuk_llm[cli]    # Enhanced CLI experience
pip install chuk_llm[all]    # Everything

Basic Usage

# Simplest approach - auto-detects available providers
from chuk_llm import quick_question
answer = quick_question("Explain quantum computing in one sentence")

# Provider-specific (auto-generated functions!)
from chuk_llm import ask_openai_sync, ask_claude_sync, ask_ollama_llama3_2_sync

response = ask_openai_sync("Tell me a joke")
response = ask_claude_sync("Write a haiku")
response = ask_ollama_llama3_2_sync("Explain Python")  # Auto-discovered!

Latest Models (December 2025)

from chuk_llm import ask

# Gemini 3 Pro - Advanced reasoning with 1M context
response = await ask(
    "Explain consciousness vs intelligence in AI",
    provider="gemini",
    model="gemini-3-pro-preview"
)

# Mistral Large 3 - 675B MoE, Apache 2.0
response = await ask(
    "Write a Python function for binary search",
    provider="mistral",
    model="mistral-large-2512"
)

# Ministral 8B - Fast, efficient, cost-effective
response = await ask(
    "Summarize this text",
    provider="mistral",
    model="ministral-8b-2512"
)

# DeepSeek V3.2 - Ultra-efficient at \$0.27/M tokens
response = await ask(
    "Solve this math problem step by step",
    provider="deepseek",
    model="deepseek-chat"
)

Async & Streaming

import asyncio
from chuk_llm import ask, stream

async def main():
    # Async call
    response = await ask("What's the capital of France?")
    
    # Real-time streaming
    async for chunk in stream("Write a story"):
        print(chunk, end="", flush=True)

asyncio.run(main())

Function Calling (Tools)

from chuk_llm import ask
from chuk_llm.api.tools import tools_from_functions

def get_weather(location: str) -> dict:
    return {"temp": 22, "location": location, "condition": "sunny"}

# Tools are just a parameter!
toolkit = tools_from_functions(get_weather)
response = await ask(
    "What's the weather in Paris?",
    tools=toolkit.to_openai_format()
)
print(response)  # Returns dict with tool_calls when tools provided

CLI Usage

# Quick commands with global aliases
chuk-llm ask_gpt "What is Python?"
chuk-llm ask_claude "Explain quantum computing"

# Auto-discovered Ollama models work instantly
chuk-llm ask_ollama_gemma3 "Hello world"
chuk-llm stream_ollama_mistral "Write a long story"

# llama.cpp with automatic model resolution
chuk-llm ask "What is Python?" --provider llamacpp --model qwen3
chuk-llm ask "Count to 5" --provider llamacpp --model llama3.2

# Discover new models
chuk-llm discover ollama

🧠 Dynamic Registry System

The registry is the intelligent core of chuk-llm. Instead of hardcoding model names, it dynamically discovers models and their capabilities, then selects the best one for your needs.

Intelligent Model Selection

from chuk_llm.registry import get_registry
from chuk_llm import ask

# Get the registry (auto-discovers all available models)
registry = await get_registry()

# Find the best cheap model with tool support
model = await registry.find_best(
    requires_tools=True,
    quality_tier="cheap"
)
print(f"Selected: {model.spec.provider}:{model.spec.name}")
# Selected: groq:llama-3.3-70b-versatile

# Use the selected model with ask()
response = await ask(
    "Summarize this document",
    provider=model.spec.provider,
    model=model.spec.name
)

# Find best model for vision with large context
model = await registry.find_best(
    requires_vision=True,
    min_context=128_000,
    quality_tier="balanced"
)
# Returns: openai:gpt-4o-mini or gemini:gemini-2.0-flash-exp

# Custom queries with multiple requirements
from chuk_llm.registry import ModelQuery

results = await registry.query(ModelQuery(
    requires_tools=True,
    requires_vision=True,
    min_context=100_000,
    max_cost_per_1m_input=2.0,
    quality_tier="balanced"
))

How It Works

3-Tier Capability Resolution:

  1. Heuristic Resolver - Infers capabilities from model name patterns (e.g., "gpt-4" β†’ likely supports tools)
  2. YAML Cache - Tested capabilities stored in registry/capabilities/*.yaml for fast, reliable access
  3. Provider APIs - Queries provider APIs dynamically (Ollama /api/tags, Gemini models API, etc.)

Dynamic Discovery Sources:

  • OpenAI /v1/models API
  • Anthropic known models
  • Google Gemini models API
  • Ollama /api/tags (local models)
  • llama.cpp /v1/models (local GGUF + Ollama bridge)
  • DeepSeek /v1/models API
  • Moonshot AI /v1/models API
  • Groq, Mistral, Perplexity, and more

Provider APIs are cached on disk and refreshed periodically (or via chuk-llm discover), so new models appear without needing a chuk-llm release.

Benefits:

  • βœ… No hardcoded model lists - Pull new Ollama models, use immediately
  • βœ… Capability-based selection - Declare requirements, not model names
  • βœ… Cost-aware - Find cheapest model that meets requirements
  • βœ… Quality tiers - BEST, BALANCED, CHEAP classification
  • βœ… Extensible - Add custom sources and resolvers via protocols

Key Features

πŸ” Automatic Model Discovery

Pull new Ollama models and use them immediately - no configuration needed:

# Terminal 1: Pull a new model
ollama pull llama3.2
ollama pull mistral-small:latest

# Terminal 2: Use immediately in Python
from chuk_llm import ask_ollama_llama3_2_sync, ask_ollama_mistral_small_latest_sync
response = ask_ollama_llama3_2_sync("Hello!")

# Or via CLI
chuk-llm ask_ollama_mistral_small_latest "Tell me a joke"

πŸ¦™ llama.cpp Integration

Run local GGUF models with advanced control via llama.cpp server. Reuse Ollama's downloaded models without re-downloading!

CLI Usage (✨ Now fully supported!):

# Simple usage - model names automatically resolve to GGUF files
chuk-llm ask "What is Python?" --provider llamacpp --model qwen3
chuk-llm ask "Count to 5" --provider llamacpp --model llama3.2

# Streaming (default)
chuk-llm ask "Write a story" --provider llamacpp --model qwen3

# Non-streaming
chuk-llm ask "Quick question" --provider llamacpp --model qwen3 --no-stream

Python API (Simple - Recommended):

from chuk_llm import ask

# Model names automatically resolve to Ollama's GGUF files!
response = await ask(
    "What is Python?",
    provider="llamacpp",
    model="qwen3"  # Auto-resolves to ~/.ollama/models/blobs/sha256-xxx
)
print(response)

# Streaming
from chuk_llm import stream
async for chunk in stream("Tell me a story", provider="llamacpp", model="llama3.2"):
    print(chunk, end="", flush=True)

Python API (Advanced - Full Control):

from chuk_llm.registry.resolvers.llamacpp_ollama import discover_ollama_models
from chuk_llm.llm.providers.llamacpp_client import LlamaCppLLMClient
from chuk_llm.core import Message, MessageRole

# Discover Ollama models (finds GGUF blobs in ~/.ollama/models/blobs/)
models = discover_ollama_models()
print(f"Found {len(models)} Ollama models")  # e.g., "Found 48 Ollama models"

# Create client with auto-managed server
client = LlamaCppLLMClient(
    model=str(models[0].gguf_path),  # Reuse Ollama's GGUF!
    ctx_size=8192,
    n_gpu_layers=-1,  # Use all GPU layers
)

messages = [Message(role=MessageRole.USER, content="Hello!")]
result = await client.create_completion(messages=messages)
print(result["response"])

# Cleanup
await client.stop_server()

Key Features:

  • βœ… CLI Support - Full integration with chuk-llm CLI (model name resolution)
  • βœ… Ollama Bridge - Automatically discovers and reuses Ollama's downloaded models (no re-download!)
  • βœ… Auto-Resolution - Model names (qwen3, llama3.2) resolve to GGUF file paths automatically
  • βœ… Process Management - Auto-managed server lifecycle (start/stop/health checks)
  • βœ… OpenAI-Compatible - Uses standard OpenAI client (streaming, tools, etc.)
  • βœ… High Performance - Benchmarks show llama.cpp is 1.53x faster than Ollama (311 vs 204 tok/s)
  • βœ… Advanced Control - Custom sampling, grammars, GPU layers, context size
  • βœ… Cross-Platform - Works on macOS, Linux, Windows

Performance Comparison (same GGUF file, qwen3:0.6b):

  • llama.cpp: 311.4 tok/s
  • Ollama: 204.2 tok/s
  • llama.cpp is 1.53x faster!

See examples/providers/llamacpp_ollama_usage_examples.py and examples/providers/benchmark_ollama_vs_llamacpp.py for full examples.

πŸ“Š Automatic Session Tracking

Every call is automatically tracked for analytics:

from chuk_llm import ask_sync, get_session_stats

ask_sync("What's the capital of France?")
ask_sync("What's 2+2?")

stats = get_session_stats()
print(f"Total cost: ${stats['estimated_cost']:.6f}")
print(f"Total tokens: {stats['total_tokens']}")

🎭 Stateful Conversations

Build conversational AI with memory:

from chuk_llm import conversation

async with conversation() as chat:
    await chat.ask("My name is Alice")
    response = await chat.ask("What's my name?")
    # AI responds: "Your name is Alice"

⚑ Concurrent Execution

Run multiple queries in parallel for massive speedups:

import asyncio
from chuk_llm import ask

# 3-7x faster than sequential!
responses = await asyncio.gather(
    ask("What is AI?"),
    ask("Capital of Japan?"),
    ask("Meaning of life?")
)

Supported Providers

All providers are dynamically discovered via the registry system - no hardcoded model lists!

ProviderDiscovery MethodSpecial FeaturesStatus
OpenAI/v1/models APIGPT-5 / GPT-5.1, o3-family reasoning, industry standardβœ… Dynamic
Azure OpenAIDeployment configSOC2, HIPAA compliant, VNet, multi-regionβœ… Dynamic
AnthropicKnown models†Claude 3.5 Sonnet, advanced reasoning, 200K contextβœ… Static
Google GeminiModels APIGemini 2.5/3 Pro, 1M token context, adaptive thinking, multimodalβœ… Dynamic
Groq/v1/models APILlama 3.3, ultra-fast (our benchmarks: ~526 tok/s)βœ… Dynamic
Ollama/api/tagsAny local model, auto-discovery, offline, privacyβœ… Dynamic
llama.cpp/v1/modelsLocal GGUF models, Ollama bridge, advanced controlβœ… Dynamic
IBM watsonxKnown models†Granite 3.3, enterprise, on-prem, complianceβœ… Static
PerplexityKnown models†Sonar, real-time web search, citationsβœ… Static
MistralKnown models†Large 3 (675B MoE), Ministral 3 (3B/8B/14B), Apache 2.0βœ… Static
DeepSeek/v1/models APIDeepSeek V3.2 (671B MoE), ultra-efficient, $0.27/M tokensβœ… Dynamic
Moonshot AI/v1/models APIKimi K2, 256K context, coding, Chinese languageβœ… Dynamic
OpenRouterKnown models†Access to 100+ models via single APIβœ… Static

† Static = discovered from curated model list + provider docs, not via /models endpoint

Capabilities (auto-detected by registry):

  • βœ… Streaming responses
  • βœ… Function calling / tool use
  • βœ… Vision / multimodal inputs
  • βœ… JSON mode / structured outputs
  • βœ… Async and sync interfaces
  • βœ… Automatic client caching
  • βœ… Session tracking
  • βœ… Conversation management

Configuration

Environment Variables

# API Keys - Cloud Providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."        # For Gemini 2.5/3 models
export GROQ_API_KEY="..."
export DEEPSEEK_API_KEY="..."      # For DeepSeek V3.2 (chat/reasoner)
export MOONSHOT_API_KEY="..."
export MISTRAL_API_KEY="..."       # For Mistral Large 3 & Ministral 3

# Azure Configuration
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"

# Local Servers
# (No API keys needed for Ollama or llama.cpp)

# Session Storage (optional)
export SESSION_PROVIDER=redis  # Default: memory
export SESSION_REDIS_URL=redis://localhost:6379/0

# Performance Settings
export CHUK_LLM_CACHE_CLIENTS=1      # Enable client caching (default: 1)
export CHUK_LLM_AUTO_DISCOVER=true   # Auto-discover new models (default: true)

Python Configuration

from chuk_llm import configure

configure(
    provider="azure_openai",
    model="gpt-4o-mini",
    temperature=0.7
)

# All subsequent calls use these settings
response = ask_sync("Hello!")

Client Caching (Advanced)

Automatic client caching is enabled by default for maximum performance:

from chuk_llm.llm.client import get_client

# First call creates client (~12ms)
client1 = get_client("openai", model="gpt-4o")

# Subsequent calls return cached instance (~125Β΅s)
client2 = get_client("openai", model="gpt-4o")
assert client1 is client2  # Same instance!

# Disable caching for specific call
client3 = get_client("openai", model="gpt-4o", use_cache=False)

# Monitor cache performance
from chuk_llm.client_registry import print_registry_stats
print_registry_stats()
# Cache statistics:
# - Total clients: 1
# - Cache hits: 1
# - Cache misses: 1
# - Hit rate: 50.0%

Advanced Features

πŸ› οΈ Function Calling / Tool Use

ChukLLM provides a clean, unified API for function calling. Recommended approach: Use the Tools class for automatic execution.

from chuk_llm import Tools, tool

# Recommended: Class-based tools with auto-execution
class MyTools(Tools):
    @tool(description="Get weather for a city")
    def get_weather(self, location: str) -> dict:
        return {"temp": 22, "location": location, "condition": "sunny"}

    @tool  # Description auto-extracted from docstring
    def calculate(self, expr: str) -> float:
        """Evaluate a mathematical expression"""
        return eval(expr)

# Auto-executes tools and returns final response
tools = MyTools()
response = await tools.ask("What's the weather in Paris and what's 2+2?")
print(response)  # "The weather in Paris is 22Β°C and sunny. 2+2 equals 4."

# Sync version
response = tools.ask_sync("Calculate 15 * 4")
print(response)  # "15 * 4 equals 60"

Alternative: Direct API usage (for more control):

from chuk_llm import ask
from chuk_llm.api.tools import tools_from_functions

def get_weather(location: str) -> dict:
    """Get weather information for a location"""
    return {"temp": 22, "location": location}

# Create toolkit
toolkit = tools_from_functions(get_weather)

# Returns dict with tool_calls - you handle execution
response = await ask(
    "What's the weather in Paris?",
    tools=toolkit.to_openai_format()
)
print(response)  # {"response": "...", "tool_calls": [...]}

Streaming with Tools

from chuk_llm import stream

# Streaming with tools
async for chunk in stream(
    "What's the weather in Tokyo?", 
    tools=toolkit.to_openai_format(),
    return_tool_calls=True  # Include tool calls in stream
):
    if isinstance(chunk, dict):
        print(f"Tool call: {chunk['tool_calls']}")
    else:
        print(chunk, end="", flush=True)
🌳 Conversation Branching
async with conversation() as chat:
    await chat.ask("Planning a vacation")
    
    # Explore different options
    async with chat.branch() as japan_branch:
        await japan_branch.ask("Tell me about Japan")
    
    async with chat.branch() as italy_branch:
        await italy_branch.ask("Tell me about Italy")
    
    # Main conversation unaffected by branches
    await chat.ask("I'll go with Japan!")
πŸ“ˆ Provider Comparison
from chuk_llm import compare_providers

results = compare_providers(
    "Explain quantum computing",
    ["openai", "anthropic", "groq", "ollama"]
)

for provider, response in results.items():
    print(f"{provider}: {response[:100]}...")
🎯 Intelligent System Prompts

ChukLLM automatically generates optimized system prompts based on provider capabilities:

# Each provider gets optimized prompts
response = ask_claude_sync("Help me code", tools=tools)
# Claude gets: "You are Claude, an AI assistant created by Anthropic..."

response = ask_openai_sync("Help me code", tools=tools)  
# OpenAI gets: "You are a helpful assistant with function calling..."

CLI Commands

# Quick access to any model
chuk-llm ask_gpt "Your question"
chuk-llm ask_claude "Your question"
chuk-llm ask_ollama_llama3_2 "Your question"

# llama.cpp with automatic model resolution
chuk-llm ask "Your question" --provider llamacpp --model qwen3
chuk-llm ask "Your question" --provider llamacpp --model llama3.2

# Discover and test
chuk-llm discover ollama        # Find new models
chuk-llm test llamacpp          # Test llamacpp provider
chuk-llm test azure_openai      # Test connection
chuk-llm providers              # List all providers
chuk-llm models ollama          # Show available models
chuk-llm functions              # List all generated functions

# Advanced usage
chuk-llm ask "Question" --provider azure_openai --model gpt-4o-mini --json
chuk-llm ask "Question" --provider llamacpp --model qwen3 --no-stream
chuk-llm ask "Question" --stream --verbose

# Function calling / Tool use from CLI
chuk-llm ask "Calculate 15 * 4" --tools calculator_tools.py
chuk-llm stream "What's the weather?" --tools weather_tools.py --return-tool-calls

# Zero-install with uvx
uvx chuk-llm ask_claude "Hello world"
uvx chuk-llm ask "Question" --provider llamacpp --model qwen3

Performance

chuk-llm is designed for high throughput with negligible overhead:

Key Metrics

OperationTimeNotes
Import14ms52x faster than eager loading
Client creation (cached)125Β΅s112x faster, thread-safe
Request overhead50-140Β΅s<0.015% of typical API call

Production Features

  • Automatic client caching - Thread-safe, 112x faster repeated operations
  • Lazy imports - Only load what you use
  • Connection pooling - Efficient HTTP/2 reuse
  • Async-native - Built on asyncio for maximum throughput
  • Smart caching - Model discovery results cached on disk

Benchmarks

Run comprehensive benchmarks:

uv run python benchmarks/benchmark_client_registry.py
uv run python benchmarks/llm_benchmark.py

See PERFORMANCE_OPTIMIZATIONS.md for detailed analysis and micro-benchmarks.

Architecture

ChukLLM uses a registry-driven, async-native architecture designed for scale:

πŸ—οΈ Core Design Principles

  1. Dynamic Registry - Models discovered and selected by capabilities, not names
  2. Pydantic V2 Native - Type-safe models throughout, no dictionary goop
  3. Async-First - Built on asyncio with sync wrappers for convenience
  4. Stateless Clients - Clients don't store conversation history; your application manages state
  5. Lazy Loading - Modules load on-demand for instant imports (14ms)
  6. Automatic Caching - Thread-safe client registry eliminates duplicate initialization

πŸ”„ Request Flow

User Code
    ↓
import chuk_llm (14ms - lazy loading)
    ↓
get_client() (2Β΅s - cached registry lookup)
    ↓
[Cached Client Instance]
    ↓
async ask() (~50Β΅s - minimal overhead)
    ↓
Provider SDK (~50Β΅s - efficient request building)
    ↓
HTTP Request (50-500ms - network I/O)
    ↓
Response Parsing (~50Β΅s - orjson)
    ↓
Return to User

Total chuk-llm Overhead: ~150Β΅s (<0.015% of API call)

πŸ” Session Isolation

Important: Conversation history is NOT shared between calls. Each conversation is independent:

from chuk_llm.llm.client import get_client
from chuk_llm.core.models import Message

client = get_client("openai", model="gpt-4o")

# Conversation 1
conv1 = [Message(role="user", content="My name is Alice")]
response1 = await client.create_completion(conv1)

# Conversation 2 (completely separate)
conv2 = [Message(role="user", content="What's my name?")]
response2 = await client.create_completion(conv2)
# AI won't know the name - conversations are isolated!

Key Insights:

  • βœ… Clients are stateless (safe to cache and share)
  • βœ… Conversation state lives in YOUR application
  • βœ… HTTP sessions shared for performance (connection pooling)
  • βœ… No cross-conversation or cross-user leakage
  • βœ… Thread-safe for concurrent use

See CONVERSATION_ISOLATION.md for detailed architecture.

πŸ“¦ Module Organization

chuk-llm/
β”œβ”€β”€ api/                      # Public API (ask, stream, conversation)
β”œβ”€β”€ registry/                 # ⭐ Dynamic model registry (THE BRAIN)
β”‚   β”œβ”€β”€ core.py              # ModelRegistry orchestrator
β”‚   β”œβ”€β”€ models.py            # Pydantic models (ModelSpec, ModelCapabilities)
β”‚   β”œβ”€β”€ sources/             # Discovery sources (OpenAI, Ollama, Gemini, etc.)
β”‚   └── resolvers/           # Capability resolvers (Heuristic, YAML, APIs)
β”œβ”€β”€ core/                     # Pydantic V2 models (Message, Tool, ContentPart)
β”‚   β”œβ”€β”€ models.py            # Core Pydantic models
β”‚   β”œβ”€β”€ enums.py             # Type-safe enums (Provider, Feature, etc.)
β”‚   └── constants.py         # Constants
β”œβ”€β”€ llm/
β”‚   β”œβ”€β”€ providers/           # 15+ provider implementations
β”‚   β”œβ”€β”€ client.py            # Client factory with registry integration
β”‚   └── features.py          # Feature detection
β”œβ”€β”€ configuration/           # Unified configuration system
└── client_registry.py       # Thread-safe client caching

Used by the CHUK Stack

chuk-llm is the canonical LLM layer for the entire CHUK ecosystem:

  • chuk-ai-planner uses the registry to select planning vs drafting models by capability
  • chuk-acp-agent uses capability-based policies per agent (e.g., "requires tools + 128k context")
  • chuk-mcp-remotion uses it to pick video-script models with vision + long context

Instead of hardcoding "use GPT-4o", CHUK components declare what they need, and the registry finds the best available model.

Documentation

Quick Comparison

Featurechuk-llmLangChainLiteLLMOpenAI SDK
Import speed⚑ 14ms🐌 1-2s🐌 500ms+⚑ Fast
Client cachingβœ… Auto (112x)❌❌❌
Auto-discoveryβœ…βŒβŒβŒ
Native streamingβœ…βš οΈβœ…βœ…
Function callingβœ… Clean APIβœ… Complex⚠️ Basicβœ…
Session trackingβœ… Built-in⚠️ Manual❌❌
Session isolationβœ… Guaranteed⚠️ Varies⚠️ Unclear⚠️ Manual
CLI includedβœ…βŒβš οΈ Basic❌
Provider functionsβœ… Auto-generated❌❌❌
Conversationsβœ… Branchingβœ…βŒβš οΈ Manual
Thread-safeβœ…βš οΈ Variesβš οΈβœ…
Async-nativeβœ…βš οΈ Mixedβœ…βœ…
Setup complexitySimpleComplexSimpleSimple
DependenciesMinimalHeavyModerateMinimal
Performance overhead<0.015%~2-5%~1-2%Minimal

Installation Options

CommandFeaturesUse Case
pip install chuk_llmCore + Session trackingDevelopment
pip install chuk_llm[redis]+ Redis persistenceProduction
pip install chuk_llm[cli]+ Rich CLI formattingCLI tools
pip install chuk_llm[all]EverythingFull features

License

Apache 2.0 License - see LICENSE file for details.

Support


Built with ❀️ for developers who just want their LLMs to work.