Octolib: Self-Sufficient AI Provider Library

June 15, 2026 ยท View on GitHub

ยฉ 2026 Muvon Un Limited (Hong Kong) | Website | Product Page License Rust

๐Ÿš€ Overview

Octolib is a comprehensive, self-sufficient AI provider library that provides a unified, type-safe interface for interacting with multiple AI services. It offers intelligent model selection, robust error handling, and advanced features like cross-provider tool calling and vision support.

โœจ Key Features

  • ๐Ÿ”Œ Multi-Provider Support: OpenAI, Anthropic, OpenRouter, Cerebras, NVIDIA NIM, Groq, BytePlus, Ollama, Together, Featherless, Google, Amazon, Cloudflare, DeepSeek, MiniMax, Moonshot AI (Kimi), Z.ai, OctoHub, Local, CLI proxies
  • ๐Ÿ›ก๏ธ Unified Interface: Consistent API across different providers
  • ๐Ÿ” Intelligent Model Validation: Strict provider:model format parsing with case-insensitive model support
  • ๐Ÿ“‹ Structured Output: JSON and JSON Schema support for OpenAI, OpenRouter, DeepSeek, Together, and Z.ai
  • ๐Ÿ’ฐ Cost Tracking: Automatic token usage and cost calculation
  • ๐Ÿ–ผ๏ธ Vision Support: Image and video attachment handling for compatible models (Moonshot Kimi K2.5)
  • ๐Ÿงฐ Tool Calling: Cross-provider tool call standardization
  • ๐Ÿงฉ CLI Provider: Use cli:<backend>/<model> (e.g. cli:codex/gpt-5.2-codex). Proxy-only: tools/MCP are not used or controllable.
  • โฑ๏ธ Retry Management: Configurable exponential backoff
  • ๐Ÿ”’ Secure Design: Environment-based API key management
  • ๐ŸŽฏ Embedding Support: Multi-provider embedding generation with Jina, Voyage, Google, OpenAI, Together, OctoHub, Local (Ollama, llama.cpp, LM Studio, vLLM), FastEmbed, and HuggingFace
  • ๐Ÿ”„ Reranking: Document relevance scoring with cross-encoder models (Voyage AI, Cohere, Jina AI, Mixedbread, Local (llama.cpp, vLLM, TEI), HuggingFace)

๐Ÿ“ฆ Quick Installation

# Add to Cargo.toml
octolib = { git = "https://github.com/muvon/octolib" }

๐Ÿš€ Quick Start

use octolib::{ProviderFactory, ChatCompletionParams, Message};

async fn example() -> anyhow::Result<()> {
    // Parse model and get provider
    let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;

    // Create messages
    let messages = vec![
        Message::user("Hello, how are you?"),
    ];

    // Create completion parameters
    let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000);

    // Get completion (requires OPENAI_API_KEY environment variable)
    let response = provider.chat_completion(params).await?;
    println!("Response: {}", response.content);

    Ok(())
}

๐Ÿ“‹ Structured Output

Get structured JSON responses with schema validation:

use octolib::{ProviderFactory, ChatCompletionParams, Message, StructuredOutputRequest};
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug)]
struct PersonInfo {
    name: String,
    age: u32,
    skills: Vec<String>,
}

async fn structured_example() -> anyhow::Result<()> {
    let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;

    // Check if provider supports structured output
    if !provider.supports_structured_output(&model) {
        return Err(anyhow::anyhow!("Provider does not support structured output"));
    }

    let messages = vec![
        Message::user("Tell me about a software engineer in JSON format"),
    ];

    // Request structured JSON output
    let structured_request = StructuredOutputRequest::json();
    let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
        .with_structured_output(structured_request);

    let response = provider.chat_completion(params).await?;

    if let Some(structured) = response.structured_output {
        let person: PersonInfo = serde_json::from_value(structured)?;
        println!("Person: {:?}", person);
    }

    Ok(())
}

๐Ÿงฉ CLI Provider (Proxy Mode)

Use local CLIs as a lightweight proxy. This mode is prompt-only; tool calling/MCP integration is not used or controllable.

let (provider, model) = ProviderFactory::get_provider_for_model(\"cli:codex/gpt-5.2-codex\")?;
// or: \"cli:claude/claude-sonnet-4-5\"
// or: \"cli:gemini/gemini-2.5-pro\"
// or: \"cli:cursor/auto\"

Set a backend-specific command if it is not on PATH:

CLI_CODEX_COMMAND=/path/to/codex
CLI_CLAUDE_COMMAND=/path/to/claude
CLI_GEMINI_COMMAND=/path/to/gemini
CLI_CURSOR_COMMAND=/path/to/cursor-agent

๐Ÿงฐ Tool Calling

Use AI models to call functions with automatic parameter extraction:

use octolib::{ProviderFactory, ChatCompletionParams, Message, FunctionDefinition, ToolCall};
use serde_json::json;

async fn tool_calling_example() -> anyhow::Result<()> {
    let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;

    // Define available tools/functions
    let tools = vec![
        FunctionDefinition {
            name: "get_weather".to_string(),
            description: "Get the current weather for a location".to_string(),
            parameters: json!({
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }),
            cache_control: None,
        },
        FunctionDefinition {
            name: "calculate".to_string(),
            description: "Perform a mathematical calculation".to_string(),
            parameters: json!({
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Mathematical expression to evaluate"
                    }
                },
                "required": ["expression"]
            }),
            cache_control: None,
        },
    ];

    let mut messages = vec![
        Message::user("What's the weather in Tokyo and calculate 15 * 23?"),
    ];

    // Initial request with tools
    let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
        .with_tools(tools.clone());

    let response = provider.chat_completion(params).await?;

    // Check if model wants to call tools
    if let Some(tool_calls) = response.tool_calls {
        println!("Model requested {} tool calls", tool_calls.len());

        // Add assistant's response with tool calls to conversation
        let mut assistant_msg = Message::assistant(&response.content);
        assistant_msg.tool_calls = Some(serde_json::to_value(&tool_calls)?);
        messages.push(assistant_msg);

        // Execute each tool call and add results
        for tool_call in tool_calls {
            println!("Calling tool: {} with args: {}", tool_call.name, tool_call.arguments);

            // Execute the tool (your implementation)
            let result = match tool_call.name.as_str() {
                "get_weather" => {
                    let location = tool_call.arguments["location"].as_str().unwrap_or("Unknown");
                    json!({
                        "location": location,
                        "temperature": 22,
                        "unit": "celsius",
                        "condition": "sunny"
                    })
                }
                "calculate" => {
                    let expr = tool_call.arguments["expression"].as_str().unwrap_or("0");
                    // Simple calculation (in real app, use proper eval)
                    json!({
                        "expression": expr,
                        "result": 345  // 15 * 23
                    })
                }
                _ => json!({"error": "Unknown tool"}),
            };

            // Add tool result to conversation
            messages.push(Message::tool(
                &serde_json::to_string(&result)?,
                &tool_call.id,
                &tool_call.name,
            ));
        }

        // Get final response with tool results
        let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
            .with_tools(tools);

        let final_response = provider.chat_completion(params).await?;
        println!("Final response: {}", final_response.content);
    } else {
        println!("Direct response: {}", response.content);
    }

    Ok(())
}

Tool Calling Features:

  • โœ… Cross-provider support (OpenAI, Anthropic, Google, Amazon, OpenRouter)
  • โœ… Automatic parameter validation via JSON Schema
  • โœ… Multi-turn conversations with tool results
  • โœ… Parallel tool execution support
  • โœ… Standardized ToolCall and GenericToolCall formats across all providers
  • โœ… Provider-specific metadata preservation (e.g., Gemini thought signatures)
  • โœ… Clean conversion API with to_generic_tool_calls() method

๐ŸŽฏ Embedding Generation

Generate embeddings using multiple providers:

use octolib::embedding::{generate_embeddings, generate_embeddings_batch, InputType};

async fn embedding_example() -> anyhow::Result<()> {
    // Single embedding generation
    let embedding = generate_embeddings(
        "Hello, world!",
        "voyage",  // provider
        "voyage-3.5-lite"  // model
    ).await?;

    println!("Embedding dimension: {}", embedding.len());

    // Batch embedding generation
    let texts = vec![
        "First document".to_string(),
        "Second document".to_string(),
    ];

    let embeddings = generate_embeddings_batch(
        texts,
        "jina",  // provider
        "jina-embeddings-v4",  // model
        InputType::Document,  // input type for better embeddings
        16,  // batch size
        100_000,  // max tokens per batch
    ).await?;

    println!("Generated {} embeddings", embeddings.len());

    Ok(())
}

// Supported embedding providers:
// - Jina: jina-embeddings-v4, jina-clip-v2, etc.
// - Voyage: voyage-3.5, voyage-code-2, etc.
// - Google: gemini-embedding-001, text-embedding-005
// - OpenAI: text-embedding-3-small, text-embedding-3-large
// - FastEmbed: Local models (feature-gated)
// - HuggingFace: sentence-transformers models

๐ŸŽฏ Document Reranking

Improve search results by scoring document relevance with cross-encoder models:

use octolib::reranker::rerank;

async fn reranking_example() -> anyhow::Result<()> {
    let query = "What is machine learning?";
    let documents = vec![
        "Machine learning is a subset of AI.".to_string(),
        "Cooking recipes for beginners.".to_string(),
        "Deep learning uses neural networks.".to_string(),
    ];

    // Rerank documents by relevance to query
    let response = rerank(
        query,
        documents,
        "voyage",           // provider: voyage, cohere, jina, fastembed
        "rerank-2.5",       // model
        Some(2)             // top_k: return top 2 results
    ).await?;

    for (rank, result) in response.results.iter().enumerate() {
        println!("Rank {}: Score {:.4}", rank + 1, result.relevance_score);
        println!("  Document: {}", result.document);
    }

    println!("Total tokens used: {}", response.total_tokens);

    Ok(())
}

// Supported Providers:
//
// API-Based (require API keys):
// - Voyage AI (VOYAGE_API_KEY): rerank-2.5, rerank-2.5-lite, rerank-2, rerank-2-lite
// - Cohere (COHERE_API_KEY): rerank-english-v3.0, rerank-multilingual-v3.0
// - Jina AI (JINA_API_KEY): jina-reranker-v3, jina-reranker-v2-base-multilingual
//
// Local (no API keys, requires features):
// - FastEmbed (fastembed feature): bge-reranker-base, bge-reranker-large, jina-reranker-v1-turbo-en

๐Ÿ” OAuth Authentication

Octolib supports OAuth authentication for ChatGPT subscriptions and Anthropic:

OpenAI OAuth (ChatGPT Plus/Pro/Team/Enterprise):

export OPENAI_OAUTH_ACCESS_TOKEN="your_oauth_token"
export OPENAI_OAUTH_ACCOUNT_ID="your_account_id"

Anthropic OAuth:

export ANTHROPIC_OAUTH_TOKEN="your_bearer_token"

The library automatically detects OAuth credentials and prefers them over API keys. See examples/openai_oauth.rs and examples/anthropic_oauth.rs for full usage examples.

๐ŸŽฏ Provider Support Matrix

ProviderStructured OutputVisionTool CallsCaching
OpenAIโœ… JSON + Schemaโœ… Yesโœ… Yesโœ… Yes
OpenRouterโœ… JSON + Schemaโœ… Yesโœ… Yesโœ… Yes
DeepSeekโœ… JSON ModeโŒ NoโŒ Noโœ… Yes
Moonshot AI (Kimi)โœ… JSON Modeโœ… kimi-k2.5โœ… Yesโœ… Yes
MiniMaxโœ… JSON ModeโŒ Noโœ… Yesโœ… Yes
AnthropicโŒ Noโœ… Yesโœ… Yesโœ… Yes
Z.aiโœ… JSON ModeโŒ Noโœ… Yesโœ… Yes
NVIDIA NIMโœ… JSON + SchemaPer-modelโœ… YesโŒ No
Groqโœ… JSON + SchemaPer-modelโŒ Noโœ… Select models
BytePlusโœ… JSON + SchemaPer-modelโŒ Noโœ… Yes
Cerebrasโœ… JSON + SchemaโŒ NoโŒ NoโŒ No
Featherlessโœ… JSON + SchemaโŒ NoโŒ NoโŒ No
Google VertexโŒ Noโœ… Yesโœ… YesโŒ No
Amazon BedrockโŒ Noโœ… Yesโœ… YesโŒ No
OctoHubPer-modelPer-modelโœ… Yesโœ… Yes
TogetherPer-modelPer-modelโœ… YesโŒ No
CloudflareโŒ NoโŒ NoโŒ NoโŒ No
LocalPer-modelPer-modelPer-modelโŒ No
OllamaPer-modelPer-modelPer-modelโŒ No

Structured Output Details

  • JSON Mode: Basic JSON object output
  • JSON Schema: Full schema validation with strict mode
  • Provider Detection: Use provider.supports_structured_output(&model) to check capability

๐Ÿง  Thinking/Reasoning Support

Octolib provides first-class support for models that produce thinking/reasoning content. Thinking is stored separately from the main response content, similar to how tool_calls are separate from content.

use octolib::{ProviderFactory, ChatCompletionParams, Message, ThinkingBlock};

async fn thinking_example() -> anyhow::Result<()> {
    // MiniMax and OpenAI o-series models support thinking
    let (provider, model) = ProviderFactory::get_provider_for_model("minimax:MiniMax-M2")?;

    let messages = vec![
        Message::user("Solve this complex math problem step by step"),
    ];

    let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000);
    let response = provider.chat_completion(params).await?;

    // Access thinking content (separate from response.content)
    if let Some(ref thinking) = response.thinking {
        println!("=== MODEL THINKING ({}) ===", thinking.tokens);
        println!("{}", thinking.content);
        println!("==========================");
    }

    // Final response (clean, no thinking prefix)
    println!("Response: {}", response.content);
    // Token usage breakdown
    if let Some(usage) = &response.exchange.usage {
        println!("Input tokens: {}", usage.input_tokens);
        println!("Cache read tokens: {}", usage.cache_read_tokens);
        println!("Cache write tokens: {}", usage.cache_write_tokens);
        println!("Output tokens: {}", usage.output_tokens);
        println!("Reasoning tokens: {}", usage.reasoning_tokens);
    }
    Ok(())
}

Supported Providers

ProviderThinking FormatNotes
MiniMaxContent blocks ({"type": "thinking"})Full thinking block extraction
OpenAI o-seriesreasoning_content fieldo1, o3, o4 models
OpenRouterreasoning_detailsGemini and other providers

Token Tracking

Thinking tokens are tracked separately in TokenUsage.reasoning_tokens:

if let Some(usage) = &response.exchange.usage {
    println!("Total tokens: {}", usage.total_tokens);
    println!("  - Input: {}", usage.input_tokens);
    println!("  - Cache Read: {}", usage.cache_read_tokens);
    println!("  - Cache Write: {}", usage.cache_write_tokens);
    println!("  - Output: {}", usage.output_tokens);
    println!("  - Reasoning: {}", usage.reasoning_tokens);
}

๐Ÿ“š Complete Documentation

๐Ÿ“– Quick Navigation

๐ŸŒ Supported Providers

ProviderStatusCapabilities
OpenAIโœ… Full SupportChat, Vision, Tools, Structured Output, Caching
Anthropicโœ… Full SupportClaude Models, Vision, Tools, Caching
OpenRouterโœ… Full SupportMulti-Provider Proxy, Vision, Caching, Structured Output
Groqโœ… Full SupportFast Inference, Structured Output, Caching
BytePlusโœ… Full SupportSeed Models, Structured Output, Caching
DeepSeekโœ… Full SupportOpen-Source AI Models, Structured Output, Caching
Moonshot AI (Kimi)โœ… Full SupportKimi K2 Series, Vision (kimi-k2.5), Tools, Structured Output, Caching, Thinking
MiniMaxโœ… Full SupportAnthropic-Compatible API, Tools, Caching, Thinking, Structured Output
Z.aiโœ… Full SupportGLM Models, Caching, Structured Output
NVIDIA NIMโœ… Full Support100+ Hosted Models, Tools, Structured Output, Reference Pricing
Together AIโœ… Full SupportMulti-Provider Proxy, Vision, Tools, Structured Output
Cerebrasโœ… Full SupportFast Inference, Structured Output
Featherlessโœ… Full SupportOpen-Weight Models (Qwen, Llama, Mistral, DeepSeek, RWKV), Subscription Billing
OctoHubโœ… SupportedLocal AI Serving
Google Vertex AIโœ… SupportedEnterprise AI Integration
Amazon Bedrockโœ… SupportedCloud AI Services
Cloudflare Workers AIโœ… SupportedEdge AI Compute
Local LLMโœ… SupportedOllama, LM Studio, LocalAI, Jan, vLLM
Ollamaโœ… SupportedLocal LLM Runner
CLI Proxyโœ… SupportedCodex, Claude, Gemini, Cursor

๐Ÿ”’ Privacy & Security

  • ๐Ÿ  Local-first design
  • ๐Ÿ”‘ Secure API key management
  • ๐Ÿ“ Respects .gitignore
  • ๐Ÿ›ก๏ธ Comprehensive error handling

๐Ÿค Support & Community

โš–๏ธ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Built with โค๏ธ by the Muvon team in Hong Kong