Claude Code API
April 3, 2026 · View on GitHub
🦀 cc-sdk v0.8.1 - Rust SDK for Claude Code
NEW: LLM Proxy — use CC subscription as direct LLM | WebSocket Reconnection | Python SDK v0.1.55 Full Parity
cc-sdk is a community-driven Rust SDK for Claude Code CLI:
- LLM Proxy —
llm::query("prompt")→ text (bypasses agent layer, uses CC subscription) - Full Agent SDK — permissions, hooks, MCP servers, file checkpointing
- WebSocket — production-grade reconnection with exponential backoff
- Budget Control —
max_budget_usd, cache token tracking
use cc_sdk::llm;
#[tokio::main]
async fn main() -> cc_sdk::Result<()> {
// Simple: use CC subscription as LLM proxy
let response = llm::query("Explain quantum entanglement", None).await?;
println!("{}", response.text);
Ok(())
}
use cc_sdk::{query, ClaudeCodeOptions};
use futures::StreamExt;
#[tokio::main]
async fn main() -> cc_sdk::Result<()> {
// Full agent mode with streaming
let options = ClaudeCodeOptions::builder()
.model("sonnet")
.max_budget_usd(10.0)
.build();
let mut stream = query("Hello, Claude!", Some(options)).await?;
while let Some(msg) = stream.next().await {
println!("{:?}", msg?);
}
Ok(())
}
👉 Full SDK Documentation | API Docs
A high-performance Rust implementation of an OpenAI-compatible API gateway for Claude Code CLI. Built on top of the robust cc-sdk, this project provides a RESTful API interface that allows you to interact with Claude Code using the familiar OpenAI API format.
🎉 Who's Using Claude Code API
- url-preview v0.6.0 - A Rust library for extracting structured data from web pages using LLMs. It leverages claude-code-api to provide Claude-powered web content extraction alongside OpenAI support.
✨ Features
- 🔌 OpenAI API Compatibility - Drop-in replacement for OpenAI API, works with existing OpenAI client libraries
- 🚀 High Performance - Built with Rust, Axum, and Tokio for exceptional performance
- 📦 Powered by claude-code-sdk-rs - Built on a robust SDK with full Claude Code CLI integration
- ⚡ Connection Pooling - Reuse Claude processes with optimized connection pooling for 5-10x faster responses
- 💬 Conversation Management - Built-in session support for multi-turn conversations
- 🖼️ Multimodal Support - Process images alongside text in your requests
- ⚡ Response Caching - Intelligent caching system to reduce latency and costs
- 🔧 MCP Support - Model Context Protocol integration for accessing external tools and services
- 📁 File Access Control - Configurable file system permissions for secure operations
- 🌊 Streaming Responses - Real-time streaming support for long-form content
- 🛡️ Robust Error Handling - Comprehensive error handling with automatic retries
- 📊 Statistics API - Monitor usage and performance metrics
- 🔄 Multiple Client Modes - OneShot, Interactive, and Batch processing modes
- 🔧 Tool Calling - OpenAI tools format support for AI tool integrations
- 🌐 WebSocket Bridge - Real-time bidirectional communication between CLI and external clients via WebSocket
🚀 Quick Start
Prerequisites
- Rust 1.75 or higher
- Claude CLI installed and configured
- (Optional) MCP servers for extended functionality
Installation
Option 1: Install from crates.io
cargo install claude-code-api
Then run:
RUST_LOG=info claude-code-api
# or use the short alias
RUST_LOG=info ccapi
Option 2: Build from source
git clone https://github.com/ZhangHanDong/claude-code-api-rs.git
cd claude-code-api-rs
Build the entire workspace (API server + SDK):
cargo build --release
Start the server:
./target/release/claude-code-api
Note: The API server automatically includes and uses claude-code-sdk-rs for all Claude Code CLI interactions.
The API server will start on http://localhost:8080 by default.
Quick Test
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-5-20251101",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'
🤖 Supported Models
| Model | ID | Alias |
|---|---|---|
| Opus 4.6 | claude-opus-4-6 | "opus" |
| Sonnet 4.6 | claude-sonnet-4-6 | "sonnet" |
| Haiku 4.5 | claude-haiku-4-5-20251001 | "haiku" |
# Use aliases — they always point to the latest version
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "sonnet", "messages": [{"role": "user", "content": "Hello"}]}'
📖 Core Features
1. OpenAI-Compatible Chat API
import openai
# Configure the client to use Claude Code API
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed" # API key is not required
)
response = client.chat.completions.create(
model="opus", # or "sonnet" for faster responses
messages=[
{"role": "user", "content": "Write a hello world in Python"}
]
)
print(response.choices[0].message.content)
2. Conversation Management
Maintain context across multiple requests:
# First request - creates a new conversation
response = client.chat.completions.create(
model="sonnet-4",
messages=[
{"role": "user", "content": "My name is Alice"}
]
)
conversation_id = response.conversation_id
# Subsequent request - continues the conversation
response = client.chat.completions.create(
model="sonnet-4",
conversation_id=conversation_id,
messages=[
{"role": "user", "content": "What's my name?"}
]
)
# Claude will remember: "Your name is Alice"
3. Multimodal Support
Process images with text:
response = client.chat.completions.create(
model="claude-opus-4-20250514",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "/path/to/image.png"}}
]
}]
)
Supported image formats:
- Local file paths
- HTTP/HTTPS URLs
- Base64 encoded data URLs
4. Streaming Responses
stream = client.chat.completions.create(
model="claude-opus-4-20250514",
messages=[{"role": "user", "content": "Write a long story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
5. MCP (Model Context Protocol)
Enable Claude to access external tools and services:
# Create MCP configuration
cat > mcp_config.json << EOF
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/allowed/path"]
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "your-token"
}
}
}
}
EOF
# Start with MCP support
export CLAUDE_CODE__MCP__ENABLED=true
export CLAUDE_CODE__MCP__CONFIG_FILE="./mcp_config.json"
./target/release/claude-code-api
6. Tool Calling (OpenAI Compatible)
Use tools for AI integrations:
response = client.chat.completions.create(
model="claude-3-5-haiku-20241022",
messages=[
{"role": "user", "content": "Please preview this URL: https://rust-lang.org"}
],
tools=[
{
"type": "function",
"function": {
"name": "url_preview",
"description": "Preview a URL and extract its content",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "The URL to preview"}
},
"required": ["url"]
}
}
}
],
tool_choice="auto"
)
# Response will include tool_calls:
# {
# "choices": [{
# "message": {
# "role": "assistant",
# "tool_calls": [{
# "id": "call_xxx",
# "type": "function",
# "function": {
# "name": "url_preview",
# "arguments": "{\"url\": \"https://rust-lang.org\"}"
# }
# }]
# }
# }]
# }
This feature enables seamless integration with modern AI tools like url-preview and other OpenAI-compatible tool chains. url-preview v0.6.0+ uses this exact format to extract structured data from web pages using Claude.
Agent Tools & Permissions
- Tools control (SDK): set
allowed_tools/disallowed_toolsandpermission_modeinClaudeCodeOptionsto whitelist/blacklist and choose approval behavior. - Runtime approvals (SDK): implement
CanUseToolto decide{allow, input?/reason?}per tool call. - MCP (server): configure via
config/*.tomlormcp_config.jsonand use scripts underscript/(e.g.,start_with_mcp.sh). The API reuses the SDK’s MCP wiring. - Programmatic agents (SDK): define
agentsandsetting_sourcesto pass structured agent definitions to CLI.
See claude-code-sdk-rs/README.md for a full Rust example enabling tools and MCP.
🔧 Configuration
Environment Variables
# Server configuration
CLAUDE_CODE__SERVER__HOST=0.0.0.0
CLAUDE_CODE__SERVER__PORT=8080
# Claude CLI configuration
CLAUDE_CODE__CLAUDE__COMMAND=claude
CLAUDE_CODE__CLAUDE__TIMEOUT_SECONDS=300
CLAUDE_CODE__CLAUDE__MAX_CONCURRENT_SESSIONS=10
CLAUDE_CODE__CLAUDE__USE_INTERACTIVE_SESSIONS=true
# File access permissions
CLAUDE_CODE__FILE_ACCESS__SKIP_PERMISSIONS=false
CLAUDE_CODE__FILE_ACCESS__ADDITIONAL_DIRS='["/path1", "/path2"]'
# MCP configuration
CLAUDE_CODE__MCP__ENABLED=true
CLAUDE_CODE__MCP__CONFIG_FILE="./mcp_config.json"
CLAUDE_CODE__MCP__STRICT=false
CLAUDE_CODE__MCP__DEBUG=false
# Cache configuration
CLAUDE_CODE__CACHE__ENABLED=true
CLAUDE_CODE__CACHE__MAX_ENTRIES=1000
CLAUDE_CODE__CACHE__TTL_SECONDS=3600
# Conversation management
CLAUDE_CODE__CONVERSATION__MAX_HISTORY_MESSAGES=20
CLAUDE_CODE__CONVERSATION__SESSION_TIMEOUT_MINUTES=30
Configuration File
Create config/local.toml:
[server]
host = "0.0.0.0"
port = 8080
[claude]
command = "claude"
timeout_seconds = 300
max_concurrent_sessions = 10
use_interactive_sessions = false # Disabled by default due to stability issues
[file_access]
skip_permissions = false
additional_dirs = ["/Users/me/projects", "/tmp"]
[mcp]
enabled = true
config_file = "./mcp_config.json"
strict = false
debug = false
📦 Built on claude-code-sdk-rs
This API server is built on top of claude-code-sdk-rs, a powerful Rust SDK for Claude Code CLI that provides:
- Full Feature Parity with the official Python SDK
- Multiple Client Types:
query()- Simple one-shot queriesInteractiveClient- Stateful conversations with contextOptimizedClient- Advanced client with connection pooling and performance features
- Streaming Support - Real-time message streaming
- Complete Type Safety - Strongly typed with serde support
- Async/Await - Built on Tokio for high-performance async operations
Using the SDK Directly
If you prefer to build your own integration, you can use the SDK directly:
[dependencies]
cc-sdk = "0.8"
tokio = { version = "1", features = ["full"] }
futures = "0.3"
use cc_sdk::llm::{self, LlmOptions};
#[tokio::main]
async fn main() -> cc_sdk::Result<()> {
// LLM proxy mode — simplest way to use Claude
let resp = llm::query("Explain quantum computing in one sentence", None).await?;
println!("{}", resp.text);
// With custom system prompt
let opts = LlmOptions::builder()
.system_prompt("You are a Rust expert. Be concise.")
.model("sonnet")
.build();
let resp = llm::query("What's the difference between Arc and Rc?", Some(opts)).await?;
println!("{}", resp.text);
Ok(())
}
📚 API Endpoints
Chat Completions
POST /v1/chat/completions- Create a chat completion
Models
GET /v1/models- List available models
Conversations
POST /v1/conversations- Create a new conversationGET /v1/conversations- List active conversationsGET /v1/conversations/:id- Get conversation details
WebSocket Sessions
POST /v1/sessions- Create a new WebSocket session (spawns CLI with--sdk-url)GET /v1/sessions- List all active WebSocket sessionsGET /v1/sessions/{id}- Get session detailsDELETE /v1/sessions/{id}- Delete a session (kills CLI process)WS /ws/cli/{session_id}- WebSocket endpoint for CLI process connectionWS /ws/session/{session_id}- WebSocket endpoint for external client connection
Statistics
GET /stats- Get API usage statistics
Health Check
GET /health- Check service health
🛠️ Advanced Usage
Using with LangChain
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed",
model="claude-opus-4-20250514"
)
response = llm.invoke("Explain quantum computing")
print(response.content)
Using with Node.js
const OpenAI = require('openai');
const client = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'not-needed'
});
async function chat() {
const response = await client.chat.completions.create({
model: 'claude-opus-4-20250514',
messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(response.choices[0].message.content);
}
Using with curl
# Basic request
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-20250514",
"messages": [{"role": "user", "content": "Hello"}]
}'
# With conversation ID
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-20250514",
"conversation_id": "uuid-here",
"messages": [{"role": "user", "content": "Continue our chat"}]
}'
# With image
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-20250514",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is this?"},
{"type": "image_url", "image_url": {"url": "/path/to/image.png"}}
]
}]
}'
🔒 Security
- File access is controlled through configurable permissions
- MCP servers run in isolated processes
- No API key required (relies on Claude CLI authentication)
- Supports CORS for web applications
- Request ID tracking for audit trails
⚡ Performance Optimization
Advanced Performance Features (via claude-code-sdk-rs)
The API leverages advanced performance optimizations from the underlying SDK:
Connection Pooling
- First request: 2-5 seconds (with pre-warmed connection pool)
- Subsequent requests: < 0.1 seconds (reusing existing connections)
- Concurrent handling: Multiple requests can share the connection pool
Client Modes
- OneShot Mode: Simple, stateless queries (default)
- Interactive Mode: Maintains conversation context across requests
- Batch Mode: Process multiple queries concurrently for high throughput
Performance Metrics
# Example performance improvements with OptimizedClient:
- Sequential queries: ~5s for 5 queries
- Batch processing: ~1.5s for 5 queries (3x speedup)
- With connection pooling: < 100ms per query after warm-up
Configuration for Performance
[claude]
max_concurrent_sessions = 10 # Increase for higher throughput
use_interactive_sessions = true # Enable for conversation context
timeout_seconds = 300 # Adjust based on query complexity
[cache]
enabled = true
max_entries = 1000
ttl_seconds = 3600
Best Practices
- Use the optimized REST endpoints that leverage
OptimizedClientfrom the SDK - Enable connection pooling for frequently used endpoints
- Use batch endpoints for processing multiple queries
- Monitor performance via the
/statsendpoint - Configure appropriate connection pool size based on load
For detailed performance tuning, see the SDK Performance Guide.
🐛 Troubleshooting
Common Issues
-
"Permission denied" errors
# Enable file permissions export CLAUDE_CODE__FILE_ACCESS__SKIP_PERMISSIONS=true # Or use the startup script ./start_with_permissions.sh -
MCP servers not working
# Enable debug mode export CLAUDE_CODE__MCP__DEBUG=true # Check MCP server installation npx -y @modelcontextprotocol/server-filesystem --version -
High latency on first request
- This is normal as Claude CLI needs to start up
- Subsequent requests will be faster due to process reuse
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built on top of claude-code-sdk-rs - The robust Rust SDK for Claude Code CLI
- Powered by Claude Code CLI from Anthropic
- Inspired by OpenAI's API design for maximum compatibility
- Web framework: Axum for high-performance HTTP serving
- Async runtime: Tokio for blazing-fast async I/O
📞 Support
Made with ❤️ by the Claude Code API team