Usage Guide

June 7, 2026 · View on GitHub

Per-provider setup and example commands for every supported model backend. The README carries a condensed version of this; the complete per-provider detail lives here.

See also: Supported Models table · Model Name Format.


Usage: Closed-Source API Models

Anthropic Claude

Get your API key at console.anthropic.com.

export ANTHROPIC_API_KEY=sk-ant-api03-...

# Default model (claude-opus-4-6)
cheetahclaws

# Choose a specific model
cheetahclaws --model claude-sonnet-4-6
cheetahclaws --model claude-haiku-4-5-20251001

# Enable Extended Thinking
cheetahclaws --model claude-opus-4-6 --thinking --verbose

OpenAI GPT

Get your API key at platform.openai.com.

export OPENAI_API_KEY=sk-...

cheetahclaws --model gpt-4o
cheetahclaws --model gpt-4o-mini
cheetahclaws --model gpt-4.1-mini
cheetahclaws --model o3-mini

Google Gemini

Get your API key at aistudio.google.com.

export GEMINI_API_KEY=AIza...

cheetahclaws --model gemini/gemini-3-flash-preview
cheetahclaws --model gemini/gemini-3.1-pro-preview

Kimi (Moonshot AI)

Get your API key at platform.moonshot.cn.

export MOONSHOT_API_KEY=sk-...

cheetahclaws --model kimi/moonshot-v1-32k
cheetahclaws --model kimi/moonshot-v1-128k

Qwen (Alibaba DashScope)

Get your API key at dashscope.aliyun.com.

export DASHSCOPE_API_KEY=sk-...

cheetahclaws --model qwen/Qwen3.5-Plus
cheetahclaws --model qwen/Qwen3-MAX
cheetahclaws --model qwen/Qwen3.5-Flash

Zhipu GLM

Get your API key at open.bigmodel.cn.

export ZHIPU_API_KEY=...

cheetahclaws --model zhipu/glm-4-plus
cheetahclaws --model zhipu/glm-4-flash   # free tier

DeepSeek

Get your API key at platform.deepseek.com.

export DEEPSEEK_API_KEY=sk-...

cheetahclaws --model deepseek/deepseek-chat
cheetahclaws --model deepseek/deepseek-reasoner

MiniMax

Get your API key at platform.minimaxi.chat.

export MINIMAX_API_KEY=...

cheetahclaws --model minimax/MiniMax-Text-01
cheetahclaws --model minimax/MiniMax-VL-01
cheetahclaws --model minimax/abab6.5s-chat

LiteLLM (AWS Bedrock / Azure / Vertex AI)

Use the litellm/ prefix when the upstream needs auth that's painful to wire by hand — AWS Bedrock SigV4 signing, Azure OpenAI deployment routing, or Google Vertex AI service-account JWTs. For plain OpenAI-shaped endpoints (vLLM, LM Studio, TGI, Together, Groq, …) prefer the zero-dependency custom/ adapter from Option C below.

pip install ".[litellm]"

# AWS Bedrock — uses your boto3 credential chain (AWS_PROFILE, ~/.aws/
# credentials, IAM role on EC2). No explicit api_key needed.
export AWS_REGION=us-east-1
cheetahclaws --model litellm/bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0

# Azure OpenAI — deployment-id routing via api_base + api_version pair.
export AZURE_API_KEY=...
export AZURE_API_BASE=https://my-resource.openai.azure.com
export AZURE_API_VERSION=2024-10-01-preview
cheetahclaws --model litellm/azure/my-gpt4o-deployment

# Google Vertex AI — Application Default Credentials.
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
export VERTEXAI_PROJECT=my-project
export VERTEXAI_LOCATION=us-central1
cheetahclaws --model litellm/vertex_ai/gemini-2.0-flash

The model string format is litellm/<provider>/<model> — the first segment routes to this adapter, everything after is passed verbatim to litellm.completion(model=...). See LiteLLM docs for the full list of 100+ supported providers, and recipes.md for the troubleshooting table.


Usage: Open-Source Models (Local)

Ollama runs models locally with zero configuration. No API key required.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or download from https://ollama.com/download

Step 2: Pull a model

# Best for coding (recommended)
ollama pull qwen2.5-coder          # 4.7 GB (7B)
ollama pull qwen2.5-coder:32b      # 19 GB (32B)

# General purpose
ollama pull llama3.3               # 42 GB (70B)
ollama pull llama3.2               # 2.0 GB (3B)

# Reasoning
ollama pull deepseek-r1            # 4.7 GB (7B)
ollama pull deepseek-r1:32b        # 19 GB (32B)

# Other
ollama pull phi4                   # 9.1 GB (14B)
ollama pull mistral                # 4.1 GB (7B)

Step 3: Start Ollama server (runs automatically on macOS; on Linux run manually)

ollama serve     # starts on http://localhost:11434

Step 4: Run cheetahclaws

cheetahclaws --model ollama/qwen2.5-coder
cheetahclaws --model ollama/llama3.3
cheetahclaws --model ollama/deepseek-r1

Or

python cheetahclaws.py --model ollama/qwen2.5-coder
python cheetahclaws.py --model ollama/llama3.3
python cheetahclaws.py --model ollama/deepseek-r1
python cheetahclaws.py --model ollama/qwen3.5:35b

List your locally available models:

ollama list

Then use any model from the list:

cheetahclaws --model ollama/<model-name>

If a local model "just keeps talking" instead of editing files / running commands: that means it emitted its tool calls as text rather than as structured calls. CheetahClaws auto-recovers the common text formats — <tool_call>…</tool_call> (Qwen/Hermes), <|tool_call|>… (Gemma), and [TOOL_CALLS]… (Mistral) — so they now execute. For best results pick a function-calling model (qwen2.5-coder, llama3.3, mistral, phi4) and give concrete prompts (a path, a filename, an exact command). Small local models are inherently weaker at agentic tool use than cloud models, so they may still need more explicit instructions. If a model has no tool template at all, the first tool-enabled request returns 500 and CheetahClaws falls back to chat-only mode (a yellow [warn] is printed) — pull one of the recommended models instead.


Option B — LM Studio

LM Studio provides a GUI to download and run models, with a built-in OpenAI-compatible server.

Step 1: Download LM Studio and install it.

Step 2: Search and download a model inside LM Studio (GGUF format).

Step 3: Go to Local Server tab → click Start Server (default port: 1234).

Step 4:

cheetahclaws --model lmstudio/<model-name>
# e.g.:
cheetahclaws --model lmstudio/phi-4-GGUF
cheetahclaws --model lmstudio/qwen2.5-coder-7b

The model name should match what LM Studio shows in the server status bar.


Option C — vLLM / Self-Hosted OpenAI-Compatible Server

For self-hosted inference servers (vLLM, TGI, llama.cpp server, etc.) that expose an OpenAI-compatible API:

Quick Start for option C: Step 1: Start vllm:

CUDA_VISIBLE_DEVICES=7 python -m vllm.entrypoints.openai.api_server \
     --model Qwen/Qwen2.5-Coder-7B-Instruct \
     --host 0.0.0.0 \
     --port 8000 \
     --enable-auto-tool-choice \
     --tool-call-parser hermes

Step 2: Start cheetahclaws:

  export CUSTOM_BASE_URL=http://localhost:8000/v1
  export CUSTOM_API_KEY=none
  cheetahclaws --model custom/Qwen/Qwen2.5-Coder-7B-Instruct
# Example: vLLM serving Qwen2.5-Coder-32B
python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen2.5-Coder-32B-Instruct \
    --port 8000 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes

# Then run cheetahclaws pointing to your server:
cheetahclaws

Inside the REPL:

/config custom_base_url=http://localhost:8000/v1
/config custom_api_key=token-abc123    # skip if no auth
/model custom/Qwen2.5-Coder-32B-Instruct

Or set via environment:

export CUSTOM_BASE_URL=http://localhost:8000/v1
export CUSTOM_API_KEY=token-abc123

cheetahclaws --model custom/Qwen2.5-Coder-32B-Instruct

For a remote GPU server:

/config custom_base_url=http://192.168.1.100:8000/v1
/model custom/your-model-name

Using vLLM with the Web UI

--web --model <name> now persists the model into ~/.cheetahclaws/config.json before the server starts, so the Chat UI hits the right endpoint on the very first request:

export CUSTOM_BASE_URL=http://localhost:8000/v1
export CUSTOM_API_KEY=dummy            # vLLM doesn't validate but the OpenAI SDK requires non-empty
cheetahclaws --web --no-auth --port 8080 --model custom/qwen2.5-72b

If you skip --model, the Chat UI uses whatever was previously saved (it will not silently fall back to a default). Switch models on the fly from the Chat UI's Settings panel or with /model custom/<name> in the message box. The model name after custom/ must match the vLLM --served-model-name exactly.