Deployment Guide

June 18, 2026 · View on GitHub

Last updated: 2026-01-24
Type: Operations Guide
Audience: DevOps, system administrators

Overview

LLM4Free can be deployed in multiple ways:

  • Docker — Containerized deployment (recommended)
  • Docker Compose — Multi-service orchestration
  • OpenAI-Compatible API Server — Local HTTP API
  • Bare Metal — Direct Python installation

Table of Contents

  1. Docker Setup
  2. Docker Compose
  3. OpenAI-Compatible API Server
  4. Environment Configuration
  5. Production Considerations
  6. Troubleshooting

Docker Setup

Pull Official Image

# Latest version
docker pull oevortex/llm4free:latest

# Specific version
docker pull oevortex/llm4free:2024.12.01

# Latest slim version (smaller)
docker pull oevortex/llm4free:slim

Run Container

# Interactive mode
docker run -it oevortex/llm4free:latest

# With API key mounted
docker run -it \
  -e OPENAI_API_KEY="your-api-key" \
  oevortex/llm4free:latest

# With port forwarding (for API server)
docker run -it \
  -p 8000:8000 \
  -e OPENAI_API_KEY="your-api-key" \
  oevortex/llm4free:latest \
  llm4free-server

Build Custom Image

FROM oevortex/llm4free:latest

# Add custom requirements
RUN pip install additional-package

# Set default command
CMD ["llm4free", "--help"]
# Build
docker build -t my-llm4free .

# Run
docker run -it my-llm4free

Docker Compose

Basic Setup

Create docker-compose.yml:

version: '3.8'

services:
  llm4free:
    image: oevortex/llm4free:latest
    container_name: llm4free-app
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - GROQ_API_KEY=${GROQ_API_KEY}
    volumes:
      - ./data:/app/data
    ports:
      - "8000:8000"
    stdin_open: true
    tty: true

  llm4free-server:
    image: oevortex/llm4free:latest
    container_name: llm4free-api
    command: llm4free-server --host 0.0.0.0 --port 8001
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - GROQ_API_KEY=${GROQ_API_KEY}
    ports:
      - "8001:8001"
    depends_on:
      - llm4free

Run Compose Stack

# Create .env file with your API keys
cat > .env << EOF
OPENAI_API_KEY=your-api-key-here
GROQ_API_KEY=your-groq-key-here
EOF

# Start services
docker-compose up -d

# View logs
docker-compose logs -f llm4free-server

# Stop services
docker-compose down

With Authentication (Optional)

services:
  llm4free-server:
    image: oevortex/llm4free:latest
    command: >
      llm4free-server
      --host 0.0.0.0
      --port 8001
      --api-key gsk_llm4free_prod_12345
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    ports:
      - "8001:8001"

OpenAI-Compatible API Server

What It Does

Runs a FastAPI server that proxies any LLM4Free provider through OpenAI-compatible endpoints.

Your App → llm4free-server (OpenAI API) → Any LLM4Free Provider

Start Server

# Simple start
llm4free-server

# With custom host/port
llm4free-server --host 0.0.0.0 --port 8001

# With debug mode
llm4free-server --debug

# With API key requirement
llm4free-server --api-key your-secret-key

Configure Providers

Create llm4free_config.json:

{
  "default_provider": "GROQ",
  "providers": {
    "GROQ": {
      "api_key": "${GROQ_API_KEY}",
      "model": "llama-3.1-70b-versatile"
    },
    "OpenAI": {
      "api_key": "${OPENAI_API_KEY}",
      "model": "gpt-4"
    }
  }
}

Use with OpenAI Client

from openai import OpenAI

# Point to your LLM4Free server
client = OpenAI(
    api_key="any-key-or-gsk_...",
    base_url="http://localhost:8000/v1"
)

# Use exactly like OpenAI
response = client.chat.completions.create(
    model="GROQ",
    messages=[
        {"role": "user", "content": "Hello"}
    ]
)

print(response.choices[0].message.content)

API Endpoints

Chat Completions

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR-API-KEY" \
  -d '{
    "model": "GROQ",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

List Models

curl http://localhost:8000/v1/models \
  -H "Authorization: Bearer YOUR-API-KEY"

Health Check

curl http://localhost:8000/health

Environment Configuration

API Keys

Set environment variables:

# Linux/macOS
export OPENAI_API_KEY="sk-..."
export GROQ_API_KEY="gsk_..."
export COHERE_API_KEY="..."

# Windows PowerShell
$env:OPENAI_API_KEY = "sk-..."
$env:GROQ_API_KEY = "gsk_..."

# Windows CMD
set OPENAI_API_KEY=sk-...

Using .env File

Create .env:

# AI Provider Keys
OPENAI_API_KEY=sk-your-openai-key
GROQ_API_KEY=gsk_your-groq-key
COHERE_API_KEY=co_your-cohere-key
GEMINI_API_KEY=your-gemini-key

# Server Configuration
LLM4FREE_HOST=0.0.0.0
LLM4FREE_PORT=8000
LLM4FREE_DEBUG=false

# Timeout settings
REQUEST_TIMEOUT=30
STREAM_TIMEOUT=120

Load with:

# Shell
set -a
source .env
set +a

# Or in Python
from dotenv import load_dotenv
load_dotenv()

Configuration File

Create llm4free.yaml:

server:
  host: 0.0.0.0
  port: 8000
  debug: false
  workers: 4
  timeout: 30

providers:
  GROQ:
    api_key: ${GROQ_API_KEY}
    model: llama-3.1-70b-versatile
    timeout: 60
  
  OpenAI:
    api_key: ${OPENAI_API_KEY}
    model: gpt-4
    timeout: 30

logging:
  level: INFO
  format: json

Production Considerations

1. Security

Use HTTPS

# With self-signed certificate
openssl req -x509 -newkey rsa:4096 \
  -keyout key.pem -out cert.pem -days 365 -nodes

# Use with nginx reverse proxy
nginx

Environment Variables

# Never commit secrets
echo ".env" >> .gitignore

# Use secure secret management
# - Kubernetes Secrets
# - AWS Secrets Manager
# - HashiCorp Vault
# - Docker Secrets

Rate Limiting

# In nginx config
limit_req_zone $binary_remote_addr zone=llm4free:10m rate=10r/s;

server {
    location / {
        limit_req zone=llm4free burst=20 nodelay;
        proxy_pass http://llm4free:8000;
    }
}

2. Monitoring

Health Checks

# Docker
docker run --healthcheck=CMD curl -f http://localhost:8000/health

# Kubernetes
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 10

Logging

import logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

Metrics

# Prometheus metrics endpoint
curl http://localhost:8000/metrics

3. Performance

Caching

from functools import lru_cache

@lru_cache(maxsize=100)
def get_provider_instance(provider_name: str):
    return initialize_provider(provider_name)

Connection Pooling

from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
retry_strategy = Retry(
    total=3,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504],
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)

Load Balancing

# docker-compose with multiple instances
services:
  llm4free-1:
    image: oevortex/llm4free:latest
    ports:
      - "8001:8000"
  
  llm4free-2:
    image: oevortex/llm4free:latest
    ports:
      - "8002:8000"
  
  nginx:
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf

4. Scaling

Horizontal Scaling (Kubernetes)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm4free-deployment

spec:
  replicas: 3
  
  selector:
    matchLabels:
      app: llm4free
  
  template:
    metadata:
      labels:
        app: llm4free
    
    spec:
      containers:
      - name: llm4free
        image: oevortex/llm4free:latest
        ports:
        - containerPort: 8000
        
        env:
        - name: GROQ_API_KEY
          valueFrom:
            secretKeyRef:
              name: llm4free-secrets
              key: groq-api-key
        
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Troubleshooting

"Connection refused" when accessing API

# Check if server is running
docker ps | grep llm4free

# Check logs
docker logs llm4free-api

# Verify port is open
netstat -tulpn | grep 8000

# Test locally
curl http://localhost:8000/health

"API key not found" errors

# Verify environment variables are set
echo $GROQ_API_KEY
echo $OPENAI_API_KEY

# Check in Docker
docker exec llm4free-api env | grep -i api

# Set in docker-compose
environment:
  - GROQ_API_KEY=${GROQ_API_KEY}

High memory usage

# Monitor container memory
docker stats llm4free-api

# Limit memory in docker-compose
services:
  llm4free:
    mem_limit: 1g

SSL/TLS certificate errors

# Generate self-signed certificate
openssl req -x509 -nodes -days 365 \
  -newkey rsa:2048 \
  -keyout /etc/nginx/ssl/private.key \
  -out /etc/nginx/ssl/certificate.crt

# Use with proper nginx config

Deployment Checklist

  • API keys configured securely
  • HTTPS/SSL enabled
  • Health checks working
  • Logging configured
  • Monitoring in place
  • Backups configured
  • Rate limiting enabled
  • Firewall rules set
  • Documentation updated
  • Test deployment first

See Also