Chapter 8: Production Deployment

March 2, 2026 ยท View on GitHub

Welcome to Chapter 8: Production Deployment. In this part of Perplexica Tutorial: AI-Powered Search Engine, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Deploying Perplexica for production use requires careful planning across infrastructure, security, monitoring, and cost management. This chapter covers everything from single-command Docker deployment to multi-instance scaling with load balancing, SSL termination, API key security, observability, and cost optimization. By the end, you will have a production-ready deployment playbook.

Deployment Architecture

flowchart TD
    subgraph Internet["Internet"]
        U[Users]
        DNS[DNS - yourdomain.com]
    end

    subgraph Edge["Edge Layer"]
        CF[Cloudflare / CDN]
        SSL[SSL Termination]
    end

    subgraph Proxy["Reverse Proxy"]
        NX[Nginx / Caddy]
    end

    subgraph App["Application Tier"]
        P1[Perplexica Instance 1]
        P2[Perplexica Instance 2]
    end

    subgraph Search["Search Tier"]
        SX1[SearXNG Instance 1]
        SX2[SearXNG Instance 2]
    end

    subgraph Data["Data Tier"]
        DB[(SQLite / PostgreSQL)]
        VOL[Persistent Volume]
    end

    subgraph AI["AI Providers"]
        OA[OpenAI API]
        AN[Anthropic API]
        OL[Ollama - Local]
    end

    U --> DNS --> CF --> SSL --> NX
    NX --> P1 & P2
    P1 & P2 --> SX1 & SX2
    P1 & P2 --> DB
    DB --> VOL
    P1 & P2 --> OA & AN & OL

    classDef internet fill:#e1f5fe,stroke:#01579b
    classDef edge fill:#f3e5f5,stroke:#4a148c
    classDef proxy fill:#fff3e0,stroke:#ef6c00
    classDef app fill:#e8f5e8,stroke:#1b5e20
    classDef search fill:#e0f2f1,stroke:#004d40
    classDef data fill:#fce4ec,stroke:#b71c1c
    classDef ai fill:#f1f8e9,stroke:#33691e

    class U,DNS internet
    class CF,SSL edge
    class NX proxy
    class P1,P2 app
    class SX1,SX2 search
    class DB,VOL data
    class OA,AN,OL ai

Docker Deployment

Single-Command Quick Start

Starting with v1.11, Perplexica supports single-command Docker deployment without cloning the full repository.

# Quick start with Docker Compose (no repo clone needed)
docker run -d \
  --name perplexica \
  -p 3000:3000 \
  -p 3001:3001 \
  -v perplexica-data:/app/data \
  -v perplexica-config:/app/config \
  itzcrazykns/perplexica:latest

Full Docker Compose Setup

# docker-compose.yaml
version: '3.8'

services:
  perplexica-frontend:
    image: itzcrazykns/perplexica-frontend:latest
    build:
      context: ./ui
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    environment:
      - NEXT_PUBLIC_API_URL=http://perplexica-backend:3001
      - NEXT_PUBLIC_WS_URL=ws://perplexica-backend:3001
    depends_on:
      - perplexica-backend
    restart: unless-stopped
    networks:
      - perplexica-net

  perplexica-backend:
    image: itzcrazykns/perplexica-backend:latest
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "3001:3001"
    volumes:
      - perplexica-data:/app/data
      - ./config.toml:/app/config.toml
    environment:
      - NODE_ENV=production
      - PORT=3001
    depends_on:
      - searxng
    restart: unless-stopped
    networks:
      - perplexica-net

  searxng:
    image: searxng/searxng:latest
    container_name: perplexica-searxng
    ports:
      - "32768:8080"
    volumes:
      - ./searxng:/etc/searxng
    environment:
      - SEARXNG_BASE_URL=http://localhost:32768/
    restart: unless-stopped
    networks:
      - perplexica-net

volumes:
  perplexica-data:
    driver: local

networks:
  perplexica-net:
    driver: bridge

Production Dockerfile

# Dockerfile (backend)
FROM node:20-alpine AS builder

WORKDIR /app

# Install dependencies
COPY package.json yarn.lock ./
RUN yarn install --frozen-lockfile --production=false

# Build application
COPY . .
RUN yarn build

# Production image
FROM node:20-alpine AS runner

WORKDIR /app

# Copy built application
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

# Create data directory
RUN mkdir -p /app/data

# Non-root user for security
RUN addgroup --system --gid 1001 perplexica && \
    adduser --system --uid 1001 perplexica && \
    chown -R perplexica:perplexica /app

USER perplexica

EXPOSE 3001

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3001/api/health || exit 1

CMD ["node", "dist/index.js"]

Reverse Proxy Configuration

Nginx Configuration

# /etc/nginx/sites-available/perplexica.conf
upstream perplexica_backend {
    server 127.0.0.1:3001;
    # Add more instances for load balancing:
    # server 127.0.0.1:3002;
    # server 127.0.0.1:3003;
}

upstream perplexica_frontend {
    server 127.0.0.1:3000;
}

server {
    listen 80;
    server_name search.yourdomain.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name search.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/search.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/search.yourdomain.com/privkey.pem;

    # SSL hardening
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;
    ssl_session_cache shared:SSL:10m;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # Frontend
    location / {
        proxy_pass http://perplexica_frontend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Backend API
    location /api/ {
        proxy_pass http://perplexica_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # Rate limiting
        limit_req zone=api burst=20 nodelay;
    }

    # WebSocket
    location /ws {
        proxy_pass http://perplexica_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 86400;
    }
}

# Rate limiting zone
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

Caddy Configuration (Alternative)

# Caddyfile
search.yourdomain.com {
    # Automatic HTTPS via Let's Encrypt

    # Frontend
    handle {
        reverse_proxy localhost:3000
    }

    # Backend API
    handle /api/* {
        reverse_proxy localhost:3001
    }

    # WebSocket
    handle /ws {
        reverse_proxy localhost:3001
    }

    # Security headers
    header {
        X-Frame-Options "SAMEORIGIN"
        X-Content-Type-Options "nosniff"
        Strict-Transport-Security "max-age=31536000; includeSubDomains"
    }
}

Security Considerations

API Key Protection

flowchart TD
    subgraph Storage["Key Storage"]
        CT[config.toml - file permissions 600]
        EV[Environment Variables]
        VS[Vault / Secrets Manager]
    end

    subgraph Access["Access Control"]
        RD[Redact in API responses]
        NL[Never log API keys]
        RT[Key rotation schedule]
    end

    subgraph Network["Network Security"]
        SSL[TLS 1.2+ only]
        FW[Firewall rules]
        RL[Rate limiting]
        CO[CORS policy]
    end

    Storage --> Access --> Network

    classDef storage fill:#fce4ec,stroke:#b71c1c
    classDef access fill:#fff3e0,stroke:#ef6c00
    classDef network fill:#e8f5e8,stroke:#1b5e20

    class CT,EV,VS storage
    class RD,NL,RT access
    class SSL,FW,RL,CO network

Security Hardening Checklist

AreaActionPriority
API KeysStore in config.toml with 600 permissions, never commit to gitCritical
HTTPSEnforce TLS 1.2+ with valid certificateCritical
Rate Limiting10 req/s per IP on API, 2 req/s per IP on searchHigh
CORSRestrict to your domain onlyHigh
File UploadsValidate file type, enforce 10MB limit, scan for malwareHigh
HeadersX-Frame-Options, CSP, HSTSMedium
AuthenticationAdd user auth if exposed to internet (not built-in)Medium
LoggingAudit log API access, redact sensitive dataMedium
UpdatesAutomated security updates for Docker imagesMedium
BackupDaily database backup with encryptionMedium

Adding Basic Authentication

Perplexica does not include built-in authentication, but you can add it at the reverse proxy layer.

# Generate htpasswd file
sudo apt install apache2-utils
htpasswd -c /etc/nginx/.htpasswd admin
# Add to nginx location blocks
location / {
    auth_basic "Perplexica";
    auth_basic_user_file /etc/nginx/.htpasswd;

    proxy_pass http://perplexica_frontend;
    # ... other proxy settings
}

Monitoring and Observability

Health Check Endpoint

// src/routes/health.ts
import { Router } from 'express';

const router = Router();

router.get('/health', async (_req, res) => {
  const checks: Record<string, 'ok' | 'error'> = {};

  // Check database
  try {
    await db.select().from(chats).limit(1);
    checks.database = 'ok';
  } catch {
    checks.database = 'error';
  }

  // Check SearXNG
  try {
    const response = await fetch(`${getSearXNGUrl()}/healthz`, {
      signal: AbortSignal.timeout(5000),
    });
    checks.searxng = response.ok ? 'ok' : 'error';
  } catch {
    checks.searxng = 'error';
  }

  // Check at least one AI provider
  try {
    const providers = await getAvailableChatModelProviders();
    const hasProvider = Object.values(providers).some(
      (p) => Object.keys(p).length > 0,
    );
    checks.aiProviders = hasProvider ? 'ok' : 'error';
  } catch {
    checks.aiProviders = 'error';
  }

  const allHealthy = Object.values(checks).every((c) => c === 'ok');

  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? 'healthy' : 'degraded',
    checks,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

export default router;

Prometheus Metrics

// src/lib/metrics.ts
import { Counter, Histogram, Gauge, register } from 'prom-client';

// Search metrics
export const searchRequestsTotal = new Counter({
  name: 'perplexica_search_requests_total',
  help: 'Total number of search requests',
  labelNames: ['focus_mode', 'status'],
});

export const searchLatency = new Histogram({
  name: 'perplexica_search_latency_seconds',
  help: 'Search request latency in seconds',
  labelNames: ['focus_mode'],
  buckets: [0.5, 1, 2, 5, 10, 30],
});

export const llmTokensUsed = new Counter({
  name: 'perplexica_llm_tokens_total',
  help: 'Total LLM tokens consumed',
  labelNames: ['provider', 'model', 'type'],
});

export const activeWebSockets = new Gauge({
  name: 'perplexica_active_websockets',
  help: 'Number of active WebSocket connections',
});

export const searxngLatency = new Histogram({
  name: 'perplexica_searxng_latency_seconds',
  help: 'SearXNG search latency',
  buckets: [0.1, 0.25, 0.5, 1, 2, 5],
});

// Metrics endpoint
export function getMetrics() {
  return register.metrics();
}

Monitoring Dashboard Metrics

MetricDescriptionAlert Threshold
Search latency (p95)95th percentile search response time> 10 seconds
Error ratePercentage of failed search requests> 5%
Active WebSocketsCurrent concurrent connections> 100 per instance
LLM tokens/hourAI provider token consumption rate> budget limit
SearXNG latencyMeta-search response time> 3 seconds
Database sizeSQLite file size on disk> 1 GB
Memory usageContainer memory consumption> 80% of limit
CPU usageContainer CPU utilization> 70% sustained

Scaling Strategy

Horizontal Scaling with Docker Compose

# docker-compose.prod.yaml
version: '3.8'

services:
  perplexica-backend:
    image: itzcrazykns/perplexica-backend:latest
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2.0'
          memory: 2G
        reservations:
          cpus: '0.5'
          memory: 512M
    environment:
      - NODE_ENV=production
      - PORT=3001
    volumes:
      - shared-data:/app/data
      - ./config.toml:/app/config.toml:ro
    networks:
      - perplexica-net

  searxng:
    image: searxng/searxng:latest
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
    volumes:
      - ./searxng:/etc/searxng:ro
    networks:
      - perplexica-net

volumes:
  shared-data:
    driver: local

networks:
  perplexica-net:
    driver: bridge

Cloud Deployment Options

PlatformServiceAdvantagesEstimated Cost
AWSECS FargateAuto-scaling, no server management$50-200/month
Google CloudCloud RunScale to zero, pay-per-request$30-150/month
DigitalOceanApp PlatformSimple deployment, predictable pricing$20-100/month
HetznerVPS + DockerBest price/performance for EU$10-50/month
Self-hostedDocker ComposeFull control, no recurring cloud costHardware cost only

Cost Optimization

AI Provider Cost Management

flowchart LR
    subgraph Query["Incoming Query"]
        Q[User Query]
    end

    subgraph Router["Cost-Aware Router"]
        CX{Query Complexity?}
    end

    subgraph Models["Model Selection"]
        CHEAP[Groq / Haiku - \$0.001/query]
        MID[GPT-4o Mini / Sonnet - \$0.01/query]
        FULL[GPT-4o / Opus - \$0.05/query]
        LOCAL[Ollama - Free]
    end

    Q --> CX
    CX -->|Simple factual| CHEAP
    CX -->|Moderate analysis| MID
    CX -->|Complex research| FULL
    CX -->|Development/testing| LOCAL

    classDef query fill:#e1f5fe,stroke:#01579b
    classDef router fill:#fff3e0,stroke:#ef6c00
    classDef model fill:#e8f5e8,stroke:#1b5e20

    class Q query
    class CX router
    class CHEAP,MID,FULL,LOCAL model

Cost Optimization Strategies

StrategyImplementationMonthly Savings
Use cheap models for query reformulationRoute reformulation step to Haiku/Mini60-80% on reformulation
Cache search results (15-min TTL)In-memory cache by query hash30-50% on SearXNG load
Cache LLM responses (1-hour TTL)Redis/memory cache for repeated queries20-40% on LLM costs
Rate limit per userNginx rate limiting + token budgetsPrevents runaway costs
Use Ollama for dev/stagingLocal models for non-production100% savings in dev
Batch embedding requestsGroup documents before embedding call20-30% on embeddings
Monitor and alert on spendTrack tokens/hour, set budget alertsAvoids surprise bills

Budget Monitoring

// src/lib/budget.ts
interface BudgetConfig {
  dailyLimitUSD: number;
  alertThresholdPercent: number;
}

class BudgetMonitor {
  private dailySpend: number = 0;
  private lastReset: Date = new Date();
  private config: BudgetConfig;

  constructor(config: BudgetConfig) {
    this.config = config;
  }

  trackUsage(tokens: number, costPerToken: number): void {
    this.resetIfNewDay();
    this.dailySpend += tokens * costPerToken;

    const percentUsed = (this.dailySpend / this.config.dailyLimitUSD) * 100;

    if (percentUsed >= this.config.alertThresholdPercent) {
      console.warn(
        `Budget alert: ${percentUsed.toFixed(1)}% of daily limit used ($${this.dailySpend.toFixed(4)} / $${this.config.dailyLimitUSD})`,
      );
    }
  }

  canProceed(): boolean {
    this.resetIfNewDay();
    return this.dailySpend < this.config.dailyLimitUSD;
  }

  private resetIfNewDay(): void {
    const now = new Date();
    if (now.toDateString() !== this.lastReset.toDateString()) {
      this.dailySpend = 0;
      this.lastReset = now;
    }
  }
}

Backup and Recovery

Database Backup Script

#!/bin/bash
# backup.sh - Run daily via cron

BACKUP_DIR="/backups/perplexica"
DB_PATH="/app/data/perplexica.db"
DATE=$(date +%Y%m%d_%H%M%S)

# Create backup directory
mkdir -p "$BACKUP_DIR"

# SQLite online backup (safe for concurrent access)
sqlite3 "$DB_PATH" ".backup '$BACKUP_DIR/perplexica_$DATE.db'"

# Compress
gzip "$BACKUP_DIR/perplexica_$DATE.db"

# Retain last 30 days
find "$BACKUP_DIR" -name "*.gz" -mtime +30 -delete

echo "Backup completed: perplexica_$DATE.db.gz"
# Add to crontab
# crontab -e
0 2 * * * /app/scripts/backup.sh >> /var/log/perplexica-backup.log 2>&1

Production Readiness Checklist

CategoryItemStatus
InfrastructureDocker containers configured with resource limitsRequired
InfrastructurePersistent volume for SQLite databaseRequired
InfrastructureReverse proxy with SSL terminationRequired
InfrastructureHealth check endpoint configuredRequired
SecurityAPI keys stored securely (file permissions 600)Required
SecurityHTTPS enforced with valid certificateRequired
SecurityRate limiting configuredRequired
SecuritySecurity headers setRecommended
SecurityAuthentication layer for public deploymentsRecommended
MonitoringHealth check monitoringRequired
MonitoringLog aggregationRecommended
MonitoringPerformance metrics (Prometheus)Recommended
MonitoringBudget alerts for AI provider costsRecommended
ReliabilityAutomated database backupsRequired
ReliabilityContainer auto-restart policyRequired
ReliabilityGraceful shutdown handlingRecommended
PerformanceResponse caching configuredRecommended
PerformanceSearXNG replicas for redundancyRecommended
PerformanceDatabase WAL mode enabledRecommended

Summary

Deploying Perplexica in production is a matter of wrapping the Docker containers with the right operational infrastructure: a reverse proxy for SSL and load balancing, persistent storage for the database, security hardening for API keys and network access, monitoring for health and performance, and cost controls for AI provider spending. The single-command Docker deployment makes getting started easy, while the modular architecture allows you to scale each tier independently as usage grows.

Key Takeaways

  • Docker-first deployment -- Perplexica ships with Docker Compose files that handle the frontend, backend, and SearXNG in a single command.
  • Reverse proxy is essential -- Nginx or Caddy provides SSL termination, WebSocket proxying, rate limiting, and load balancing.
  • Security is layered -- file permissions for API keys, HTTPS everywhere, rate limiting, security headers, and optional authentication at the proxy layer.
  • Monitoring drives reliability -- health checks, Prometheus metrics, and budget alerts keep the system observable and costs predictable.
  • Cost-aware model routing -- sending simple queries to cheap/fast models and reserving expensive models for complex research queries can reduce AI costs by 60-80%.
  • Backup is non-negotiable -- daily SQLite backups with 30-day retention protect against data loss.

Congratulations

You have completed the Perplexica tutorial. You now have the knowledge to install, configure, extend, and deploy an AI-powered search engine. Whether you are building a personal research assistant, a team knowledge tool, or a customer-facing search product, Perplexica's open-source architecture gives you full control over every layer of the stack.

Go back to the Tutorial Index to review any chapter.


Built with insights from the Perplexica project.

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for perplexica, config, searxng so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 8: Production Deployment as an operating subsystem inside Perplexica Tutorial: AI-Powered Search Engine, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around subgraph, classDef, fill as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 8: Production Deployment usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for perplexica.
  2. Input normalization: shape incoming data so config receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through searxng.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Use the following upstream sources to verify implementation details while reading this chapter:

  • View Repo Why it matters: authoritative reference on View Repo (github.com).

Suggested trace strategy:

  • search upstream code for perplexica and config to map concrete implementation paths
  • compare docs claims against actual runtime/config code before reusing patterns in production

Chapter Connections