Chapter 8: Production Deployment

March 2, 2026 ยท View on GitHub

Welcome to Chapter 8: Production Deployment. In this part of PostHog Tutorial: Open Source Product Analytics Platform, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Throughout this tutorial you built an analytics stack from the ground up: event tracking (Chapter 2), user analytics (Chapter 3), session recordings (Chapter 4), feature flags (Chapter 5), dashboards (Chapter 6), and advanced analytics (Chapter 7). All of that work is only as valuable as the infrastructure running it. A misconfigured deployment loses events, a breach exposes user data, and an unmonitored system fails silently.

This final chapter covers everything you need to run PostHog in production with confidence: choosing between cloud and self-hosted, hardening your ingestion pipeline, monitoring system health, managing costs, ensuring compliance, and planning for scale.

What You Will Learn

  • Choose the right hosting model for your organization
  • Secure event ingestion and API access
  • Monitor ingestion health, latency, and error rates
  • Manage storage, retention, and costs
  • Implement backup and disaster recovery plans
  • Comply with GDPR, CCPA, and other privacy regulations
  • Plan for traffic growth and scaling

Hosting Models

Cloud vs. Self-Hosted

flowchart TD
    Decision{"Choose hosting<br/>model"}
    Decision -->|"Fast setup,<br/>managed infra"| Cloud["PostHog Cloud"]
    Decision -->|"Full data control,<br/>custom config"| Self["Self-Hosted"]

    Cloud --> C1["US or EU region"]
    Cloud --> C2["Managed upgrades"]
    Cloud --> C3["SOC 2 compliant"]
    Cloud --> C4["Pay per event"]

    Self --> S1["Kubernetes or Docker"]
    Self --> S2["Your infrastructure"]
    Self --> S3["You manage upgrades"]
    Self --> S4["Fixed infra cost"]

    classDef decision fill:#fff3e0,stroke:#ef6c00
    classDef cloud fill:#e1f5fe,stroke:#01579b
    classDef self fill:#e8f5e8,stroke:#1b5e20

    class Decision decision
    class Cloud,C1,C2,C3,C4 cloud
    class Self,S1,S2,S3,S4 self
FactorPostHog CloudSelf-Hosted
Setup timeMinutesHours to days
MaintenanceManaged by PostHogYour team
Data residencyUS or EUAny region
UpgradesAutomaticManual
Cost modelPer eventInfrastructure cost
SOC 2 / HIPAAAvailableYour responsibility
CustomizationLimitedFull
Scale limitUnlimited (pricing tiers)Your infrastructure
SupportIncluded (paid plans)Community + self-service

Self-Hosted Deployment Options

Docker Compose (Development / Small Teams)

# Clone the PostHog repository
git clone https://github.com/PostHog/posthog.git
cd posthog

# Start PostHog with Docker Compose
docker compose -f docker-compose.hobby.yml up -d

# Access PostHog at http://localhost:8000
# Complete the setup wizard in the browser

Kubernetes with Helm (Production)

# Add the PostHog Helm repository
helm repo add posthog https://posthog.github.io/charts-clickhouse/
helm repo update

# Create a values file for your deployment
cat > posthog-values.yaml << 'EOF'
cloud: aws  # or gcp, azure

ingress:
  enabled: true
  hostname: posthog.yourcompany.com
  tls:
    - secretName: posthog-tls
      hosts:
        - posthog.yourcompany.com

clickhouse:
  persistence:
    size: 200Gi
  resources:
    requests:
      memory: 8Gi
      cpu: 2
    limits:
      memory: 16Gi
      cpu: 4

kafka:
  persistence:
    size: 50Gi

postgresql:
  persistence:
    size: 20Gi

redis:
  master:
    persistence:
      size: 5Gi

web:
  replicaCount: 2
  resources:
    requests:
      memory: 1Gi
      cpu: 500m

worker:
  replicaCount: 2
  resources:
    requests:
      memory: 2Gi
      cpu: 1

plugins:
  replicaCount: 2
  resources:
    requests:
      memory: 1Gi
      cpu: 500m
EOF

# Install PostHog
helm install posthog posthog/posthog \
  --namespace posthog \
  --create-namespace \
  -f posthog-values.yaml

Self-Hosted Architecture

flowchart TD
    LB["Load Balancer<br/>(nginx / ALB)"]
    LB --> Web["Web Server<br/>(Django)"]
    LB --> Capture["Capture Server<br/>(Event Ingestion)"]

    Capture --> Kafka["Kafka"]
    Kafka --> Worker["Worker<br/>(Event Processing)"]
    Worker --> CH["ClickHouse<br/>(Events Storage)"]
    Worker --> PG["PostgreSQL<br/>(Metadata)"]

    Web --> CH
    Web --> PG
    Web --> Redis["Redis<br/>(Cache / Queues)"]

    Plugins["Plugin Server"] --> Kafka
    Plugins --> CH

    S3["Object Storage<br/>(S3 / GCS)"] --> CH
    S3 --> Recordings["Session Recordings"]

    classDef lb fill:#fff3e0,stroke:#ef6c00
    classDef app fill:#e1f5fe,stroke:#01579b
    classDef data fill:#e8f5e8,stroke:#1b5e20
    classDef storage fill:#f3e5f5,stroke:#4a148c

    class LB lb
    class Web,Capture,Worker,Plugins app
    class Kafka,CH,PG,Redis data
    class S3,Recordings storage

Securing Your Deployment

Ingestion Security

Protect the event ingestion endpoint from abuse, data poisoning, and unauthorized access.

// Reverse proxy configuration for PostHog ingestion
// This hides PostHog from ad-blockers and adds security headers

// Next.js rewrites (next.config.js)
const nextConfig = {
  async rewrites() {
    return [
      {
        source: '/ingest/:path*',
        destination: 'https://app.posthog.com/:path*'
      }
    ]
  }
}

// Then initialize PostHog with your proxy
import posthog from 'posthog-js'

posthog.init('YOUR_API_KEY', {
  api_host: '/ingest',  // uses your domain, bypasses ad-blockers
  ui_host: 'https://app.posthog.com'  // keep UI links working
})
# Django middleware for server-side proxy to PostHog
import httpx
from django.http import HttpResponse, HttpResponseNotAllowed

POSTHOG_HOST = 'https://app.posthog.com'
ALLOWED_PATHS = ['/e/', '/decide/', '/engage/', '/s/']

async def posthog_proxy(request):
    """Proxy PostHog requests through your server."""
    if request.method != 'POST':
        return HttpResponseNotAllowed(['POST'])

    path = request.path.replace('/ingest', '')
    if not any(path.startswith(p) for p in ALLOWED_PATHS):
        return HttpResponse(status=404)

    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'{POSTHOG_HOST}{path}',
            content=request.body,
            headers={
                'Content-Type': request.content_type,
                'X-Forwarded-For': request.META.get('REMOTE_ADDR', ''),
            },
            timeout=10.0,
        )

    return HttpResponse(
        content=response.content,
        status=response.status_code,
        content_type=response.headers.get('content-type'),
    )

API Key Management

Key TypeScopeUse CaseRotation
Project API keyWrite events for one projectClient-side SDKsQuarterly
Personal API keyFull API access for one userScripts, CI/CDMonthly
Service accountScoped API accessAutomated pipelinesOn personnel change
// Rotate API keys safely
// 1. Create new key in PostHog settings
// 2. Deploy new key to all environments
// 3. Monitor for errors (old key still active)
// 4. After 24h with no errors on old key, revoke it

// Environment-based key management
const POSTHOG_API_KEY = process.env.POSTHOG_API_KEY
if (!POSTHOG_API_KEY) {
  throw new Error('POSTHOG_API_KEY environment variable is required')
}

posthog.init(POSTHOG_API_KEY, {
  api_host: process.env.POSTHOG_HOST || 'https://app.posthog.com'
})

Network Security

ControlImplementationPurpose
HTTPS everywhereTLS 1.2+ on all endpointsEncryption in transit
CORS restrictionsLimit Access-Control-Allow-OriginPrevent unauthorized origins
Rate limiting100 req/s per IP on ingestionPrevent abuse
WAFCloudFlare / AWS WAF rulesBlock malicious requests
IP allowlistingRestrict API access to known IPsProtect management endpoints
CSP headersAllow PostHog domains in CSPEnable SDK and recordings
# Nginx configuration for PostHog reverse proxy
server {
    listen 443 ssl;
    server_name posthog.yourcompany.com;

    ssl_certificate /etc/ssl/certs/posthog.crt;
    ssl_certificate_key /etc/ssl/private/posthog.key;
    ssl_protocols TLSv1.2 TLSv1.3;

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=posthog:10m rate=100r/s;

    # Ingestion endpoint
    location /e/ {
        limit_req zone=posthog burst=200 nodelay;
        proxy_pass http://posthog-backend:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Decide endpoint (feature flags)
    location /decide/ {
        limit_req zone=posthog burst=50 nodelay;
        proxy_pass http://posthog-backend:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    # Session recordings
    location /s/ {
        client_max_body_size 10m;
        proxy_pass http://posthog-backend:8000;
        proxy_set_header Host $host;
    }
}

Monitoring and Observability

What to Monitor

flowchart TD
    M["Monitoring"]
    M --> I["Ingestion Health"]
    M --> P["Performance"]
    M --> S["Storage"]
    M --> A["Application"]

    I --> I1["Events/second"]
    I --> I2["Ingestion latency"]
    I --> I3["Failed events"]
    I --> I4["Queue depth (Kafka)"]

    P --> P1["API response time"]
    P --> P2["Dashboard load time"]
    P --> P3["Query execution time"]
    P --> P4["Memory usage"]

    S --> S1["ClickHouse disk usage"]
    S --> S2["PostgreSQL size"]
    S --> S3["S3 recording volume"]
    S --> S4["Kafka log size"]

    A --> A1["Error rate (5xx)"]
    A --> A2["SDK initialization failures"]
    A --> A3["Feature flag evaluation latency"]
    A --> A4["Worker queue backlog"]

    classDef main fill:#e1f5fe,stroke:#01579b
    classDef detail fill:#fff3e0,stroke:#ef6c00

    class M main
    class I,P,S,A main
    class I1,I2,I3,I4,P1,P2,P3,P4,S1,S2,S3,S4,A1,A2,A3,A4 detail

Key Metrics Dashboard

MetricSourceAlert ThresholdSeverity
Events ingested / minutePostHog internal< 50% of baselineCritical
Ingestion latency (p95)PostHog internal> 30 secondsWarning
5xx error rateLoad balancer> 1% of requestsCritical
ClickHouse disk usageInfrastructure> 80% capacityWarning
Kafka consumer lagKafka metrics> 100k messagesWarning
API response time (p95)Load balancer> 5 secondsWarning
SDK initialization failuresClient-side error tracking> 5% of page loadsCritical
Feature flag evaluation timePostHog internal> 500msWarning

Health Check Endpoints

// Server-side health check for PostHog availability
import { PostHog } from 'posthog-node'

const client = new PostHog('YOUR_API_KEY', {
  host: 'https://app.posthog.com'
})

async function checkPostHogHealth(): Promise<{
  status: 'healthy' | 'degraded' | 'down'
  details: Record<string, unknown>
}> {
  const checks: Record<string, boolean> = {}

  // Check event capture
  try {
    client.capture({
      distinctId: 'health-check',
      event: '$health_check',
      properties: { timestamp: new Date().toISOString() }
    })
    checks.capture = true
  } catch {
    checks.capture = false
  }

  // Check feature flag evaluation
  try {
    await client.isFeatureEnabled('health-check-flag', 'health-check')
    checks.featureFlags = true
  } catch {
    checks.featureFlags = false
  }

  // Check API availability
  try {
    const response = await fetch(
      'https://app.posthog.com/api/projects/YOUR_PROJECT_ID/',
      { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
    )
    checks.api = response.ok
  } catch {
    checks.api = false
  }

  const allHealthy = Object.values(checks).every(v => v)
  const anyHealthy = Object.values(checks).some(v => v)

  return {
    status: allHealthy ? 'healthy' : anyHealthy ? 'degraded' : 'down',
    details: checks
  }
}
import requests
from posthog import Posthog

posthog_client = Posthog(
    api_key='YOUR_API_KEY',
    host='https://app.posthog.com'
)

def check_posthog_health() -> dict:
    """Run health checks against PostHog."""
    checks = {}

    # Check event capture
    try:
        posthog_client.capture(
            distinct_id='health-check',
            event='$health_check',
        )
        checks['capture'] = True
    except Exception:
        checks['capture'] = False

    # Check feature flags
    try:
        posthog_client.feature_enabled(
            key='health-check-flag',
            distinct_id='health-check',
        )
        checks['feature_flags'] = True
    except Exception:
        checks['feature_flags'] = False

    # Check API
    try:
        response = requests.get(
            'https://app.posthog.com/api/projects/YOUR_PROJECT_ID/',
            headers={'Authorization': 'Bearer YOUR_API_KEY'},
            timeout=5,
        )
        checks['api'] = response.ok
    except Exception:
        checks['api'] = False

    all_healthy = all(checks.values())
    any_healthy = any(checks.values())

    return {
        'status': 'healthy' if all_healthy else 'degraded' if any_healthy else 'down',
        'checks': checks,
    }

Storage and Retention

Data Categories and Retention

Data TypeDefault RetentionConfigurableStorage Impact
EventsUnlimitedYesHigh (primary cost driver)
Person profilesUnlimitedYesMedium
Session recordings30 daysYesHigh
Feature flag dataUnlimitedNoLow
Dashboard / insight configUnlimitedNoLow

Retention Configuration

flowchart LR
    Events["All Events"]
    Events --> Hot["Hot Storage<br/>0-90 days<br/>SSD / fast queries"]
    Events --> Warm["Warm Storage<br/>90-365 days<br/>HDD / slower queries"]
    Events --> Cold["Cold Storage<br/>365+ days<br/>S3 / archive only"]
    Events --> Delete["Deleted<br/>Past retention window"]

    classDef hot fill:#ffebee,stroke:#c62828
    classDef warm fill:#fff3e0,stroke:#ef6c00
    classDef cold fill:#e1f5fe,stroke:#01579b
    classDef del fill:#f5f5f5,stroke:#9e9e9e

    class Hot hot
    class Warm warm
    class Cold cold
    class Delete del

Storage Cost Estimation

ComponentGrowth RateTypical Size (10k DAU)Cost Factor
ClickHouse (events)~1 GB / 1M events50-100 GB / monthPrimary
PostgreSQL (metadata)Slow growth1-5 GBLow
Object storage (recordings)~3 MB / recording50-200 GB / monthSecondary
Kafka (queue)Temporary5-20 GBLow
Redis (cache)Stable1-2 GBLow

Managing Storage Costs

import requests

def configure_data_retention(
    project_id: str,
    api_key: str,
    event_retention_days: int = 365,
    recording_retention_days: int = 30,
) -> None:
    """Configure data retention policies for a PostHog project."""

    # Set event retention (self-hosted only, via ClickHouse TTL)
    # For cloud, configure in project settings
    print(f"Event retention: {event_retention_days} days")
    print(f"Recording retention: {recording_retention_days} days")

    # Delete old person data
    # Use the PostHog API to handle deletion requests
    response = requests.get(
        f'https://app.posthog.com/api/projects/{project_id}/persons/',
        headers={'Authorization': f'Bearer {api_key}'},
        params={'properties': [{'key': '$last_seen', 'value': '365d', 'operator': 'is_date_before'}]},
    )

    stale_persons = response.json()
    print(f"Found {stale_persons['count']} persons older than {event_retention_days} days")

Backup and Disaster Recovery

Backup Strategy

ComponentMethodFrequencyRetention
ClickHouseSnapshot + incrementalDaily30 days
PostgreSQLpg_dump + WAL archivingDaily + continuous30 days
Object storageCross-region replicationContinuousMatches source
ConfigurationHelm values + secrets in vaultOn every changeUnlimited

Backup Scripts

#!/bin/bash
# backup-posthog.sh - Run as a cron job
set -euo pipefail

BACKUP_DIR="/backups/posthog/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"

# PostgreSQL backup
echo "Backing up PostgreSQL..."
pg_dump -h posthog-postgresql \
  -U posthog \
  -d posthog \
  --format=custom \
  --compress=9 \
  -f "$BACKUP_DIR/postgresql.dump"

# ClickHouse backup (using clickhouse-backup tool)
echo "Backing up ClickHouse..."
clickhouse-backup create "posthog-$(date +%Y%m%d)"
clickhouse-backup upload "posthog-$(date +%Y%m%d)"

# Helm values backup
echo "Backing up Helm configuration..."
helm get values posthog -n posthog > "$BACKUP_DIR/helm-values.yaml"

# Upload to S3
aws s3 sync "$BACKUP_DIR" "s3://your-backup-bucket/posthog/$(date +%Y-%m-%d)/"

echo "Backup complete: $BACKUP_DIR"

Disaster Recovery Plan

ScenarioRPORTOProcedure
Single pod failure0< 5 minKubernetes auto-restarts pod
Node failure0< 15 minKubernetes reschedules to another node
AZ failure< 1 hour< 1 hourMulti-AZ ClickHouse replicas take over
Region failure< 24 hours< 4 hoursRestore from cross-region backup
Data corruption< 24 hours< 2 hoursRestore ClickHouse from snapshot
Complete loss< 24 hours< 8 hoursFull restore from backup to new cluster

Testing Disaster Recovery

# Test PostgreSQL restore
pg_restore -h localhost \
  -U posthog \
  -d posthog_test \
  --clean \
  /backups/posthog/latest/postgresql.dump

# Verify data integrity
psql -h localhost -U posthog -d posthog_test -c "
  SELECT count(*) FROM posthog_dashboard;
  SELECT count(*) FROM posthog_featureflag;
  SELECT count(*) FROM posthog_insight;
"

# Test ClickHouse restore
clickhouse-backup restore "posthog-latest" --table events

Compliance and Privacy

GDPR Compliance

flowchart TD
    R["User Rights"]
    R --> A["Right to Access"]
    R --> D["Right to Deletion"]
    R --> P["Right to Portability"]
    R --> O["Right to Object"]

    A --> A1["Export user data<br/>via API"]
    D --> D1["Delete user data<br/>via API"]
    P --> P1["Export in machine-<br/>readable format"]
    O --> O1["Opt-out of tracking<br/>via SDK"]

    classDef right fill:#e1f5fe,stroke:#01579b
    classDef impl fill:#e8f5e8,stroke:#1b5e20

    class R right
    class A,D,P,O right
    class A1,D1,P1,O1 impl

Implementing User Data Requests

// Handle GDPR data subject requests

// Right to Access: export user data
async function exportUserData(userId: string): Promise<object> {
  // Get person data
  const personResponse = await fetch(
    `https://app.posthog.com/api/projects/YOUR_PROJECT_ID/persons/?distinct_id=${userId}`,
    { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
  )
  const person = await personResponse.json()

  // Get user events
  const eventsResponse = await fetch(
    `https://app.posthog.com/api/projects/YOUR_PROJECT_ID/events/?distinct_id=${userId}&limit=10000`,
    { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
  )
  const events = await eventsResponse.json()

  return {
    person: person.results[0],
    events: events.results,
    exported_at: new Date().toISOString()
  }
}

// Right to Deletion: delete user data
async function deleteUserData(userId: string): Promise<void> {
  // Find the person
  const personResponse = await fetch(
    `https://app.posthog.com/api/projects/YOUR_PROJECT_ID/persons/?distinct_id=${userId}`,
    { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
  )
  const person = await personResponse.json()

  if (person.results.length === 0) {
    console.log('Person not found')
    return
  }

  // Delete the person and their data
  await fetch(
    `https://app.posthog.com/api/projects/YOUR_PROJECT_ID/persons/${person.results[0].id}/`,
    {
      method: 'DELETE',
      headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
    }
  )

  console.log(`Deleted data for user: ${userId}`)
}
import requests

def handle_data_request(
    project_id: str,
    api_key: str,
    user_id: str,
    request_type: str,  # 'access' or 'deletion'
) -> dict:
    """Handle GDPR data subject requests."""
    headers = {'Authorization': f'Bearer {api_key}'}
    base_url = f'https://app.posthog.com/api/projects/{project_id}'

    # Find the person
    person_response = requests.get(
        f'{base_url}/persons/',
        headers=headers,
        params={'distinct_id': user_id},
    )
    persons = person_response.json()

    if not persons['results']:
        return {'status': 'not_found', 'user_id': user_id}

    person = persons['results'][0]

    if request_type == 'access':
        # Export all user data
        events_response = requests.get(
            f'{base_url}/events/',
            headers=headers,
            params={'distinct_id': user_id, 'limit': 10000},
        )
        return {
            'status': 'exported',
            'person': person,
            'events': events_response.json()['results'],
        }

    elif request_type == 'deletion':
        # Delete user data
        requests.delete(
            f'{base_url}/persons/{person["id"]}/',
            headers=headers,
        )
        return {'status': 'deleted', 'user_id': user_id}

Privacy Compliance Checklist

RequirementCloudSelf-HostedImplementation
Data encryption at restIncludedConfigureEnable disk encryption
Data encryption in transitIncludedConfigureTLS 1.2+ on all endpoints
Data residency (EU/US)Choose regionYour choiceDeploy in correct region
User consentSDK configSDK configConsent banner + opt_in_capturing
Data deletionAPIAPI + DBImplement deletion endpoint
Data portabilityAPIAPIImplement export endpoint
Access loggingIncludedConfigureEnable audit logs
Data minimizationSDK configSDK configproperty_denylist, masking
Retention policiesProject settingsClickHouse TTLConfigure per data type

Scaling for Growth

Scaling Dimensions

DimensionGrowth SignalScaling Action
Event volume> 10M events/dayAdd ClickHouse shards
Concurrent users> 50 analystsAdd web replicas
Recording volume> 10k recordings/dayIncrease object storage; add workers
Query complexityDashboard load > 10sAdd ClickHouse replicas; optimize queries
Feature flag evaluations> 100k evals/minuteEnable local evaluation; add decide replicas

Scaling Architecture

flowchart TD
    subgraph "Ingestion Tier"
        LB1["Load Balancer"]
        C1["Capture 1"]
        C2["Capture 2"]
        C3["Capture N"]
    end

    subgraph "Processing Tier"
        K["Kafka Cluster<br/>(3+ brokers)"]
        W1["Worker 1"]
        W2["Worker 2"]
        W3["Worker N"]
    end

    subgraph "Storage Tier"
        CH1["ClickHouse Shard 1<br/>(Replica A + B)"]
        CH2["ClickHouse Shard 2<br/>(Replica A + B)"]
        PG["PostgreSQL<br/>(Primary + Replica)"]
        S3["Object Storage"]
    end

    subgraph "Query Tier"
        Web1["Web 1"]
        Web2["Web 2"]
        Web3["Web N"]
    end

    LB1 --> C1 & C2 & C3
    C1 & C2 & C3 --> K
    K --> W1 & W2 & W3
    W1 & W2 & W3 --> CH1 & CH2
    W1 & W2 & W3 --> S3
    Web1 & Web2 & Web3 --> CH1 & CH2 & PG

    classDef ingest fill:#e1f5fe,stroke:#01579b
    classDef process fill:#fff3e0,stroke:#ef6c00
    classDef store fill:#e8f5e8,stroke:#1b5e20
    classDef query fill:#f3e5f5,stroke:#4a148c

    class LB1,C1,C2,C3 ingest
    class K,W1,W2,W3 process
    class CH1,CH2,PG,S3 store
    class Web1,Web2,Web3 query

Performance Tuning

# ClickHouse tuning for high-volume PostHog
# clickhouse-config.xml overrides
profiles:
  default:
    max_memory_usage: 10000000000           # 10 GB per query
    max_execution_time: 60                   # 60 second timeout
    max_threads: 8                           # parallel query threads
    max_insert_block_size: 1048576           # batch insert size
    merge_tree_min_rows_for_concurrent_read: 20000

# Kafka consumer tuning
kafka:
  consumer:
    max_poll_records: 500
    max_poll_interval_ms: 300000
    session_timeout_ms: 30000
    fetch_min_bytes: 1048576                 # 1 MB min fetch

Go-Live Checklist

Pre-Launch

  • SDK initialized correctly in all environments (dev, staging, production)
  • Event taxonomy documented and validated with Live Events view
  • User identification working for authenticated and anonymous users
  • Session recordings enabled with appropriate masking
  • Feature flags tested end-to-end (client and server)
  • Privacy controls configured (consent, masking, data residency)

Infrastructure

  • HTTPS/TLS enabled on all endpoints
  • API keys rotated and stored in secrets manager
  • Rate limiting configured on ingestion endpoints
  • CORS restricted to your domains
  • WAF rules deployed (if applicable)
  • Reverse proxy configured (optional, for ad-blocker bypass)

Monitoring

  • Ingestion health dashboard created
  • Error rate alerts configured
  • Storage usage alerts configured
  • SDK initialization errors tracked
  • Feature flag evaluation latency monitored

Compliance

  • Privacy policy updated to include analytics and recordings
  • Cookie consent banner implemented
  • Data deletion endpoint implemented and tested
  • Data export endpoint implemented and tested
  • Retention policies configured for all data types
  • Access controls set up (RBAC for team members)

Backup and Recovery

  • Database backups scheduled and tested
  • Object storage replication enabled
  • Configuration backed up (Helm values, secrets)
  • Restore procedure documented and tested
  • RPO/RTO targets documented

Troubleshooting

ProblemCauseSolution
Events not appearingSDK misconfigured or network issueCheck API key, host, and browser console
High ingestion latencyKafka consumer lag or ClickHouse overloadScale workers; increase ClickHouse resources
Dashboard slow to loadLarge date ranges or too many insightsReduce date range; split dashboards
Recordings not playingCSP blocking or SDK version mismatchUpdate CSP; upgrade SDK
Feature flags returning defaultFlags not loaded or wrong distinct_idCheck onFeatureFlags callback; verify ID
Out of disk spaceClickHouse or recordings consuming storageConfigure retention; increase disk
5xx errors on ingestionWorker crashes or OOMIncrease memory limits; check error logs
Slow feature flag evaluationNetwork latency to PostHogEnable local evaluation; cache flag values

Summary

Running PostHog in production requires attention to security, monitoring, storage, compliance, and scalability. Whether you choose PostHog Cloud or self-host, the operational responsibilities are the same: protect user data, monitor system health, plan for growth, and maintain compliance with privacy regulations. The checklists and configurations in this chapter give you a concrete starting point for a production-grade deployment.

Key Takeaways

  1. Choose the right hosting model -- PostHog Cloud for speed and simplicity; self-hosted for maximum data control and customization.
  2. Secure the ingestion pipeline -- use HTTPS, rate limiting, CORS, and a reverse proxy to protect your analytics endpoints.
  3. Monitor proactively -- track ingestion health, error rates, and storage usage. Set alerts before problems become outages.
  4. Plan retention and costs -- not all data needs to live forever. Set retention policies by data type and monitor storage growth.
  5. Compliance is continuous -- implement data deletion, export, and consent mechanisms from day one. Audit regularly.

Next Steps

Congratulations -- you have completed the PostHog tutorial. You now have the knowledge to build a complete, production-ready product analytics platform. Here is how to continue your journey:


Practice what you've learned:

  1. Deploy PostHog in a staging environment and run through the go-live checklist
  2. Set up monitoring dashboards for ingestion health and error rates
  3. Implement a data deletion endpoint for GDPR compliance
  4. Configure backup and test a restore procedure

Built with insights from the PostHog project.

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for posthog, checks, PostHog so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 8: Production Deployment as an operating subsystem inside PostHog Tutorial: Open Source Product Analytics Platform, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around classDef, fill, stroke as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 8: Production Deployment usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for posthog.
  2. Input normalization: shape incoming data so checks receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through PostHog.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Use the following upstream sources to verify implementation details while reading this chapter:

  • View Repo Why it matters: authoritative reference on View Repo (github.com).

Suggested trace strategy:

  • search upstream code for posthog and checks to map concrete implementation paths
  • compare docs claims against actual runtime/config code before reusing patterns in production

Chapter Connections