Prometheus Export

May 3, 2026 ยท View on GitHub

Ember can expose Caddy and FrankenPHP metrics in Prometheus format via an HTTP endpoint. This works alongside the TUI or in headless daemon mode.

Enabling the Metrics Endpoint

# TUI + metrics endpoint on port 9191
ember --expose :9191

# Headless mode (no TUI)
ember --expose :9191 --daemon

This starts an HTTP server with two endpoints:

  • /metrics: Prometheus text format (v0.0.4)
  • /healthz: JSON health check

Daemon Mode

The --daemon flag disables the TUI and requires --expose. Ember polls the Caddy admin API at the configured interval and serves metrics. This is ideal for sidecar deployments, CI environments, or any context where a terminal is not available.

ember --expose :9191 --daemon --addr http://caddy:2019

Logs are written to stderr. Use --log-format json for structured JSON logs suitable for log aggregation:

ember --expose :9191 --daemon --log-format json

Error Throttling

When the daemon cannot reach Caddy, it logs one error message immediately, then suppresses repeated errors for 30 seconds to avoid flooding logs. The suppressed count is included in the next log line. When connectivity is restored, a "fetch recovered" message is logged.

State Dump via SIGUSR1

Send SIGUSR1 to a running daemon to dump the full state snapshot to stderr as JSON. This is useful for debugging without interrupting the process:

kill -USR1 $(pgrep ember)

The dump includes threads, metrics, process info, and derived metrics. Not available on Windows.

TLS Certificate Reload via SIGHUP

Send SIGHUP to a running daemon to reload TLS certificates from disk without restarting:

kill -HUP $(pgrep ember)

This re-reads --ca-cert, --client-cert, and --client-key files and applies the new configuration. Useful for certificate rotation in long-running deployments. Not available on Windows.

Exported Metrics

FrankenPHP Thread Metrics

MetricTypeLabelsDescription
frankenphp_threads_totalgaugestate (busy, idle, other)Number of threads by state
frankenphp_thread_memory_bytesgaugeindexMemory usage per thread (only emitted for threads with memory > 0). Requires FrankenPHP 1.12.2+.

FrankenPHP Worker Metrics

MetricTypeLabelsDescription
frankenphp_worker_crashes_totalcounterworkerTotal worker crashes
frankenphp_worker_restarts_totalcounterworkerTotal worker restarts
frankenphp_worker_queue_depthgaugeworkerRequests waiting in queue
frankenphp_worker_requests_totalcounterworkerTotal requests processed

FrankenPHP Request Duration

MetricTypeLabelsDescription
frankenphp_request_duration_millisecondsgaugequantile (0.5, 0.9, 0.95, 0.99)Request duration percentiles

Caddy Host Metrics

MetricTypeLabelsDescription
ember_host_rpsgaugehostRequests per second
ember_host_latency_avg_millisecondsgaugehostAverage response time
ember_host_latency_millisecondsgaugehost, quantile (0.5, 0.9, 0.95, 0.99)Latency percentiles
ember_host_inflightgaugehostIn-flight requests
ember_host_status_rategaugehost, class (2xx, 3xx, 4xx, 5xx)Request rate by status class
ember_host_error_rategaugehostMiddleware error rate (handler-level errors, distinct from HTTP status codes)

Process Metrics

MetricTypeDescription
process_cpu_percentgaugeCPU usage of the monitored process
process_rss_bytesgaugeResident set size of the monitored process

Ember Self-Metrics

Ember emits its own scrape metrics so operators can observe the observer. They appear on /metrics whenever --expose is set, alongside the Caddy and FrankenPHP metrics.

MetricTypeLabelsDescription
ember_build_infogaugeversion, goversionConstant 1, exposes the build identifiers
ember_scrape_totalcounterstage (threads, metrics, process)Scrape attempts per sub-fetch (success + error)
ember_scrape_errors_totalcounterstageFailed scrape attempts per sub-fetch
ember_scrape_duration_secondsgaugestageDuration of the last scrape attempt, in seconds
ember_last_successful_scrape_timestamp_secondsgaugestageUnix timestamp of the last success per stage; 0 means never

Common alerts:

  • rate(ember_scrape_errors_total[5m]) > 0: Ember is failing to talk to Caddy.
  • time() - ember_last_successful_scrape_timestamp_seconds{stage="metrics"} > 60: no fresh metrics for over a minute.
  • ember_scrape_duration_seconds{stage="metrics"} > 1: scrape latency degraded.

Multi-instance label

When --addr is supplied more than once (only valid in --daemon and --json modes), every metric above gains an ember_instance="<name>" label. The instance name is either the explicit alias from name=url, or a slug derived from the host (web1.fr -> web1_fr). With a single instance no extra label is emitted, so existing dashboards and alerts keep working unchanged.

ember_build_info is the only metric that stays unlabelled even in multi-instance mode: there is one Ember binary regardless of how many Caddy instances it polls.

Example scrape config that promotes ember_instance to a regular Prometheus target label so PromQL filters look natural:

scrape_configs:
  - job_name: ember
    scrape_interval: 5s
    static_configs:
      - targets: ["ember:9191"]
    metric_relabel_configs:
      - source_labels: [ember_instance]
        target_label: instance
        action: replace

Custom Metric Prefix

Use --metrics-prefix to add a prefix to all metric names:

ember --expose :9191 --metrics-prefix myapp

This turns frankenphp_threads_total into myapp_frankenphp_threads_total, ember_host_rps into myapp_ember_host_rps, and so on.

The prefix must be a legal Prometheus metric name segment: letters, digits and underscores, not starting with a digit. Hyphens, dots and colons are rejected at startup so invalid names never reach a scraper.

Health Endpoint

GET /healthz returns a JSON response indicating whether Ember is collecting data successfully:

200 OK: Data is fresh:

{ "status": "ok", "last_fetch": "2026-03-16T10:00:00Z", "age_seconds": 1.2 }

503 Service Unavailable: Data is stale (older than 3x the polling interval, minimum 5 seconds). In multi-instance mode the threshold is computed per-instance using each instance's effective interval (the per-instance ,interval= suffix when set, otherwise the global --interval):

{ "status": "stale", "last_fetch": "2026-03-16T09:58:00Z", "age_seconds": 120 }

503 Service Unavailable: No data collected yet:

{ "status": "no data yet" }

In multi-instance mode the body switches to an aggregated form. The top-level status is the worst across instances (ok < stale < no data yet). The endpoint returns 200 only when every instance is ok:

{
  "status": "stale",
  "instances": [
    { "name": "web1", "addr": "https://web1.fr", "status": "ok",    "last_fetch": "2026-03-16T10:00:00Z", "age_seconds": 1.2 },
    { "name": "web2", "addr": "https://web2.fr", "status": "stale", "last_fetch": "2026-03-16T09:58:00Z", "age_seconds": 120 }
  ]
}

Tip: Use /healthz as a Kubernetes liveness probe to detect when Ember loses contact with Caddy.

Per-instance health (/healthz/<name>)

In multi-instance mode, GET /healthz/<name> returns the readiness of a single instance, with the same body shape as the single-instance /healthz:

{ "status": "ok", "last_fetch": "2026-03-16T10:00:00Z", "age_seconds": 1.2 }

The status code is 200 only when that instance is ok, 503 otherwise. The endpoint returns 404 when the instance name is unknown, when no name is provided (/healthz/), or in single-instance mode. Use it as a per-pod readiness probe when running Ember as a sidecar against several upstreams: a single bad upstream no longer fails every probe.

Authentication

Use --metrics-auth to protect the /metrics and /healthz endpoints with HTTP Basic Authentication:

ember --expose :9191 --daemon --metrics-auth admin:secret

When enabled, all requests to the metrics server must include valid credentials. Unauthenticated requests receive a 401 Unauthorized response.

Prometheus Scrape Configuration

Minimal prometheus.yml snippet:

scrape_configs:
  - job_name: ember
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9191"]

With authentication:

scrape_configs:
  - job_name: ember
    scrape_interval: 5s
    basic_auth:
      username: admin
      password: secret
    static_configs:
      - targets: ["localhost:9191"]

See Also