Telemetry Documentation
May 31, 2026 · View on GitHub
Status: This document covers currently implemented telemetry features for both Huginn Proxy and the eBPF agent.
Overview
Proxy
Huginn Proxy provides comprehensive telemetry through:
- Prometheus Metrics - 48 metrics covering connections, requests, TLS, fingerprinting, backends, active health checks, throughput, rate limiting, IP filtering, header manipulation, mTLS, config hot reload, TLS certificate hot reload, and fingerprint spoofing detection
- Health Check Endpoints - Kubernetes-ready:
/health,/ready,/live,/metrics
All proxy telemetry is exposed on a separate observability server (configurable via telemetry.metrics_port).
eBPF Agent
The eBPF agent (DaemonSet) exposes the same four HTTP endpoints as the proxy for K8s compatibility, plus its own Prometheus metrics:
- Endpoints -
/health,/ready,/live,/metrics(same JSON format as proxy;/readyreturns 503 when BPF map pins are missing) - Metrics -
tcp_syn_captured_total,tcp_syn_insert_failures_total,tcp_syn_malformed_total,agent_up,huginn_ebpf_agent_build_info
Configuration
Proxy
[telemetry]
metrics_port = 9090 # Port for metrics and health endpoints (default: disabled)
When metrics_port is configured, the following endpoints become available:
eBPF Agent
The agent’s observability server is configured via environment variables:
| Variable | Required | Description |
|---|---|---|
HUGINN_EBPF_METRICS_ADDR | Yes | Bind address (e.g. 127.0.0.1) |
HUGINN_EBPF_METRICS_PORT | Yes | Port (e.g. 9091) |
Metrics Endpoint
Format: Prometheus text format (both proxy and agent)
Scraping: Compatible with Prometheus, Grafana Agent, etc.
- Proxy:
http://<host>:<telemetry.metrics_port>/metrics(e.g.http://localhost:9090/metrics) - eBPF agent:
http://<HUGINN_EBPF_METRICS_ADDR>:<HUGINN_EBPF_METRICS_PORT>/metrics(e.g.http://127.0.0.1:9091/metrics)
Example Prometheus Configuration
scrape_configs:
- job_name: 'huginn-proxy'
static_configs:
- targets: [ 'localhost:9090' ]
scrape_interval: 15s
- job_name: 'huginn-ebpf-agent'
static_configs:
- targets: [ '127.0.0.1:9091' ]
scrape_interval: 15s
Implemented Metrics
1. Throughput Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_bytes_received_total | Counter | Total bytes received from clients | protocol |
huginn_bytes_sent_total | Counter | Total bytes sent to clients | protocol |
huginn_backend_bytes_received_total | Counter | Total bytes received from backends | backend_address |
huginn_backend_bytes_sent_total | Counter | Total bytes sent to backends | backend_address |
Labels:
protocol: Connection protocol (http/1.1,h2,https)backend_address: Backend server address (e.g.,backend-1:9000)
Example queries:
# Client throughput rate (bytes/sec received)
rate(huginn_bytes_received_total[5m])
# Client throughput rate (bytes/sec sent)
rate(huginn_bytes_sent_total[5m])
# Backend throughput rate (bytes/sec)
rate(huginn_backend_bytes_received_total[5m])
rate(huginn_backend_bytes_sent_total[5m])
# Total bandwidth usage (MB/s)
(rate(huginn_bytes_received_total[5m]) + rate(huginn_bytes_sent_total[5m])) / 1024 / 1024
# Per-backend bandwidth
sum by (backend_address) (rate(huginn_backend_bytes_received_total[5m]))
Note: Throughput metrics are based on Content-Length headers when available. Chunked transfer encoding (without
Content-Length) will not be counted.
2. Connection Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_connections_total | Counter | Total connections established | protocol |
huginn_connections_active | Gauge | Active connections currently open | protocol |
huginn_connections_rejected_total | Counter | Connections rejected due to limits | reason |
huginn_tls_connections_active | Gauge | Active TLS connections | - |
Labels:
protocol: Connection protocol (http/1.1,h2,https)reason: Rejection reason —limit_exceeded(active connections hit the configured maximum)
Example queries:
# Connection rate
rate(huginn_connections_total[5m])
# Active connections
huginn_connections_active
# Rejection rate
rate(huginn_connections_rejected_total[5m])
3. Request Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_entrypoint_requests_total | Counter | All requests arriving at the proxy, regardless of routing outcome | method, status_code, protocol |
huginn_requests_total | Counter | Requests matched to a route and dispatched | method, status_code, protocol, route |
huginn_requests_duration_seconds | Histogram | Duration of routed requests | method, status_code, protocol, route |
The two request counters model the same two layers as Traefik's entrypoint / router metrics:
huginn_entrypoint_requests_total— incremented for every HTTP request the proxy receives, including those rejected before routing (IP block → 403, no matching route → 404). Use this for total load and overall status-code distribution visible to clients.huginn_requests_total— incremented only when a route matched. Carries theroutelabel so you can break down traffic, latency, and error rates per route. Unrouted requests (403, 404) are not counted here.
Labels:
method: HTTP method (GET,POST,PUT, etc.)status_code: HTTP status code (200,404,500, etc.)protocol: HTTP version (HTTP/1.1,HTTP/2.0)route: Matched route prefix — only onhuginn_requests_total(e.g.,/api,/)
Example queries:
# Total request rate (all traffic arriving at the proxy)
rate(huginn_entrypoint_requests_total[5m])
# Routed request rate (matched a route)
rate(huginn_requests_total[5m])
# Unrouted request rate (404 no-match + 403 blocked)
rate(huginn_entrypoint_requests_total[5m]) - rate(huginn_requests_total[5m])
# Error rate (5xx) as seen by clients
rate(huginn_entrypoint_requests_total{status_code=~"5.."}[5m])
/ rate(huginn_entrypoint_requests_total[5m])
# P95 latency (routed requests only)
histogram_quantile(0.95, rate(huginn_requests_duration_seconds_bucket[5m]))
# P99 latency
histogram_quantile(0.99, rate(huginn_requests_duration_seconds_bucket[5m]))
# Requests by route
sum by (route) (rate(huginn_requests_total[5m]))
# Latency by route (P95)
histogram_quantile(0.95,
sum by (route, le) (rate(huginn_requests_duration_seconds_bucket[5m]))
)
# Error rate by route (5xx from backends)
sum by (route) (rate(huginn_requests_total{status_code=~"5.."}[5m]))
/ sum by (route) (rate(huginn_requests_total[5m]))
4. TLS Handshake Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_tls_handshakes_total | Counter | TLS handshakes completed | tls_version, cipher_suite |
huginn_tls_handshake_duration_seconds | Histogram | TLS handshake duration | tls_version |
huginn_tls_handshake_errors_total | Counter | TLS handshake errors | error_type |
huginn_timeouts_total | Counter | Timeouts by type | timeout_type |
Labels:
tls_version: TLS version negotiated (TLS1.2,TLS1.3)cipher_suite: TLS cipher suite used (e.g.,TLS_AES_256_GCM_SHA384)error_type: Error type (handshake_timeout,invalid_certificate,protocol_error, etc.)timeout_type: Timeout type (tls_handshake,connection,idle)
Example queries:
# TLS handshake rate
rate(huginn_tls_handshakes_total[5m])
# TLS version distribution
sum by (tls_version) (rate(huginn_tls_handshakes_total[5m]))
# Cipher suite distribution
sum by (cipher_suite) (rate(huginn_tls_handshakes_total[5m]))
# TLS error rate
rate(huginn_tls_handshake_errors_total[5m])
# P95 handshake duration
histogram_quantile(0.95, rate(huginn_tls_handshake_duration_seconds_bucket[5m]))
5. Fingerprinting Metrics
TLS Fingerprinting (JA4)
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_tls_fingerprints_extracted_total | Counter | TLS (JA4) fingerprints extracted | - |
huginn_tls_fingerprint_extraction_duration_seconds | Histogram | TLS fingerprint extraction time | - |
huginn_tls_fingerprint_failures_total | Counter | TLS fingerprint extraction failures | - |
HTTP/2 Fingerprinting (Akamai)
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_http2_fingerprints_extracted_total | Counter | HTTP/2 (Akamai) fingerprints extracted | - |
huginn_http2_fingerprint_extraction_duration_seconds | Histogram | HTTP/2 fingerprint extraction time | - |
huginn_http2_fingerprint_failures_total | Counter | HTTP/2 fingerprint failures | reason |
Labels:
reason: Failure kind —extraction_failed(HTTP/2 connection where fingerprint could not be extracted, e.g. malformed frames or connection closed before SETTINGS),not_http2(HTTP/1.1 connection — Akamai fingerprinting does not apply)
TCP SYN Fingerprinting (p0f via eBPF)
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_tcp_syn_fingerprints_total | Counter | TCP SYN fingerprint lookups (result=hit|miss|malformed) | reason |
huginn_tcp_syn_fingerprint_duration_seconds | Histogram | BPF map lookup and parse duration | reason |
huginn_tcp_syn_fingerprint_failures_total | Counter | Malformed BPF map entries (undecodable TCP options) | - |
Labels:
reason: Lookup result —hit(fingerprint found and injected),miss(no BPF map entry — keep-alive reuse, IPv6 peer, or stale entry),malformed(entry present but TCP options undecodable)
Note: TCP SYN fingerprinting requires the eBPF agent to be running and pinning BPF maps. The proxy reads from those maps; this metric covers the proxy-side lookup, not the agent-side capture (see eBPF Agent Metrics for capture counters).
Fingerprint Spoofing Detection
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_fingerprint_spoofing_attempts_total | Counter | Client-supplied proxy-authoritative fingerprint headers stripped (spoofing attempts) | header |
Labels:
header: The header name the client attempted to supply (e.g.x-http2-akamai,x-tcp-p0f,x-tls-ja4)
Note: All eight proxy-authoritative fingerprint headers are stripped unconditionally on every request. This counter
is incremented only when the client actually sent one of those headers — i.e., when there was an active spoofing
attempt. A zero value means no clients have tried to forge fingerprints. The companion request header
x-fingerprint-spoofing-detected (forwarded to the backend) lists the spoofed names per-request; this metric
aggregates the same signal across requests for alerting.
Example queries:
# Rate of spoofing attempts (any header)
rate(huginn_fingerprint_spoofing_attempts_total[5m])
# Which fingerprint headers are being targeted
sum by (header) (rate(huginn_fingerprint_spoofing_attempts_total[5m]))
Example queries (original):
# TLS fingerprint extraction rate
rate(huginn_tls_fingerprints_extracted_total[5m])
# HTTP/2 fingerprint extraction rate
rate(huginn_http2_fingerprints_extracted_total[5m])
# HTTP/2 fingerprint failure rate (HTTP/2 connections only)
rate(huginn_http2_fingerprint_failures_total{reason="extraction_failed"}[5m])
# HTTP/1.1 connections (no HTTP/2 fingerprint applicable)
rate(huginn_http2_fingerprint_failures_total{reason="not_http2"}[5m])
# TLS fingerprint failure rate
rate(huginn_tls_fingerprint_failures_total[5m])
/ rate(huginn_tls_fingerprints_extracted_total[5m])
# P95 extraction duration (TLS)
histogram_quantile(0.95, rate(huginn_tls_fingerprint_extraction_duration_seconds_bucket[5m]))
6. Backend Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_backend_requests_total | Counter | Requests forwarded to backends | backend_address, status_code, protocol, route |
huginn_backend_errors_total | Counter | Backend errors | backend_address, error_type, route |
huginn_backend_duration_seconds | Histogram | Backend request duration | backend_address, route |
huginn_backend_selections_total | Counter | Backend selection events | backend |
Labels:
backend: Backend address selected at runtime (usuallyhost:port, e.g.,backend-a:9000)backend_address: Backend address (e.g.,backend-1:9000)status_code: HTTP status code from backenderror_type: Error type (connection_refused,timeout,dns_error, etc.)protocol: HTTP version used for backend requestroute: Route that triggered the backend request
Example queries:
# Backend request rate
rate(huginn_backend_requests_total[5m])
# Backend error rate (global)
sum(rate(huginn_backend_errors_total[5m]))
/ sum(rate(huginn_backend_requests_total[5m]))
# P95 backend latency
histogram_quantile(0.95, rate(huginn_backend_duration_seconds_bucket[5m]))
# Backend selection distribution
sum by (backend) (rate(huginn_backend_selections_total[5m]))
# Backend request distribution by route
sum by (backend_address, route) (rate(huginn_backend_requests_total[5m]))
# Backend requests by route
sum by (backend_address, route) (rate(huginn_backend_requests_total[5m]))
# Backend errors by route
sum by (backend_address, route) (rate(huginn_backend_errors_total[5m]))
Active health checks (TCP or HTTP GET over plain http://, opt-in: health_check on a [[backends]] entry;
see SETTINGS.md). The supervisor probes the backend; requests are short-circuited with 502 when the
upstream is marked unhealthy (error_type = upstream_unhealthy in huginn_errors_total).
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_health_check_probes_total | Counter | Probes: TCP connect or HTTP round-trip (success and failure) | backend, result |
huginn_health_check_gate_rejects_total | Counter | Client requests not forwarded because upstream is unhealthy (502) | backend_address |
Labels:
backend: Upstreamhost:port(same as backend key in the registry)result:ok(probe succeeded) orfail(timeout, refused, unexpected HTTP status, etc.)backend_address: Same asbackend(Prometheusbackend_addresskey for this counter)
Example queries:
# Probe success ratio per backend
sum by (backend) (rate(huginn_health_check_probes_total{result="ok"}[5m]))
/ sum by (backend) (rate(huginn_health_check_probes_total[5m]))
# 502s blocked by the health gate (per upstream)
sum by (backend_address) (rate(huginn_health_check_gate_rejects_total[5m]))
# Fail probes per backend
sum by (backend) (rate(huginn_health_check_probes_total{result="fail"}[5m]))
7. Rate Limiting Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_rate_limit_requests_total | Counter | Total requests evaluated by rate limiter | strategy, route |
huginn_rate_limit_allowed_total | Counter | Total requests allowed by rate limiter | strategy, route |
huginn_rate_limit_rejected_total | Counter | Total requests rejected (429) by rate limiter | strategy, route |
Labels:
strategy: Rate limiting strategy (ip,header,route,combined)route: Route prefix (e.g.,/api,/)
Example queries:
# Rate limit evaluation rate
rate(huginn_rate_limit_requests_total[5m])
# Rate limit rejection rate
rate(huginn_rate_limit_rejected_total[5m])
# Rate limit rejection percentage
rate(huginn_rate_limit_rejected_total[5m])
/ rate(huginn_rate_limit_requests_total[5m]) * 100
# Rejections by strategy
sum by (strategy) (rate(huginn_rate_limit_rejected_total[5m]))
# Rejections by route
sum by (route) (rate(huginn_rate_limit_rejected_total[5m]))
# Allow rate by strategy
sum by (strategy) (rate(huginn_rate_limit_allowed_total[5m]))
8. Error Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_errors_total | Counter | Total errors by type | error_type, component |
Labels:
error_type: Error category (config,tls,http,io,timeout)component: Component where error occurred (proxy,backend,fingerprint, etc.)
Example queries:
# Error rate by type
sum by (error_type) (rate(huginn_errors_total[5m]))
# Total error rate
rate(huginn_errors_total[5m])
9. IP Filtering Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_ip_filter_requests_total | Counter | Total requests evaluated by IP filter | - |
huginn_ip_filter_allowed_total | Counter | Total requests allowed by IP filter | - |
huginn_ip_filter_denied_total | Counter | Total requests denied by IP filter (403) | - |
Example queries:
# IP filter evaluation rate
rate(huginn_ip_filter_requests_total[5m])
# IP filter denial rate
rate(huginn_ip_filter_denied_total[5m])
# IP filter denial percentage
rate(huginn_ip_filter_denied_total[5m])
/ rate(huginn_ip_filter_requests_total[5m]) * 100
# Allow rate
rate(huginn_ip_filter_allowed_total[5m])
10. Header Manipulation Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_headers_added_total | Counter | Total headers added by header manipulation | context |
huginn_headers_removed_total | Counter | Total headers removed by header manipulation | context |
Labels:
context: Context where headers were manipulated (request,response)
Example queries:
# Headers added rate
rate(huginn_headers_added_total[5m])
# Headers removed rate
rate(huginn_headers_removed_total[5m])
# Headers added per context
sum by (context) (rate(huginn_headers_added_total[5m]))
# Headers removed per context
sum by (context) (rate(huginn_headers_removed_total[5m]))
11. mTLS Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_mtls_connections_total | Counter | Total connections with mTLS enabled (client certificate verified) | protocol |
Labels:
protocol: TLS protocol version (e.g.,TLSv1.2,TLSv1.3)
Example queries:
# mTLS connection rate
rate(huginn_mtls_connections_total[5m])
# mTLS usage percentage
rate(huginn_mtls_connections_total[5m])
/ rate(huginn_tls_handshakes_total[5m]) * 100
# mTLS by protocol version
sum by (protocol) (rate(huginn_mtls_connections_total[5m]))
Note:
- This metric only counts successful TLS handshakes where a client certificate was present and verified.
- mTLS verification failures are captured in
huginn_tls_handshake_errors_total. - When mTLS is required but client certificate is invalid/absent, the TLS handshake fails before this metric is recorded.
12. Config Hot Reload Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_config_reload_total | Counter | Config reload attempts | result |
huginn_config_last_reload_timestamp_seconds | Gauge | Unix timestamp of the last successful reload | - |
huginn_config_hash | Gauge | Semantic hash of the active DynamicConfig | - |
Labels:
result: Outcome of the reload attempt —successorerror
Notes:
huginn_config_reload_totalis incremented on every reload attempt triggered by SIGHUP or filesystem watcher, regardless of outcome.huginn_config_last_reload_timestamp_secondsis only updated on success; use it together withhuginn_config_reload_total{result="error"}to detect stuck reloads.huginn_config_hashchanges whenever the deserializedDynamicConfigchanges. It is unaffected by TOML formatting changes (whitespace, comments, field ordering within a table) since it hashes the parsed struct, not the raw file.
13. TLS Certificate Hot Reload Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_tls_cert_reload_total | Counter | TLS certificate load/reload attempts (includes initial load) | result |
huginn_tls_cert_last_reload_timestamp_seconds | Gauge | Unix timestamp of the last successful TLS cert load or reload | - |
huginn_tls_cert_hash | Gauge | FNV-1a hash of the currently active certificate chain (DER bytes) | - |
Labels:
result: Outcome of the reload attempt —successorerror
Notes:
- This metric trio is independent from the config hot reload trio (§12). Modifying the TOML config does not bump these gauges; replacing the cert/key files does. The two subsystems run in separate background tasks.
- The hash and timestamp gauges are populated immediately at boot with the initial certificate, so
time() - huginn_tls_cert_last_reload_timestamp_secondsis meaningful from the first scrape (unlike §12, which only populates on the first hot reload event). huginn_tls_cert_hashhashes the certificate chain DER bytes. The private key is intentionally not part of the hash (cert/key are always rotated together, and hashing key material in a process-wide gauge is a security smell).- A failed reload (cert/key parse error, validation failure, or
ServerConfigrebuild error) bumps theresult="error"counter but leaves the hash and timestamp gauges untouched, so dashboards continue to advertise the last good certificate that is actually serving traffic.
Example queries:
- Detect a rotation in the last 5 minutes:
changes(huginn_tls_cert_hash[5m]) > 0 - Alert on stuck reloads (rotation attempted but failed):
rate(huginn_tls_cert_reload_total{result="error"}[5m]) > 0 - Cert age proxy (time since last successful load):
time() - huginn_tls_cert_last_reload_timestamp_seconds
14. Build Info
| Metric | Type | Description | Labels |
|---|---|---|---|
huginn_build_info | Gauge | Build information (always 1) | version, rust_version |
Labels:
version: Proxy version (e.g.,0.0.1)rust_version: Rust version used to compile (e.g.,1.86)
Example queries:
# Get current version
huginn_build_info
# Check version across multiple instances
group by (version) (huginn_build_info)
Note: This metric always has value 1 and is used to expose version information as labels.
eBPF Agent Metrics
The eBPF agent (huginn-ebpf-agent) exposes a small set of metrics on its own observability server, in addition to the same health endpoints as the proxy.
Agent metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
tcp_syn_captured_total | Observable counter | Number of TCP SYN signatures successfully captured | - |
tcp_syn_insert_failures_total | Observable counter | Number of TCP SYN map insert failures (e.g. LRU full) | - |
tcp_syn_malformed_total | Observable counter | Number of malformed TCP packets (e.g. doff too short) that matched dst | - |
agent_up | Gauge | 1 if the agent has pinned maps and is running | - |
huginn_ebpf_agent_build_info | Gauge | Build information (always 1) | version, rust_version |
Grafana Dashboard Suggestions
Key Metrics to Monitor
Overview Panel:
- Request rate (all traffic):
rate(huginn_entrypoint_requests_total[5m]) - Active connections:
huginn_connections_active - Error rate (client view):
rate(huginn_entrypoint_requests_total{status_code=~"5.."}[5m]) / rate(huginn_entrypoint_requests_total[5m]) - P95 latency:
histogram_quantile(0.95, rate(huginn_requests_duration_seconds_bucket[5m])) - Bandwidth (MB/s):
(rate(huginn_bytes_received_total[5m]) + rate(huginn_bytes_sent_total[5m])) / 1024 / 1024
TLS Panel:
- Handshake rate:
rate(huginn_tls_handshakes_total[5m]) - TLS version distribution:
sum by (tls_version) (rate(huginn_tls_handshakes_total[5m])) - Handshake duration P95:
histogram_quantile(0.95, rate(huginn_tls_handshake_duration_seconds_bucket[5m])) - TLS error rate:
rate(huginn_tls_handshake_errors_total[5m])
Fingerprinting Panel:
- TLS fingerprints/sec:
rate(huginn_tls_fingerprints_extracted_total[5m]) - HTTP/2 fingerprints/sec:
rate(huginn_http2_fingerprints_extracted_total[5m]) - Extraction duration P95:
histogram_quantile(0.95, rate(huginn_tls_fingerprint_extraction_duration_seconds_bucket[5m]))
Rate Limiting Panel:
- Rate limit evaluation rate:
rate(huginn_rate_limit_requests_total[5m]) - Rate limit rejection rate:
rate(huginn_rate_limit_rejected_total[5m]) - Rejection percentage:
rate(huginn_rate_limit_rejected_total[5m]) / rate(huginn_rate_limit_requests_total[5m]) * 100 - Rejections by strategy:
sum by (strategy) (rate(huginn_rate_limit_rejected_total[5m]))
Backend Panel:
- Backend request rate:
sum by (backend_address) (rate(huginn_backend_requests_total[5m])) - Backend error rate:
sum(rate(huginn_backend_errors_total[5m])) / sum(rate(huginn_backend_requests_total[5m])) - Backend latency P95:
histogram_quantile(0.95, rate(huginn_backend_duration_seconds_bucket[5m])) - Backend throughput:
sum by (backend_address) (rate(huginn_backend_bytes_received_total[5m]) + rate(huginn_backend_bytes_sent_total[5m])) - Health (opt-in): probe rate
sum by (backend) (rate(huginn_health_check_probes_total[5m])); fail ratiorate(huginn_health_check_probes_total{result="fail"}[5m]) / rate(huginn_health_check_probes_total[5m]); gate 502ssum by (backend_address) (rate(huginn_health_check_gate_rejects_total[5m]))
Hot Reload Panel (config and TLS cert grouped in one section, two parallel rows):
Config row:
- Reload success rate:
rate(huginn_config_reload_total{result="success"}[1h]) - Reload error rate:
rate(huginn_config_reload_total{result="error"}[1h]) - Time since last successful reload:
time() - huginn_config_last_reload_timestamp_seconds - Active config hash:
huginn_config_hash
TLS certificate row (each panel aligned column-wise under its config counterpart):
- Cert reload success rate:
rate(huginn_tls_cert_reload_total{result="success"}[1h]) - Cert reload error rate:
rate(huginn_tls_cert_reload_total{result="error"}[1h]) - Time since last successful cert load:
time() - huginn_tls_cert_last_reload_timestamp_seconds - Active cert hash:
huginn_tls_cert_hash(changes on every rotation; content hash for change detection, not a JA4-style client fingerprint) - Detect rotation in last 5 min:
changes(huginn_tls_cert_hash[5m]) > 0
eBPF Agent Panel (DaemonSet, one agent per node):
- Agent up:
agent_up - TCP SYN signatures captured:
tcp_syn_captured_total - TCP SYN insert failures:
tcp_syn_insert_failures_total - TCP SYN malformed:
tcp_syn_malformed_total - Agent version:
huginn_ebpf_agent_build_info
Future Enhancements
The following telemetry features are planned but not yet implemented:
Metrics + Tracing for Pending Features
The following metrics are not implemented yet (the product may already include the related runtime behaviour):
- Backend connection pool: optional future gauges/counters (e.g. pool size, active/idle connections, reuse rate). The connection pool to upstreams already exists (see SETTINGS.md and FEATURES.md); only dedicated Prometheus series for it are still missing.
- Tracing: distributed request tracing and correlation (
traceparentpropagation, proxy spans, and request ID correlation) is planned but not implemented yet.
Grafana Dashboard
A pre-built Grafana dashboard covering all metrics in this document is available in
examples/grafana/dashboards/huginn-proxy.json.
To run it locally alongside the proxy:
docker compose -f examples/docker-compose.observability.yml up -d
Then open http://localhost:3000 and log in with admin / huginn. The dashboard loads automatically.
See examples/README.md for full setup instructions.