Architecture

May 17, 2026 · View on GitHub

Fleet is a thin, stateful wrapper around an OpenTelemetry Collector distribution. Agents push health scores via OTLP; a processor computes a per-cluster threshold from the rolling score map; a middleware injects that threshold into the response headers of every OTLP push.

                +-----------------+
 agent --OTLP-->|  ingero-fleet   |--piggyback response-->  agent
                |                 |
                |  threshold store|---GET /api/v1/threshold  (fallback)
                +-----------------+

The state that drives threshold computation is entirely in-process. No database, no external coordination, no cross-replica synchronization.

Multi-replica behavior

What happens today

Each Fleet replica maintains its own score map. An agent push reaches one replica (selected by DNS or the service mesh), and that replica's map is the only one that ever sees the score. The threshold the replica returns is therefore derived from whatever subset of agents happened to hash to it.

Observed during the Day 2 EKS run (fleet/e2e-validation-report.md test K9): with a headless Service and Go's default DNS-cached resolver, agents tend to stick to one replica IP for their process lifetime, and small clusters end up with replicas that have fewer than statistical_min active agents. Those replicas return {"threshold":0,"quorum_met":false} while others serve a valid threshold — agents get inconsistent answers depending on which replica their request hit.

Recommended: single replica

Run Fleet with replicaCount: 1 (the chart default). Vertical scale is the path to larger clusters. A single g4dn.xlarge-class node has been measured to carry 100+ pushing agents at 5s intervals with p99 handler latency under 20 ms.

Why per-replica computation can drift

Each replica computes its threshold from a sample (its own subset of agents), not the full population. For this to converge to the "right" answer:

Sample must be large enough to be statistically meaningful (rule of thumb: ≥30 nodes per replica).
Sample must be representative (no systematic bias in which agents land on which replica).
Agents must redial occasionally so subsets shift over time (Go's default HTTP transport with IdleConnTimeout: 30s handles this in practice).

When these hold, replicas converge to similar thresholds. When they don't (small clusters, sticky load balancers, very short-lived agents), replicas drift and agents see inconsistent thresholds depending on which replica their request hit. This is why the chart defaults to replicaCount: 1 and why the multi-replica guidance below is structured around consistent-hash routing.

If you must run multiple replicas:

Put an L7 load balancer (Envoy, nginx, service mesh) in front with consistent-hash on the cluster_id query parameter. Every agent from one cluster will then land on the same replica.
Size statistical_min for the per-replica visible node count, not the cluster-wide count. statistical_min=10 across 20 agents but 2 replicas is a degraded configuration — one replica sees 10 and is healthy, the other sees 10 and is also healthy, but there is no cross-replica awareness of the 20.
Alert on sum_over_replicas(ingero_fleet_active_nodes) < expected_total_nodes to catch replica starvation.

Future direction

Native consistent-hash routing inside the extension so operators do not need an extra LB. Out of scope for this release.

Push path

Agent emits OTLP/HTTP POST to /v1/metrics?cluster_id=<id> (the cluster_id query parameter lets the extension match the push to its cluster at the HTTP layer; without it the middleware cannot tell which cluster the response header should reference).
ingeroprocessor folds the score into the in-memory map and recomputes the threshold.
The ingero_threshold extension (wired as an OTLP receiver middleware) writes ingero-threshold and ingero-quorum-met response headers on the reply.
The agent reads those headers from its own client response and seeds its threshold cache. No extra round trip.

The cluster's agents never need to call /api/v1/threshold during steady-state operation — that endpoint is a fallback for cold starts and for operators curl'ing from outside the fleet.

Observability

Fleet exposes zPages at /debug/servicez and OTel Collector's standard internal metrics at /metrics. Useful gauges:

ingero_fleet_active_nodes{cluster_id="..."} — per-cluster node count currently alive (TTL not expired).
ingero_fleet_quorum_met{cluster_id="..."} — 1 when active_nodes >= statistical_min, else 0.
ingero_fleet_threshold{cluster_id="..."} — last computed threshold.

Alert philosophy: do not alert on threshold == 0 (cold-start normal). Do alert on quorum_met == 0 persisting beyond a few TTL cycles, and on active_nodes dropping below a per-cluster expected value.