Architecture
May 17, 2026 · View on GitHub
Fleet is a thin, stateful wrapper around an OpenTelemetry Collector distribution. Agents push health scores via OTLP; a processor computes a per-cluster threshold from the rolling score map; a middleware injects that threshold into the response headers of every OTLP push.
+-----------------+
agent --OTLP-->| ingero-fleet |--piggyback response--> agent
| |
| threshold store|---GET /api/v1/threshold (fallback)
+-----------------+
The state that drives threshold computation is entirely in-process. No database, no external coordination, no cross-replica synchronization.
Multi-replica behavior
What happens today
Each Fleet replica maintains its own score map. An agent push reaches one replica (selected by DNS or the service mesh), and that replica's map is the only one that ever sees the score. The threshold the replica returns is therefore derived from whatever subset of agents happened to hash to it.
Observed during the Day 2 EKS run (fleet/e2e-validation-report.md
test K9): with a headless Service and Go's default DNS-cached resolver,
agents tend to stick to one replica IP for their process lifetime, and
small clusters end up with replicas that have fewer than statistical_min
active agents. Those replicas return {"threshold":0,"quorum_met":false}
while others serve a valid threshold — agents get inconsistent answers
depending on which replica their request hit.
Recommended: single replica
Run Fleet with replicaCount: 1 (the chart default). Vertical scale
is the path to larger clusters. A single g4dn.xlarge-class node has
been measured to carry 100+ pushing agents at 5s intervals with p99
handler latency under 20 ms.
Why per-replica computation can drift
Each replica computes its threshold from a sample (its own subset of agents), not the full population. For this to converge to the "right" answer:
- Sample must be large enough to be statistically meaningful (rule of thumb: ≥30 nodes per replica).
- Sample must be representative (no systematic bias in which agents land on which replica).
- Agents must redial occasionally so subsets shift over time (Go's
default HTTP transport with
IdleConnTimeout: 30shandles this in practice).
When these hold, replicas converge to similar thresholds. When they
don't (small clusters, sticky load balancers, very short-lived
agents), replicas drift and agents see inconsistent thresholds
depending on which replica their request hit. This is why the chart
defaults to replicaCount: 1 and why the multi-replica guidance below
is structured around consistent-hash routing.
If you must run multiple replicas:
- Put an L7 load balancer (Envoy, nginx, service mesh) in front with
consistent-hash on the
cluster_idquery parameter. Every agent from one cluster will then land on the same replica. - Size
statistical_minfor the per-replica visible node count, not the cluster-wide count.statistical_min=10across 20 agents but 2 replicas is a degraded configuration — one replica sees 10 and is healthy, the other sees 10 and is also healthy, but there is no cross-replica awareness of the 20. - Alert on
sum_over_replicas(ingero_fleet_active_nodes) < expected_total_nodesto catch replica starvation.
Future direction
Native consistent-hash routing inside the extension so operators do not need an extra LB. Out of scope for this release.
Push path
- Agent emits OTLP/HTTP POST to
/v1/metrics?cluster_id=<id>(thecluster_idquery parameter lets the extension match the push to its cluster at the HTTP layer; without it the middleware cannot tell which cluster the response header should reference). ingeroprocessorfolds the score into the in-memory map and recomputes the threshold.- The
ingero_thresholdextension (wired as an OTLP receiver middleware) writesingero-thresholdandingero-quorum-metresponse headers on the reply. - The agent reads those headers from its own client response and seeds its threshold cache. No extra round trip.
The cluster's agents never need to call /api/v1/threshold during
steady-state operation — that endpoint is a fallback for cold starts
and for operators curl'ing from outside the fleet.
Observability
Fleet exposes zPages at /debug/servicez and OTel Collector's standard
internal metrics at /metrics. Useful gauges:
ingero_fleet_active_nodes{cluster_id="..."}— per-cluster node count currently alive (TTL not expired).ingero_fleet_quorum_met{cluster_id="..."}— 1 whenactive_nodes >= statistical_min, else 0.ingero_fleet_threshold{cluster_id="..."}— last computed threshold.
Alert philosophy: do not alert on threshold == 0 (cold-start normal).
Do alert on quorum_met == 0 persisting beyond a few TTL cycles, and
on active_nodes dropping below a per-cluster expected value.