Configuration Reference
May 17, 2026 ยท View on GitHub
Fleet uses standard OTEL Collector YAML configuration. All Ingero-specific settings are under the ingero processor and ingero_threshold extension.
Processor: ingero
processors:
ingero:
threshold:
k: 2.0 # MAD sensitivity. Lower = more sensitive.
ema_alpha: 0.2 # Threshold EMA smoothing. Range (0, 1].
quorum:
statistical_min: 5 # Min active nodes for valid threshold.
coverage_fraction: 0.80 # Coverage alert fraction.
push_interval: 10s # Expected agent push interval.
ttl_multiplier: 5 # Node expiry = push_interval * ttl_multiplier.
max_expected_nodes: 10000 # Score map cap and world_size limit.
Parameter Details
| Parameter | Default | Range | Description |
|---|---|---|---|
threshold.k | 2.0 | 1.0 - 4.0 | MAD multiplier. k=1.5 flags ~6.7% of fleet, k=2.0 flags ~2.3%, k=3.0 flags ~0.13%. |
threshold.ema_alpha | 0.2 | (0, 1] | How fast the threshold responds to changes. Higher = more responsive. |
quorum.statistical_min | 5 | 1+ | Minimum active nodes before Fleet serves a threshold. Below this, Fleet returns quorum_met: false. |
quorum.coverage_fraction | 0.80 | 0 - 1.0 | Fraction of max_expected_nodes expected to report. When not met, emits ingero_fleet_coverage_low metric. Informational only - does not block threshold. |
push_interval | 10s | 1s+ | How often agents push. Also the MAD recomputation interval. Use 10s for <1K nodes, 30s for 1K-10K, 60s for 10K+. |
ttl_multiplier | 5 | 2+ | A node is expired from the score map after push_interval * ttl_multiplier with no push. Default: 50s at 10s interval. |
max_expected_nodes | 10000 | 1+ | Hard cap on score map entries. Also used for coverage quorum denominator. |
Extension: ingero_threshold
extensions:
ingero_threshold:
agent_endpoint: 0.0.0.0:8080 # Threshold API for agents.
admin_endpoint: 127.0.0.1:8081 # Diagnostics (planned, loopback only).
| Parameter | Default | Description |
|---|---|---|
agent_endpoint | 0.0.0.0:8080 | Address for the agent threshold API (GET /api/v1/threshold). |
admin_endpoint | 127.0.0.1:8081 | Address for diagnostics endpoint (not yet implemented). Bind to loopback for security. |
Middleware Configuration
To enable threshold piggyback on OTLP push responses, add the extension as middleware on the OTLP receiver:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
middlewares:
- id: ingero_threshold
grpc:
endpoint: 0.0.0.0:4317
middlewares:
- id: ingero_threshold
Without this, agents must use the GET endpoint as fallback.
Complete Example
See examples/fleet-config.yaml for a working configuration.
Trace forwarding
Fleet can forward OTLP traces emitted by Ingero agents (detection-event spans) to an operator-chosen backend such as Tempo, Jaeger, or Datadog APM. The traces pipeline is pure passthrough: Fleet does not modify, sample, or enrich span content. Fleet's custom processors (ingero, nccl, providerlookup) operate on metrics only.
Multi-node trace aggregation works by virtue of every agent tagging its spans with the ingero.cluster.id resource attribute. The backend filters and groups by that attribute to produce a per-cluster view of detection events across nodes.
Cross-node trace context propagation is not in scope today: each agent emits its own root span per detection event, so spans from different nodes share cluster_id but carry distinct trace_id values. Operators correlate manually by timestamp and cluster_id. Cross-node parent-child linking is planned for v0.14+.
To enable forwarding, uncomment the otlp exporter block in examples/fleet-config.yaml, point endpoint at your backend, and switch the traces pipeline's exporters from [debug] to [otlp].
Fleet Metrics
Fleet emits these metrics (available via Prometheus exporter or self-monitoring):
| Metric | Type | Labels | Description |
|---|---|---|---|
ingero_fleet_threshold | Gauge | cluster_id | Current straggler threshold |
ingero_fleet_median | Gauge | cluster_id | Fleet health score median |
ingero_fleet_mad | Gauge | cluster_id | Fleet MAD value |
ingero_fleet_active_nodes | Gauge | cluster_id | Nodes actively reporting |
ingero_fleet_idle_nodes | Gauge | cluster_id | Nodes in idle state |
ingero_fleet_coverage_low | Gauge | cluster_id | 1 if coverage quorum not met |
ingero_fleet_panic_mode | Gauge | cluster_id | 1 if panic mode active |
ingero_fleet_straggler_count | Gauge | cluster_id | Nodes below threshold |