Configuration Reference

May 17, 2026 ยท View on GitHub

Fleet uses standard OTEL Collector YAML configuration. All Ingero-specific settings are under the ingero processor and ingero_threshold extension.

Processor: ingero

processors:
  ingero:
    threshold:
      k: 2.0              # MAD sensitivity. Lower = more sensitive.
      ema_alpha: 0.2       # Threshold EMA smoothing. Range (0, 1].
    quorum:
      statistical_min: 5   # Min active nodes for valid threshold.
      coverage_fraction: 0.80  # Coverage alert fraction.
    push_interval: 10s     # Expected agent push interval.
    ttl_multiplier: 5      # Node expiry = push_interval * ttl_multiplier.
    max_expected_nodes: 10000  # Score map cap and world_size limit.

Parameter Details

ParameterDefaultRangeDescription
threshold.k2.01.0 - 4.0MAD multiplier. k=1.5 flags ~6.7% of fleet, k=2.0 flags ~2.3%, k=3.0 flags ~0.13%.
threshold.ema_alpha0.2(0, 1]How fast the threshold responds to changes. Higher = more responsive.
quorum.statistical_min51+Minimum active nodes before Fleet serves a threshold. Below this, Fleet returns quorum_met: false.
quorum.coverage_fraction0.800 - 1.0Fraction of max_expected_nodes expected to report. When not met, emits ingero_fleet_coverage_low metric. Informational only - does not block threshold.
push_interval10s1s+How often agents push. Also the MAD recomputation interval. Use 10s for <1K nodes, 30s for 1K-10K, 60s for 10K+.
ttl_multiplier52+A node is expired from the score map after push_interval * ttl_multiplier with no push. Default: 50s at 10s interval.
max_expected_nodes100001+Hard cap on score map entries. Also used for coverage quorum denominator.

Extension: ingero_threshold

extensions:
  ingero_threshold:
    agent_endpoint: 0.0.0.0:8080    # Threshold API for agents.
    admin_endpoint: 127.0.0.1:8081  # Diagnostics (planned, loopback only).
ParameterDefaultDescription
agent_endpoint0.0.0.0:8080Address for the agent threshold API (GET /api/v1/threshold).
admin_endpoint127.0.0.1:8081Address for diagnostics endpoint (not yet implemented). Bind to loopback for security.

Middleware Configuration

To enable threshold piggyback on OTLP push responses, add the extension as middleware on the OTLP receiver:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        middlewares:
          - id: ingero_threshold
      grpc:
        endpoint: 0.0.0.0:4317
        middlewares:
          - id: ingero_threshold

Without this, agents must use the GET endpoint as fallback.

Complete Example

See examples/fleet-config.yaml for a working configuration.

Trace forwarding

Fleet can forward OTLP traces emitted by Ingero agents (detection-event spans) to an operator-chosen backend such as Tempo, Jaeger, or Datadog APM. The traces pipeline is pure passthrough: Fleet does not modify, sample, or enrich span content. Fleet's custom processors (ingero, nccl, providerlookup) operate on metrics only.

Multi-node trace aggregation works by virtue of every agent tagging its spans with the ingero.cluster.id resource attribute. The backend filters and groups by that attribute to produce a per-cluster view of detection events across nodes.

Cross-node trace context propagation is not in scope today: each agent emits its own root span per detection event, so spans from different nodes share cluster_id but carry distinct trace_id values. Operators correlate manually by timestamp and cluster_id. Cross-node parent-child linking is planned for v0.14+.

To enable forwarding, uncomment the otlp exporter block in examples/fleet-config.yaml, point endpoint at your backend, and switch the traces pipeline's exporters from [debug] to [otlp].

Fleet Metrics

Fleet emits these metrics (available via Prometheus exporter or self-monitoring):

MetricTypeLabelsDescription
ingero_fleet_thresholdGaugecluster_idCurrent straggler threshold
ingero_fleet_medianGaugecluster_idFleet health score median
ingero_fleet_madGaugecluster_idFleet MAD value
ingero_fleet_active_nodesGaugecluster_idNodes actively reporting
ingero_fleet_idle_nodesGaugecluster_idNodes in idle state
ingero_fleet_coverage_lowGaugecluster_id1 if coverage quorum not met
ingero_fleet_panic_modeGaugecluster_id1 if panic mode active
ingero_fleet_straggler_countGaugecluster_idNodes below threshold