Deployment Guide

May 18, 2026 ยท View on GitHub

Prerequisites

  • Ingero agent v0.10+ on each GPU node
  • Kubernetes 1.24+ (for Helm) or Docker (for standalone)

Option A: Add to Existing OTEL Collector

If you already run an OTEL Collector, add the Ingero modules to your OCB manifest:

# your-builder-config.yaml
processors:
  # ingero-version:builder-gomod-processor product=ingero-fleet channel=stable
  - gomod: github.com/ingero-io/ingero-fleet/processor/ingeroprocessor v1.0.1

extensions:
  # ingero-version:builder-gomod-extension product=ingero-fleet channel=stable
  - gomod: github.com/ingero-io/ingero-fleet/extension/ingerothresholdextension v1.0.1

Rebuild your collector:

ocb --config your-builder-config.yaml

Add the Ingero components to your collector config:

processors:
  ingero:
    threshold:
      k: 2.0
    push_interval: 10s

extensions:
  ingero_threshold:
    agent_endpoint: 0.0.0.0:8080

service:
  extensions: [ingero_threshold]
  pipelines:
    metrics:
      processors: [ingero]  # add to your existing pipeline

Option B: Kubernetes (Helm)

helm install ingero-fleet ./helm/ingero-fleet \
  --namespace ingero --create-namespace \
  --set replicaCount=2

Verify:

kubectl get pods -n ingero
kubectl port-forward -n ingero svc/ingero-fleet 8080:8080
curl http://localhost:8080/api/v1/threshold?cluster_id=your-cluster

Custom Values

helm install ingero-fleet ./helm/ingero-fleet \
  --set config.threshold.k=2.5 \
  --set config.push_interval=30s \
  --set config.max_expected_nodes=100 \
  --set replicaCount=3

See helm/ingero-fleet/values.yaml for all options.

Option C: Docker (Standalone)

docker run -d --name ingero-fleet \
  -p 4317:4317 -p 4318:4318 -p 8080:8080 \
  ghcr.io/ingero-io/ingero-fleet:latest

With custom config:

docker run -d --name ingero-fleet \
  -v $(pwd)/fleet-config.yaml:/etc/ingero-fleet/config.yaml \
  -p 4317:4317 -p 4318:4318 -p 8080:8080 \
  ghcr.io/ingero-io/ingero-fleet:latest

Configure Agents

Point each Ingero agent at Fleet by adding to ingero.yaml:

fleet:
  endpoint: https://fleet.example.com:4318
  cluster_id: your-cluster-name

Or via environment:

sudo ingero trace --fleet-endpoint https://fleet:4318 --cluster-id prod-training

Verify

After agents start pushing, check the threshold API:

curl http://fleet:8080/api/v1/threshold?cluster_id=your-cluster

Expected response once quorum is met (5+ active nodes):

{"threshold":0.89,"quorum_met":true}

High Availability

replicaCount: 1 is the chart default. Vertical scale is the path to larger clusters: a single g4dn.xlarge-class node carries 100+ pushing agents at 5s intervals with p99 handler latency under 20 ms.

helm install ingero-fleet ./helm/ingero-fleet --set replicaCount=1

Multi-replica HA (when you need it)

Each Fleet replica maintains its own in-memory score map. An agent push reaches ONE replica (selected by DNS or the service mesh); that replica's map is the only one that sees the score. Each replica computes its own threshold from its subset of agents.

For multi-replica: put an L7 load balancer with consistent-hash on the cluster_id query parameter (Envoy / nginx / service mesh) in front of Fleet. Every agent from one cluster lands on the same replica, eliminating cross-replica drift.

Size statistical_min for the per-replica visible node count, not the cluster-wide count. Alert on sum_over_replicas(ingero_fleet_active_nodes) < expected_total_nodes for replica starvation.

Larger-cluster topologies (gateway-based shared state) are out of scope for this release. Talk to us if you're approaching the per-replica vertical-scale ceiling.

See docs/architecture_fleet.md for the full behavior model and rationale.

LB config snippets

The agent encodes cluster_id as a query parameter on every OTLP push URL (POST /v1/metrics?cluster_id=<id>). The threshold cache GET also includes it. All three load balancers below hash on that parameter so every push from a single cluster pins to one replica.

NGINX

upstream ingero_fleet {
    hash $arg_cluster_id consistent;
    server 10.0.0.10:4318;
    server 10.0.0.11:4318;
    server 10.0.0.12:4318;
    keepalive 32;
}

server {
    listen 4318;
    location / {
        proxy_pass http://ingero_fleet;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_buffering off;
        proxy_request_buffering off;
    }
}

hash $arg_cluster_id consistent reads the cluster_id query parameter and uses NGINX's ketama-style consistent hash. With three replicas and cluster_id="prod-cluster-eu", every push from that cluster lands on the same upstream regardless of source IP. Adding or removing one replica re-keys at most 1/N of clusters.

Envoy

static_resources:
  listeners:
    - name: ingero_fleet_listener
      address:
        socket_address: { address: 0.0.0.0, port_value: 4318 }
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingero_fleet
                route_config:
                  virtual_hosts:
                    - name: ingero_fleet
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: ingero_fleet
                            hash_policy:
                              - query_parameter: { name: cluster_id }
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: ingero_fleet
      connect_timeout: 5s
      type: STRICT_DNS
      lb_policy: RING_HASH
      ring_hash_lb_config:
        minimum_ring_size: 1024
      load_assignment:
        cluster_name: ingero_fleet
        endpoints:
          - lb_endpoints:
              - endpoint: { address: { socket_address: { address: ingero-fleet-0.ingero-fleet, port_value: 4318 } } }
              - endpoint: { address: { socket_address: { address: ingero-fleet-1.ingero-fleet, port_value: 4318 } } }
              - endpoint: { address: { socket_address: { address: ingero-fleet-2.ingero-fleet, port_value: 4318 } } }

lb_policy: RING_HASH plus the query_parameter hash policy gives the same consistent-hash semantics as NGINX. minimum_ring_size controls the spread; 1024 is fine for fleets of <100 clusters.

Istio

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: ingero-fleet
  namespace: ingero
spec:
  host: ingero-fleet
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpQueryParameterName: cluster_id
    connectionPool:
      http:
        h2UpgradePolicy: UPGRADE
        idleTimeout: 30s

Istio routes all in-mesh traffic to ingero-fleet through this DestinationRule. httpQueryParameterName: cluster_id is the same hash key as NGINX and Envoy: agents picked up by the sidecar see the matching replica without any agent-side config change.

Verifying the hash

After deploying any of the three above, set up a synthetic check that sends ten pushes with the same cluster_id and asserts they all hit the same replica's process metrics:

for i in $(seq 1 10); do
    curl -s "http://<lb>:4318/v1/metrics?cluster_id=test-hash" \
        -X POST -H 'Content-Type: application/json' \
        -d '{"resourceMetrics":[]}' > /dev/null
done

# Verify exactly one replica's `ingero_fleet_total_pushes` counter
# advanced by 10:
kubectl -n ingero exec -it ingero-fleet-0 -- \
    curl -s localhost:8888/metrics | grep ingero_fleet_total_pushes
kubectl -n ingero exec -it ingero-fleet-1 -- \
    curl -s localhost:8888/metrics | grep ingero_fleet_total_pushes

Cloud Quick Start Scripts

AWS (EKS)

# Assumes: EKS cluster with GPU nodes running Ingero agent
helm install ingero-fleet ./helm/ingero-fleet \
  --namespace ingero --create-namespace \
  --set replicaCount=2

TensorDock / LambdaLabs / Bare Metal

# Run Fleet on any machine reachable by GPU nodes
docker run -d --name ingero-fleet \
  -p 4317:4317 -p 4318:4318 -p 8080:8080 \
  ghcr.io/ingero-io/ingero-fleet:latest

# Point agents at it
sudo ingero trace --fleet-endpoint http://fleet-host:4318 --cluster-id my-cluster