Deployment Guide
May 18, 2026 ยท View on GitHub
Prerequisites
- Ingero agent v0.10+ on each GPU node
- Kubernetes 1.24+ (for Helm) or Docker (for standalone)
Option A: Add to Existing OTEL Collector
If you already run an OTEL Collector, add the Ingero modules to your OCB manifest:
# your-builder-config.yaml
processors:
# ingero-version:builder-gomod-processor product=ingero-fleet channel=stable
- gomod: github.com/ingero-io/ingero-fleet/processor/ingeroprocessor v1.0.1
extensions:
# ingero-version:builder-gomod-extension product=ingero-fleet channel=stable
- gomod: github.com/ingero-io/ingero-fleet/extension/ingerothresholdextension v1.0.1
Rebuild your collector:
ocb --config your-builder-config.yaml
Add the Ingero components to your collector config:
processors:
ingero:
threshold:
k: 2.0
push_interval: 10s
extensions:
ingero_threshold:
agent_endpoint: 0.0.0.0:8080
service:
extensions: [ingero_threshold]
pipelines:
metrics:
processors: [ingero] # add to your existing pipeline
Option B: Kubernetes (Helm)
helm install ingero-fleet ./helm/ingero-fleet \
--namespace ingero --create-namespace \
--set replicaCount=2
Verify:
kubectl get pods -n ingero
kubectl port-forward -n ingero svc/ingero-fleet 8080:8080
curl http://localhost:8080/api/v1/threshold?cluster_id=your-cluster
Custom Values
helm install ingero-fleet ./helm/ingero-fleet \
--set config.threshold.k=2.5 \
--set config.push_interval=30s \
--set config.max_expected_nodes=100 \
--set replicaCount=3
See helm/ingero-fleet/values.yaml for all options.
Option C: Docker (Standalone)
docker run -d --name ingero-fleet \
-p 4317:4317 -p 4318:4318 -p 8080:8080 \
ghcr.io/ingero-io/ingero-fleet:latest
With custom config:
docker run -d --name ingero-fleet \
-v $(pwd)/fleet-config.yaml:/etc/ingero-fleet/config.yaml \
-p 4317:4317 -p 4318:4318 -p 8080:8080 \
ghcr.io/ingero-io/ingero-fleet:latest
Configure Agents
Point each Ingero agent at Fleet by adding to ingero.yaml:
fleet:
endpoint: https://fleet.example.com:4318
cluster_id: your-cluster-name
Or via environment:
sudo ingero trace --fleet-endpoint https://fleet:4318 --cluster-id prod-training
Verify
After agents start pushing, check the threshold API:
curl http://fleet:8080/api/v1/threshold?cluster_id=your-cluster
Expected response once quorum is met (5+ active nodes):
{"threshold":0.89,"quorum_met":true}
High Availability
Single replica (recommended for most clusters)
replicaCount: 1 is the chart default. Vertical scale is the path to larger clusters: a single g4dn.xlarge-class node carries 100+ pushing agents at 5s intervals with p99 handler latency under 20 ms.
helm install ingero-fleet ./helm/ingero-fleet --set replicaCount=1
Multi-replica HA (when you need it)
Each Fleet replica maintains its own in-memory score map. An agent push reaches ONE replica (selected by DNS or the service mesh); that replica's map is the only one that sees the score. Each replica computes its own threshold from its subset of agents.
For multi-replica: put an L7 load balancer with consistent-hash on the cluster_id query parameter (Envoy / nginx / service mesh) in front of Fleet. Every agent from one cluster lands on the same replica, eliminating cross-replica drift.
Size statistical_min for the per-replica visible node count, not the cluster-wide count. Alert on sum_over_replicas(ingero_fleet_active_nodes) < expected_total_nodes for replica starvation.
Larger-cluster topologies (gateway-based shared state) are out of scope for this release. Talk to us if you're approaching the per-replica vertical-scale ceiling.
See docs/architecture_fleet.md for the full behavior model and rationale.
LB config snippets
The agent encodes cluster_id as a query parameter on every OTLP push
URL (POST /v1/metrics?cluster_id=<id>). The threshold cache GET also
includes it. All three load balancers below hash on that parameter so
every push from a single cluster pins to one replica.
NGINX
upstream ingero_fleet {
hash $arg_cluster_id consistent;
server 10.0.0.10:4318;
server 10.0.0.11:4318;
server 10.0.0.12:4318;
keepalive 32;
}
server {
listen 4318;
location / {
proxy_pass http://ingero_fleet;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_buffering off;
proxy_request_buffering off;
}
}
hash $arg_cluster_id consistent reads the cluster_id query
parameter and uses NGINX's ketama-style consistent hash. With three
replicas and cluster_id="prod-cluster-eu", every push from that
cluster lands on the same upstream regardless of source IP. Adding or
removing one replica re-keys at most 1/N of clusters.
Envoy
static_resources:
listeners:
- name: ingero_fleet_listener
address:
socket_address: { address: 0.0.0.0, port_value: 4318 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingero_fleet
route_config:
virtual_hosts:
- name: ingero_fleet
domains: ["*"]
routes:
- match: { prefix: "/" }
route:
cluster: ingero_fleet
hash_policy:
- query_parameter: { name: cluster_id }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: ingero_fleet
connect_timeout: 5s
type: STRICT_DNS
lb_policy: RING_HASH
ring_hash_lb_config:
minimum_ring_size: 1024
load_assignment:
cluster_name: ingero_fleet
endpoints:
- lb_endpoints:
- endpoint: { address: { socket_address: { address: ingero-fleet-0.ingero-fleet, port_value: 4318 } } }
- endpoint: { address: { socket_address: { address: ingero-fleet-1.ingero-fleet, port_value: 4318 } } }
- endpoint: { address: { socket_address: { address: ingero-fleet-2.ingero-fleet, port_value: 4318 } } }
lb_policy: RING_HASH plus the query_parameter hash policy gives
the same consistent-hash semantics as NGINX. minimum_ring_size
controls the spread; 1024 is fine for fleets of <100 clusters.
Istio
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: ingero-fleet
namespace: ingero
spec:
host: ingero-fleet
trafficPolicy:
loadBalancer:
consistentHash:
httpQueryParameterName: cluster_id
connectionPool:
http:
h2UpgradePolicy: UPGRADE
idleTimeout: 30s
Istio routes all in-mesh traffic to ingero-fleet through this
DestinationRule. httpQueryParameterName: cluster_id is the same hash
key as NGINX and Envoy: agents picked up by the sidecar see the
matching replica without any agent-side config change.
Verifying the hash
After deploying any of the three above, set up a synthetic check that
sends ten pushes with the same cluster_id and asserts they all hit
the same replica's process metrics:
for i in $(seq 1 10); do
curl -s "http://<lb>:4318/v1/metrics?cluster_id=test-hash" \
-X POST -H 'Content-Type: application/json' \
-d '{"resourceMetrics":[]}' > /dev/null
done
# Verify exactly one replica's `ingero_fleet_total_pushes` counter
# advanced by 10:
kubectl -n ingero exec -it ingero-fleet-0 -- \
curl -s localhost:8888/metrics | grep ingero_fleet_total_pushes
kubectl -n ingero exec -it ingero-fleet-1 -- \
curl -s localhost:8888/metrics | grep ingero_fleet_total_pushes
Cloud Quick Start Scripts
AWS (EKS)
# Assumes: EKS cluster with GPU nodes running Ingero agent
helm install ingero-fleet ./helm/ingero-fleet \
--namespace ingero --create-namespace \
--set replicaCount=2
TensorDock / LambdaLabs / Bare Metal
# Run Fleet on any machine reachable by GPU nodes
docker run -d --name ingero-fleet \
-p 4317:4317 -p 4318:4318 -p 8080:8080 \
ghcr.io/ingero-io/ingero-fleet:latest
# Point agents at it
sudo ingero trace --fleet-endpoint http://fleet-host:4318 --cluster-id my-cluster