Single-node Echo deployment

May 17, 2026 · View on GitHub

The Echo binary works the same on a single host as it does inside a cluster. This walkthrough sets up agent + Echo + Grafana on one Linux machine, which is the deployment shape the Ingero Grafana app plugin targets for single-host operators.

If you only need dashboards (no plugin, no MCP, no SQL queries), the agent + Prometheus + Grafana path is faster and is documented separately in the agent repo's quickstart. The setup below is for operators who want the plugin's MCP-tool panels and read-only SQL surface.

What you install

Component	Source	Purpose
`ingero` (agent)	https://github.com/ingero-io/ingero/releases	eBPF GPU profiler. Pushes OTLP to Echo.
`ingero-echo`	https://github.com/ingero-io/ingero-fleet/releases	OTLP receiver + DuckDB store + HTTP API.
Grafana	https://grafana.com/grafana/download	Dashboards + plugin host.
Ingero Grafana app plugin	(ships separately as a Grafana app plugin; install link will be published in the Grafana plugin catalog)	Bundled dashboards + datasource.

Quickstart

The shell below assumes a Linux host with an NVIDIA GPU + driver installed. Run as a user with sudo for the agent; Echo + Grafana run as your regular user.

# 1. Install the agent.
curl -sSL https://github.com/ingero-io/ingero/releases/latest/download/install.sh | bash

# 2. Install Echo.
curl -sSL https://github.com/ingero-io/ingero-fleet/releases/latest/download/install-echo.sh | bash

# 3. Mint a bearer token and start Echo on localhost.
#
# `--insecure-no-tls` ships plaintext-only on loopback and is
# fine for a single-host trial where nothing leaves the box.
# Production deployments switch to TLS (see "Production
# hardening" below); the binary refuses to start otherwise.
export INGERO_BEARER=$(openssl rand -hex 32)
ingero-echo serve \
  --otlp-addr        127.0.0.1:4317 \
  --mcp-addr         127.0.0.1:8081 \
  --health-addr      127.0.0.1:8080 \
  --db-path          ~/.ingero/echo.db \
  --auth-token       "$INGERO_BEARER" \
  --insecure-no-tls &

# 4. Start the agent, pointing at local Echo.
sudo ingero trace \
  --prometheus    :9090 \
  --otlp-endpoint http://127.0.0.1:4317 \
  --otlp-auth     "Bearer $INGERO_BEARER" &

# 5. Open Grafana, install the Ingero plugin, configure its
#    datasource:
#      - Endpoint: http://127.0.0.1:8081
#      - Bearer:   <paste $INGERO_BEARER value>
#    Click Test connection -> green. Dashboards auto-import.

About five minutes end-to-end. The dashboards include both single-host views (per-PID GPU op profile, memcpy bandwidth, throttle history) and cluster views (NCCL stragglers, per-node drill-down). On a single host the cluster views show one cluster of one node; the dashboards still work, just with a smaller fleet.

What this deployment is NOT

Multi-host: if you want a single Echo aggregating across multiple GPU hosts, install Ingero Fleet (the OTel Collector distribution this repo is named after) and follow docs/deployment_fleet.md. Fleet adds the peer-fan-in topology + multi-tenant ACL surface the single-node setup does not need.
Production-hardened by default: the bearer is on the command line for simplicity above; production deployments source it from a file or env var, run Echo behind TLS, and put the binary behind systemd.

Production hardening

When moving past the trial setup:

TLS on the MCP/httpapi listener. Drop the --insecure-no-tls flag and supply --tls-cert + --tls-key pointing at PEM-encoded files. The binary validates the cert
- key at boot and refuses to start if they don't load. Required whenever Echo binds to anything other than loopback.
Bearer rotation. Three procedures, by deployment shape; see Bearer rotation procedures below.
Persistent DB. The --db-path arg defaults to ~/.ingero/echo.db; back up that file the same way you back up any local store. DuckDB is a single file so rsync works.
Systemd unit. A reference unit lives at packaging/systemd/ingero-echo.service (if present in the release archive).

Bearer rotation procedures

Bearer tokens should be rotated periodically (90 days is a reasonable floor for production deployments) and immediately if a token is suspected compromised. Three procedures are supported; pick by operational shape. Procedure C (SIGHUP rotation) lands in v0.18 and is the recommended path for any deployment that already uses --auth-token-file.

Procedure A: brief restart (recommended for single-node)

Best for the single-host setup this guide describes. Window of unavailability is sub-second; in-flight requests are dropped (Echo does not graceful-drain today; tracked for v0.18).

# 1. Mint the new token.
export INGERO_BEARER_NEW=$(openssl rand -hex 32)

# 2. Stop the current Echo.
pkill -TERM -f ingero-echo

# 3. Start Echo with the new token.
ingero-echo serve \
  --otlp-addr   127.0.0.1:4317 \
  --mcp-addr    127.0.0.1:8081 \
  --health-addr 127.0.0.1:8080 \
  --db-path     ~/.ingero/echo.db \
  --auth-token  "$INGERO_BEARER_NEW" \
  --insecure-no-tls &

# 4. Update the agent + Grafana datasource to use the new token.
#    Agent: restart with --otlp-auth "Bearer $INGERO_BEARER_NEW"
#    Grafana: edit the datasource bearer field, click Test
#             connection -> green.

After step 3 and before step 4, the Grafana plugin's next probe returns 401 with X-Request-Id for correlation. The plugin shows "Test connection failed: unauthorized"; the operator pastes the new bearer and re-tests. The agent's OTLP push retries with the next batch (default 5s interval) once its bearer is updated.

Procedure B: parallel listeners (recommended for fleet / multi-tenant)

Best when zero-downtime matters more than operational simplicity. Two Echo instances run in parallel on sibling ports; the plugin's datasource flips from old to new; the old instance drains and shuts down. No in-flight loss.

# 1. Start a second Echo on a sibling port with the new token.
export INGERO_BEARER_NEW=$(openssl rand -hex 32)
ingero-echo serve \
  --otlp-addr   127.0.0.1:4318 \
  --mcp-addr    127.0.0.1:8082 \
  --health-addr 127.0.0.1:8083 \
  --db-path     ~/.ingero/echo.db.new \
  --auth-token  "$INGERO_BEARER_NEW" \
  --tls-cert    /etc/ingero/echo-tls.pem \
  --tls-key     /etc/ingero/echo-tls.key &

# 2. Reconfigure the agent + Grafana datasource to point at the new
#    instance (port 8082, new bearer). Verify dashboards populate.
# 3. Drain the old instance for 5-10 minutes (the agent push interval
#    + Grafana panel refresh cycles need to settle).
# 4. Stop the old instance, merge the data file if needed (see note).
pkill -f 'ingero-echo.*8081'

Caveat: this procedure runs Echo against two DuckDB files. The new instance starts empty; the old retains historical events. Choices:

Reset retention on rotation. Discard old echo.db; new instance starts with empty history. Acceptable for most operational metrics use cases (historical drill-down loses pre-rotation data).
Merge files. Stop the old instance, DuckDB ATTACH the old file to the new instance, INSERT INTO new.events SELECT * FROM old.events. Practical for retention windows up to ~1M rows; larger fleets should script this.

Helm deployments: add a secondary: block in helm/ingero-echo/values.yaml with its own --auth-token value and a separate Service to bind the new port. Roll the old Service out after the cutover. (Helm values structure for this is not shipped in v0.17.1 templates yet; the chart needs a follow-up to parameterize it. Track on a v0.18 issue.)

Procedure C: SIGHUP rotation (v0.18, recommended)

Shipped in v0.18.0. Zero-restart rotation with an accept-both grace window. Echo re-reads --auth-token-file on SIGHUP and serves BOTH the previous and the new bearer for the configured grace window (default 5 minutes). After grace the previous bearer is rejected on all three transports (OTLP gRPC, MCP HTTP, httpapi).

Prereqs:

Echo started with --auth-token-file <path> instead of (or in addition to) --auth-token. The file path is what Echo re-reads on each SIGHUP.
File mode must be 0600 (or stricter). Symlinks are refused. File owner must match Echo's effective UID, or be root.

# 1. Mint the new token and write it to the existing token file.
#    Use an atomic rename so a partial write never lands.
export INGERO_BEARER_NEW=$(openssl rand -hex 32)
umask 0077  # ensure 0600 mode on the new file
printf '%s\n' "$INGERO_BEARER_NEW" > /etc/ingero/echo-token.new
chmod 0600 /etc/ingero/echo-token.new
mv -f /etc/ingero/echo-token.new /etc/ingero/echo-token

# 2. Signal Echo to re-read the token file.
pkill -HUP -f ingero-echo
#    OR with the pidfile: kill -HUP "$(cat /run/ingero-echo.pid)"

# 3. Echo emits an audit line:
#      event=bearer_rotation_applied
#      live_hash=<sha256-full-hex of new bearer>
#      grace_hashes=[<sha256-full-hex of previous bearer>]
#      grace_expires_at=<rfc3339 timestamp>
#    Both the old and new bearer authenticate during the grace
#    window (default 5 minutes; configurable with --rotation-grace).

# 4. Update the agent + Grafana datasource to use the new token
#    inside the grace window. Once you confirm the new token works
#    everywhere, the old token expires automatically at
#    grace_expires_at; no further action required.

Failure modes (Echo logs ERROR or WARN; the previous accept-set stays in effect — no silent fallthrough):

Condition	Echo response
Token file missing	`event=bearer_rotation_failed code=missing`
Token file mode 0640/0644/world-readable	`event=bearer_rotation_failed code=mode_too_open`
Token file is a symlink	`event=bearer_rotation_failed code=symlink_refused`
Token file empty / whitespace only	`event=bearer_rotation_failed code=empty`
Token equals current live token	`event=bearer_rotation_noop code=identical_token` (WARN, not error)
Grace cap exceeded (default 3 concurrent grace)	`event=bearer_rotation_failed code=grace_cap_exceeded`
SIGHUP received without `--auth-token-file`	`event=bearer_rotation_skipped reason=no_token_file`

Flags:

--auth-token-file <path>            (required for SIGHUP rotation)
--rotation-grace <duration>         (default 5m; 0 = hard cutover)
--rotation-max-grace-tokens <int>   (default 3)

Audit-log compatibility: every authenticated request's existing audit line carries the bearer_hash it matched against. Operators can grep bearer_hash=<old> grace_expires_at=... to watch the old bearer drain across the grace window.

Verifying the install

# Echo is alive:
curl -fsS http://127.0.0.1:8081/api/v1/health

# Capability negotiation (no auth needed):
curl -fsS http://127.0.0.1:8081/api/versions

# Tool catalog (bearer needed):
curl -fsS \
  -H "Authorization: Bearer $INGERO_BEARER" \
  http://127.0.0.1:8081/api/v1/tools/list

# Sample query through the SQL endpoint:
curl -fsS \
  -X POST \
  -H "Authorization: Bearer $INGERO_BEARER" \
  -H "Content-Type: application/json" \
  -d '{"sql":"SELECT COUNT(*) FROM events"}' \
  http://127.0.0.1:8081/api/v1/sql

If /api/v1/health returns 200 and the SQL response shows a non-zero row count, the agent is pushing successfully and the plugin will see populated dashboards.

Limits

Single-node ceiling is ~5K events/sec sustained on commodity hardware. The agent's default batch size + push interval keep the receiver well under that.
DuckDB store grows linearly with retained events. Default retention is forever; truncate the database or DELETE FROM events periodically if disk pressure shows up.
Echo runs as a single process; no high-availability story for single-node setups. Cluster deployments via Fleet handle this through the Fleet collector's redundancy.