Single-node Echo deployment
May 17, 2026 ยท View on GitHub
The Echo binary works the same on a single host as it does inside a cluster. This walkthrough sets up agent + Echo + Grafana on one Linux machine, which is the deployment shape the Ingero Grafana app plugin targets for single-host operators.
If you only need dashboards (no plugin, no MCP, no SQL queries), the agent + Prometheus + Grafana path is faster and is documented separately in the agent repo's quickstart. The setup below is for operators who want the plugin's MCP-tool panels and read-only SQL surface.
What you install
| Component | Source | Purpose |
|---|---|---|
ingero (agent) | https://github.com/ingero-io/ingero/releases | eBPF GPU profiler. Pushes OTLP to Echo. |
ingero-echo | https://github.com/ingero-io/ingero-fleet/releases | OTLP receiver + DuckDB store + HTTP API. |
| Grafana | https://grafana.com/grafana/download | Dashboards + plugin host. |
| Ingero Grafana app plugin | (ships separately as a Grafana app plugin; install link will be published in the Grafana plugin catalog) | Bundled dashboards + datasource. |
Quickstart
The shell below assumes a Linux host with an NVIDIA GPU + driver
installed. Run as a user with sudo for the agent; Echo + Grafana
run as your regular user.
# 1. Install the agent.
curl -sSL https://github.com/ingero-io/ingero/releases/latest/download/install.sh | bash
# 2. Install Echo.
curl -sSL https://github.com/ingero-io/ingero-fleet/releases/latest/download/install-echo.sh | bash
# 3. Mint a bearer token and start Echo on localhost.
#
# `--insecure-no-tls` ships plaintext-only on loopback and is
# fine for a single-host trial where nothing leaves the box.
# Production deployments switch to TLS (see "Production
# hardening" below); the binary refuses to start otherwise.
export INGERO_BEARER=$(openssl rand -hex 32)
ingero-echo serve \
--otlp-addr 127.0.0.1:4317 \
--mcp-addr 127.0.0.1:8081 \
--health-addr 127.0.0.1:8080 \
--db-path ~/.ingero/echo.db \
--auth-token "$INGERO_BEARER" \
--insecure-no-tls &
# 4. Start the agent, pointing at local Echo.
sudo ingero trace \
--prometheus :9090 \
--otlp-endpoint http://127.0.0.1:4317 \
--otlp-auth "Bearer $INGERO_BEARER" &
# 5. Open Grafana, install the Ingero plugin, configure its
# datasource:
# - Endpoint: http://127.0.0.1:8081
# - Bearer: <paste $INGERO_BEARER value>
# Click Test connection -> green. Dashboards auto-import.
About five minutes end-to-end. The dashboards include both single-host views (per-PID GPU op profile, memcpy bandwidth, throttle history) and cluster views (NCCL stragglers, per-node drill-down). On a single host the cluster views show one cluster of one node; the dashboards still work, just with a smaller fleet.
What this deployment is NOT
- Multi-host: if you want a single Echo aggregating across
multiple GPU hosts, install Ingero Fleet (the OTel Collector
distribution this repo is named after) and follow
docs/deployment_fleet.md. Fleet adds the peer-fan-in topology + multi-tenant ACL surface the single-node setup does not need. - Production-hardened by default: the bearer is on the command line for simplicity above; production deployments source it from a file or env var, run Echo behind TLS, and put the binary behind systemd.
Production hardening
When moving past the trial setup:
- TLS on the MCP/httpapi listener. Drop the
--insecure-no-tlsflag and supply--tls-cert+--tls-keypointing at PEM-encoded files. The binary validates the cert- key at boot and refuses to start if they don't load. Required whenever Echo binds to anything other than loopback.
- Bearer rotation. Three procedures, by deployment shape; see Bearer rotation procedures below.
- Persistent DB. The
--db-patharg defaults to~/.ingero/echo.db; back up that file the same way you back up any local store. DuckDB is a single file so rsync works. - Systemd unit. A reference unit lives at
packaging/systemd/ingero-echo.service(if present in the release archive).
Bearer rotation procedures
Bearer tokens should be rotated periodically (90 days is a reasonable
floor for production deployments) and immediately if a token is
suspected compromised. Three procedures are supported; pick by
operational shape. Procedure C (SIGHUP rotation) lands in v0.18 and
is the recommended path for any deployment that already uses
--auth-token-file.
Procedure A: brief restart (recommended for single-node)
Best for the single-host setup this guide describes. Window of unavailability is sub-second; in-flight requests are dropped (Echo does not graceful-drain today; tracked for v0.18).
# 1. Mint the new token.
export INGERO_BEARER_NEW=$(openssl rand -hex 32)
# 2. Stop the current Echo.
pkill -TERM -f ingero-echo
# 3. Start Echo with the new token.
ingero-echo serve \
--otlp-addr 127.0.0.1:4317 \
--mcp-addr 127.0.0.1:8081 \
--health-addr 127.0.0.1:8080 \
--db-path ~/.ingero/echo.db \
--auth-token "$INGERO_BEARER_NEW" \
--insecure-no-tls &
# 4. Update the agent + Grafana datasource to use the new token.
# Agent: restart with --otlp-auth "Bearer $INGERO_BEARER_NEW"
# Grafana: edit the datasource bearer field, click Test
# connection -> green.
After step 3 and before step 4, the Grafana plugin's next probe
returns 401 with X-Request-Id for correlation. The plugin shows
"Test connection failed: unauthorized"; the operator pastes the new
bearer and re-tests. The agent's OTLP push retries with the next
batch (default 5s interval) once its bearer is updated.
Procedure B: parallel listeners (recommended for fleet / multi-tenant)
Best when zero-downtime matters more than operational simplicity. Two Echo instances run in parallel on sibling ports; the plugin's datasource flips from old to new; the old instance drains and shuts down. No in-flight loss.
# 1. Start a second Echo on a sibling port with the new token.
export INGERO_BEARER_NEW=$(openssl rand -hex 32)
ingero-echo serve \
--otlp-addr 127.0.0.1:4318 \
--mcp-addr 127.0.0.1:8082 \
--health-addr 127.0.0.1:8083 \
--db-path ~/.ingero/echo.db.new \
--auth-token "$INGERO_BEARER_NEW" \
--tls-cert /etc/ingero/echo-tls.pem \
--tls-key /etc/ingero/echo-tls.key &
# 2. Reconfigure the agent + Grafana datasource to point at the new
# instance (port 8082, new bearer). Verify dashboards populate.
# 3. Drain the old instance for 5-10 minutes (the agent push interval
# + Grafana panel refresh cycles need to settle).
# 4. Stop the old instance, merge the data file if needed (see note).
pkill -f 'ingero-echo.*8081'
Caveat: this procedure runs Echo against two DuckDB files. The new instance starts empty; the old retains historical events. Choices:
- Reset retention on rotation. Discard old echo.db; new instance starts with empty history. Acceptable for most operational metrics use cases (historical drill-down loses pre-rotation data).
- Merge files. Stop the old instance, DuckDB
ATTACHthe old file to the new instance,INSERT INTO new.events SELECT * FROM old.events. Practical for retention windows up to ~1M rows; larger fleets should script this.
Helm deployments: add a secondary: block in
helm/ingero-echo/values.yaml with its own --auth-token value
and a separate Service to bind the new port. Roll the old Service
out after the cutover. (Helm values structure for this is not
shipped in v0.17.1 templates yet; the chart needs a follow-up to
parameterize it. Track on a v0.18 issue.)
Procedure C: SIGHUP rotation (v0.18, recommended)
Shipped in v0.18.0. Zero-restart rotation with an accept-both grace
window. Echo re-reads --auth-token-file on SIGHUP and serves
BOTH the previous and the new bearer for the configured grace
window (default 5 minutes). After grace the previous bearer is
rejected on all three transports (OTLP gRPC, MCP HTTP, httpapi).
Prereqs:
- Echo started with
--auth-token-file <path>instead of (or in addition to)--auth-token. The file path is what Echo re-reads on each SIGHUP. - File mode must be
0600(or stricter). Symlinks are refused. File owner must match Echo's effective UID, or be root.
# 1. Mint the new token and write it to the existing token file.
# Use an atomic rename so a partial write never lands.
export INGERO_BEARER_NEW=$(openssl rand -hex 32)
umask 0077 # ensure 0600 mode on the new file
printf '%s\n' "$INGERO_BEARER_NEW" > /etc/ingero/echo-token.new
chmod 0600 /etc/ingero/echo-token.new
mv -f /etc/ingero/echo-token.new /etc/ingero/echo-token
# 2. Signal Echo to re-read the token file.
pkill -HUP -f ingero-echo
# OR with the pidfile: kill -HUP "$(cat /run/ingero-echo.pid)"
# 3. Echo emits an audit line:
# event=bearer_rotation_applied
# live_hash=<sha256-full-hex of new bearer>
# grace_hashes=[<sha256-full-hex of previous bearer>]
# grace_expires_at=<rfc3339 timestamp>
# Both the old and new bearer authenticate during the grace
# window (default 5 minutes; configurable with --rotation-grace).
# 4. Update the agent + Grafana datasource to use the new token
# inside the grace window. Once you confirm the new token works
# everywhere, the old token expires automatically at
# grace_expires_at; no further action required.
Failure modes (Echo logs ERROR or WARN; the previous accept-set stays in effect โ no silent fallthrough):
| Condition | Echo response |
|---|---|
| Token file missing | event=bearer_rotation_failed code=missing |
| Token file mode 0640/0644/world-readable | event=bearer_rotation_failed code=mode_too_open |
| Token file is a symlink | event=bearer_rotation_failed code=symlink_refused |
| Token file empty / whitespace only | event=bearer_rotation_failed code=empty |
| Token equals current live token | event=bearer_rotation_noop code=identical_token (WARN, not error) |
| Grace cap exceeded (default 3 concurrent grace) | event=bearer_rotation_failed code=grace_cap_exceeded |
SIGHUP received without --auth-token-file | event=bearer_rotation_skipped reason=no_token_file |
Flags:
--auth-token-file <path> (required for SIGHUP rotation)
--rotation-grace <duration> (default 5m; 0 = hard cutover)
--rotation-max-grace-tokens <int> (default 3)
Audit-log compatibility: every authenticated request's existing
audit line carries the bearer_hash it matched against. Operators
can grep bearer_hash=<old> grace_expires_at=... to watch the
old bearer drain across the grace window.
Verifying the install
# Echo is alive:
curl -fsS http://127.0.0.1:8081/api/v1/health
# Capability negotiation (no auth needed):
curl -fsS http://127.0.0.1:8081/api/versions
# Tool catalog (bearer needed):
curl -fsS \
-H "Authorization: Bearer $INGERO_BEARER" \
http://127.0.0.1:8081/api/v1/tools/list
# Sample query through the SQL endpoint:
curl -fsS \
-X POST \
-H "Authorization: Bearer $INGERO_BEARER" \
-H "Content-Type: application/json" \
-d '{"sql":"SELECT COUNT(*) FROM events"}' \
http://127.0.0.1:8081/api/v1/sql
If /api/v1/health returns 200 and the SQL response shows a
non-zero row count, the agent is pushing successfully and the
plugin will see populated dashboards.
Limits
- Single-node ceiling is ~5K events/sec sustained on commodity hardware. The agent's default batch size + push interval keep the receiver well under that.
- DuckDB store grows linearly with retained events. Default
retention is forever; truncate the database or
DELETE FROM eventsperiodically if disk pressure shows up. - Echo runs as a single process; no high-availability story for single-node setups. Cluster deployments via Fleet handle this through the Fleet collector's redundancy.