docker-exporter

June 3, 2026 · View on GitHub

CI License: MIT Image size

docker-exporter

A Prometheus exporter for Docker container metrics, built in Rust for ARM64 homelabs running cgroup v2.

cAdvisor returns zero for container_memory_working_set_bytes and container_memory_rss on Raspberry Pi 5 — same cAdvisor build, same cgroup v2 + arm64 combination, no issue on other ARM64 SBCs (e.g. Rock 5B+). Memory dashboards silently lie. docker-exporter is a single-binary alternative that reads the Docker stats API directly, computes the working set correctly on both cgroup versions, talks to the socket read-only, and runs in ~7 MiB of RAM at idle.

Why this exists

The Pi 5 zero-metric bug isn't cAdvisor's only pain point — it also sits at 9-17% sustained CPU and 36-94 MB RAM per host, with six bind mounts and privileged mode. cAdvisor#2523 tracks more ARM-side issues. The tool isn't sized for SBC homelabs.

Existing alternatives were dormant or amd64-only. So I wrote a Rust replacement: single self-contained binary, no GC churn, ~7 MiB RAM and <1% CPU at steady state. Read-only socket, no privileged mode, cAdvisor-compatible metric names so dashboards swap without rewrites.

Features

  • Correct memory working set on cgroup v1 and v2 (usage − inactive_file on v2, usage − cache on v1)
  • Per-container CPU, memory, network, and disk I/O
  • Read-only Docker socket; runs as non-root (UID 65532) inside the container
  • On-demand collection — no background polling, stats fetched per scrape with a 5 s per-container timeout
  • Container exclusion filter via env var
  • /health and /ready endpoints (Docker HEALTHCHECK already wired in the image)

Resource use

  • Memory: ~7–10 MiB at idle, ~10–20 MiB scraping ~30 containers.
  • CPU: idle between scrapes (no background loop). Per scrape, work scales with container count and Docker daemon latency — typically a small single-digit percentage on a Raspberry Pi 5.
  • Scrape duration: usually 1–3 s, dominated by the Docker daemon's /containers/{id}/stats endpoint. Per-container concurrency with a 5 s timeout caps individual stalls.
  • Image: ~9 MB (static musl binary on distroless/static, non-root). Same on linux/amd64 and linux/arm64.

cAdvisor — when to pick which

docker-exporter is purpose-built for one job. cAdvisor monitors much more (host, processes, OOM events, hardware counters). Pick whichever matches your scope.

Dimensiondocker-exportercAdvisor
ScopeDocker containers onlyContainers + host + processes
Image size~9 MB (musl + distroless)~250 MB
RAM idle (small host, ~10 cont.)~7–10 MiB~80–150 MiB
cgroup v2 working set on ARM64CorrectReports raw RSS — known issue
Privileged container requiredNo (socket RO)Yes (mounts cgroup, proc, sys)
Built-in UINoYes

If you already run cAdvisor and your dashboards work, keep it. If you're hitting the cgroup-v2 memory bug or want a small footprint on an SBC, this exporter gets you back to accurate dashboards without privileged mode.

Quick start

docker run -d \
  --name docker-exporter \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -p 9713:9713 \
  --restart unless-stopped \
  ghcr.io/dlepaux/docker-exporter:latest

Image is published for linux/amd64 and linux/arm64.

Docker Compose

services:
  docker-exporter:
    image: ghcr.io/dlepaux/docker-exporter:latest
    container_name: docker-exporter
    restart: unless-stopped
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - "9713:9713"

Configuration

All configuration is via environment variables. All are optional.

VariableDefaultDescription
LISTEN_ADDR0.0.0.0:9713Address and port to bind the HTTP server
LOG_LEVELinfoLog verbosity: trace, debug, info, warn, error
EXCLUDE_CONTAINERS(empty)Comma-separated container name patterns to exclude
RUST_LOG(unset)Standard tracing filter — overrides LOG_LEVEL if set

EXCLUDE_CONTAINERS matches against the container name (without the leading /). Each comma-separated entry is a glob pattern: * matches any run of characters, ? a single character, [abc] a character class. Patterns are whole-name anchored, so cache-* excludes cache-redis but not app-cache. A plain name with no wildcards (prometheus) is an exact match — existing exact values keep working unchanged, no migration needed. A malformed pattern fails loudly at startup rather than being silently ignored.

EXCLUDE_CONTAINERS=prometheus,cadvisor,debug-*,*-sidecar

Endpoints

PathDescription
/metricsPrometheus exposition (text/plain; v=0.0.4)
/healthLiveness — 200 ok if Docker daemon reachable
/readyReadiness — 200 ready if Docker reachable

/health is already wired into the image's HEALTHCHECK, so docker ps will show health status out of the box.

Metrics

Exporter

MetricTypeLabelsDescription
docker_exporter_upgaugeDocker daemon reachable (1 = up, 0 = down)
docker_exporter_scrape_duration_secondsgaugeDuration of the last scrape in seconds

CPU

MetricTypeLabelsDescription
container_cpu_usage_seconds_totalcounterid, image, nameCumulative CPU usage in seconds

Memory

MetricTypeLabelsDescription
container_memory_usage_bytesgaugeid, image, nameCurrent memory usage in bytes (includes cache)
container_memory_working_set_bytesgaugeid, image, nameWorking set (usage − inactive_file v2, usage − cache v1)
container_memory_cachegaugeid, image, nameCache (inactive_file on cgroup v2, cache on v1)
container_memory_limit_bytesgaugeid, image, nameMemory limit in bytes

Network

MetricTypeLabelsDescription
container_network_receive_bytes_totalcounterid, image, name, interfaceCumulative network bytes received
container_network_transmit_bytes_totalcounterid, image, name, interfaceCumulative network bytes transmitted

Disk I/O

MetricTypeLabelsDescription
container_blkio_device_usage_totalcounterid, image, name, operationCumulative block I/O bytes (operation = read/write)

Block I/O is only exposed when the container has reported non-zero bytes — containers that never touch disk produce no blkio series, keeping cardinality down.

Container info

MetricTypeLabelsDescription
container_stategaugeid, image, name, state1 if state == "running", 0 otherwise
container_start_time_secondsgaugeid, image, nameContainer creation time as Unix timestamp
container_last_seengaugeid, image, nameLast scrape time as Unix timestamp

Sample output

What curl http://localhost:9713/metrics looks like (trimmed to ~10 lines, container names anonymized):

# HELP docker_exporter_up Whether the Docker daemon is reachable (1 = up, 0 = down)
# TYPE docker_exporter_up gauge
docker_exporter_up 1
# HELP docker_exporter_scrape_duration_seconds Duration of the last scrape in seconds
# TYPE docker_exporter_scrape_duration_seconds gauge
docker_exporter_scrape_duration_seconds 2.18
# HELP container_cpu_usage_seconds_total Cumulative CPU usage in seconds
# TYPE container_cpu_usage_seconds_total counter
container_cpu_usage_seconds_total{id="abc...",image="nginx:alpine",name="web-app"} 145.26
container_cpu_usage_seconds_total{id="def...",image="postgres:17",name="db"} 406.22
container_cpu_usage_seconds_total{id="ghi...",image="redis:7-alpine",name="cache"} 468.98
# HELP container_memory_working_set_bytes Current memory working set in bytes (usage minus cache)
# TYPE container_memory_working_set_bytes gauge
container_memory_working_set_bytes{id="abc...",image="nginx:alpine",name="web-app"} 37371904
container_memory_working_set_bytes{id="def...",image="postgres:17",name="db"} 77758464
container_memory_working_set_bytes{id="ghi...",image="redis:7-alpine",name="cache"} 5701632

Prometheus

scrape_configs:
  - job_name: docker
    static_configs:
      - targets: ["docker-exporter:9713"]

Default Prometheus scrape interval (15 s) is fine. Avoid going below 5 s — the Docker stats API has its own internal sampling window and scraping faster won't give you fresher data, just more load on the daemon.

Architecture

The exporter has no background loop. On each GET /metrics it lists all containers (running and stopped, so container_state reports them all), then fetches per-container stats concurrently via bollard::stats(stream=false) with a 5 s timeout each. Failed or timed-out containers are logged and skipped — they don't fail the whole scrape.

Working set is computed at exposition time from the Docker stats payload: inactive_file on cgroup v2, cache on v1, with a max(0, usage − cache) floor. The Prometheus output is built directly as MetricFamily protos (no Registry) so counter values can be emitted as absolute numbers without state across scrapes — the process holds zero shared mutable state between requests.

Scrape duration therefore tracks the Docker daemon's /containers/{id}/stats latency, not the exporter — the exporter itself adds <1 ms of overhead per scrape.

Troubleshooting

docker_exporter_up 0 or the container restarts immediately. The exporter calls docker.ping() at startup and exits non-zero if it fails. Three common causes:

  • Socket not mounted: confirm -v /var/run/docker.sock:/var/run/docker.sock:ro is on the run command and the path on the host actually exists (rootless Docker uses $XDG_RUNTIME_DIR/docker.sock instead).
  • Permission denied: the container runs as UID 65532 (distroless nonroot). The Docker socket is normally owned by root:docker. Add the docker group's GID to the container (--group-add $(getent group docker | cut -d: -f3)) or run the exporter on a host where the socket GID matches.
  • Daemon isn't listening on a Unix socket (e.g. only on TCP). bollard uses connect_with_socket_defaults() — Unix socket only at the moment.

A container doesn't appear in metrics.

  • Check EXCLUDE_CONTAINERS — the match is on the container name (without the leading /), comma-separated, glob-aware (a wildcard pattern like cache-* may be catching it; plain names match exactly).
  • The container was created after the last scrape and Prometheus hasn't pulled yet — wait one scrape interval.
  • Stopped containers do appear, with container_state{...} 0 and zero stats. If you see nothing at all for a container, the Docker daemon isn't returning it from /containers/json?all=true.

Memory differs from docker stats. docker stats shows usage − cache (working set). container_memory_usage_bytes here exposes the raw usage including cache; container_memory_working_set_bytes is the apples-to-apples comparison with docker stats. Difference between working set and docker stats after that is usually one scrape window of drift.

Scrape duration is high (>3 s). The bottleneck is the Docker daemon, not the exporter. Containers under heavy I/O sometimes block on stats — the 5 s per-container timeout caps individual stalls. If you consistently see 4–5 s scrapes, your daemon is overloaded; check dockerd CPU and disk pressure.

Versioning and compatibility

  • Docker API: bollard 0.20 negotiates against API ≥ 1.41 (Docker 20.10+). Tested against Docker 27.x and 28.x daemons.
  • cgroup: tested on cgroup v1 and v2. Working-set computation handles both.
  • Architectures: linux/amd64, linux/arm64. No 32-bit ARM build (no demand; PRs welcome).
  • Rust: pinned to 1.85 stable in rust-toolchain.toml.
  • Versioning: SemVer, automated by release-please. Breaking changes to metric names or labels would be a major bump — none planned.

Development

cargo build           # debug build
cargo test            # unit + integration tests (some require a local Docker daemon)
cargo clippy          # lints
cargo fmt             # format
cargo build --release # release binary in target/release/docker-exporter

Tests that need a Docker daemon skip themselves (with a stderr note) when the socket isn't reachable, so cargo test is safe to run anywhere.

Acknowledgments

Built on:

  • bollard — Docker Engine API client for Rust
  • prometheus — Prometheus client / text encoder
  • axum — HTTP server
  • tokio — async runtime

Related projects:

  • cAdvisor — Google's container monitor; broader scope, larger footprint, broken cgroup-v2 memory on ARM64 (the reason this exporter exists).
  • docker_stats_exporter — Python equivalent; same scope, but no working-set fix and a heavier base image.

Security

Found a vulnerability? Please follow the disclosure policy in SECURITY.md — do not open a public issue.

License

MIT