kpod-metrics

March 30, 2026 · View on GitHub

CI License Artifact Hub GitHub release

eBPF-based pod-level kernel metrics collector for Kubernetes. Runs as a DaemonSet, attaches eBPF programs to kernel tracepoints, and exports per-pod CPU, network, memory, syscall, disk I/O, and filesystem metrics to Prometheus.

Demo

kpod-metrics demo

Architecture

Node (DaemonSet pod)
┌─────────────────────────────────────────────────┐
│  Spring Boot (JDK 21 + Virtual Threads)         │
│                                                  │
│  MetricsCollectorService (every 30s default)    │
│  ├── eBPF Collectors ──► JNI ──► BPF Maps      │
│  │   ├── CpuSchedulingCollector                 │
│  │   ├── NetworkCollector                       │
│  │   ├── MemoryCollector                        │
│  │   ├── SyscallCollector                       │
│  │   ├── BiolatencyCollector                    │
│  │   ├── CachestatCollector                     │
│  │   ├── TcpdropCollector                       │
│  │   ├── HardirqsCollector                      │
│  │   ├── SoftirqsCollector                      │
│  │   ├── ExecsnoopCollector                     │
│  │   └── BpfMapStatsCollector                   │
│  └── Cgroup Collectors ──► /sys/fs/cgroup       │
│      ├── DiskIOCollector                        │
│      ├── InterfaceNetworkCollector              │
│      ├── FilesystemCollector                    │
│      └── MemoryCgroupCollector                  │
│                                                  │
│  PodWatcher (K8s informer, node-scoped)         │
│  CgroupResolver (cgroup ID → pod metadata)      │
│  Prometheus exporter (:9090/actuator/prometheus) │
└─────────────────────────────────────────────────┘
         │ JNI (libkpod_bpf.so)
    ┌────▼────────────────────────┐
    │ Linux Kernel                │
    │ ├── cpu_sched.bpf.o        │
    │ ├── net.bpf.o              │
    │ ├── mem.bpf.o              │
    │ └── syscall.bpf.o          │
    │                             │
    │ Tracepoints: sched_switch,  │
    │ tcp_sendmsg, oom_kill,      │
    │ sys_enter/exit, ...         │
    └─────────────────────────────┘

eBPF programs are defined in Kotlin using kotlin-ebpf-dsl, which generates both the C code for kernel-side programs and Kotlin MapReader classes for userspace deserialization. Programs are compiled once with CO-RE (Compile Once, Run Everywhere) using kernel BTF, so no per-kernel compilation is needed.

Metrics

All metrics are labeled with namespace, pod, container, and node.

eBPF Metrics

MetricTypeDescription
kpod.cpu.runqueue.latencyDistributionSummaryTime spent waiting in the CPU run queue (seconds)
kpod.cpu.context.switchesCounterContext switch count
kpod.net.tcp.bytes.sentCounterTCP bytes sent
kpod.net.tcp.bytes.receivedCounterTCP bytes received
kpod.net.tcp.retransmitsCounterTCP retransmissions
kpod.net.tcp.connectionsCounterTCP connection count
kpod.net.tcp.rttDistributionSummaryTCP round-trip time (seconds)
kpod.mem.oom.killsCounterOOM kill events
kpod.mem.major.page.faultsCounterMajor page faults
kpod.syscall.countCounterSyscall invocations (+ syscall label)
kpod.syscall.errorsCounterSyscall errors (+ syscall label)
kpod.syscall.latencyDistributionSummarySyscall latency (+ syscall label)
kpod.net.tcp.dropsCounterTCP packet drops
kpod.disk.io.latencyDistributionSummaryBlock I/O latency (seconds)
kpod.mem.cache.accessesCounterPage cache accesses
kpod.mem.cache.additionsCounterPage cache additions (misses)
kpod.mem.cache.dirtiedCounterPage cache dirty pages
kpod.mem.cache.buf.dirtiedCounterBuffer cache dirty pages
kpod.irq.hw.latencyDistributionSummaryHardware interrupt latency (seconds)
kpod.irq.hw.countCounterHardware interrupt count
kpod.irq.sw.latencyDistributionSummarySoftware interrupt latency (seconds)
kpod.proc.execsCounterProcess exec events
kpod.proc.forksCounterProcess fork events
kpod.proc.exitsCounterProcess exit events

Cgroup Metrics

MetricTypeExtra LabelsDescription
kpod.disk.read.bytesCounterdeviceBytes read from disk
kpod.disk.written.bytesCounterdeviceBytes written to disk
kpod.disk.readsCounterdeviceRead operation count
kpod.disk.writesCounterdeviceWrite operation count
kpod.net.iface.rx.bytesCounterinterfaceInterface bytes received
kpod.net.iface.tx.bytesCounterinterfaceInterface bytes transmitted
kpod.net.iface.rx.packetsCounterinterfaceInterface packets received
kpod.net.iface.tx.packetsCounterinterfaceInterface packets transmitted
kpod.net.iface.rx.errorsCounterinterfaceInterface receive errors
kpod.net.iface.tx.errorsCounterinterfaceInterface transmit errors
kpod.net.iface.rx.dropsCounterinterfaceInterface receive drops
kpod.net.iface.tx.dropsCounterinterfaceInterface transmit drops
kpod.fs.capacity.bytesGaugemountpointFilesystem total capacity
kpod.fs.usage.bytesGaugemountpointFilesystem used bytes
kpod.fs.available.bytesGaugemountpointFilesystem available bytes

Memory Cgroup Metrics

MetricTypeDescription
kpod.mem.cgroup.usage.bytesGaugeCurrent memory usage
kpod.mem.cgroup.peak.bytesGaugePeak memory usage
kpod.mem.cgroup.cache.bytesGaugePage cache usage
kpod.mem.cgroup.swap.bytesGaugeSwap usage

Pod Lifecycle Metrics

MetricTypeLabelsDescription
kpod.container.restartsGaugecontainerContainer restart count from K8s API

Self-Monitoring Metrics

MetricTypeLabelsDescription
kpod.collection.cycle.durationTimerFull collection cycle duration
kpod.collector.durationTimercollectorPer-collector execution time
kpod.collector.errors.totalCountercollectorPer-collector failure count
kpod.collector.skipped.totalCountercollectorInterval-based collector skips
kpod.collection.timeouts.totalCounterCollection timeout count
kpod.discovery.pods.totalGaugeDiscovered pods per cycle
kpod.cgroup.read.errorsCountercollectorCgroup read failures
kpod.bpf.program.load.durationTimerprogramBPF program load time at startup

BPF Map Diagnostics

MetricTypeLabelsDescription
kpod.bpf.map.entriesGaugemapCurrent entry count in BPF map
kpod.bpf.map.capacityGaugemapMax entries per map (10240)
kpod.bpf.map.update.errors.totalCountermapBPF map update failures

Profiles

Control which metrics are collected via the kpod.profile setting:

Collectorminimalstandardcomprehensive
CPU schedulingyesyesyes
Network TCP (eBPF)-yesyes
TCP drops (eBPF)-yesyes
Memory OOMyesyesyes
Memory page faults-yesyes
Block I/O latency (eBPF)-yesyes
Page cache stats (eBPF)-yesyes
Hardware IRQ latency (eBPF)--yes
Software IRQ latency (eBPF)--yes
Process exec/fork/exit (eBPF)--yes
Syscall tracing--yes
Disk I/O (cgroup)yesyesyes
Interface network (cgroup)-yesyes
Filesystem (cgroup)-yesyes

Estimated cardinality per pod: minimal ~20, standard ~39, comprehensive ~69 time series.

Prerequisites

  • Linux kernel 4.18+ (5.2+ recommended for CO-RE/BTF)
  • Cgroup v2 (default on Kubernetes 1.25+)
  • Kubernetes 1.19+

The image ships two sets of compiled BPF programs. At startup, kpod-metrics checks for /sys/kernel/btf/vmlinux and automatically loads the appropriate set.

Kernel Version Support

KernelModeHow it works
5.2+CO-RE (recommended)Uses BTF for portable BPF loading. All features supported. Most distros since RHEL 8.2, Ubuntu 20.04, Debian 11.
4.18–5.1LegacyUses pre-compiled BPF programs with fixed struct offsets. All features supported, but BPF objects are not relocatable across kernel builds with non-standard tracepoint layouts.
< 4.18Not supportedMissing bpf_get_current_cgroup_id() helper required for per-pod attribution.

Limitations of legacy mode (4.18–5.1):

  • Tracepoint context struct layouts are assumed to match the stable kernel ABI. Custom or patched kernels that alter tracepoint format fields may cause incorrect data or load failures.
  • No automatic struct relocation — if a field offset changes, the BPF program must be recompiled with an updated compat_vmlinux.h.

How to verify your kernel supports kpod-metrics:

# Check kernel version
uname -r

# Check if BTF is available (5.2+ with CONFIG_DEBUG_INFO_BTF=y)
ls /sys/kernel/btf/vmlinux

# Check cgroup v2
mount | grep cgroup2

Required kernel config (typically enabled by default on modern distros):

CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT=y
CONFIG_DEBUG_INFO_BTF=y  # Required only for CO-RE path; optional on 4.18+

Quick Start

Deploy with Helm

helm repo add kpod-metrics https://pjs7678.github.io/kpod-metrics
helm repo update
helm install kpod-metrics kpod-metrics/kpod-metrics \
  --namespace kpod-metrics --create-namespace

Or from a local clone:

helm install kpod-metrics ./helm/kpod-metrics \
  --namespace kpod-metrics --create-namespace

Try It Locally (kind)

Spin up a local demo cluster with a single command:

./scripts/quickstart.sh

This creates a kind cluster, installs kpod-metrics, and sets up port-forwarding so you can immediately view metrics. Run ./scripts/quickstart.sh --cleanup to tear it down.

Verify

# Check the DaemonSet is running
kubectl -n kpod-metrics get pods

# Check metrics are being exported
kubectl -n kpod-metrics port-forward ds/kpod-metrics 9090:9090
curl http://localhost:9090/actuator/prometheus | grep kpod

Service Topology

Auto-discovered service dependency graph from eBPF TCP peer data — no configuration, no sidecars.

Service Topology Demo

Edges show avg + p99 latency, request rate, and auto-detected protocol. Nodes show aggregated traffic, protocol mix, and TCP drops. See docs/topology.md for details.

# View topology API
kubectl -n kpod-metrics port-forward ds/kpod-metrics 9090:9090
curl http://localhost:9090/actuator/kpodTopology | python3 -m json.tool

Grafana Dashboard

A ready-made Grafana dashboard is included with 9 rows covering all metric categories. It auto-provisions via the Grafana sidecar when deployed with Helm:

grafana:
  dashboard:
    enabled: true   # default
    label: "1"      # matches Grafana sidecar default

For non-Helm setups, import grafana/kpod-metrics-dashboard.json directly via the Grafana UI.

Prometheus Operator

For clusters running the Prometheus Operator, enable the ServiceMonitor and PrometheusRule:

serviceMonitor:
  enabled: true
  interval: 30s

prometheusRule:
  enabled: true

This provisions 18 alerting rules including: high runqueue latency, TCP retransmits/drops, syscall error rate, filesystem full, BPF map health, container restart rate, crash loop detection, memory pressure, collector skip rate, and fork/exec bomb detection. Plus 17 recording rules for precomputed p50/p90/p99 aggregations.

OTLP Export

Push metrics to any OpenTelemetry-compatible collector alongside Prometheus scraping:

otlp:
  enabled: true
  endpoint: "http://otel-collector:4318/v1/metrics"
  headers:
    api-key: "my-api-key"
  step: 60000   # push interval in ms

When enabled, an OtlpMeterRegistry is created that pushes all kpod metrics via OTLP/HTTP. This works in parallel with Prometheus scraping — both registries receive the same metrics.

Configuration

All settings are under the kpod.* prefix. Configure via Helm values or environment variables.

Helm Values

image:
  repository: ghcr.io/pjs7678/kpod-metrics
  tag: "1.11.0"

resources:
  requests:
    cpu: 150m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

config:
  profile: standard          # minimal | standard | comprehensive | custom
  pollInterval: 30000        # Collection interval in ms
  discovery:
    mode: informer           # informer (K8s API) or kubelet (Kubelet API)
    kubeletPollInterval: 30  # seconds, for kubelet mode
  cgroup:
    root: /sys/fs/cgroup
    procRoot: /host/proc

grafana:
  dashboard:
    enabled: true            # Deploy Grafana dashboard ConfigMap
    label: "1"               # Sidecar label selector value

serviceMonitor:
  enabled: false             # Requires Prometheus Operator CRDs
  interval: 30s
  scrapeTimeout: 10s

prometheusRule:
  enabled: false             # Requires Prometheus Operator CRDs

Key Properties

PropertyDefaultDescription
kpod.profilestandardMetric collection profile
kpod.poll-interval30000Base collection interval (ms)
kpod.collection-timeout20000Max time per collection cycle (ms)
kpod.initial-delay10000Delay before first collection (ms)
kpod.node-name${NODE_NAME}Node name for metric tags
kpod.cluster-name""Cluster name for multi-cluster tag
kpod.discovery.modeinformerPod discovery: informer or kubelet
kpod.filter.namespaces[] (all)Namespaces to include (empty = all)
kpod.filter.exclude-namespaceskube-system, kube-publicNamespaces to skip
kpod.filter.label-selector""Label selector (key=value, key!=value, key)
kpod.filter.include-labelsapp, app.kubernetes.io/name, ...Pod labels to include as metric tags
kpod.bpf.enabledtrueEnable eBPF programs
kpod.otlp.enabledfalseEnable OTLP metrics export
kpod.otlp.endpointhttp://localhost:4318/v1/metricsOTLP collector endpoint
kpod.otlp.headers{}OTLP request headers (e.g., API keys)
kpod.otlp.step60000OTLP push interval (ms)

Per-Collector Intervals

Heavy collectors can run less frequently than the base poll-interval. Set per-collector intervals in milliseconds:

config:
  collectorIntervals:
    syscall: 60000      # every 60s instead of 30s
    biolatency: 60000
    hardirqs: 60000
    softirqs: 60000

Collectors without an explicit interval run every cycle. Use config.collectors.<name>: false to disable a collector entirely. | kpod.bpf.program-dir | /app/bpf | Path to compiled BPF objects | | kpod.syscall.tracked-syscalls | read, write, openat, ... | Syscalls to trace (comprehensive profile) |

Building

The build context requires both this repo and kotlin-ebpf-dsl as a sibling directory:

parent/
├── kpod-metrics/
└── kotlin-ebpf-dsl/
docker build -f kpod-metrics/Dockerfile -t kpod-metrics:latest .

The 5-stage Dockerfile handles:

  1. Codegen -- Gradle runs kotlin-ebpf-dsl to generate BPF C code and Kotlin MapReader classes
  2. BPF compile -- clang compiles generated .bpf.c into both CO-RE (5.2+) and legacy (4.18+) .bpf.o objects
  3. JNI build -- CMake compiles the JNI bridge (libkpod_bpf.so) against libbpf
  4. App build -- Gradle builds the Spring Boot executable JAR
  5. Runtime -- Eclipse Temurin JRE 21, minimal image with compiled artifacts

Local Development

Requires JDK 21 and kotlin-ebpf-dsl as a sibling directory:

./gradlew generateBpf  # Generate BPF C code + Kotlin MapReader classes
./gradlew build         # Compile + test (293 tests)
./gradlew bootJar       # Build executable JAR

BPF programs and JNI library must be cross-compiled in a Linux environment (the Dockerfile handles this).

BPF Code Generation

eBPF programs are defined as Kotlin DSL in src/bpfGenerator/kotlin/:

val memProgram = ebpfProgram("mem") {
    val counterKey = struct("counter_key") { u64("cgroup_id") }
    val oomKills = hashMap("oom_kills", counterKey, BpfScalar.U64, maxEntries = 10240)

    tracepoint("oom", "mark_victim") {
        val cgId = getCurrentCgroupId()
        val ptr = mapLookupElem(oomKills, cgId)
        ifNonNull(ptr) { atomicIncrement(it) }
    }
}

Running ./gradlew generateBpf produces:

  • build/generated/bpf/*.bpf.c -- kernel-side C programs
  • build/generated/kotlin/*MapReader.kt -- type-safe map deserialization

Collectors use generated MapReader layout classes instead of manual ByteBuffer parsing:

// Before (manual)
val cgroupId = ByteBuffer.wrap(keyBytes).order(ByteOrder.LITTLE_ENDIAN).long

// After (generated)
val cgroupId = MemMapReader.CounterKeyLayout.decodeCgroupId(keyBytes)

Testing

Unit Tests

./gradlew test  # 293 tests

Integration Test (minikube)

# Full test: minikube start, Docker build, Helm deploy, stress test, cleanup
./scripts/test-local-k8s.sh

# Reuse existing minikube and skip Docker build
./scripts/test-local-k8s.sh --skip-minikube --skip-build

# Cleanup only
./scripts/test-local-k8s.sh --teardown

The integration test validates: health endpoint, Prometheus metrics, cgroup collector output, pod stability under stress (zero restarts, <5s scrape latency, <10% error rate). It also runs the E2E test (below) as a non-blocking sub-step.

E2E Test (targeted workloads)

Deploys deterministic workload pods that generate specific kernel events, then asserts that kpod-metrics captures them as Prometheus metrics with correct pod labels.

# Full run: build, deploy, test, cleanup
./e2e/e2e-test.sh --cleanup

# Skip build, use existing image
./e2e/e2e-test.sh --skip-build --cleanup

# Test against an already-running deployment
./e2e/e2e-test.sh --skip-build --skip-deploy
FlagDescription
--skip-buildSkip Docker image build (use existing image)
--skip-deploySkip helm install (use existing deployment)
--cleanupFull teardown after test (helm uninstall + namespace delete)
--wait=NOverride metrics collection wait time in seconds (default: 25)
--port=NReuse an existing port-forward on this port

Workloads (deployed to e2e-test namespace):

PodKernel ActivityMetrics Verified
e2e-cpu-worker4 busy-loop forks, 100m CPU limitkpod_cpu_context_switches_total
e2e-net-server / e2e-net-clientTCP connect/send loopkpod_net_tcp_connections_total, kpod_net_iface_rx_bytes_total
e2e-syscall-workerTight cat /proc/self/status loopkpod_syscall_count_total
e2e-mem-workerdd 10MB allocationskpod_fs_usage_bytes

eBPF-based assertions are warn-only (BPF programs may not load on minikube). Cgroup-based assertions are required to pass.

Scaling

Tested for clusters up to 1,000 nodes / 100,000 pods.

ComponentCapacity
BPF map entries10,240 per map (LRU, auto-evicts)
API server load1 node-scoped watch per node
Batch JNISingle syscall per map read
Kernel memory~15-20 MB per node
Collection cycle~500-1000ms per node

For large clusters, use the standard profile (not comprehensive) to keep Prometheus cardinality under 4M time series.

Project Structure

kpod-metrics/
├── bpf/
│   ├── vmlinux.h               # Kernel BTF headers for CO-RE
│   └── compat_vmlinux.h        # Minimal header for legacy (non-CO-RE) builds
├── jni/
│   ├── bpf_bridge.c            # JNI bridge (libbpf wrapper)
│   └── CMakeLists.txt
├── src/
│   ├── bpfGenerator/kotlin/    # eBPF program definitions (Kotlin DSL)
│   │   └── .../bpf/programs/
│   │       ├── Structs.kt      # Shared BPF struct definitions
│   │       ├── MemProgram.kt
│   │       ├── CpuSchedProgram.kt
│   │       ├── NetProgram.kt
│   │       ├── SyscallProgram.kt
│   │       └── GenerateBpf.kt  # Code generation entry point
│   ├── main/kotlin/
│   │   └── com/internal/kpodmetrics/
│   │       ├── bpf/            # BpfBridge, BpfProgramManager, CgroupResolver
│   │       ├── cgroup/         # CgroupReader, CgroupPathResolver
│   │       ├── collector/      # All metric collectors (eBPF + cgroup)
│   │       ├── config/         # MetricsProperties, profiles, auto-configuration
│   │       ├── discovery/      # PodProvider, PodCgroupMapper
│   │       ├── k8s/            # PodWatcher (K8s informer)
│   │       └── model/          # DTOs
│   └── test/kotlin/            # 293 unit tests
├── grafana/
│   └── kpod-metrics-dashboard.json  # Standalone Grafana dashboard (importable via UI)
├── helm/kpod-metrics/          # Helm chart (DaemonSet, RBAC, ConfigMap)
│   ├── dashboards/
│   │   └── kpod-metrics.json   # Dashboard JSON for Helm-managed ConfigMap
│   └── templates/
│       ├── grafana-dashboard-cm.yaml   # Grafana sidecar ConfigMap
│       ├── servicemonitor.yaml         # Prometheus Operator ServiceMonitor
│       ├── prometheusrule.yaml         # Prometheus Operator alerting rules
│       └── service.yaml                # Headless Service for ServiceMonitor
├── e2e/
│   ├── e2e-test.sh             # E2E targeted workload test
│   └── workloads.yaml          # CPU, network, syscall, memory workload pods
├── scripts/
│   ├── test-local-k8s.sh       # Integration test (minikube)
│   └── stress-workload.yaml
├── Dockerfile                  # 5-stage build (codegen → BPF → JNI → app → runtime)
├── build.gradle.kts
└── settings.gradle.kts         # Composite build with kotlin-ebpf-dsl

Comparison with Similar Tools

Featurekpod-metricsPixieHubbleInspektor GadgetKepler
Per-pod kernel metricsyesyesnetwork onlyper-gadgetenergy only
eBPF-basedyesyesyesyesyes
Zero config topologyyesyesyesnono
Prometheus-native exportyesvia pluginvia pluginvia pluginyes
OTLP exportyesnononono
Lightweight DaemonSet~256 Mi~2 Gi~128 Mi~128 Mi~128 Mi
No sidecar requiredyesyesyesyesyes
Kernel 4.18+ supportyes (legacy mode)no (5.2+)no (5.2+)no (5.2+)yes
Kotlin eBPF DSLyesno (C/C++)no (C)no (C)no (C)
Grafana dashboard includedyesown UIown UInoyes
L7 protocol detectionyes (HTTP/Redis/MySQL/Kafka/MongoDB)yesyesper-gadgetno

When to choose kpod-metrics: You want a lightweight, Prometheus-native pod metrics collector with zero-config service topology, broad kernel support, and type-safe eBPF programs defined in Kotlin instead of C.

Tech Stack

  • Runtime: Kotlin 2.1.10, Spring Boot 3.4.3, JDK 21 (virtual threads)
  • eBPF: CO-RE programs generated by kotlin-ebpf-dsl, compiled with clang, loaded via libbpf + JNI
  • Metrics: Micrometer + Prometheus registry
  • K8s: Fabric8 Kubernetes Client 7.1.0
  • Build: Gradle 8.12 (composite build), multi-stage Docker
  • CI/CD: GitHub Actions — unit tests on PRs, image publish on merge to main

CI/CD

GitHub Actions runs two workflows:

  • CI (ci.yml) — Runs unit tests on every PR and push to main. Checks out the sibling kotlin-ebpf-dsl repo for the composite Gradle build.
  • Publish (publish.yml) — On push to main, builds the Docker image and pushes to ghcr.io/pjs7678/kpod-metrics with :latest and :<sha> tags.
docker pull ghcr.io/pjs7678/kpod-metrics:latest