procpower / energy_proc

April 8, 2026 · View on GitHub

A Linux kernel module that exposes per‑PID runtime, I/O and energy‑related statistics—including Intel RAPL readings—via /proc and debugfs. It is designed for lightweight, high‑frequency sampling so that userspace can build accurate power‑ or carbon‑aware schedulers, profilers and monitors. When you run it on the host, the same metrics are surfaced securely inside every Docker container, preserving isolation while giving full per‑cgroup visibility. An extensible energy model is built‑in—collect samples with the helper script, retrain the weights, and tailor the score to your hardware or carbon‑intensity signals. The module also runs inside virtual machines; because raw RAPL energy counters are absent, the model falls back to workload metrics and absolute energy estimates are less precise.

Table of Contents

  1. Features
  2. Requirements
  3. Quick Start
  4. Running Inside Virtual Machines
  5. Sampling Interval & Weighting
  6. Windowing (decoupled sampling vs reporting)
  7. Reading the Metrics
  8. Data Collection Helper
  9. Model
  10. Test
  11. Troubleshooting
  12. Contributing

Features

  • Per‑process metrics: CPU time, RSS memory, disk I/O, network packets, context switch/wakeup counts and (optionally) retired instructions via PMU.
  • Energy accounting: Integrates Intel RAPL MSRs (PP0/core and PSYS) to compute energy usage in µJ.
  • Dynamic sampling interval: Run‑time adjustable via the sample_ns module parameter.
  • Weight‑based energy model: Each metric has a tunable weight (w_cpu_ns, w_mem_bytes, …); the weighted sum is exported as energy=<fixed‑point‑milliJ> in each record.
  • Container aware: /proc/energy/cgroup only shows processes in the caller’s default cgroup; /proc/energy/all and debugfs/.../all are root‑only.
  • Low overhead: Uses RCU look‑ups, rhashtable and per‑CPU workqueue. Sampling at 100 ms costs <0.3 % CPU on a 16‑core host.
  • VM: Works in VMs but accuracy will drop.

Thank you

This work has been made possible by the Prototype Fund, Catalyst Fund and Green Coding Solutions. Funded by the BMFTR. Foerderkennzeichen: 16IS25S22:

BMFTR

Requirements

MinimumNotes
Linux kernel5.15Built‑in CONFIG_PERF_EVENTS, CONFIG_KPROBES, CONFIG_TRACEPOINTS, CONFIG_TASK_IO_ACCOUNTING, CONFIG_SCHEDSTATS (optional).
CPUIntel ≥ SandyBridge
or Zen / other x86_64
RAPL energy only on Intel; AMD Zen counts but no MSR‑based energy yet.
Tool‑chaingcc, make, bc, bash
Headerslinux-headers-$(uname -r)
Debug FS/sys/kernel/debug mounted

Ubuntu 24.04 LTS / 25.04

sudo apt update
sudo apt install build-essential git bc libtraceevent-dev libtracefs-dev linux-headers-$(uname -r)

Fedora 42

sudo dnf install @development-tools kernel-devel kernel-headers elfutils-libelf-devel git bc trace-cmd

Other distros: install the equivalent of kernel‑headers and build‑essential.

💡 Tip: When compiling against a custom kernel tree, export KDIR=/path/to/linux before running make.

Quick Start

# Clone & build
$ git clone https://github.com/green-kernel/procpower.git && cd procpower/src
$ make

# Load with default 100 ms sampling
$ sudo make install

# See overall energy since boot (root‑only)
$ sudo cat /proc/energy/all

# Watch processes in your current cgroup (non‑root allowed)
$ watch -n0.5 cat /proc/energy/cgroup

# Try something cool if you have docker installed
docker run -it ubuntu cat /proc/energy/cgroup

# Unload when done
$ sudo make uninstall

Running Inside Virtual Machines

Many cloud or desktop hypervisors disable the vPMU by default, which prevents the instruction counter from being active inside guests.

Check current state:

$ cat /sys/module/kvm/parameters/enable_pmu
N

If it prints N, enable it on the host (all VMs must be shut down first):

# Intel host
sudo modprobe -r kvm_intel
sudo modprobe   kvm_intel enable_pmu=1

# AMD host
sudo modprobe -r kvm_amd
sudo modprobe   kvm_amd enable_pmu=1

To make it persistent:

echo 'options kvm_intel enable_pmu=1' | sudo tee /etc/modprobe.d/kvm_pmu.conf   # or kvm_amd

Within the guest ensure perf_event_paranoid allows kernel counters:

sudo sysctl -w kernel.perf_event_paranoid=-1   # Debugging only!

Sampling Interval & Weighting

Change interval at load time

sudo insmod energy_proc.ko sample_ns=25000000      # 25 ms

Change at run time

# New value in nanoseconds (5 ms):
echo 5000000 | sudo tee /sys/module/energy_proc/parameters/sample_ns

Adjust individual weights

# Double the weight of network RX packets
sudo modprobe -r energy_proc
sudo insmod energy_proc.ko w_net_rx_packets=2

Each weight is multiplied by its metric and the sum is exposed as energy=INT.FRAC where FRAC has three decimal places (kilo‑scaling).

Windowing (decoupled sampling vs reporting)

The module runs two independent loops:

  • Sampler (sample_ns): collects and accumulates per-PID metrics at a high frequency.
  • Window publisher (window_ns): periodically snapshots the current accumulated values into window_* fields and timestamps them. These snapshots are what /proc/energy/cgroup exposes.

This design lets you keep very fast internal sampling while publishing at a slower, steadier secure value to userspace.

Change window interval at load time

sudo insmod energy_proc.ko window_ns=1000000000   # 1 s

Reading the Metrics

  • /proc/energy/cgroup – Metrics for tasks in the caller’s default cgroup (read‑able by unprivileged users).
  • /proc/energy/all – Prints out all the processes; requires CAP_SYS_ADMIN (root) to prevent container escapes.
  • /sys/kernel/debug/energy/all – Debugfs variant with extra module state; root only.
  • /sys/kernel/debug/energy/sys – Only the whole system for faster output (only needed for model training)

A single line looks like:

pid=3187 energy=12.457 alive=1 kernel=0 cpu_ns=285931257 mem=10485760 instructions=8952846 wakeups=14 diski=0 disko=0 rx=22 tx=17 comm=python3
FieldMeaning
pidProcess ID
energyWeighted score (kilo‑scaled)
alive1 if pid_alive() at time of sample
kernel1 for kernel threads / kthreads
cpu_nsCumulative user+sys CPU time (ns)
memResident Set Size (bytes)
instructionsRetired instructions (if PMU available)
wakeupsScheduler wake‑ups since process start
diski, diskoBytes read / written by the task
rx, txNetwork packets received / transmitted
commTask name (TASK_COMM_LEN)

It is imporant to not that we use pid 0 as the whole system and not the idle process.

Data Collection Helper

The repository ships with energy-logger.sh which periodically dumps /proc/energy/all to disk for offline analysis (e.g. weight regression).

  1. Pick a longer sampling interval to keep file size manageable:
    echo 100000000 > /sys/module/energy_proc/parameters/sample_ns   # 100 ms
    
  2. Start the collector:
    sudo ./energy-logger.sh
    

Model

We use a linear model to calculate the energy score for each process. You should train this model yourself by

  1. src
  2. running the data collection helper
echo 100000000 > /sys/module/energy_proc/parameters/sample_ns
sudo ./energy-logger.sh
  1. python3 -m venv venv
  2. source venv/bin/activate
  3. pip install -r requirements.txt
  4. python3 model.py tmp/energy-XXXX.log

You can then add the weights to your kernel module by

echo 1231232 > /sys/module/energy_proc/parameters/PARAM   # 100 ms

Please remember that you can not add floats. We use fix decimal values with 3 decimal points.

Test

There is a test script you can run that you need to run on a host that will build the kernel extension and then see if everythig works out. Just run

sudo ./test.sh

Troubleshooting

SymptomFix
insmod: ERROR: could not insert module: Operation not permittedEnsure secure boot allows unsigned modules or sign the module.
rapl MSRs not available – energy metrics offRunning on AMD or RAPL disabled in BIOS. Energy still computed from other metrics.
perf_event_open: Permission denied inside VMEnable PMU passthrough as described above.
Null lines or zeros in /proc/energy/*Sampling interval set too high? Confirm iterations increases in debugfs.

Contributing

Patches and 🍻 are welcome!

You can either contribute here on GitHub or drop me a message under didi@ribalba.de