Interpreting duct's resource statistics
May 19, 2026 · View on GitHub
duct records resource usage in two places:
usage.jsonl: one JSON record per report interval (default: every 60 seconds), capturing per-process and session-total stats aggregated over that window.execution_summary(printed at exit and stored ininfo.json): a whole-run summary of peak and average values across the full execution.
The numbers in both come from the same sampling pipeline.
This document explains what those numbers actually measure, how con-duct plot renders them, and where they're trustworthy vs misleading.
How duct samples and aggregates
Duct polls the monitored process tree on two independent intervals:
--sample-interval(default 1.0s): how often duct reads per-pid stats viaps -s <session_id>. Each read is a sample.--report-interval(default 60.0s): how often duct writes an aggregated record tousage.jsonl. Each record summarizes all the samples taken during that report window.
Aggregation within a report window uses max reduction:
- For each per-pid metric, the reported value is the maximum observed across all samples of that pid in the window.
- For each session-total metric (
totals.rss,totals.pcpu, etc.), the reported value is the maximum observed across all samples' totals in the window.
Consequences worth knowing:
- Short spikes between samples are not recorded. A process that briefly allocates 10GB and frees it within a single sample interval is invisible to duct.
- Per-pid and session-total peaks may come from different sample moments.
Per-pid max-reduction and total max-reduction are independent.
The same record can have
stats[A].rss = X(A's peak from one sub-sample) andtotals.rss = Y(the peak simultaneous total from another sub-sample).
CPU — pcpu
What it measures
On Linux, ps -o pcpu is computed per process as:
$ \text{pcpu} = ((\text{utime} + \text{stime}) / (\text{now} - \text{process\_start\_time})) \times 100 $
utime + stimeis cumulative CPU time consumed by the process since it started (kernel ticks from/proc/[pid]/stat).now - process_start_timeis wall-clock time elapsed since the process started.
So pcpu is the fraction of wall time the process has spent on CPU, averaged from birth until the moment of sampling.
It is a lifetime average, not an instantaneous rate.
This differs from top(1), which shows instantaneous-over-refresh-interval.
etime is integer seconds
ps -o etime reports elapsed time as an integer count of seconds (formatted [[DD-]HH:]MM:SS).
This has consequences for short-lived and freshly-spawned pids:
- During a pid's first second of life,
etimereads as00:00. ps'spcpucalculation divides by thisetime, and the result during sub-second life is unstable. - A pid sampled at sub-second age that has accumulated meaningful CPU work across multiple threads can yield extreme
pcpureadings. Issue #399 included a single pid reporting 5347%pcpuatetime=3on a 20-core machine, which is physically impossible: it came from a sub-second-young sub-sample where ps's calculation was racy.
This is why sample intervals shorter than 1.0s behave erratically.
Consecutive samples of the same pid often see the same integer etime, so derived measurements (like the --cpu ps-cpu-timepoint view, below) discard those points because Δetime = 0.
Three scenarios to build intuition
Scenario A: long-running steady-state process at 100% CPU
t = 1s: cumulative CPU = 1.0s, elapsed = 1s → pcpu = 100%
t = 10s: cumulative CPU = 10.0s, elapsed = 10s → pcpu = 100%
t = 60s: cumulative CPU = 60.0s, elapsed = 60s → pcpu = 100%
For steady-state workloads, lifetime-average converges to instantaneous. This is why the mental model "pcpu = current CPU usage" works most of the time.
Scenario B: brief burst, then idle
A process that does 1 second of 100% CPU work, then sits idle:
t = 1s: cumulative CPU = 1.0s, elapsed = 1s → pcpu = 100%
t = 2s: cumulative CPU = 1.0s, elapsed = 2s → pcpu = 50%
t = 10s: cumulative CPU = 1.0s, elapsed = 10s → pcpu = 10%
t = 100s: cumulative CPU = 1.0s, elapsed = 100s → pcpu = 1%
After the burst, pcpu decays toward 0.
The process "remembers" past CPU work and slowly forgets as its elapsed time grows.
Counterintuitive if you expected a real-time number.
Scenario C: the pathological summation case
Many short-lived, multi-threaded native child processes, as happens under tox when pip compiles C extensions:
``$ \text{Child} 1 \text{runs} \text{for} 200\text{ms} \text{on} 4 \text{cores}, \text{observed} \text{by} \text{sample} \text{at} \text{t}=150\text{ms}: \text{cumulative} \text{CPU} = 600\text{ms}, \text{elapsed} = 150\text{ms} → \text{pcpu} \text{reported} = 400%
…30 \text{such} \text{children} \text{observed} \text{during} \text{a} \text{single} \text{sample}…
\text{sum} \text{across} \text{children} \text{at} \text{sample} \text{time} = 30 \times 400% = 12{,}000% \text{system} \text{physical} \text{ceiling} (20 \text{cores}) = 2{,}000% $``
Each individual child's number is correct for what ps is answering ("fraction of wall time spent on CPU, averaged from start").
The problem is that summing lifetime-averages across processes that took turns on the CPU produces a total claiming work the system didn't have the cores to do.
The children ran sequentially, but the sum over the report window treats the spikes as simultaneous.
When pcpu is reliable
| Workload shape | pcpu reliability |
|---|---|
| Single long-running steady-state process | Accurate |
| Few long-running processes at steady state | Accurate |
| Bursty processes that are long-running | Accurate at the average; misses burst structure |
| Many short-lived (few second) child processes | Unreliable: can inflate dramatically when summed |
| Multi-threaded native code bursts | Per-process pcpu correct; summed totals may overshoot |
Memory — rss and pmem
ps -o rss reports per-process resident set size: physical memory currently mapped into the process's address space, in kilobytes.
This counts:
- Private pages the process has allocated and touched.
- Shared pages (libraries, copy-on-write memory after
fork()) that the process has mapped, counted independently in each process that maps them.
ps -o pmem is derived: rss divided by total system RAM, expressed as a percentage.
It inherits every property of rss and adds a host-dependent denominator.
The shared-page issue
When multiple processes share the same physical page, that page appears in each process's RSS, but the physical page exists only once.
Example: a Python parent process with 100MB RSS forks 10 child workers. Immediately after fork:
Parent RSS: 100MB
Child 1 RSS: 100MB
…
Child 10 RSS: 100MB
Sum of RSS across processes: 1100MB
Actual physical memory used: ~100MB (all shared with parent)
As children write to their copy of each page, copy-on-write triggers and the page becomes private.
At that point physical use genuinely grows.
So sum(rss) is a loose upper bound on actual usage: never less than true usage, often much more.
For a duct-monitored Python test suite with pytest-xdist spawning 8 workers, expect sum(rss) to overstate physical memory by 3-5×.
What con-duct plot renders
con-duct plot <usage> renders, per report-interval record:
- Per-pid traces: one faint dotted line per pid.
CPU is on the primary y-axis.
RSS is on a secondary axis (
twinx) so the two scales don't fight. Color encodes metric, not pid identity. - Envelope lines: summarize the per-pid cloud at each timestamp (one solid lower bound + one dashed upper bound).
- Optional host-memory annotation: when
info.jsonis alongside the usage file, the rss legend label includes total host RAM (e.g.rss (host: 256.0GB)). Useful for SLURM contexts. Withoutinfo.json, plainrss.
--cpu mode flag
duct stores pcpu (lifetime average from ps) per pid in usage.jsonl.
The plot can render this two ways:
--cpu ps-pcpu(default): plot the raw lifetime ratio untransformed. "Lossless" view: every point on the chart is an unaltered ps reading. Useful when you want to see exactly what the sampler captured.--cpu ps-cpu-timepoint: at plot time, derive a per-interval estimate from consecutive(pcpu, etime)pairs:(curr_pcpu × curr_etime − prev_pcpu × prev_etime) / Δetime. This inverts ps's lifetime-average formula to extract an approximate instantaneous CPU rate. Motivated by Scenario C: lifetime averages of short-lived bursty processes overstate "current" usage by orders of magnitude.
Both modes have caveats:
- The raw
ps-pcpumode shows what ps reported, including lifetime-average inflation. A pid that ran on 4 cores for 150ms and went idle peaks at 400% in the first report interval that observed it, then decays toward its true average asetimegrows in subsequent intervals. - The derived
ps-cpu-timepointmode is approximate (delta math on max-reduced samples). It discards each pid's first observation (no prior point to delta against), so short-lived pids that appear in only one record drop out entirely. CPU bursts from those pids are not visible in the timepoint view, but remain visible in theps-pcpuview viatotals.pcpu.
Envelope semantics
The plot draws two envelopes over the per-pid trace cloud:
- Lower bound (solid): max-across-pids at each timestamp. Reads as "at least this much was in use."
- Upper bound (dashed): depends on what's being plotted.
- RSS, and CPU in
ps-pcpumode:totals.*from the record. duct computes this as the peak simultaneous total observed across the report window's sub-samples. - CPU in
ps-cpu-timepointmode: sum-across-pids of the derived (instantaneous) values at each timestamp. Used here becausetotals.pcpuis a peak of lifetime averages and doesn't share units with the derived instantaneous values.
- RSS, and CPU in
Common questions
Why is the raw pcpu line in ps-pcpu mode so much higher than ncores × 100%?
Two compounding reasons, either of which can do it alone:
- Single-pid extremes from ps.
For pids sampled at sub-second age, ps's
cputime / etimecalculation is unstable (seeetimeis integer seconds). Individual pids can briefly report thousands of percent. - Summed lifetime-averages across many short-lived pids.
Even if each pid's
pcpuis finite, summing lifetime averages across processes that took turns on the cores produces a total claiming work the cores couldn't have done. See Scenario C. Most common in workloads that spawn many short-lived child processes involving native/multi-threaded code: pip install compiling C extensions,make -j, tox, any CI/build workflow.
Why is the ps-cpu-timepoint line lower than the ps-pcpu line?
ps-pcpu plots the lifetime average from ps.
A burst captured early in a pid's life pulls the reported pcpu high, and that pid's trace decays slowly as etime grows.
ps-cpu-timepoint instead estimates an instantaneous rate per report interval, so a burst contributes only to the interval that contained it.
Example: a pid that did 600ms of CPU on 4 cores in its first 150ms and was idle thereafter.
ps-pcpu shows ~400% in the first report interval and a decaying trace in subsequent intervals (until the pid dies or the trace falls off the chart).
ps-cpu-timepoint shows ~400% only in the burst interval and ~0% thereafter.
The timepoint view is more "honest" about current usage but loses the cumulative-effort information that ps-pcpu carries.
Why does totals.* not equal sum(per-pid max) in a record?
duct max-reduces per-pid stats and session totals independently within a report window.
A pid's reported rss is its max across sub-samples in the window; totals.rss is the max of the simultaneous total across those sub-samples.
The per-pid peaks may have happened at different moments, so summing them counts moments that never coexisted.
totals.* is the actual peak simultaneous footprint and is the right number for sizing.
My RSS chart grew a lot when I added more worker processes. Did memory usage really grow proportionally?
Probably not.
If the workers are forked children of a common parent, each child's RSS counts the shared pages it inherited.
Per-pid traces and their max envelope grow roughly linearly with child count even when physical memory grows much less.
The dashed totals.rss upper bound is closer to actual physical use, but still over-counts shared libraries linked by independent processes.
See The shared-page issue.