lauka
December 30, 2025 · View on GitHub
A minimal, predictable CLI to record Apple‑Silicon PMU counters and compare multiple commands.
Quick start
- Requires Zig 0.15.1
- Build with
zig build --release=fast - Running
lauka recordrequiressudo(PMU counters need elevated access).
# Record a single command
lauka [options] -- <command>
# Compare two or more commands (aggregated per command)
lauka [options] -- '<cmd1>' '<cmd2>' ['<cmd3>' ...]
# Explicit subcommands (optional)
lauka record [options] -- '<cmd1>' '<cmd2>' ['<cmd3>' ...]
# List counters
lauka counters
lauka counters --details
Synopsis
Usage:
lauka [options] -- <command>
lauka [options] -- '<command_1>' '<command_2>' ['<command_3>' ...]
lauka <subcommand> [options]
Subcommands:
record Run one or more commands; aggregated stats per command (deltas vs first when multiple)
counters List available counters (names; optional descriptions & compatibility flags)
version Show version
help Show help for any command
Global options
-n, --runs <N>— number of measured runs (default:3, minimum: 3)--warmup <N>— warmup runs before measuring (default:0)-m, --measurements <list>— comma‑separated counters--color <when>—auto(default),never,ansi-h, --help— show help
Notes
- If any child exits non‑zero,
laukaforwards that exit code (for multiple commands, the first failing code). - Everything after
--is passed verbatim to the child command list.
Commands
record
lauka [options] -- '<cmd1>' ['<cmd2>' ['<cmd3>' ...]]
Behavior
- Run
warmuptimes (ignored in stats), thenrunsmeasured times. - Compute per‑metric:
mean,stddev,min,max, andoutliers. - Default execution is sequential: run all iterations of the first command, then the second, etc.
- Output shows a separate aggregated table for each command, and for commands after the first, a delta column vs the first (baseline) for each metric.
Examples
# Two commands, sequential (default)
lauka -n 9 -m core_active_cycle,inst_all,branch_mispred_nonspec,l1d_cache_miss_ld_nonspec -- \
'./old --opt=0' \
'./new --opt=1'
# Three commands
a="prog --size 1e6 --mode=A"; b="prog --size 1e6 --mode=B"; c="prog --size 1e6 --mode=C"
lauka -n 7 -m l1d_cache_miss_ld_nonspec,branch_mispred_nonspec -- "$a" "$b" "$c"
counters
lauka counters [--details] [--no-headers]
Behavior
- Default: print names only.
- With
--details: include description and incompatibility flags for each counter.
Options
-d, --details— showname,description, and incompatibility flags--no-headers- hide headers for--detailsoutput
Incompatibility flags (shown only with --details)
incompat: none— no known constraintincompat: pair— has pairwise incompatibility constraintsincompat: quad— has quad incompatibility constraints
Examples
lauka counters
lauka counters --details
Output
Default — aggregated tables
- One block per command.
- Columns:
measurement,mean ± σ,min … max,outliers, anddelta(compare mode only, for commands after the baseline). - Colors follow
--color:auto,never,ansi.
Example (two commands)
Benchmark 1 (9 runs): ./build-old
measurement mean ± σ min … max outliers
wall_time 591ms ± 7.6ms 583ms … 605ms 0 (0%)
peak_rss 137MB ± 0.3MB 136.6MB … 137.4MB 0 (0%)
core_active_cycle 2.51G ± 22.1M 2.48G … 2.54G 0 (0%)
inst_all 3.62G ± 23.9M 3.53G … 3.69G 0 (0%)
l1d_cache_miss_ld_nonspec 3.58M ± 31.7K 3.54M … 3.63M 0 (0%)
branch_mispred_nonspec 21.4M ± 58.2K 21.3M … 21.5M 0 (0%)
Benchmark 2 (9 runs): ./build-new -O2
measurement mean ± σ min … max outliers delta
wall_time 130ms ± 8.3ms 125ms … 141ms 0 (0%) ⚡ −78.0% ± 0.5%
peak_rss 91.9MB ± 0.09MB 91.8MB … 92.1MB 0 (0%) −32.9% ± 0.1%
core_active_cycle 507M ± 2.35M 503M … 511M 0 (0%) −79.8% ± 0.1%
inst_all 796M ± 10.7M 781M … 809M 0 (0%) −78.0% ± 0.1%
l1d_cache_miss_ld_nonspec 352K ± 7.7K 318K … 355K 0 (0%) −90.2% ± 0.1%
branch_mispred_nonspec 4.52M ± 11.5K 4.51M … 4.57M 2 (5%) −78.9% ± 0.0%
The delta columns are relative to the first command (baseline). Signs and glyphs may be colorized depending on
--color.
Output — counters
- Default: names only (one per line).
- With
--details: add description and incompatibility flags.
Example (--details):
name incompat description
core_active_cycle none Cycles while the core was active
inst_all pair,quad All retired instructions
l1d_cache_miss_ld_nonspec quad Retired loads that missed in the L1D
branch_mispred_nonspec quad Retired branches mispredicted
Exit codes
0– success1– usage error (bad flags, missing command, quoting error)2– PMU scheduling/collection error
Common errors
error: minimum runs is 3 (got 2)
fix: use -n 3 or higher
error: command contains spaces/metacharacters; wrap it in quotes
example: lauka -- 'myapp --flag 1'
error: requested measurements could not be scheduled on Apple M3
tip: remove one of the conflicting counters or try a smaller set
If app was interrupted/killed, the next run may show this:
error(lauka): SetTimerCount
error(lauka): SetTimerPeriod with Action ID = 1
error(lauka): SetTimer with Action ID = 1 and Timer ID = 1
error(lauka): SetTimerPet with Timer ID = 1
error(lauka): SetLightweightPet
workaround: ignore it, no changes in recording behavior was detected. If you want to
fix it, make one successful run of lauka record, the next runs should be clean.
Thanks
This tool couldn't exist without the following projects and their authors:
- reverse-engineered kperf API by ibireme, which made it possible to access PMU counters on Apple Silicon.
- poop by Andrey Kelly, which created a great tool to monitor PMU counters for Linux.
- scoop by tensorush, who ported to Zig reverse-engineered kperf API, wrapped it in a nice library, and created the PR that added the ability to fetch CPU counters on Macs to
pooptool.
This tool is besically a merge of poop and scoop, with a rewritten CLI and extended functionality.