lauka

December 30, 2025 · View on GitHub

A minimal, predictable CLI to record Apple‑Silicon PMU counters and compare multiple commands.

Quick start

Requires Zig 0.15.1
Build with zig build --release=fast
Running lauka record requires sudo (PMU counters need elevated access).

# Record a single command
lauka [options] -- <command>

# Compare two or more commands (aggregated per command)
lauka [options] -- '<cmd1>' '<cmd2>' ['<cmd3>' ...]

# Explicit subcommands (optional)
lauka record [options] -- '<cmd1>' '<cmd2>' ['<cmd3>' ...]

# List counters
lauka counters
lauka counters --details

Synopsis

Usage:
  lauka [options] -- <command>
  lauka [options] -- '<command_1>' '<command_2>' ['<command_3>' ...]
  lauka <subcommand> [options]

Subcommands:
  record      Run one or more commands; aggregated stats per command (deltas vs first when multiple)
  counters    List available counters (names; optional descriptions & compatibility flags)
  version     Show version
  help        Show help for any command

Global options

-n, --runs <N> — number of measured runs (default: 3, minimum: 3)
--warmup <N> — warmup runs before measuring (default: 0)
-m, --measurements <list> — comma‑separated counters
--color <when> — auto (default), never, ansi
-h, --help — show help

Notes

If any child exits non‑zero, lauka forwards that exit code (for multiple commands, the first failing code).
Everything after -- is passed verbatim to the child command list.

Commands

`record`

lauka [options] -- '<cmd1>' ['<cmd2>' ['<cmd3>' ...]]

Behavior

Run warmup times (ignored in stats), then runs measured times.
Compute per‑metric: mean, stddev, min, max, and outliers.
Default execution is sequential: run all iterations of the first command, then the second, etc.
Output shows a separate aggregated table for each command, and for commands after the first, a delta column vs the first (baseline) for each metric.

Examples

# Two commands, sequential (default)
lauka -n 9 -m core_active_cycle,inst_all,branch_mispred_nonspec,l1d_cache_miss_ld_nonspec -- \
  './old --opt=0' \
  './new --opt=1'

# Three commands
a="prog --size 1e6 --mode=A"; b="prog --size 1e6 --mode=B"; c="prog --size 1e6 --mode=C"
lauka -n 7 -m l1d_cache_miss_ld_nonspec,branch_mispred_nonspec -- "$a" "$b" "$c"

`counters`

lauka counters [--details] [--no-headers]

Behavior

Default: print names only.
With --details: include description and incompatibility flags for each counter.

Options

-d, --details — show name, description, and incompatibility flags
--no-headers - hide headers for --details output

Incompatibility flags (shown only with --details)

incompat: none — no known constraint
incompat: pair — has pairwise incompatibility constraints
incompat: quad — has quad incompatibility constraints

Examples

lauka counters
lauka counters --details

Output

Default — aggregated tables

One block per command.
Columns: measurement, mean ± σ, min … max, outliers, and delta (compare mode only, for commands after the baseline).
Colors follow --color: auto, never, ansi.

Example (two commands)

Benchmark 1 (9 runs): ./build-old

measurement                 mean ± σ          min … max            outliers
wall_time                   591ms ± 7.6ms     583ms … 605ms        0 (0%)
peak_rss                    137MB ± 0.3MB     136.6MB … 137.4MB    0 (0%)
core_active_cycle           2.51G ± 22.1M     2.48G … 2.54G        0 (0%)
inst_all                    3.62G ± 23.9M     3.53G … 3.69G        0 (0%)
l1d_cache_miss_ld_nonspec   3.58M ± 31.7K     3.54M … 3.63M        0 (0%)
branch_mispred_nonspec      21.4M ± 58.2K     21.3M … 21.5M        0 (0%)

Benchmark 2 (9 runs): ./build-new -O2

measurement                 mean ± σ          min … max            outliers    delta
wall_time                   130ms ± 8.3ms     125ms … 141ms        0 (0%)      ⚡ −78.0% ± 0.5%
peak_rss                    91.9MB ± 0.09MB   91.8MB … 92.1MB      0 (0%)      −32.9% ± 0.1%
core_active_cycle           507M ± 2.35M      503M … 511M          0 (0%)      −79.8% ± 0.1%
inst_all                    796M ± 10.7M      781M … 809M          0 (0%)      −78.0% ± 0.1%
l1d_cache_miss_ld_nonspec   352K ± 7.7K       318K … 355K          0 (0%)      −90.2% ± 0.1%
branch_mispred_nonspec      4.52M ± 11.5K     4.51M … 4.57M        2 (5%)      −78.9% ± 0.0%

The delta columns are relative to the first command (baseline). Signs and glyphs may be colorized depending on --color.

Output — `counters`

Default: names only (one per line).
With --details: add description and incompatibility flags.

Example (--details):

name                              incompat  description
core_active_cycle                 none      Cycles while the core was active
inst_all                          pair,quad All retired instructions
l1d_cache_miss_ld_nonspec         quad      Retired loads that missed in the L1D
branch_mispred_nonspec            quad      Retired branches mispredicted

Exit codes

0 – success
1 – usage error (bad flags, missing command, quoting error)
2 – PMU scheduling/collection error

Common errors

error: minimum runs is 3 (got 2) fix: use -n 3 or higher

error: command contains spaces/metacharacters; wrap it in quotes example: lauka -- 'myapp --flag 1'

error: requested measurements could not be scheduled on Apple M3 tip: remove one of the conflicting counters or try a smaller set

If app was interrupted/killed, the next run may show this:

error(lauka): SetTimerCount
error(lauka): SetTimerPeriod with Action ID = 1
error(lauka): SetTimer with Action ID = 1 and Timer ID = 1
error(lauka): SetTimerPet with Timer ID = 1
error(lauka): SetLightweightPet

workaround: ignore it, no changes in recording behavior was detected. If you want to fix it, make one successful run of lauka record, the next runs should be clean.

Thanks

This tool couldn't exist without the following projects and their authors:

reverse-engineered kperf API by ibireme, which made it possible to access PMU counters on Apple Silicon.
poop by Andrey Kelly, which created a great tool to monitor PMU counters for Linux.
scoop by tensorush, who ported to Zig reverse-engineered kperf API, wrapped it in a nice library, and created the PR that added the ability to fetch CPU counters on Macs to poop tool.

This tool is besically a merge of poop and scoop, with a rewritten CLI and extended functionality.