pg_ash

April 25, 2026 · View on GitHub

CI Postgres 14–18 License Pure SQL Functions tested

Active Session History for Postgres — lightweight wait event sampling with zero bloat.

The anti-extension. Pure SQL + PL/pgSQL that works on any Postgres 14+ — including RDS, Cloud SQL, AlloyDB, Supabase, Neon, and every other managed provider. No C extension, no shared_preload_libraries, no provider approval, no restart. Just \i and go.

pg_ash v1.4 investigation flow

Short walkthrough of the LLM-assisted investigation flow against a live row-lock spike in Postgres 18. Source: demos/.

Why

Postgres has no built-in session history. When something was slow an hour ago, there is nothing to look at. pg_ash samples pg_stat_activity every second and stores the results in a compact format queryable with plain SQL.

How it compares

pg_ashpg_wait_samplingpgsentinelExternal sampling
Install\i (pure SQL)shared_preload_librariesshared_preload_libraries (package or compile)Separate infra
Works on managed (RDS, Cloud SQL, Supabase, ...)YesCloud SQL only (limited managed support)Not known to be supportedYes, with effort
Sampling rate1s (via pg_cron, system cron, or any scheduler)10ms (in-process)10ms (in-process)15-60s typical
VisibilityInside PostgresInside PostgresInside PostgresOutside only
StorageDisk (~30 MiB/day)Memory onlyMemory onlyExternal store
Historical queriesYes (persistent)Ring buffer (lost on restart)Ring buffer (lost on restart)Depends on setup
Pure SQLYesNo (C extension)No (C extension)No
Maintenance overheadNoneNoneNoneHigh
RequirementsNone (pg_cron optional)shared_preload_libraries (restart required)shared_preload_libraries (restart required)Agent + storage

Quick start

-- prerequisites (optional but recommended)
create extension if not exists pg_stat_statements;  -- enables query text + execution metrics
-- pg_cron is optional: if installed, ash.start() uses it; otherwise see "Scheduling without pg_cron"

-- install (just run the SQL file — works on RDS, Cloud SQL, AlloyDB, etc.)
\i sql/ash-install.sql

-- start sampling (1 sample/second — uses pg_cron if available, otherwise prints external scheduling instructions)
select ash.start('1 second');

-- wait a few minutes, then query
select * from ash.top_waits('1 hour');
select * from ash.top_queries_with_text('1 hour');
select * from ash.top_by_type('1 hour');

-- stop sampling
select ash.stop();

-- uninstall (drops the ash schema and pg_cron jobs)
select ash.uninstall('yes');

Sampling intervals

ash.start(interval) accepts PostgreSQL interval values. The interval is converted to a pg_cron schedule:

Interval rangeExample inputpg_cron scheduleDescription
1–59 seconds'1 second' .. '59 seconds'N secondsEvery N seconds (native pg_cron format)
1–59 minutes'1 minute' .. '59 minutes'*/N * * * *Every N minutes via cron syntax
1–23 hours'1 hour' .. '23 hours'0 */N * * * (or 0 * * * * for 1h)Every N hours via cron syntax (max 23h)

Notes:

  • Sub-minute intervals must be exact seconds (e.g., '30 seconds')
  • Minute and hour intervals must be exact (e.g., '5 minutes' works, '90 seconds' does not — use '1 minute' instead)
  • Hour intervals are limited to 23 hours maximum (cron syntax limitation). For daily sampling, use '23 hours' or consider a different approach
  • The default and recommended interval is '1 second' for high-resolution sampling
  • See select * from ash.status() for the current sampling interval

Privileges

The role that runs sampling (the owner of ash.take_sample(), or the pg_cron job owner) should be a superuser or a member of the built-in pg_read_all_stats role. Without it, pg_stat_activity.query_id is visible only for activity owned by the sampling role; queries run by other users come back with query_id = NULL, which ash records under the sentinel value 0.

This silently skews ash.top_queries*, ash.query_waits, and any per-query drill-downs — all of that "other-user" traffic collapses into a single query_id = 0 bucket. To grant the role:

-- as a superuser
grant pg_read_all_stats to <sampling_role>;

On managed services where pg_read_all_stats is already granted to the primary admin (RDS rds_superuser, Cloud SQL cloudsqlsuperuser, Supabase postgres), installing and running ash as that role is sufficient. Always verify with select pg_has_role(current_user, 'pg_read_all_stats', 'MEMBER');.

If the privilege probe itself errors (e.g. missing pg_roles access on a locked-down managed service), ash.start() does not abort — it emits a RAISE NOTICE 'privilege probe failed: ...' so the skipped check remains visible in server / CI logs.

Upgrade

-- from 1.0 to 1.1
\i sql/ash-1.0-to-1.1.sql

-- from 1.1 to 1.2
\i sql/ash-1.1-to-1.2.sql

-- from 1.2 to 1.3
\i sql/ash-1.2-to-1.3.sql

-- from 1.3 to 1.4
\i sql/ash-1.3-to-1.4.sql

-- check version
select * from ash.status();

Function reference

Admin

FunctionDescription
ash.start(interval)Start sampling (default: '1 second'). Uses pg_cron if available, otherwise prints external scheduling instructions. Also schedules rollup jobs
ash.stop()Stop sampling and rollups (removes pg_cron jobs, sets sampling_enabled = false)
ash.status()Sampling status, version, partition info, rollup metrics, debug_logging state
ash.take_sample()Take one sample manually (called automatically by the scheduler)
ash.rotate()Rotate sample partitions (called automatically, or manually for external schedulers). Runs pre-truncation rollup to prevent data loss
ash.rebuild_partitions(N, 'yes')Change partition count (3–32). Destructive — all raw sample data is lost; requires 'yes' confirmation token. Rollup tables survive. Call ash.start() after to resume
ash.rollup_minute([batch])Aggregate raw samples into per-minute rollups. Watermark-based with catch-up. Default batch: 60 minutes
ash.rollup_hour()Aggregate minute rollups into hourly rollups. Watermark-based
ash.rollup_cleanup()Delete expired rollup rows per retention config
ash.set_debug_logging([bool])Enable/disable per-session RAISE LOG in take_sample() for diagnostics. Call with no argument to check current state
ash.uninstall('yes')Drop the ash schema and remove pg_cron jobs

Relative time (last N hours)

FunctionDescription
ash.top_waits(interval, limit, width)Top wait events ranked by sample count, with bar chart
ash.top_queries(interval, limit)Top queries ranked by sample count
ash.top_queries_with_text(interval, limit)Same as top_queries, with pg_stat_statements join (requires pg_stat_statements)
ash.query_waits(query_id, interval, width, color)Wait profile for a specific query
ash.top_by_type(interval, width, color)Breakdown by wait event type
ash.wait_timeline(interval, bucket)Wait events bucketed over time
ash.samples_by_database(interval)Per-database activity
ash.activity_summary(interval)One-call overview: samples, peak backends, top waits, top queries
ash.timeline_chart(interval, bucket, top, width)Stacked bar chart of wait events over time
ash.event_queries(event, interval, limit)Top queries for a specific wait event (requires pg_stat_statements)
ash.samples(interval, limit)Fully decoded raw samples with timestamps and query text

All interval-based functions default to '1 hour'. Limit defaults to 10 (top 9 + "Other" rollup row).

Absolute time (incident investigation)

FunctionDescription
ash.top_waits_at(start, end, limit, width)Top waits in a time range, with bar chart
ash.top_queries_at(start, end, limit)Top queries in a time range
ash.query_waits_at(query_id, start, end, width, color)Query wait profile in a time range
ash.event_queries_at(event, start, end, limit)Top queries for a wait event in a time range (requires pg_stat_statements)
ash.samples_at(start, end, limit)Fully decoded raw samples in a time range
ash.top_by_type_at(start, end, width, color)Breakdown by wait event type in a time range
ash.wait_timeline_at(start, end, bucket)Wait timeline in a time range
ash.timeline_chart_at(start, end, bucket, top, width)Stacked bar chart in a time range

Start and end are timestamptz. Bucket defaults to '1 minute'.

FunctionDescription
ash.minute_waits(interval, limit)Top wait events from minute rollups (default: last 1 hour)
ash.minute_waits_at(start, end, limit)Same, absolute time range
ash.hourly_queries(interval, limit)Top queries from hourly rollups with pg_stat_statements text (default: last 1 day)
ash.hourly_queries_at(start, end, limit)Same, absolute time range
ash.daily_peak_backends(interval)Peak and average backends per day (default: last 7 days)
ash.daily_peak_backends_at(start, end)Same, absolute time range

Rollup readers query rollup_1m / rollup_1h tables — they work even after raw samples have rotated away.

Helpers

FunctionDescription
ash.ts_from_timestamptz(timestamptz)Convert timestamptz to internal int4 epoch offset (useful for querying rollup tables directly)
ash.ts_to_timestamptz(int4)Convert int4 epoch offset back to timestamptz
ash.decode_sample(integer[], smallint)Decode a single packed ash.sample.data array. Pass slot for unambiguous query_id resolution
ash.decode_sample(int4)Convenience: decode every ash.sample row at the given sample_ts (across all datids/slots). Returns (datid, wait_event, query_id, count)
ash.decode_sample_at(timestamptz)Same as above but accepts timestamptz (converted via ts_from_timestamptz). Named with the _at suffix (consistent with samples_at / top_waits_at) to avoid decode_sample(unknown) overload ambiguity

decode_sample / decode_sample_at have EXECUTE revoked from PUBLIC (per the privilege hardening in #45). Grant explicitly to roles that need them, e.g. grant execute on function ash.decode_sample(int4) to my_reader;.

Example

-- All decoded backends recorded at a specific moment, by database:
select db.datname, d.wait_event, d.query_id, d.count
from ash.decode_sample_at('2026-04-19 14:30:00+00'::timestamptz) d
join pg_database db on db.oid = d.datid
order by db.datname, d.wait_event;

Usage

Check status

select * from ash.status();
           metric           |             value
----------------------------+-------------------------------
 version                    | 1.4
 color                      | off
 num_partitions             | 3
 sampling_enabled           | true
 skipped_samples            | 0
 current_slot               | 0
 sample_interval            | 00:00:01
 rotation_period            | 1 day
 raw_retention              | 1 day + current partial
 include_bg_workers         | false
 debug_logging              | false
 installed_at               | 2026-02-16 08:30:00.000000+00
 rotated_at                 | 2026-02-16 08:30:00.000000+00
 time_since_rotation        | 00:09:03.123456
 last_sample_ts             | 2026-02-16 08:39:03+00
 samples_in_current_slot    | 56
 samples_total              | 56
 wait_event_map_count       | 11
 wait_event_map_utilization | 0.03%
 query_map_count            | 8
 rollup_1m_rows             | 540
 rollup_1m_oldest           | 2026-02-16 08:30:00+00
 rollup_1m_newest           | 2026-02-16 08:39:00+00
 rollup_1m_retention        | 30 days
 rollup_1h_rows             | 0
 rollup_1h_retention        | 1825 days
 pg_cron_available          | yes

What hurt recently?

-- morning coffee: what happened overnight?
select * from ash.activity_summary('5 minutes');
        metric        |                                            value
----------------------+---------------------------------------------------------------------------------------------
 time_range           | 00:05:00
 total_samples        | 56
 avg_active_backends  | 6.6
 peak_active_backends | 10
 peak_time            | 2026-02-16 08:38:16+00
 databases_active     | 1
 top_wait_1           | Client:ClientRead (46.77%)
 top_wait_2           | Timeout:PgSleep (11.83%)
 top_wait_3           | Lock:transactionid (9.68%)
 top_query_1          | -2835399305386018931 — COMMIT (29.73%)
 top_query_2          | 3365820675399133794 — UPDATE pgbench_branches SET bbalance = bbalance + \$1 WHERE b (23.24%)
 top_query_3          | -4378765880691287891 — UPDATE pgbench_tellers SET tbalance = tbalance + \$1 WHERE t (11.35%)
-- top wait events (default: top 10 + Other)
select * from ash.top_waits('5 minutes');
     wait_event     | samples |  pct  |                  bar
--------------------+---------+-------+---------------------------------------
 Client:ClientRead  |     174 | 46.77 | ██████████████████████████████ 46.77%
 Timeout:PgSleep    |      44 | 11.83 | ████████ 11.83%
 Lock:transactionid |      36 |  9.68 | ██████ 9.68%
 CPU*               |      35 |  9.41 | ██████ 9.41%
 LWLock:WALWrite    |      31 |  8.33 | █████ 8.33%
 IdleTx             |      26 |  6.99 | ████ 6.99%
 IO:WalSync         |      19 |  5.11 | ███ 5.11%
 Lock:tuple         |       5 |  1.34 | █ 1.34%
 LWLock:LockManager |       2 |  0.54 | █ 0.54%
-- top queries with text from pg_stat_statements
select * from ash.top_queries_with_text('5 minutes', 5);
       query_id       | samples |  pct  | calls  | total_exec_time_ms | mean_exec_time_ms |                             query_text
----------------------+---------+-------+--------+--------------------+-------------------+---------------------------------------------------------------------
 -2835399305386018931 |     110 | 29.73 | 283202 |            1234.56 |              0.00 | commit
  3365820675399133794 |      86 | 23.24 | 283195 |          518349.35 |              1.83 | UPDATE pgbench_branches SET bbalance = bbalance + \$1 WHERE bid = \$2
  5457019535816659310 |      44 | 11.89 |     11 |          195225.25 |          17747.75 | select pg_sleep(\$1)
 -4378765880691287891 |      42 | 11.35 | 283195 |          113278.00 |              0.40 | UPDATE pgbench_tellers SET tbalance = tbalance + \$1 WHERE tid = \$2
-- breakdown by wait event type
select * from ash.top_by_type('5 minutes');
 wait_event_type | samples |  pct  |                       bar
-----------------+---------+-------+-------------------------------------------------
 Client          |     174 | 46.77 | ████████████████████████████████████████ 46.77%
 Timeout         |      44 | 11.83 | ██████████ 11.83%
 Lock            |      41 | 11.02 | █████████ 11.02%
 CPU*            |      35 |  9.41 | ████████ 9.41%
 LWLock          |      33 |  8.87 | ████████ 8.87%
 IdleTx          |      26 |  6.99 | ██████ 6.99%
 IO              |      19 |  5.11 | ████ 5.11%

Analyze a specific query

-- what is query 3365820675399133794 waiting on?
select * from ash.query_waits(3365820675399133794, '5 minutes');
     wait_event     | samples |  pct  |                       bar
--------------------+---------+-------+-------------------------------------------------
 Client:ClientRead  |      32 | 54.24 | ████████████████████████████████████████ 54.24%
 Lock:transactionid |      12 | 20.34 | ███████████████ 20.34%
 LWLock:WALWrite    |       6 | 10.17 | ████████ 10.17%
 CPU*               |       4 |  6.78 | █████ 6.78%
 IO:WalSync         |       3 |  5.08 | ████ 5.08%
 IdleTx             |       2 |  3.39 | ██ 3.39%
-- same, but during a specific time window
select * from ash.query_waits_at(3365820675399133794, '2026-02-16 08:38', '2026-02-16 08:40');

Drill into a wait event

-- which queries are stuck on Lock:transactionid?
select * from ash.event_queries('Lock:transactionid', '1 hour');

-- or by wait type (matches all events of that type)
select * from ash.event_queries('IO', '1 hour');

Browse raw samples

-- see the last 20 decoded samples with query text
select * from ash.samples('10 minutes', 20);
      sample_time       | database_name | active_backends |     wait_event     |       query_id       |                          query_text
------------------------+---------------+-----------------+--------------------+----------------------+--------------------------------------------------------------
 2026-02-16 11:18:51+00 | postgres      |               7 | CPU*               | -2835399305386018931 | END
 2026-02-16 11:18:51+00 | postgres      |               7 | CPU*               |  3365820675399133794 | UPDATE pgbench_branches SET bbalance = bbalance + \$1 WHERE ...
 2026-02-16 11:18:49+00 | postgres      |               5 | Client:ClientRead  |  9144568883098003499 | SELECT abalance FROM pgbench_accounts WHERE aid = \$1
 2026-02-16 11:18:49+00 | postgres      |               5 | IO:WalSync         | -2835399305386018931 | END
 2026-02-16 11:18:49+00 | postgres      |               3 | Lock:transactionid | -2835399305386018931 | END
 2026-02-16 11:18:49+00 | postgres      |               5 | LWLock:WALWrite    | -2835399305386018931 | END
-- raw samples during an incident
select * from ash.samples_at('2026-02-14 03:00', '2026-02-14 03:05', 50);

Dump samples to CSV

Always go through ash.samples() / ash.samples_at() — the underlying ash.sample table stores a packed integer[] and cannot be joined directly. The defaults for ash.samples() are p_interval => '1 hour' and p_limit => 100; pass a large p_limit when dumping.

-- dump every sample from the last hour
\copy (select * from ash.samples('1 hour'::interval, 10000000)) to '/tmp/ash.csv' csv header

-- dump a specific incident window
\copy (select * from ash.samples_at('2026-02-14 03:00', '2026-02-14 03:05', 10000000)) to '/tmp/incident.csv' csv header

Use \copy (psql) rather than server-side COPY TO if /tmp isn't writable by the Postgres user (managed services), and check the exit status — silently redirecting stderr to /dev/null will hide errors like typos in table names.

Timeline chart

Visualize wait event patterns over time — spot spikes, correlate with deployments, see what changed.

select bucket_start, active, detail, chart
from ash.timeline_chart('5 minutes', '30 seconds', 3, 40);
      bucket_start       | active |                             detail                             |                           chart
-------------------------+--------+----------------------------------------------------------------+-----------------------------------------------------------
                         |        |                                                                | █ Client:ClientRead  ▓ LWLock:WALWrite  ░ IdleTx  · Other
 2026-02-16 08:37:30+00  |    2.0 | Other=2.0                                                      | ···········
 2026-02-16 08:38:00+00  |    7.0 | Client:ClientRead=2.3 LWLock:WALWrite=0.8 IdleTx=0.4 Other=3.5 | █████████████▓▓▓▓▓░░····················
 2026-02-16 08:38:30+00  |    6.6 | Client:ClientRead=4.0 LWLock:WALWrite=0.4 IdleTx=0.5 Other=1.7 | ███████████████████████▓▓░░░··········
 2026-02-16 08:39:00+00  |    5.3 | Client:ClientRead=3.3 LWLock:WALWrite=0.3 IdleTx=0.7 Other=1.0 | ███████████████████▓▓░░░░······

Each rank gets a distinct character — (rank 1), (rank 2), (rank 3), (rank 4+), · (Other) — so the breakdown is visible without color.

-- zoom into a specific time window
select * from ash.timeline_chart_at(
  now() - interval '10 minutes', now(),
  '1 minute', 3, 50
);

Experimental: ANSI colors. Enable per-session or per-call — green = CPU*, blue = IO, red = Lock, pink = LWLock, cyan = IPC, yellow = Client, orange = Timeout, teal = BufferPin, purple = Activity, light purple = Extension, light yellow = IdleTx.

-- Option 1: enable once for the session (recommended)
set ash.color = on;

-- Option 2: per-call
select * from ash.top_waits('1 hour', p_color => true);

psql's table formatter escapes ANSI codes — to render colors, pipe through sed:

-- add to ~/.psqlrc for a reusable :color command
\set color '\\g | sed ''s/\\\\x1B/\\x1b/g'' | less -R'

-- then use it
select * from ash.top_waits('1 hour') :color
select * from ash.timeline_chart('1 hour') :color

Colors also render natively in pgcli, DataGrip, and other clients that pass raw bytes.

top_waits with colors:

top_waits with ANSI colors

timeline_chart with colors:

timeline_chart with ANSI colors

Example data generated with pgbench -c 8 -T 65 on Postgres 17 with concurrent lock contention and idle-in-transaction sessions.

Investigate an incident

Use the _at functions with absolute timestamps to zoom into a specific time window:

-- what happened between 3:00 and 3:10 am?
select * from ash.top_waits_at('2026-02-14 03:00', '2026-02-14 03:10');

-- which queries were running during the incident?
select * from ash.top_queries_at('2026-02-14 03:00', '2026-02-14 03:10');

-- minute-by-minute timeline of the incident
select * from ash.wait_timeline_at(
    '2026-02-14 03:00',
    '2026-02-14 03:10',
    '1 minute'
);

LLM-assisted investigation

pg_ash functions chain naturally for how an LLM investigates a problem — each answer tells it what to ask next.

Prompt: "There was a performance issue about 5 minutes ago. Investigate."

Step 1 — the LLM checks the big picture:

select * from ash.activity_summary('10 minutes');
       metric        |  value
---------------------+---------
 samples             | 600
 avg_active_sessions | 4.2
 max_active_sessions | 12
 top_wait_event      | Lock:tuple
 top_query_id        | 7283901445

"Average 4.2 active sessions but peak 12 — something spiked. And Lock:tuple is the top wait event."

Step 2 — drill into the waits:

select * from ash.top_waits('10 minutes');
   wait_event   | samples | pct  |         bar
----------------+---------+------+---------------------
 Lock:tuple     |    2810 |  68% | ████████████████████
 CPU*           |     830 |  20% | ██████
 IO:DataFileRead|     290 |   7% | ██
 Client:ClientRe|     125 |   3% | █
 Other          |      83 |   2% |

"Lock:tuple is 68% of all waits. Multiple sessions fighting over the same rows."

Step 3 — see the timeline:

select * from ash.timeline_chart('10 minutes', '30 seconds', 5, 60);
      bucket_start       | active |                              chart                              |              detail
-------------------------+--------+-----------------------------------------------------------------+----------------------------------
                         |        | █ Lock:tuple  ▓ CPU*  ░ IO:DataFileRead  ▒ Client:ClientRead    |
 2026-02-17 14:00:00+00  |    2.1 | ▓▓▓▓░░···                                                       | CPU*=1.2 IO=0.5 Other=0.4
 2026-02-17 14:00:30+00  |    2.3 | ▓▓▓▓▓░░···                                                      | CPU*=1.4 IO=0.5 Other=0.4
 2026-02-17 14:01:00+00  |    3.8 | ██▓▓▓▓░░····                                                    | Lock=0.6 CPU*=1.5 IO=0.5 Other=1.2
 2026-02-17 14:01:30+00  |    8.5 | █████████████████▓▓▓░░···                                       | Lock=5.8 CPU*=1.3 IO=0.5 Other=0.9
 2026-02-17 14:02:00+00  |   12.0 | █████████████████████████████▓▓░░··                             | Lock=9.8 CPU*=1.1 IO=0.4 Other=0.7
 2026-02-17 14:02:30+00  |   11.2 | ███████████████████████████▓▓░░··                               | Lock=9.0 CPU*=1.0 IO=0.5 Other=0.7
 2026-02-17 14:03:00+00  |    4.1 | ███▓▓▓▓░░····                                                   | Lock=1.2 CPU*=1.5 IO=0.6 Other=0.8
 2026-02-17 14:03:30+00  |    2.0 | ▓▓▓▓░···                                                        | CPU*=1.1 IO=0.4 Other=0.5
 2026-02-17 14:04:00+00  |    2.2 | ▓▓▓▓░░···                                                       | CPU*=1.3 IO=0.5 Other=0.4

"The spike is clearly 14:01:30 to 14:02:30 — Lock:tuple dominates. Let me find which queries."

Step 4 — find the guilty queries:

select * from ash.event_queries_at(
  'Lock:tuple',
  '2026-02-17 14:01:30', '2026-02-17 14:02:30'
);
  query_id   | samples | pct  |         bar
-------------+---------+------+---------------------
  7283901445 |     412 |  85% | ████████████████████
  9102384756 |      53 |  11% | ███
  Other      |      19 |   4% | █

"Query 7283901445 accounts for 85% of the lock waits."

Step 5 — get the SQL text:

select * from ash.top_queries_with_text('10 minutes');
  query_id   | samples | pct  | mean_exec_time_ms | total_exec_time_ms |                    query_text
-------------+---------+------+-------------------+--------------------+--------------------------------------------------
  7283901445 |    2395 |  58% |            842.30 |          534120.50 | UPDATE orders SET status = \$1 WHERE id = \$2
  9102384756 |     530 |  13% |            215.60 |           42347.20 | UPDATE orders SET shipped_at = \$1 WHERE id = \$2

LLM's conclusion:

Root cause: multiple concurrent UPDATE orders ... WHERE id = \$2 statements are contending on the same rows (Lock:tuple). Two different update patterns hit the orders table — status updates and shipping updates — and when they target overlapping rows, they serialize on tuple locks.

Mitigation options:

  1. Use SELECT ... FOR UPDATE SKIP LOCKED to skip already-locked rows and process them later
  2. Batch the status and shipping updates into a single statement to reduce lock duration
  3. If these run from a queue worker, reduce concurrency or partition the work by order ID range

How it works

Sampling

ash.take_sample() runs every second via pg_cron. It reads pg_stat_activity, groups active backends by (wait_event_type, wait_event, state), and encodes the result into a single integer[] per database:

{-5, 3, 101, 102, 103, -1, 2, 104, 105, -8, 1, 106}
 │   │  │              │  │  │           │  │  │
 │   │  └─ query_ids   │  │  └─ qids     │  │  └─ qid
 │   └─ count=3        │  └─ count=2     │  └─ count=1
 └─ wait_event_id=5    └─ weid=1         └─ weid=8

6 active backends across 3 wait events = 1 row, 12 array elements. Each query_id is one backend — if two backends run the same query, the same map_id appears twice (the count reflects total backends, not distinct queries). Full row size: 24 (tuple header) + 4 (sample_ts) + 4 (datid) + 2 (active_count) + 2 (slot) + 68 (array: 20-byte header + 12 × 4) + alignment = 106 bytes (measured with pg_column_size).

Dictionary tables

TablePurpose
ash.wait_event_mapMaps (state, wait_event_type, wait_event) to integer IDs
ash.query_map_0..{N-1}Maps query_id (from pg_stat_activity) to integer IDs (one per sample partition, truncated on rotation)
ash.rollup_1mPer-minute aggregated samples (30-day retention)
ash.rollup_1hPer-hour aggregated rollups (5-year retention)

Dictionaries are auto-populated by the sampler. Wait events are stable (~600 entries max across all Postgres versions). Query map grows as new queries appear and is garbage-collected based on last_seen.

Encoding version is tracked in ash.config.encoding_version, not in the array itself — zero per-row overhead.

Note on CPU*: When wait_event_type and wait_event are both NULL in pg_stat_activity, the backend is active but not in a known wait state. This is either genuine CPU work or an uninstrumented code path where Postgres does not report a wait event. The asterisk signals this ambiguity. See gaps.wait.events for details on uninstrumented wait events in Postgres — these gaps are being closed over time, making CPU* increasingly accurate.

Rotation

Skytools PGQ-style N-partition ring buffer (default N=3, configurable 3–32 via ash.rebuild_partitions(N, 'yes')). Physical tables (sample_0 through sample_{N-1}) rotate at rotation_period intervals. TRUNCATE replaces the oldest partition — zero dead tuples, zero bloat, no VACUUM needed for sample tables.

N-1 partitions hold data at any time. One is always empty, ready for the next rotation. Before truncation, rotate() calls rollup_minute() to aggregate endangered samples into rollup tables.

┌──────────┐  ┌───────────┐  ┌──────────┐    ┌───────────────┐
│ sample_0 │  │ sample_1  │  │ sample_2 │    │ sample_{N-1}  │
│ (today)  │  │(yesterday)│  │ (empty)  │... │ (readable)    │
│ writing  │  │ readable  │  │ next     │    │               │
└──────────┘  └───────────┘  └──────────┘    └───────────────┘
                              ↑ TRUNCATE + rotate

Reader optimization

Reader functions decode arrays inline using generate_subscripts() with direct array subscript access. This avoids per-row plpgsql function calls and is 9-17x faster than the CROSS JOIN LATERAL decode_sample() approach.

Storage

Raw samples

Active backendsStorage/dayMax on disk (N-1 partitions, default N=3)
1011 MiB22 MiB
5030 MiB60 MiB
10050 MiB100 MiB
200100 MiB200 MiB
500245 MiB490 MiB

At 500+ backends, TOAST LZ4 compression reduces actual storage. Increasing num_partitions increases the number of days kept, not the daily rate.

Rollup tables

LevelRetentionRows/dbStorage/db
1-minute (rollup_1m)30 days~43,200~43 MiB
1-hour (rollup_1h)5 years~43,800~77 MiB

Total: ~120 MiB per database for 5 years of trend data.

Performance

Measured on Postgres 17, 50 backends, 1s sampling, jit = off (median of 10 runs, warm cache):

MetricResult
top_waits('1 hour')30 ms
top_waits('24 hours')6.1 s
top_queries_with_text('1 hour')31 ms
take_sample() overhead53 ms
WAL per sample~29 KiB (~2.4 GiB/day)
Rotation (1-day partition)9 ms
Dead tuples after rotation0

See issue #1 for full benchmarks — EXPLAIN ANALYZE output, backend scaling, multi-database tests, WAL analysis, and concurrency testing.

Requirements

  • Postgres 14+ (requires query_id in pg_stat_activity)
  • pg_cron 1.5+ (optional — for built-in scheduling; see Scheduling without pg_cron for alternatives)
  • pg_stat_statements (optional but recommended — enables query text and execution metrics; without it, top_queries_with_text(), event_queries(), and event_queries_at() will error, and top_queries(), samples() will return NULL for query_text)

Note on query_id: The default compute_query_id = auto only populates query_id when pg_stat_statements is in shared_preload_libraries. If query_id is NULL in pg_stat_activity, set:

alter system set compute_query_id = 'on';
-- requires reload: select pg_reload_conf();

Configuration

-- change sampling interval (default: 1 second)
select ash.stop();
select ash.start('5 seconds');

-- change rotation interval (default: 1 day)
update ash.config set rotation_period = '12 hours';

-- check current configuration
select * from ash.status();

Defaults

All configuration is in the ash.config singleton table:

SettingDefaultDescription
sample_interval1 secondTime between samples
rotation_period1 dayHow often partitions rotate
num_partitions3Number of sample partitions (3–32)
include_bg_workersfalseSample autovacuum, logical replication, parallel workers
debug_loggingfalseRAISE LOG for every sampled session
rollup_1m_retention_days30How long to keep minute-level rollups
rollup_1h_retention_days1825How long to keep hourly rollups (5 years)
rollup_min_backend_seconds3Minimum backend-seconds for a query to appear in rollup query_counts

Configurable partitions

By default, pg_ash uses 3 partitions (1 day of history + current partial). To keep more raw sample history, increase the partition count:

-- keep 7 days of raw samples (9 partitions × 1-day rotation = 7 readable days + current)
-- 'yes' is required because the call drops all raw sample data
select ash.rebuild_partitions(9, 'yes');

-- resume sampling after rebuild
select ash.start();

-- verify
select * from ash.status();
--  num_partitions  | 9
--  raw_retention   | 7 days + current partial

The retention formula is (N - 2) × rotation_period. The minimum is 3 (current + previous + one being truncated), the maximum is 32.

rebuild_partitions() is destructive — all raw samples are lost. To prevent accidents, the call requires a 'yes' confirmation token (e.g. ash.rebuild_partitions(9, 'yes')); calling it without 'yes' raises an error and changes nothing. Rollup tables survive. You must call ash.start() afterward to resume sampling.

Raw samples rotate away after (N-2) × rotation_period. Rollup tables preserve aggregated data for long-term trend analysis:

  • rollup_1m: per-minute aggregates, kept for 30 days (~43 MiB/db)
  • rollup_1h: per-hour aggregates, kept for 5 years (~77 MiB/db)

Rollups are populated automatically when pg_cron is available (ash.start() schedules them). Without pg_cron, schedule externally:

# Every minute: aggregate raw samples into minute rollups
* * * * * psql -qAtX -d mydb -c "SELECT ash.rollup_minute();"

# Every hour: aggregate minutes into hourly rollups
0 * * * * psql -qAtX -d mydb -c "SELECT ash.rollup_hour();"

# Daily at 3am: delete expired rollup rows
0 3 * * * psql -qAtX -d mydb -c "SELECT ash.rollup_cleanup();"

Query rollup data with the rollup reader functions:

-- what were the top wait events in the last 6 hours? (from minute rollups)
select * from ash.minute_waits('6 hours');

-- top queries over the last week (from hourly rollups)
select * from ash.hourly_queries('7 days');

-- peak concurrency trend over the last 30 days
select * from ash.daily_peak_backends('30 days');

-- investigate a specific time range (even if raw samples are gone)
select * from ash.minute_waits_at('2026-03-01 02:00', '2026-03-01 03:00');

Rollups use backend-seconds as the count unit (Oracle ASH-compatible). Each sample appearance = 1 backend-second at 1s sampling interval.

To change retention:

update ash.config set rollup_1m_retention_days = 14 where singleton;   -- keep 2 weeks
update ash.config set rollup_1h_retention_days = 365 where singleton;  -- keep 1 year

Debug logging

Enable per-session RAISE LOG output from take_sample() — useful for diagnosing connection pooler issues (e.g., PgBouncer mapping client_addr to pooler sessions):

-- check current state
select ash.set_debug_logging();

-- enable: each take_sample() call logs every active session to the Postgres log
select ash.set_debug_logging(true);

-- sample output in the Postgres server log:
-- LOG: ash.take_sample: pid=107 state=active wait_type=CPU* wait_event=CPU* backend_type=client backend query_id=-5287352711091412819
-- LOG: ash.take_sample: pid=108 state=idle in transaction wait_type=Client wait_event=ClientRead backend_type=client backend query_id=-6949053775937549307

-- disable
select ash.set_debug_logging(false);

pg_cron run history

pg_cron logs every job execution to cron.job_run_details. At 1-second sampling, this adds ~12 MiB/day of unbounded growth with no built-in purge.

Recommended: disable cron.log_run. Errors from failed jobs still appear in the Postgres server log (cron.log_min_messages defaults to WARNING) — you lose nothing important, only the job_run_details table entries.

alter system set cron.log_run = off;
-- requires Postgres restart (postmaster context)

If you need run history for other pg_cron jobs (unfortunately, as of pg_cron 1.6, per-job logging configuration is not supported), schedule periodic cleanup instead:

select cron.schedule(
  'ash_purge_cron_log',
  '0 * * * *',
  $$delete from cron.job_run_details where end_time < now() - interval '1 day'$$
);

ash.start() will warn about this overhead.

Scheduling without pg_cron

pg_cron is optional. All core functions — ash.take_sample(), ash.rotate(), and all reporting — work without it. When pg_cron is not installed, ash.start('1 second') records the intended interval in ash.config and prints instructions for external scheduling.

You can call ash.take_sample() from any external scheduler:

System cron (1-minute minimum granularity):

# Every minute
* * * * * psql -qAtX -d mydb -c "SET statement_timeout='500ms'; SELECT ash.take_sample();"

# Every second (cron launches a loop each minute)
* * * * * for i in $(seq 1 59); do psql -qAtX -d mydb -c "SET statement_timeout='500ms'; SELECT ash.take_sample();"; sleep 1; done

Dedicated loop script (most reliable for 1-second sampling):

#!/bin/bash
# ash_sampler.sh — run via systemd, screen, tmux, or nohup
while true; do
  psql -qAtX -d mydb -c "SET statement_timeout='500ms'; SELECT ash.take_sample();" 2>/dev/null
  sleep 1
done

systemd timer (Linux, precise 1-second ticking):

# /etc/systemd/system/ash-sampler.service
[Service]
Type=oneshot
ExecStart=psql -qAtX -d mydb -c "SET statement_timeout='500ms'; SELECT ash.take_sample();"
User=postgres

# /etc/systemd/system/ash-sampler.timer
[Timer]
OnActiveSec=0
OnUnitActiveSec=1s
AccuracySec=100ms
[Install]
WantedBy=timers.target

psql \watch (quick ad-hoc testing):

SELECT ash.take_sample() \watch 1

Any language (Python example):

import psycopg2, time
conn = psycopg2.connect("dbname=mydb")
conn.autocommit = True
while True:
    with conn.cursor() as cur:
        cur.execute("SET statement_timeout='500ms'; SELECT ash.take_sample()")
    time.sleep(1)

Don't forget to also schedule rotation and rollups:

# System cron: rotate daily at midnight
0 0 * * * psql -qAtX -d mydb -c "SELECT ash.rotate();"

# Rollup: every minute, every hour, daily cleanup
* * * * * psql -qAtX -d mydb -c "SELECT ash.rollup_minute();"
0 * * * * psql -qAtX -d mydb -c "SELECT ash.rollup_hour();"
0 3 * * * psql -qAtX -d mydb -c "SELECT ash.rollup_cleanup();"

Privileges

pg_ash installs with a locked-down privilege model: admin functions (ash.start(), ash.stop(), ash.rotate(), ash.take_sample(), ash.set_debug_logging(), ash.uninstall()) are restricted to the schema owner, and EXECUTE on all reader functions plus SELECT on reader tables (ash.sample, ash.query_map_all, ash.config, ash.wait_event_map, and per-slot partitions) is revoked from PUBLIC. The installing role retains full access.

Grant access to a monitoring or read-only role with the convenience helpers:

-- one call, minimum privileges: USAGE on schema ash, EXECUTE on every
-- public reader function, SELECT on the tables readers depend on
-- (sample + partitions, query_map_all + partitions, config, wait_event_map,
-- rollup_1m, rollup_1h). Idempotent.
create role grafana login password 'xxx';
select ash.grant_reader('grafana');

-- ...later, take it back. Symmetric undo of grant_reader().
select ash.revoke_reader('grafana');

Both helpers are owner-only, validate the role exists in pg_roles, quote the role name, and emit a RAISE NOTICE summarizing what changed.

Note: If you subsequently change the partition count via ash.rebuild_partitions(N, 'yes'), previously-granted reader roles will lose access to the new partition tables. Re-run ash.grant_reader(...) for each monitoring role after any rebuild_partitions call.

Note on pg_cron visibility: ash.grant_reader() does not grant USAGE ON SCHEMA cron, since pg_cron is not an ash object. When pg_cron is loaded but the monitoring role lacks USAGE on schema cron, ash.status() emits a single fallback row of the form cron_jobs = '<no cron.job access; grant USAGE ON SCHEMA cron TO <role>>' instead of per-job cron_job_* rows. To surface real cron job details, either run grant usage on schema cron to <role> (and grant select on cron.job to <role>) once, or simply ignore the row.

If you prefer manual control, the equivalent explicit grants are:

-- allow a monitoring role to call the readers
grant usage on schema ash to my_monitor_role;
grant execute on function ash.top_waits(interval, int, int)           to my_monitor_role;
grant execute on function ash.top_queries(interval, int)              to my_monitor_role;
grant execute on function ash.top_queries_with_text(interval, int)    to my_monitor_role;
grant execute on function ash.samples(interval, int)                  to my_monitor_role;
grant execute on function ash.status()                                to my_monitor_role;

-- or grant all reader functions at once
grant execute on all functions in schema ash to my_monitor_role;

-- grant direct read on raw tables for ad-hoc SQL (optional)
grant select on all tables in schema ash to my_monitor_role;

pg_stat_statements in a non-default schema

pgss reader functions (top_queries, top_queries_at, top_queries_with_text, samples, samples_at, event_queries, event_queries_at) need the pg_stat_statements schema on their search_path. Install detects it automatically. If you install pg_stat_statements after pg_ash, or move it to a non-default schema, re-apply:

-- detect the pgss schema and re-apply search_path on pgss readers
select ash._apply_pgss_search_path();

Known limitations

  • Primary only — pg_ash requires writes (INSERT into sample tables, TRUNCATE on rotation), so it cannot run on physical standbys or read replicas. Install it on the primary; it samples all databases from there.
  • Observer-effect protection — the sampler pg_cron command includes SET statement_timeout = '500ms' to prevent take_sample() from becoming a problem on overloaded servers. If pg_stat_activity is slow (thousands of backends), the sample is canceled rather than piling up. Normal execution is ~50ms — the 500ms cap gives 10× headroom. Adjust in cron.job if needed.
  • Sampling gaps under heavy load — pg_cron runs in a single background worker and under heavy load (lock storms, many concurrent sessions) it can't always keep up with the 1-second schedule. You may see gaps of 8s, 13s, or even 30s+ between samples — ironically during the most interesting moments. This is a fundamental pg_cron limitation, not a bug. If precise 1-second sampling matters, use an external sampler which is more reliable under load.
  • 24-hour raw sample queries are slow (~6s for full-day scan) — use rollup reader functions (ash.minute_waits, ash.hourly_queries, ash.daily_peak_backends) for queries over long time ranges.
  • JIT protection built in — all reader functions use SET jit = off to prevent JIT compilation overhead (which can be 10-750x slower depending on Postgres version and dataset size). No global configuration needed.
  • Single-database install — pg_ash installs in one database and samples all databases from there. Per-database filtering works via the datid column.
  • query_map hard cap at 50k entries — on Postgres 14-15, volatile SQL comments (e.g., marginalia, sqlcommenter with session IDs or timestamps) produce unique query_id values that are not normalized. This can flood the query_map partitions. A hard cap of 50,000 entries per partition prevents unbounded growth — queries beyond the cap are tracked as "unknown." PG16+ normalizes comments, so this is rarely hit. Check query_map_count in ash.status() to monitor.
  • Parallel query workers counted individually — parallel workers share the same query_id as the leader but are counted as separate backends. This inflates the apparent "weight" of parallel queries in top_queries(). leader_pid grouping is not yet implemented.
  • WAL overhead — 1-second sampling generates ~29 KiB WAL per sample (~2.4 GiB/day), dominated by full_page_writes. This is significant for WAL-sensitive replication setups. Consider 5-second or 10-second sampling intervals (ash.start('5 seconds')) if WAL volume is a concern. The overhead scales linearly with sampling frequency.
  • Epoch overflow horizon (~2094)sample_ts is stored as int4 seconds since 2026-01-01 UTC and int4 is exhausted around 2094-01-19. Past that point, the ::int4 cast in ash.take_sample() raises ERROR: integer out of range and sampling hard-fails (it does NOT silently wrap). ash.status() exposes epoch_seconds_remaining so operators can plan a bigint migration of the column well before the horizon. See issue #37.

License

Apache 2.0


pg_ash is part of SAMO — self-driving Postgres.