WAL Compaction and Truncation

May 18, 2026 · View on GitHub

Managing Write-Ahead Log growth in NornicDB

Last Updated: December 2025

Overview

NornicDB's Write-Ahead Log (WAL) supports automatic compaction to prevent unbounded growth. Without compaction, the WAL would grow indefinitely in long-running databases, consuming disk space and slowing recovery.

Problem Solved: WAL grows forever until manual snapshot + delete Solution: Automatic periodic snapshots with WAL truncation

Features

1. Automatic Compaction (Recommended)

Automatic compaction is the recommended approach for production deployments. Enable it through configuration:

YAML configuration:

database:
  wal_dir: "data/wal"
  wal_sync_mode: "batch"
  wal_snapshot_interval: "1h"       # Create snapshots hourly
  wal_auto_compaction_enabled: true  # Enabled by default
  wal_snapshot_dir: "data/snapshots"

Environment variables:

export NORNICDB_WAL_SNAPSHOT_INTERVAL=1h
export NORNICDB_WAL_AUTO_COMPACTION_ENABLED=true

Behavior:

Snapshots created at configured interval (default: 1 hour)
WAL truncated after each successful snapshot
Failures logged but don't crash the database
Automatic retry on next interval
Old snapshots saved to the snapshot directory as snapshot-<timestamp>.json

2. Manual WAL Truncation

For development or special cases, you can trigger truncation manually. Create a snapshot and then truncate the WAL to remove all entries before the snapshot point.

Safety Guarantees:

Atomic rename (crash-safe)
Old WAL remains intact until truncation succeeds
Can retry truncation if it fails
Recovery works from partial truncations

3. Disable Automatic Compaction

database:
  wal_auto_compaction_enabled: false

export NORNICDB_WAL_AUTO_COMPACTION_ENABLED=false

4. Retention Settings (Immutable Segments)

NornicDB stores WAL as immutable segments with a manifest. You can retain sealed segments for audit/ledger use cases.

YAML configuration:

database:
  wal_retention_max_segments: 24
  wal_retention_max_age: "168h" # 7 days

Environment variables:

export NORNICDB_WAL_RETENTION_MAX_SEGMENTS=24
export NORNICDB_WAL_RETENTION_MAX_AGE=168h
export NORNICDB_WAL_LEDGER_RETENTION_DEFAULTS=true

These settings retain sealed WAL segments after snapshots. Auto-compaction remains enabled by default to preserve existing behavior; retention is opt-in.

5. Txlog Query Procedures

You can query WAL entries directly via Cypher:

// Scan recent entries (no args = recent window)
CALL db.txlog.entries() YIELD txId, db, kind, seq, timestamp, payload
RETURN seq, kind, txId, timestamp, payload
ORDER BY seq;

// Read entries for a specific transaction
CALL db.txlog.byTxId('tx-123') YIELD txId, db, kind, seq, timestamp, payload
RETURN seq, kind, txId, timestamp, payload
ORDER BY seq;

db.txlog.entries accepts up to 4 optional positional args (filter parameters); pass none for a recent-window scan. db.txlog.byTxId takes a single transaction ID. The yield columns are fixed: txId, db, kind, seq, timestamp, payload.

How It Works

Compaction Process

When a compaction cycle runs, NornicDB flushes any pending writes, then creates a point-in-time snapshot of the current database state. Once the snapshot is safely persisted, the WAL is rewritten to contain only entries that arrived after the snapshot. The old WAL file is replaced via an atomic rename so the operation is crash-safe — at no point can a crash leave the WAL in a partial or corrupt state.

Crash Safety

The truncation process is crash-safe at every step:

Before rename: Old WAL is intact
During rename: Atomic operation (old or new, never partial)
After rename: New WAL is complete and synced

If a crash occurs:

Before rename: Old WAL used on recovery (full history)
After rename: New WAL used on recovery (snapshot + delta)

Recovery

With auto-compaction enabled:

Recovery = Latest Snapshot + Post-Snapshot WAL Entries

Example timeline:

T=0:   Database starts
T=1h:  Snapshot 1 created (100 nodes), WAL truncated
T=2h:  Snapshot 2 created (150 nodes), WAL truncated
T=2.5h: Crash occurs (170 nodes in database)

Recovery:
  Load Snapshot 2 (150 nodes)
  + Replay WAL since T=2h (20 new nodes)
  = 170 nodes recovered

Performance Impact

Disk Space

Before compaction:

WAL size grows unbounded:
  After 1 day:  ~10GB
  After 1 week: ~70GB
  After 1 month: ~300GB

After compaction (hourly):

WAL size bounded by interval:
  Maximum size: ~500MB (1 hour of writes)
  Average size: ~250MB
  Disk savings: 99%+

Recovery Time

Before compaction:

Recovery time = O(total history)
  1 day:  ~30 seconds
  1 week: ~3 minutes
  1 month: ~15 minutes

After compaction:

Recovery time = Snapshot load + O(interval writes)
  Load snapshot: ~2 seconds
  Replay WAL:    ~1 second
  Total:         ~3 seconds (constant!)

Runtime Overhead

Snapshot creation: ~2-5ms per 1000 nodes (async, doesn't block writes)
WAL truncation: ~10-50ms (happens every hour, negligible amortized cost)
Total overhead: <0.001% of runtime

Configuration

WAL Settings

Setting	Default	Description
`wal_dir`	`data/wal`	WAL directory
`wal_sync_mode`	`batch`	Sync mode: `immediate`, `batch`, or `none`
`wal_batch_sync_interval`	`100ms`	Batch sync frequency
`wal_max_file_size`	`100MB`	File rotation trigger (bytes)
`wal_max_entries`	`100000`	File rotation trigger (count)
`wal_snapshot_interval`	`1h`	Auto-compaction frequency
`wal_auto_compaction_enabled`	`true`	Enable/disable auto-compaction

Tuning Snapshot Interval

Aggressive (every 15 minutes):

Minimal WAL size
Faster recovery
More snapshot overhead
Good for: High-write, limited disk space

Moderate (every hour — default):

Balanced disk usage
Good recovery time
Low overhead
Good for: Most use cases

Conservative (every 6 hours):

Larger WAL size
Slower recovery
Minimal overhead
Good for: Low-write, plenty of disk space

Monitoring

NornicDB exposes compaction metrics that you can use to monitor WAL health:

Total snapshots created — number of successful compaction cycles since startup
Last snapshot time — timestamp of the most recent snapshot
WAL entry count — current number of entries in the active WAL
WAL bytes written — total bytes in the active WAL

These metrics are available through the server's diagnostics and can be monitored via the admin UI or log output.

Troubleshooting

Issue: WAL still growing despite auto-compaction

Check:

Verify auto-compaction is enabled in your configuration (wal_auto_compaction_enabled: true)

Check snapshot directory for recent files:

ls -lh data/snapshots/
# Should see snapshot-<timestamp>.json files

Check WAL size:
```
ls -lh data/wal/wal.log
```

Issue: Truncation errors

Symptom: Logs show "failed to truncate WAL"

Causes:

Disk full
Permission issues
WAL file locked by another process

Solution:

# Check disk space
df -h

# Check permissions
ls -l data/wal/
chmod 644 data/wal/wal.log

# Check for locks
lsof | grep wal.log

Issue: Slow recovery after crash

Check snapshot age:

ls -lt data/snapshots/ | head -1

If snapshot is old, auto-compaction may not be running. Verify your configuration and check server logs for compaction errors.

Best Practices

Always enable auto-compaction in production — this is the default and should not be disabled unless you have a specific reason.
Monitor snapshot creation — check server logs or metrics to confirm snapshots are being created at the expected interval.

Rotate old snapshots to avoid filling disk with historical snapshots:

find data/snapshots -name "snapshot-*.json" -mtime +7 -delete

Test recovery regularly — periodically verify that your latest snapshot can be loaded and that the WAL replays correctly.