pfc-archiver-clickhouse

May 20, 2026 · View on GitHub

License: MIT Python PFC-JSONL Version

A standalone daemon that runs alongside ClickHouse, watches for data older than a configurable retention window, compresses it to PFC format, and writes it to local storage or S3 — automatically.

Runs as a sidecar or cron job — no schema changes, no plugins, no ClickHouse modifications.


How it works

Every interval_seconds (default: 3600), pfc-archiver-clickhouse runs one archive cycle:

SCAN  ->  EXPORT  ->  COMPRESS  ->  UPLOAD  ->  VERIFY  ->  (optional DELETE)  ->  LOG
  1. SCAN — compute which time partitions in ClickHouse are older than retention_days
  2. EXPORT — read rows in partition_days-sized chunks via HTTP interface (clickhouse-connect)
  3. COMPRESS — pipe through pfc_jsonl compress.pfc + .pfc.bidx
  4. UPLOAD — write to output_dir (local path or s3://bucket/prefix/)
  5. VERIFY — decompress and count rows; must match exported count exactly
  6. DELETE (optional) — remove archived rows from ClickHouse (only if delete_after_archive = true)
  7. LOG — write a JSON run log to log_dir

Supported databases

DatabaseProtocolDefault port
ClickHouseHTTP (clickhouse-connect)8123
ClickHouse CloudHTTPS (clickhouse-connect, secure = true)8443

Install

pip install pfc-archiver-clickhouse

# With S3 upload support
pip install "pfc-archiver-clickhouse[s3]"

# Or from source
git clone https://github.com/ImpossibleForge/pfc-archiver-clickhouse
cd pfc-archiver-clickhouse
pip install -r requirements.txt

Also required: pfc_jsonl binary on your PATH (or set via --pfc-binary):

# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# macOS Apple Silicon (M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# macOS Intel:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# Windows (PowerShell):
Invoke-WebRequest https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-windows-x64.exe `
  -OutFile "$env:LOCALAPPDATA\Microsoft\WindowsApps\pfc_jsonl.exe"

Quickstart

# 1. Copy and edit the config
cp config/clickhouse.toml myconfig.toml

# 2. Dry-run — safe, shows what would be archived
python pfc_archiver_clickhouse.py --config myconfig.toml --dry-run

# 3. Single cycle
python pfc_archiver_clickhouse.py --config myconfig.toml --once

# 4. Daemon mode
python pfc_archiver_clickhouse.py --config myconfig.toml

Configuration

[db]
host      = "localhost"
port      = 8123              # HTTP interface (8443 for HTTPS/TLS)
database  = "default"
user      = "default"
password  = ""
secure    = false             # true for ClickHouse Cloud

table     = "logs"            # table to archive
ts_column = "timestamp"       # timestamp column for time-range queries

# Delete mode (only used when delete_after_archive = true):
#   "delete"         — lightweight DELETE WHERE (ClickHouse 22.8+, default)
#   "drop_partition" — ALTER TABLE DROP PARTITION via system.parts (instant, atomic)
delete_mode = "delete"

[archive]
retention_days       = 90    # archive data older than this many days
partition_days       = 30    # one archive file per N days (matches toYYYYMM default)
output_dir           = "s3://my-bucket/cold-storage/"
verify               = true
delete_after_archive = false  # change to true only after testing!
log_dir              = "./archive_logs/"

# S3 options (if output_dir starts with s3://)
# s3_region    = "eu-central-1"
# s3_endpoint  = ""            # leave empty for AWS, or set MinIO URL
# s3_access_key = ""           # leave empty to use env vars / IAM role
# s3_secret_key = ""

[daemon]
interval_seconds = 3600       # run every hour

Delete modes

"delete" (default)

Uses ClickHouse lightweight DELETE (introduced in 22.8):

DELETE FROM `database`.`table` WHERE `ts_col` >= '...' AND `ts_col` < '...'

Works on any MergeTree table regardless of partitioning scheme.

"drop_partition"

Uses ALTER TABLE DROP PARTITION — instant and atomic. Ideal when the table is partitioned by time (e.g. PARTITION BY toYYYYMM(ts)):

-- Finds partition IDs via system.parts, then for each:
ALTER TABLE `database`.`table` DROP PARTITION '202401'

Requires the archived time range to align with partition boundaries.


Output format

Archive files are written as .pfc + .pfc.bidx pairs:

s3://my-bucket/cold-storage/
├── logs__20240101__20240201.pfc       ← compressed JSONL (~8-10% of original)
├── logs__20240101__20240201.pfc.bidx  ← block index for time-range queries
├── logs__20240201__20240301.pfc
└── logs__20240201__20240301.pfc.bidx

The .pfc.bidx file enables random access — query a time window with DuckDB and only the relevant blocks are decompressed. No full download needed.

INSTALL pfc FROM community;
SELECT * FROM pfc_scan('s3://my-bucket/cold-storage/logs__*.pfc')
WHERE timestamp BETWEEN '2024-01-15' AND '2024-01-16';

Run as a systemd service

[Unit]
Description=PFC ClickHouse Archiver
After=network.target

[Service]
ExecStart=/usr/bin/python3 /opt/pfc-archiver-clickhouse/pfc_archiver_clickhouse.py \
    --config /etc/pfc/clickhouse.toml
Restart=on-failure
RestartSec=60

[Install]
WantedBy=multi-user.target

Part of the PFC Ecosystem

→ View all PFC tools & integrations

Direct integrationWhy
pfc-export-clickhouseOne-shot CLI export instead of daemon — same ClickHouse connection, no scheduling needed
pfc-duckdbQuery the archives this daemon creates — DuckDB community extension, time-range queries without full decompress
pfc-gatewayHTTP REST query layer over .pfc archives — no DuckDB required
pfc-archiver-questdbSame archiver pattern for QuestDB
pfc-archiver-cratedbSame archiver pattern for CrateDB

Disclaimer

pfc-archiver-clickhouse is an independent open-source project and is not affiliated with, endorsed by, or associated with ClickHouse, Inc. or the ClickHouse project.


License

pfc-archiver-clickhouse (this repository) is released under the MIT License — see LICENSE.

The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com