pfc-archiver-clickhouse
May 20, 2026 · View on GitHub
A standalone daemon that runs alongside ClickHouse, watches for data older than a configurable retention window, compresses it to PFC format, and writes it to local storage or S3 — automatically.
Runs as a sidecar or cron job — no schema changes, no plugins, no ClickHouse modifications.
How it works
Every interval_seconds (default: 3600), pfc-archiver-clickhouse runs one archive cycle:
SCAN -> EXPORT -> COMPRESS -> UPLOAD -> VERIFY -> (optional DELETE) -> LOG
- SCAN — compute which time partitions in ClickHouse are older than
retention_days - EXPORT — read rows in
partition_days-sized chunks via HTTP interface (clickhouse-connect) - COMPRESS — pipe through
pfc_jsonl compress→.pfc+.pfc.bidx - UPLOAD — write to
output_dir(local path ors3://bucket/prefix/) - VERIFY — decompress and count rows; must match exported count exactly
- DELETE (optional) — remove archived rows from ClickHouse (only if
delete_after_archive = true) - LOG — write a JSON run log to
log_dir
Supported databases
| Database | Protocol | Default port |
|---|---|---|
| ClickHouse | HTTP (clickhouse-connect) | 8123 |
| ClickHouse Cloud | HTTPS (clickhouse-connect, secure = true) | 8443 |
Install
pip install pfc-archiver-clickhouse
# With S3 upload support
pip install "pfc-archiver-clickhouse[s3]"
# Or from source
git clone https://github.com/ImpossibleForge/pfc-archiver-clickhouse
cd pfc-archiver-clickhouse
pip install -r requirements.txt
Also required: pfc_jsonl binary on your PATH (or set via --pfc-binary):
# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS Apple Silicon (M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS Intel:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# Windows (PowerShell):
Invoke-WebRequest https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-windows-x64.exe `
-OutFile "$env:LOCALAPPDATA\Microsoft\WindowsApps\pfc_jsonl.exe"
Quickstart
# 1. Copy and edit the config
cp config/clickhouse.toml myconfig.toml
# 2. Dry-run — safe, shows what would be archived
python pfc_archiver_clickhouse.py --config myconfig.toml --dry-run
# 3. Single cycle
python pfc_archiver_clickhouse.py --config myconfig.toml --once
# 4. Daemon mode
python pfc_archiver_clickhouse.py --config myconfig.toml
Configuration
[db]
host = "localhost"
port = 8123 # HTTP interface (8443 for HTTPS/TLS)
database = "default"
user = "default"
password = ""
secure = false # true for ClickHouse Cloud
table = "logs" # table to archive
ts_column = "timestamp" # timestamp column for time-range queries
# Delete mode (only used when delete_after_archive = true):
# "delete" — lightweight DELETE WHERE (ClickHouse 22.8+, default)
# "drop_partition" — ALTER TABLE DROP PARTITION via system.parts (instant, atomic)
delete_mode = "delete"
[archive]
retention_days = 90 # archive data older than this many days
partition_days = 30 # one archive file per N days (matches toYYYYMM default)
output_dir = "s3://my-bucket/cold-storage/"
verify = true
delete_after_archive = false # change to true only after testing!
log_dir = "./archive_logs/"
# S3 options (if output_dir starts with s3://)
# s3_region = "eu-central-1"
# s3_endpoint = "" # leave empty for AWS, or set MinIO URL
# s3_access_key = "" # leave empty to use env vars / IAM role
# s3_secret_key = ""
[daemon]
interval_seconds = 3600 # run every hour
Delete modes
"delete" (default)
Uses ClickHouse lightweight DELETE (introduced in 22.8):
DELETE FROM `database`.`table` WHERE `ts_col` >= '...' AND `ts_col` < '...'
Works on any MergeTree table regardless of partitioning scheme.
"drop_partition"
Uses ALTER TABLE DROP PARTITION — instant and atomic. Ideal when the table is partitioned by time (e.g. PARTITION BY toYYYYMM(ts)):
-- Finds partition IDs via system.parts, then for each:
ALTER TABLE `database`.`table` DROP PARTITION '202401'
Requires the archived time range to align with partition boundaries.
Output format
Archive files are written as .pfc + .pfc.bidx pairs:
s3://my-bucket/cold-storage/
├── logs__20240101__20240201.pfc ← compressed JSONL (~8-10% of original)
├── logs__20240101__20240201.pfc.bidx ← block index for time-range queries
├── logs__20240201__20240301.pfc
└── logs__20240201__20240301.pfc.bidx
The .pfc.bidx file enables random access — query a time window with DuckDB and only the relevant blocks are decompressed. No full download needed.
INSTALL pfc FROM community;
SELECT * FROM pfc_scan('s3://my-bucket/cold-storage/logs__*.pfc')
WHERE timestamp BETWEEN '2024-01-15' AND '2024-01-16';
Run as a systemd service
[Unit]
Description=PFC ClickHouse Archiver
After=network.target
[Service]
ExecStart=/usr/bin/python3 /opt/pfc-archiver-clickhouse/pfc_archiver_clickhouse.py \
--config /etc/pfc/clickhouse.toml
Restart=on-failure
RestartSec=60
[Install]
WantedBy=multi-user.target
Part of the PFC Ecosystem
→ View all PFC tools & integrations
| Direct integration | Why |
|---|---|
| pfc-export-clickhouse | One-shot CLI export instead of daemon — same ClickHouse connection, no scheduling needed |
| pfc-duckdb | Query the archives this daemon creates — DuckDB community extension, time-range queries without full decompress |
| pfc-gateway | HTTP REST query layer over .pfc archives — no DuckDB required |
| pfc-archiver-questdb | Same archiver pattern for QuestDB |
| pfc-archiver-cratedb | Same archiver pattern for CrateDB |
Disclaimer
pfc-archiver-clickhouse is an independent open-source project and is not affiliated with, endorsed by, or associated with ClickHouse, Inc. or the ClickHouse project.
License
pfc-archiver-clickhouse (this repository) is released under the MIT License — see LICENSE.
The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com