pfc-archiver-cratedb
May 3, 2026 · View on GitHub
A standalone daemon that runs alongside CrateDB, watches for data older than a configurable retention window, compresses it to PFC format, and writes it to local storage or S3.
Runs as a sidecar or cron job — no schema changes, no plugins, no database modifications.
How it works
Every interval_seconds (default: 3600), pfc-archiver-cratedb runs one archive cycle:
SCAN -> EXPORT -> COMPRESS -> UPLOAD -> VERIFY -> (optional DELETE) -> LOG
- SCAN — compute which time partitions are older than
retention_days - EXPORT — stream rows via PostgreSQL wire protocol in
partition_days-sized chunks - COMPRESS — pipe through
pfc_jsonl compress→.pfc+.pfc.bidx+.pfc.idx - UPLOAD — write to
output_dir(local path ors3://bucket/prefix/) - VERIFY — decompress and count rows; must match exported count exactly
- DELETE (optional) —
DELETE WHERE ts >= from AND ts < to(only ifdelete_after_archive = true) - LOG — write a JSON run log to
log_dir
Install
pip install pfc-archiver-cratedb
# With S3 output support
pip install "pfc-archiver-cratedb[s3]"
# Or from source
git clone https://github.com/ImpossibleForge/pfc-archiver-cratedb
pip install psycopg2-binary
The pfc_jsonl binary must be installed:
# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS (Apple Silicon M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
License note:
pfc_jsonlis free for personal and open-source use. Commercial use requires a written license — see pfc-jsonl.
Quick start
# 1. Copy the example config
cp config/cratedb.toml my_config.toml
# 2. Edit the config
nano my_config.toml
# 3. Dry run (no writes, prints what would be archived)
python pfc_archiver.py --config my_config.toml --dry-run
# 4. Archive once and exit
python pfc_archiver.py --config my_config.toml --once
# 5. Run as a daemon (loops every interval_seconds)
python pfc_archiver.py --config my_config.toml
Configuration
[db]
host = "localhost"
port = 5432
user = "crate"
password = ""
dbname = "doc"
schema = "doc"
table = "logs"
ts_column = "ts"
batch_size = 10000
[archive]
retention_days = 30
partition_days = 1
output_dir = "./archives/" # local path or s3://bucket/prefix/
verify = true
delete_after_archive = false
log_dir = "./archive_logs/"
[daemon]
interval_seconds = 3600
See config/cratedb.toml for a fully annotated example.
Output format
Each archive cycle produces:
<table>_<YYYYMMDD>_<YYYYMMDD>.pfc
<table>_<YYYYMMDD>_<YYYYMMDD>.pfc.bidx
<table>_<YYYYMMDD>_<YYYYMMDD>.pfc.idx
Log format
{
"ts": "2026-04-14T18:00:00",
"db": "cratedb://localhost:5432/doc",
"table": "logs",
"from": "2026-03-01T00:00:00",
"to": "2026-03-02T00:00:00",
"rows": 248721,
"jsonl_mb": 42.3,
"pfc_mb": 2.5,
"ratio_pct": 5.9,
"output": "./archives/logs_20260301_20260302.pfc",
"verified": true,
"deleted": false,
"status": "ok"
}
Run as a systemd service
[Unit]
Description=pfc-archiver-cratedb — PFC archive daemon
After=network.target
[Service]
Type=simple
User=pfc
WorkingDirectory=/opt/pfc-archiver-cratedb
ExecStart=/usr/bin/python3 /opt/pfc-archiver-cratedb/pfc_archiver.py --config /etc/pfc-archiver/cratedb.toml
Restart=on-failure
RestartSec=60
[Install]
WantedBy=multi-user.target
Querying cold archives
INSTALL pfc FROM community;
LOAD pfc;
LOAD json;
-- Time-window query (only decompresses the relevant blocks)
SELECT *
FROM read_pfc_jsonl(
'./archives/logs_20260301_20260302.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-03-01 14:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-03-01 15:00:00+00')
);
Part of the PFC Ecosystem
→ View all PFC tools & integrations
| Direct integration | Why |
|---|---|
| pfc-export-cratedb | Same DB, different mode — exporter is one-shot CLI; archiver runs as a continuous daemon |
| pfc-archiver-questdb | Same concept for QuestDB |
Disclaimer
pfc-archiver-cratedb is an independent open-source project and is not affiliated with, endorsed by, or associated with Crate.io GmbH or the CrateDB project.
License
pfc-archiver-cratedb (this repository) is released under the MIT License — see LICENSE.
The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com