pfc-archiver-timescaledb
May 20, 2026 · View on GitHub
A standalone daemon that runs alongside TimescaleDB, watches for data older than a configurable retention window, compresses it to PFC format, and writes it to local storage or S3 — automatically.
Runs as a sidecar or cron job — no schema changes, no plugins, no TimescaleDB modifications.
How it works
Every interval_seconds (default: 3600), pfc-archiver-timescaledb runs one archive cycle:
SCAN -> EXPORT -> COMPRESS -> UPLOAD -> VERIFY -> (optional DROP_CHUNKS) -> LOG
- SCAN — compute which time partitions in the hypertable are older than
retention_days - EXPORT — read rows in
partition_days-sized chunks via PostgreSQL wire protocol (port 5432) - COMPRESS — pipe through
pfc_jsonl compress→.pfc+.pfc.bidx+.pfc.idx - UPLOAD — write to
output_dir(local path ors3://bucket/prefix/) - VERIFY — decompress and count rows; must match exported count exactly
- DELETE (optional) — call
drop_chunks()on the archived range (only ifdelete_after_archive = true) - LOG — write a JSON run log to
log_dir
Supported databases
| Database | Protocol | Default port |
|---|---|---|
| TimescaleDB 2.x | PostgreSQL wire (psycopg2) | 5432 |
Install
pip install pfc-archiver-timescaledb
# Or from source
git clone https://github.com/ImpossibleForge/pfc-archiver-timescaledb
cd pfc-archiver-timescaledb
pip install -r requirements.txt
The pfc_jsonl binary must be installed:
# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS Apple Silicon (M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS Intel:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# Windows (PowerShell):
Invoke-WebRequest https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-windows-x64.exe `
-OutFile "$env:LOCALAPPDATA\Microsoft\WindowsApps\pfc_jsonl.exe"
Python dependency:
pip install psycopg2-binary
Quick start
# 1. Copy the example config
cp config/timescaledb.toml my_config.toml
# 2. Edit the config
nano my_config.toml
# 3. Dry run (no writes, prints what would be archived)
python pfc_archiver_timescaledb.py --config my_config.toml --dry-run
# 4. Archive once and exit
python pfc_archiver_timescaledb.py --config my_config.toml --once
# 5. Run as a daemon (loops every interval_seconds)
python pfc_archiver_timescaledb.py --config my_config.toml
Configuration
All config is TOML. A complete example is in config/timescaledb.toml.
[db]
db_type = "timescaledb"
host = "localhost"
port = 5432 # Standard PostgreSQL port
user = "postgres"
password = "yourpassword"
dbname = "postgres"
schema = "public" # PostgreSQL schema (default: public)
table = "sensor_data" # hypertable to archive
ts_column = "time" # designated timestamp column (timestamptz)
[archive]
retention_days = 90 # archive data older than this many days
partition_days = 7 # export this many days per archive file
output_dir = "./archives/" # local path or s3://bucket/prefix/
verify = true # decompress + count rows after each archive
delete_after_archive = false # drop_chunks() after successful verify
log_dir = "./archive_logs/"
[daemon]
interval_seconds = 3600 # how often to run (in daemon mode)
partition_days and chunk alignment
TimescaleDB's drop_chunks() operates at the chunk level. For chunks to be dropped cleanly, partition_days should match your hypertable's chunk_time_interval (default: 7 days).
Check your hypertable's chunk interval:
SELECT * FROM timescaledb_information.dimensions;
If partition_days does not align with a chunk boundary, drop_chunks() will simply drop nothing for that partition — the data stays in the DB but the archive is still written and verified. This is safe: you can adjust partition_days and re-run.
Output format
Each archive cycle produces files named:
<schema>__<table>__<YYYYMMDD>__<YYYYMMDD>.pfc
<schema>__<table>__<YYYYMMDD>__<YYYYMMDD>.pfc.bidx
<schema>__<table>__<YYYYMMDD>__<YYYYMMDD>.pfc.idx
The .pfc file is a PFC-JSONL archive. The .bidx and .idx files are block indexes that let DuckDB decompress only the relevant time window — without reading the whole file.
Log format
Each completed cycle appends a JSON entry to <log_dir>/archive_runs.jsonl:
{
"ts": "2026-05-11T18:00:00+00:00",
"table": "sensor_data",
"from_ts": "2026-01-01T00:00:00+00:00",
"to_ts": "2026-01-08T00:00:00+00:00",
"rows": 312500,
"jsonl_mb": 55.2,
"output_mb": 4.9,
"ratio_pct": 8.9,
"deleted": false,
"status": "ok"
}
Deleting archived data
delete_after_archive = false by default — pfc-archiver-timescaledb never modifies your TimescaleDB without explicit opt-in.
After confirming your archives are accessible via DuckDB, set delete_after_archive = true and restart. Only partitions that pass the row-count verify step will have their chunks dropped.
How deletion works: Archived chunks are removed using TimescaleDB's native drop_chunks() function — far more efficient than row-level DELETE. It atomically removes entire chunk files.
Requires TimescaleDB 2.x.
Run as a systemd service
[Unit]
Description=pfc-archiver-timescaledb — PFC archive daemon for TimescaleDB
After=network.target
[Service]
Type=simple
User=pfc
WorkingDirectory=/opt/pfc-archiver-timescaledb
ExecStart=/usr/bin/python3 /opt/pfc-archiver-timescaledb/pfc_archiver_timescaledb.py \
--config /etc/pfc-archiver-timescaledb/timescaledb.toml
Restart=on-failure
RestartSec=60
[Install]
WantedBy=multi-user.target
sudo systemctl enable pfc-archiver-timescaledb
sudo systemctl start pfc-archiver-timescaledb
sudo journalctl -u pfc-archiver-timescaledb -f
Run as a Docker sidecar
# docker-compose.yml
services:
timescaledb:
image: timescale/timescaledb:latest-pg16
environment:
POSTGRES_PASSWORD: yourpassword
POSTGRES_DB: mydb
ports:
- "5432:5432"
pfc-archiver-timescaledb:
image: ghcr.io/impossibleforge/pfc-archiver-timescaledb:latest
volumes:
- ./config/timescaledb.toml:/etc/pfc-archiver/config.toml
- ./archives:/archives
- ./archive_logs:/logs
depends_on: [timescaledb]
Querying cold archives
Once archived, your .pfc files are queryable directly from DuckDB:
INSTALL pfc FROM community;
LOAD pfc;
LOAD json;
-- Scan a single archive
SELECT *
FROM read_pfc_jsonl('./archives/public__sensor_data__20260101__20260108.pfc')
LIMIT 100;
-- Time-window query (only decompresses the relevant blocks)
SELECT *
FROM read_pfc_jsonl(
'./archives/public__sensor_data__20260101__20260108.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-01-03 14:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-01-03 15:00:00+00')
);
Part of the PFC Ecosystem
→ View all PFC tools & integrations
| Direct integration | Why |
|---|---|
| pfc-export-timescaledb | Same DB, different mode — exporter is one-shot CLI; archiver runs as a continuous daemon |
| pfc-archiver-questdb | Same concept for QuestDB |
| pfc-archiver-cratedb | Same concept for CrateDB |
Disclaimer
pfc-archiver-timescaledb is an independent open-source project and is not affiliated with, endorsed by, or associated with Timescale, Inc. or the TimescaleDB project. TimescaleDB is a trademark of Timescale, Inc.
License
pfc-archiver-timescaledb (this repository) is released under the MIT License — see LICENSE.
The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com