pfc-archiver-elasticsearch
May 21, 2026 · View on GitHub
A standalone daemon that runs alongside your Elasticsearch cluster, watches for indices older than a configurable retention window, compresses them to PFC format, and writes them to local storage or S3 — automatically.
No schema changes. No plugins. No Elasticsearch modifications.
Supports self-hosted Elasticsearch and Elastic Cloud (Cloud ID).
How it works
Every interval_seconds (default: 3600), pfc-archiver-elasticsearch runs one archive cycle:
SCAN → EXPORT → COMPRESS → UPLOAD → VERIFY → (optional DELETE) → LOG
- SCAN — list all indices matching
index_pattern, detect their date from the index name, select those older thanretention_days - EXPORT — stream all documents via
search_after + Point-in-Time API→ temp JSONL - COMPRESS — pipe through
pfc_jsonl compress→.pfc+.pfc.bidx - UPLOAD — write archive to
output_dir(local path ors3://bucket/prefix/) - VERIFY — decompress and count rows; must match exported count exactly
- DELETE (optional) — delete the source index from Elasticsearch (only if
delete_after_archive = true) - LOG — write a JSON run log to
log_dir
Date detection
The archiver detects each index's date from its name — no data scan needed. All common Elasticsearch naming conventions are supported:
| Index name | Detected date |
|---|---|
logs-2024.01.15 | 2024-01-15 |
filebeat-8.19.13-2024-01-15-000001 | 2024-01-15 |
events-2024-01-15 | 2024-01-15 |
metrics-20240115 | 2024-01-15 |
logs-2024.01 | 2024-01-01 (first of month) |
events-2024-06 | 2024-06-01 (first of month) |
Indices with no detectable date are skipped automatically. Elasticsearch internal indices (.kibana, .fleet, etc.) and closed indices are always skipped.
Install
pip install pfc-archiver-elasticsearch
Or from source:
git clone https://github.com/ImpossibleForge/pfc-archiver-elasticsearch
cd pfc-archiver-elasticsearch
pip install -r requirements.txt
The pfc_jsonl binary must be installed:
# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS Apple Silicon (M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS Intel (x64):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# Windows (x64) — PowerShell:
Invoke-WebRequest -Uri https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-windows-x64.exe `
-OutFile "$env:LOCALAPPDATA\Microsoft\WindowsApps\pfc_jsonl.exe"
Requires Elasticsearch 7.12+ and elasticsearch-py 7.12–8.x.
Configuration
Copy config/elasticsearch.toml to your working directory and adjust to your setup:
[elasticsearch]
url = "http://localhost:9200"
api_key = "your-api-key-here"
# Elastic Cloud alternative:
# cloud_id = "my-deployment:dXMtZWFzdDQ..."
# api_key = "your-api-key-here"
[archive]
index_pattern = "logs-*"
retention_days = 90
output_dir = "s3://my-bucket/elasticsearch-cold/"
verify = true
delete_after_archive = false # Opt in explicitly when ready
[daemon]
interval_seconds = 3600
See config/elasticsearch.toml for the full reference with all options documented.
Usage
# Start the daemon
python pfc_archiver_elasticsearch.py --config config/elasticsearch.toml
# Dry run — scan and report, no data moved or deleted
python pfc_archiver_elasticsearch.py --config config/elasticsearch.toml --dry-run
# Single cycle then exit (for cron jobs)
python pfc_archiver_elasticsearch.py --config config/elasticsearch.toml --once
Example output
2026-05-21T14:00:00 INFO pfc-archiver-elasticsearch v0.1.0 starting
2026-05-21T14:00:00 INFO ES: http://localhost:9200 pattern: logs-* retention: 90d
2026-05-21T14:00:00 INFO Connected to Elasticsearch 8.17.0
2026-05-21T14:00:01 INFO Found 3 index(es) to archive (cutoff: 2026-02-20)
2026-05-21T14:00:01 INFO ── Index: logs-2024.01.15 (date: 2024-01-15, docs: 1,234,567) ──
2026-05-21T14:00:01 INFO Exporting 'logs-2024.01.15' ...
2026-05-21T14:00:28 INFO Exported 1,234,567 docs (210.3 MiB JSONL) — compressing ...
2026-05-21T14:00:31 INFO ✓ 1,234,567 docs | JSONL 210.3 MiB → PFC 19.1 MiB (9.1%) → logs-2024.01.15.pfc
2026-05-21T14:00:31 INFO Uploading s3://my-bucket/elasticsearch-cold/logs-2024.01.15.pfc ...
2026-05-21T14:00:33 INFO ✓ S3 upload complete
2026-05-21T14:00:33 INFO Verifying logs-2024.01.15.pfc (expected 1,234,567 rows) ...
2026-05-21T14:00:35 INFO ✓ Verified: 1,234,567 rows match
2026-05-21T14:00:35 INFO Cycle complete.
Authentication
| Method | Config keys |
|---|---|
| API key (recommended) | api_key = "KEY" |
| Basic auth | user = "elastic" + password = "changeme" |
| Elastic Cloud | cloud_id = "dep:dXMt..." + api_key = "KEY" |
| Custom TLS | ca_certs = "/path/to/ca.crt" |
| Dev/test | no_verify_certs = true |
Deleting archived indices
delete_after_archive = false by default — pfc-archiver-elasticsearch never modifies your cluster without explicit opt-in.
After confirming your archives are accessible (via DuckDB, pfc-gateway, or pfc_jsonl query), set delete_after_archive = true and restart. Only indices that pass the row-count verify step will be deleted.
How deletion works: Calls DELETE /index_name via the Elasticsearch API. The index is removed from the cluster entirely. Make sure your archives are safely stored and verified before enabling this.
Query the archives
# Time-range query (no Elasticsearch needed)
pfc_jsonl query logs-2024.01.15.pfc --from "2024-01-15T00:00:00" --to "2024-01-16T00:00:00"
# Via DuckDB
duckdb -c "
INSTALL pfc FROM community; LOAD pfc;
SELECT level, count(*) FROM pfc_read('logs-2024.01.15.pfc')
WHERE \"@timestamp\" >= '2024-01-15 08:00:00'
GROUP BY level;
"
Run as a systemd service
[Unit]
Description=pfc-archiver-elasticsearch — PFC archive daemon for Elasticsearch
After=network.target
[Service]
Type=simple
User=pfc
WorkingDirectory=/opt/pfc-archiver-elasticsearch
ExecStart=/usr/bin/python3 /opt/pfc-archiver-elasticsearch/pfc_archiver_elasticsearch.py \
--config /etc/pfc-archiver-elasticsearch/elasticsearch.toml
Restart=on-failure
RestartSec=60
[Install]
WantedBy=multi-user.target
sudo systemctl enable pfc-archiver-elasticsearch
sudo systemctl start pfc-archiver-elasticsearch
sudo journalctl -u pfc-archiver-elasticsearch -f
Run log
Each archived index produces one JSON entry in archive_logs/archive_runs.jsonl:
{
"ts": "2026-05-21T14:00:35+00:00",
"status": "ok",
"index": "logs-2024.01.15",
"index_date": "2024-01-15T00:00:00+00:00",
"rows": 1234567,
"jsonl_mb": 210.3,
"output_mb": 19.1,
"ratio_pct": 9.1,
"deleted": false
}
Running tests
# Unit tests (no Elasticsearch needed)
pip install pytest "elasticsearch>=7.12.0,<9.0"
python -m pytest tests/test_archiver.py -v
# Integration tests (requires Docker)
docker run -d --name es-test \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-p 9200:9200 \
docker.elastic.co/elasticsearch/elasticsearch:8.17.0
# Wait ~30s, then:
python -m pytest tests/test_integration_elasticsearch.py -v
Part of the PFC Ecosystem
→ View all PFC tools & integrations
| Direct integration | Why |
|---|---|
| pfc-export-elasticsearch | Same DB, one-shot mode — export a specific index or time range on demand |
| pfc-archiver-timescaledb | Same concept for TimescaleDB |
| pfc-archiver-influxdb | Same concept for InfluxDB |
| pfc-gateway | Query exported archives via HTTP REST |
| pfc-duckdb | Query .pfc files directly from DuckDB |
Disclaimer
pfc-archiver-elasticsearch is an independent open-source project and is not affiliated with, endorsed by, or associated with Elasticsearch B.V. or the Elastic project. Elasticsearch and Elastic Cloud are trademarks of Elasticsearch B.V.
License
pfc-archiver-elasticsearch (this repository) is released under the MIT License — see LICENSE.
The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com