pfc-gateway

May 3, 2026 · View on GitHub

License: MIT Python PFC-JSONL Version

Bidirectional HTTP gateway for PFC cold archives — no DuckDB required.

pfc-gateway makes PFC archives on S3 (or local storage) queryable by any tool — Grafana, Python, curl, PowerBI — through a simple HTTP API. It also receives NDJSON from Fluent Bit, Vector, Telegraf, or any HTTP client and compresses it to .pfc archives automatically.

Part of the PFC Ecosystem.


What it does

[Fluent Bit / Vector / Telegraf / curl]

          ▼  POST /ingest — push NDJSON rows
     pfc-gateway  (this server)  ←─────────── also receives data

          ├── .pfc_buffer.jsonl  (live buffer)
          └── ingest_<ts>.pfc    (auto-rotated on size or time)

[Grafana / Python / curl / PowerBI / your own tools]

          ▼  POST /query — HTTP REST, no client library needed
     pfc-gateway  (this server)  ────────────► serves data

          ▼  pfc_jsonl s3-fetch — HTTP Range requests
     .pfc archives on S3 / local

          ▼  only ~4% of the archive is read per query
     NDJSON stream back to client

One server. Ingest from any tool. Query from any tool. No DuckDB, no custom plugins.


Install

# 1. Install pfc_jsonl binary (required)
# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# macOS (Apple Silicon / M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# 2. Install pfc-gateway
git clone https://github.com/ImpossibleForge/pfc-gateway
cd pfc-gateway
pip install fastapi uvicorn boto3 python-dateutil

# 3. Start the server
PFC_API_KEY=your-secret-key uvicorn pfc_gateway:app --host 0.0.0.0 --port 8765

License note: This tool requires the pfc_jsonl binary. pfc_jsonl is free for personal and open-source use — commercial use requires a separate license. See pfc-jsonl for details.

AWS credentials are read from the standard locations (~/.aws/credentials, environment variables, IAM role). No extra config needed.


Ingest — receive data from any HTTP source

Enable ingest by setting PFC_INGEST_DIR. The gateway appends rows to a buffer file and rotates it to a compressed .pfc file when a size or time threshold is reached.

# Start gateway with ingest enabled
PFC_API_KEY=secret PFC_INGEST_DIR=/data/pfc \
  uvicorn pfc_gateway:app --host 0.0.0.0 --port 8765

Send rows with curl

# JSON array
curl -s -X POST http://localhost:8765/ingest \
  -H "X-API-Key: secret" \
  -H "Content-Type: application/json" \
  -d '[{"ts":"2026-04-21T10:00:00Z","level":"INFO","msg":"server started"}]'

# NDJSON (Fluent Bit json_stream / Vector ndjson)
printf '{"ts":"2026-04-21T10:00:01Z","level":"WARN","msg":"high cpu"}\n' | \
  curl -s -X POST http://localhost:8765/ingest \
  -H "X-API-Key: secret" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @-

Fluent Bit HTTP output

[OUTPUT]
    Name              http
    Match             *
    Host              your-server
    Port              8765
    URI               /ingest
    Format            json          # sends JSON array — pfc-gateway auto-detects
    Header            X-API-Key secret

Vector HTTP sink

[sinks.pfc_gateway]
type     = "http"
inputs   = ["your_source"]
uri      = "http://your-server:8765/ingest"
encoding.codec = "ndjson"

[sinks.pfc_gateway.request.headers]
X-API-Key = "secret"

Force-flush the buffer

curl -s -X POST http://localhost:8765/ingest/flush \
  -H "X-API-Key: secret"
# → {"flushed": true, "rows": 4821, "file": "/data/pfc/ingest_20260421T103045.pfc"}

Check buffer status

curl -s http://localhost:8765/ingest/status -H "X-API-Key: secret"
# → {"enabled": true, "buffer_rows": 142, "buffer_mb": 0.021,
#    "last_flush_age_sec": 312.4, "rotate_mb": 64, "rotate_sec": 3600, ...}

Query a PFC archive

curl (S3 file)

curl -s \
  -H "X-API-Key: your-secret-key" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8765/query \
  -d '{
    "file":    "s3://my-archive/pfc/logs_2026-03.pfc",
    "from_ts": "2026-03-05T10:00:00Z",
    "to_ts":   "2026-03-05T12:00:00Z",
    "filter":  {"level": "ERROR"}
  }'

Response: NDJSON stream

{"ts":"2026-03-05T10:14:32Z","level":"ERROR","message":"connection refused","host":"web-03"}
{"ts":"2026-03-05T11:02:19Z","level":"ERROR","message":"disk full","host":"db-01"}

curl (local file)

curl -s \
  -H "X-API-Key: your-secret-key" \
  -X POST http://localhost:8765/query \
  -H "Content-Type: application/json" \
  -d '{"file":"/data/archive/logs_march.pfc","from_ts":"2026-03-01","to_ts":"2026-04-01"}'

Python

import requests, json

resp = requests.post(
    "http://localhost:8765/query",
    headers={"X-API-Key": "your-secret-key"},
    json={
        "file":    "s3://my-archive/pfc/logs_2026-03.pfc",
        "from_ts": "2026-03-05T10:00Z",
        "to_ts":   "2026-03-05T12:00Z",
    },
    stream=True,
)
for line in resp.iter_lines():
    row = json.loads(line)
    print(row["ts"], row.get("level"), row.get("message"))

Query multiple files (multi-month)

curl -s \
  -H "X-API-Key: your-secret-key" \
  -X POST http://localhost:8765/query/batch \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      "s3://my-archive/pfc/logs_2026-01.pfc",
      "s3://my-archive/pfc/logs_2026-02.pfc",
      "s3://my-archive/pfc/logs_2026-03.pfc"
    ],
    "from_ts": "2026-01-15T00:00Z",
    "to_ts":   "2026-03-15T00:00Z"
  }'

Files are queried in order. Results stream back as a single combined NDJSON response.


SQL Query via DuckDB (optional)

If DuckDB with the pfc extension is installed on the gateway server, you can run full SQL queries against .pfc archives:

curl -X POST http://localhost:8765/query/sql \
  -H "x-api-key: secret" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT json_extract_string(line, '"'"'$.level'"'"') AS level, COUNT(*) AS cnt FROM pfc_scan('"'"'/var/lib/pfc/logs.pfc'"'"') GROUP BY level ORDER BY cnt DESC"
  }'

Supports any DuckDB SQL — GROUP BY, AVG, JOIN across multiple files, window functions:

-- Avg latency per service, last hour
SELECT json_extract_string(line, '$.service') AS service,
       ROUND(AVG(json_extract(line, '$.latency_ms')::FLOAT), 1) AS avg_ms
FROM pfc_scan('/var/lib/pfc/logs.pfc')
GROUP BY service ORDER BY avg_ms DESC;

Check if SQL mode is available on your gateway instance:

curl http://localhost:8765/ -H "x-api-key: secret"
# {"status":"ok","version":"0.3.0","binary":"...","sql_mode":true}

sql_mode: false means DuckDB is not installed — standard /query still works normally.

Setup:

# Install DuckDB
curl -L https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.gz \
  | gunzip > /usr/local/bin/duckdb && chmod +x /usr/local/bin/duckdb
# Install pfc extension
duckdb -c "INSTALL pfc FROM community;"

Grafana Integration

pfc-gateway implements the Grafana SimpleJSON data source protocol.

Setup (takes 2 minutes):

  1. In Grafana → Settings → Data Sources → Add data source
  2. Search for SimpleJSON (install plugin if needed)
  3. URL: http://your-server:8765/grafana
  4. Custom HTTP Header: X-API-Key = your secret key
  5. Save & Test → should show "Data source is working"

In a dashboard panel:

  • Target: s3://my-archive/pfc/logs_2026-03.pfc
  • Optional filter: s3://my-archive/logs.pfc|{"level":"ERROR"}

Grafana's time range picker controls from_ts and to_ts automatically.


Live DB + cold archives in one dashboard

Without pfc-gateway: Only the last 30 days (hot live data) visible in Grafana.

With pfc-gateway:

  • Panel 1: Live DB data source (last 30 days)
  • Panel 2: pfc-gateway data source (months/years of cold PFC archives)

Both panels in the same Grafana dashboard. No re-import. No DuckDB.


API Reference

Query endpoints

POST /query

FieldTypeDescription
filestringLocal path or s3://bucket/key.pfc
from_tsstringISO 8601 start time (inclusive)
to_tsstringISO 8601 end time (exclusive)
filterobjectOptional equality filter {"level": "ERROR"}
aws_profilestringOptional AWS profile name

Returns: application/x-ndjson stream.

POST /query/batch

Same as /query but with files: [...] array instead of single file.

GET /

Health check. Returns {"status": "ok", "version": "0.3.0"}.

Ingest endpoints

POST /ingest

Accepts rows in three formats (auto-detected):

  • JSON array: [{...}, {...}]
  • Object with rows key: {"rows": [{...}, ...]}
  • Raw NDJSON: {...}\n{...}\n

Returns: {"accepted": N}

Requires PFC_INGEST_DIR to be set (returns 503 otherwise).

POST /ingest/flush

Force-compresses the current buffer to a .pfc file immediately.

Returns: {"flushed": true, "rows": N, "file": "/path/to/ingest_<ts>.pfc"} or {"flushed": false, "reason": "empty"}.

GET /ingest/status

Returns buffer statistics: row count, byte size, age since last flush, last output file, rotation thresholds.


Environment Variables

Query / Auth

VariableDefaultDescription
PFC_API_KEY(none — auth off)API key for X-API-Key header
PFC_JSONL_BINARY/usr/local/bin/pfc_jsonlPath to pfc_jsonl binary
PFC_HOST0.0.0.0Bind address
PFC_PORT8765Port
PFC_PRESIGN_TTL3600Pre-signed URL TTL in seconds
AWS_DEFAULT_REGIONeu-central-1AWS region for S3

Ingest

VariableDefaultDescription
PFC_INGEST_DIR(none — ingest off)Directory for buffer + output .pfc files
PFC_INGEST_ROTATE_MB64Rotate when buffer reaches this size (MB)
PFC_INGEST_ROTATE_SEC3600Rotate when buffer is older than this (seconds)
PFC_INGEST_PREFIXingestOutput filename prefix: ingest_<ts>.pfc

Standard AWS variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_PROFILE) are respected automatically.


Run as systemd service

# /etc/systemd/system/pfc-gateway.service
[Unit]
Description=pfc-gateway — PFC cold archive bidirectional gateway
After=network.target

[Service]
Type=simple
User=pfc
WorkingDirectory=/opt/pfc-gateway
ExecStart=/usr/bin/uvicorn pfc_gateway:app --host 0.0.0.0 --port 8765
Restart=on-failure
Environment=PFC_API_KEY=your-secret-key
Environment=AWS_DEFAULT_REGION=eu-central-1
Environment=PFC_INGEST_DIR=/data/pfc          # omit to disable ingest
Environment=PFC_INGEST_ROTATE_MB=64
Environment=PFC_INGEST_ROTATE_SEC=3600

[Install]
WantedBy=multi-user.target
sudo systemctl enable --now pfc-gateway

Run as Docker container

docker run -d \
  -p 8765:8765 \
  -e PFC_API_KEY=your-secret-key \
  -e AWS_ACCESS_KEY_ID=... \
  -e AWS_SECRET_ACCESS_KEY=... \
  --name pfc-gateway \
  impossibleforge/pfc-gateway:latest

Architecture in the full PFC ecosystem

Your data sources

    ├── pfc-migrate     (one-shot export)
    ├── pfc-archiver-*  (autonomous daemon)
    ├── pfc-fluentbit   (live pipeline)
    └── pfc-gateway     (POST /ingest ← NEW)  ← this repo


    .pfc archives (local / S3 / Azure / GCS)

    ┌─────────┴──────────┐
    │                    │
    ▼                    ▼
pfc-duckdb          pfc-gateway  ← this repo
SQL queries         HTTP REST
(DuckDB needed)     (no DuckDB)
    │                    │
    ▼                    ▼
Python / CLI        Grafana / PowerBI / curl / own tools
                    Fluent Bit / Vector / Telegraf (ingest)
ToolWhatDuckDB needed
pfc-migrateOne-shot export to .pfcNo
pfc-archiver-*Autonomous archive daemonNo
pfc-fluentbitLive pipeline → .pfcNo
pfc-duckdbSQL queries on PFC filesYes
pfc-gatewayHTTP REST — any toolNo

Part of the PFC Ecosystem

→ View all PFC tools & integrations

Direct integrationWhy
pfc-duckdbSQL alternative — query .pfc archives directly via DuckDB instead of HTTP
pfc-grafanaGrafana data source plugin that queries pfc-gateway
pfc-fluentbitSend logs into pfc-gateway via Fluent Bit HTTP output
pfc-vectorHigh-performance Rust alternative for ingest

ImpossibleForge — github.com/ImpossibleForge Contact: info@impossibleforge.com


License

pfc-gateway (this repository) is released under the MIT License — see LICENSE.

The PFC-JSONL binary () is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com