pfc-ingest-watchdog

May 20, 2026 · View on GitHub

License: MIT Python PFC-JSONL Version

Automatic file watcher for the PFC Ecosystem. Monitors local folders or S3 prefixes for new files and converts them to .pfc automatically — no manual invocation needed.

Part of the PFC Ecosystem.


What it does

New file arrives in folder or S3

pfc-ingest-watchdog detects it

Calls pfc-convert or pfc-migrate (your choice)

.pfc archive ready for DuckDB / pfc-gateway queries

The watchdog is tool-agnostic: it monitors and triggers. The actual conversion logic lives in the configured tool.

ConverterWhen to use
pfc-convertApache CLF, nginx, CSV, NDJSON → JSONL → .pfc (schema changes)
pfc-migrategzip/zstd/bzip2/lz4 → .pfc (compression swap, content unchanged)

Installation

pip install pfc-ingest-watchdog

Requires either pfc-convert or pfc-migrate depending on your use case.

Both converters require the pfc_jsonl binary on the machine running the watchdog:

# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# macOS Apple Silicon (M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# macOS Intel:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# Windows (PowerShell):
Invoke-WebRequest https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-windows-x64.exe `
  -OutFile "$env:LOCALAPPDATA\Microsoft\WindowsApps\pfc_jsonl.exe"

License note: pfc_jsonl is free for personal and open-source use. Commercial use requires a written license — see pfc-jsonl.


Quick Start

1. Create a config file:

# watchdog.toml
[watcher]
mode           = "local"
converter      = "pfc-convert"
poll_interval  = 30
state_file     = "/var/run/pfc-watchdog/watchdog.state"

[source]
path           = "/var/log/apache2/"

[output]
path           = "/archive/pfc/"

[converter_options]
schema         = "apache"
on_error       = "skip"

2. Run:

# Watch continuously (30s interval)
pfc-ingest-watchdog --config watchdog.toml

# Scan once and exit
pfc-ingest-watchdog --config watchdog.toml --once

# Show what would be converted, do nothing
pfc-ingest-watchdog --config watchdog.toml --dry-run --verbose

Config Reference

[watcher]

KeyDefaultDescription
mode"local""local" or "s3"
converter"pfc-convert""pfc-convert" or "pfc-migrate"
poll_interval30Seconds between scans
state_file"watchdog.state"Tracks processed files (JSON)
audit_logPath to JSONL audit log (optional)
verbosefalseVerbose output

[source]

KeyDefaultDescription
pathLocal directory to watch (mode = "local")
recursivefalseRecurse into subdirectories

[output]

KeyDefaultDescription
pathOutput directory for .pfc files (mode = "local")

[converter_options]

These are passed directly to the configured converter:

For pfc-convert:

KeyDefaultDescription
schema"auto"auto | apache | nginx | csv | ndjson
on_error"skip"skip | fail | log
output_format"pfc"pfc or jsonl
timestamp_fieldCSV timestamp column name (auto-detected if empty)

For pfc-migrate:

KeyDefaultDescription
formatForce format: gz | zst | bz2 | lz4 (auto-detected if empty)

[s3] (for mode = "s3")

KeyRequiredDescription
source_bucketS3 bucket to watch
source_prefixKey prefix to scan (default: all)
dest_bucketDestination bucket (default: same as source)
dest_prefixDestination prefix
regionAWS region
endpoint_urlCustom endpoint (MinIO, etc.)
access_keyAWS access key (default: env/IAM role)
secret_keyAWS secret key

Examples

Apache logs → PFC (local)

[watcher]
mode      = "local"
converter = "pfc-convert"

[source]
path = "/var/log/apache2/"

[output]
path = "/archive/pfc/"

[converter_options]
schema   = "apache"
on_error = "skip"

JSONL archives on S3 → PFC

[watcher]
mode      = "s3"
converter = "pfc-migrate"

[s3]
source_bucket = "my-logs"
source_prefix = "jsonl/incoming/"
dest_bucket   = "my-pfc-archive"
dest_prefix   = "pfc/"
region        = "eu-central-1"

CSV data → PFC (with audit log)

[watcher]
mode      = "local"
converter = "pfc-convert"
audit_log = "/var/log/pfc-watchdog/audit.jsonl"

[source]
path      = "/data/csv-exports/"
recursive = true

[output]
path      = "/archive/pfc/"

[converter_options]
schema          = "csv"
timestamp_field = "event_time"
on_error        = "log"

State File

The watchdog tracks which files have been processed in a JSON state file:

{
  "processed": [
    "/var/log/apache2/access.log.1.gz",
    "/var/log/apache2/access.log.2.gz"
  ],
  "updated_at": "2026-04-29T14:30:00+00:00"
}

Files in the state are never re-processed. Delete the state file to re-process everything.


Audit Log

Every converted file is recorded as a JSONL entry:

{"logged_at": "2026-04-29T14:30:01+00:00", "input": "/var/log/apache2/access.log.gz", "output": "/archive/pfc/access.pfc", "converter": "pfc-convert", "rows": 84231, "duration_s": 1.2}

Python API

For programmatic integration:

from pfc_ingest_watchdog import Watchdog

wdog = Watchdog(
    mode       = "local",
    converter  = "pfc-convert",
    source     = "/var/log/apache2/",
    output     = "/archive/pfc/",
    options    = {"schema": "apache", "on_error": "skip"},
    state_file = "/var/run/pfc-watchdog/watchdog.state",
    audit_log  = "/var/log/pfc-watchdog/audit.jsonl",
)

# Single scan
converted, failed = wdog.scan_once()

# Continuous loop
wdog.run_loop(poll_interval=30)

# Dry run
converted, failed = wdog.scan_once(dry_run=True)

Ecosystem

Incoming data                  watchdog                    PFC archive
─────────────────────────────────────────────────────────────────────────
/var/log/apache/*.log  ──────► pfc-convert ──────────────► /archive/*.pfc
s3://bucket/incoming/  ──────► pfc-migrate ──────────────► s3://bucket/pfc/
/data/exports/*.csv    ──────► pfc-convert ──────────────► /archive/*.pfc

After conversion, query with:


Part of the PFC Ecosystem

→ View all PFC tools & integrations

Direct integrationWhy
pfc-convertSchema conversion — converts Apache CLF, CSV, NDJSON → JSONL → .pfc
pfc-migrateCompression migration — swaps gzip/zstd/lz4 → .pfc, content unchanged

Pipe mode (without watchdog)

pfc-convert and pfc-migrate can also be combined directly via pipe — no watchdog needed for one-shot conversions:

# Apache logs → JSONL → .pfc in one streaming step
pfc-convert convert access.log.gz --schema apache --stdout \
  | pfc-migrate convert --stdin --out archive.pfc

See pfc-convert and pfc-migrate for full documentation.


License

pfc-ingest-watchdog (this repository) is released under the MIT License — see LICENSE.

The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com