PGSync

May 2, 2026 · View on GitHub

PGSync

Real-time PostgreSQL to Elasticsearch/OpenSearch sync

Keep your relational database as the source of truth while powering lightning-fast search

PyPI version Python versions Build status codecov Downloads License

Docker Code style: black

Website · Documentation · Examples · Report Bug


What is PGSync?

PGSync is a change data capture tool that syncs data from PostgreSQL, MySQL, or MariaDB to Elasticsearch or OpenSearch in real-time. Define your document structure in JSON, and PGSync handles the rest — no custom code required.

%%{init: {'look': 'handDrawn', 'theme': 'neutral'}}%%
flowchart LR
    subgraph Source["🗄️ Source Database"]
        DB[(PostgreSQL<br/>MySQL<br/>MariaDB)]
    end

    subgraph CDC["⚡ Change Data Capture"]
        P[PGSync]
    end

    subgraph Search["🔍 Search Engine"]
        ES[(Elasticsearch<br/>OpenSearch)]
    end

    DB -->|WAL / Binlog| P
    P -->|Bulk Index| ES

Key Features

FeatureDescription
Real-time syncChanges propagate instantly via logical replication
Zero codeDefine mappings in JSON — no ETL pipelines to build
Nested documentsAutomatically denormalize complex relationships
Fault tolerantResumes from checkpoints after crashes
Transactionally consistentDocuments appear in commit order
Minimal overheadLightweight CDC with negligible database impact

Quick Start

Using Docker (Fastest)

docker run --rm -it \
  -e PG_URL=postgres://user:pass@host/db \
  -e ELASTICSEARCH_URL=http://localhost:9200 \
  -e REDIS_HOST=localhost \
  -v "$(pwd)/schema.json:/app/schema.json" \
  toluaina1/pgsync:latest -c schema.json -d -b

Using pip

pip install pgsync
# Bootstrap (one-time setup)
bootstrap --config schema.json

# Run sync
pgsync --config schema.json -d

Using Docker Compose

Default (Elasticsearch + Kibana):

git clone https://github.com/toluaina/pgsync
cd pgsync
docker-compose up

This starts PostgreSQL, Redis, Elasticsearch, Kibana, and PGSync configured for Elasticsearch.

For OpenSearch:

docker-compose --profile opensearch up

This starts PostgreSQL, Redis, OpenSearch, and PGSync configured for OpenSearch.

Ports:

  • PostgreSQL: 15432
  • Elasticsearch: 9201 (default)
  • Kibana: 5601 (default)
  • OpenSearch: 9400 (OpenSearch profile)

How It Works

1. Define your schema — Map tables to document structure:

{
  "table": "book",
  "columns": ["isbn", "title", "description"],
  "children": [{
    "table": "author",
    "columns": ["name"]
  }]
}

2. PGSync generates optimized queries — Complex JOINs handled automatically:

SELECT JSON_BUILD_OBJECT(
  'isbn', book.isbn,
  'title', book.title,
  'authors', (SELECT JSON_AGG(author.name) FROM author ...)
) FROM book

3. Get denormalized documents — Ready for search:

{
  "isbn": "9785811243570",
  "title": "Charlie and the Chocolate Factory",
  "authors": ["Roald Dahl"]
}

Changes to any related table automatically update the document in Elasticsearch/OpenSearch.


Requirements

ComponentVersion
Python3.10+
PostgreSQL9.6+ (or MySQL 5.7.22+ / MariaDB 10.5+)
Elasticsearch6.3.1+ (or OpenSearch 1.3.7+)
Redis3.1+ (or Valkey 7.2+) — optional in WAL mode

Database Setup

PostgreSQL

Enable logical decoding in postgresql.conf:

wal_level = logical
max_replication_slots = 1

Optionally limit WAL size:

max_slot_wal_keep_size = 100GB
MySQL / MariaDB

Enable binary logging in my.cnf:

server-id = 1
log_bin = mysql-bin
binlog_row_image = FULL
binlog_expire_logs_seconds = 604800

Create replication user:

CREATE USER 'replicator'@'%' IDENTIFIED WITH mysql_native_password BY 'password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;

Example

Consider a book library with related authors:

Book
isbn (PK)titledescription
9785811243570Charlie and the Chocolate FactoryWilly Wonka's famous...
97814713314351984George Orwell's chilling...
Author
id (PK)name
1Roald Dahl
4George Orwell

PGSync transforms this into search-ready documents:

[
  {
    "isbn": "9785811243570",
    "title": "Charlie and the Chocolate Factory",
    "authors": ["Roald Dahl"]
  },
  {
    "isbn": "9781471331435",
    "title": "1984",
    "authors": ["George Orwell"]
  }
]

Any change — updating an author's name, adding a new book, deleting a relationship — is automatically synced.


Transforms

PGSync supports built-in transforms to modify field values before indexing. Transforms are applied in order: replacerenameconcat.

Replace

Find and replace substrings within field values:

{
  "table": "product",
  "columns": ["code", "name"],
  "transform": {
    "replace": {
      "code": {
        "-": "/",
        "_": " "
      }
    }
  }
}
BeforeAfter
ABC-DEF_GHIABC/DEF GHI

Rename

Rename fields in the output document:

{
  "table": "book",
  "columns": ["id", "title"],
  "transform": {
    "rename": {
      "id": "book_id",
      "title": "book_title"
    }
  }
}

Concat

Combine multiple fields into a new field:

{
  "table": "user",
  "columns": ["first_name", "last_name"],
  "transform": {
    "concat": {
      "columns": ["first_name", "last_name"],
      "destination": "full_name",
      "delimiter": " "
    }
  }
}

Combined Example

Transforms can be combined and applied to nested children:

{
  "table": "book",
  "columns": ["isbn", "title"],
  "children": [{
    "table": "publisher",
    "columns": ["code", "name"],
    "transform": {
      "replace": { "code": { "-": "." } },
      "rename": { "name": "publisher_name" }
    }
  }],
  "transform": {
    "concat": {
      "columns": ["isbn", "title"],
      "destination": "search_text",
      "delimiter": " - "
    }
  }
}

Why PGSync?

ChallengePGSync Solution
Dual writes are error-proneCaptures changes from WAL — single source of truth
Complex JOIN queriesAuto-generates optimized SQL from your schema
Nested document updatesDetects changes in any related table
Data consistencyTransactionally consistent, ordered delivery
Crash recoveryCheckpoint-based resumption

Environment Variables

Full list at pgsync.com/env-vars

VariableDescription
PG_URLPostgreSQL connection string
ELASTICSEARCH_URLElasticsearch/OpenSearch URL
REDIS_HOSTRedis/Valkey host
REDIS_CHECKPOINTUse Redis for checkpoints (recommended for production)

One-Click Deploy

Deploy to DigitalOcean


Sponsors

DigitalOcean

Contributing

Contributions welcome! See CONTRIBUTING.rst for guidelines.

License

MIT — use it freely in your projects.