cdc stream replication peerdb fivetan airbyte pgstream - ghdrako/doc_snipets GitHub Wiki

Modern cloud data warehouses, such as BigQuery, can handle and transform vast amounts of data using just SQL. This capability has led many to switch from the traditional ETL to extract, load, transform (ELT), where data is first extracted, then loaded, and finally transformed within the warehouse itself. This change means the data warehouse, which is optimized for such tasks, executes the transformation workload instead of specialized ETL tools.

This tools focus on connecting to various data sources to extract data without meddling with its structure. The transformation responsibility is handed over to the data warehouse. Further assisting in this process, tools such as dbt, Dataform, and SQLMesh offer frameworks to help organize and execute data transformations. However, the heavy lifting – the actual data processing – is done by the data warehouse itself.

pgcapture - A scalable Netflix DBLog implementation for PostgreSQL

cloudquery

pgstream - PostgreSQL replication with DDL changes

Replicating Postgres data and schema changes to an Elasticsearch compatible store, with special handling of field IDs to minimise re-indexing caused by column renames.

PeerDB

Airbyte

Fivetran

PgSync Replicating to ElasticSearch

Pg-capture

https://pg-capture.onrender.com/

Kuvasz

redpanda

sequin

Pg_flo

Key Features

  • Real-time Data Streaming - Capture inserts, updates, deletes, and DDL changes in near real-time
  • Fast Initial Loads - Parallel copy of existing data with automatic follow-up continuous replication
  • Powerful Transformations - Filter and transform data on-the-fly (see rules)
  • Flexible Routing - Route to different tables and remap columns (see routing)
  • Production Ready - Supports resumable streaming, DDL tracking, and more