Postgresql logical replication native - ghdrako/doc_snipets GitHub Wiki
- https://www.postgresql.fastware.com/blog/inside-logical-replication-in-postgresql
- https://www.postgresql.fastware.com/blog/logical-replication-tablesync-workers
- https://www.dolthub.com/blog/2024-03-08-postgres-logical-replication/
- https://github.com/dolthub/doltgresql/
- https://blog.peerdb.io/exploring-versions-of-the-postgres-logical-replication-protocol
- https://github.com/PeerDB-io/peerdb
- https://medium.com/event-driven-utopia/8-practical-use-cases-of-change-data-capture-8f059da4c3b7
- https://www.postgresql.fastware.com/blog/inside-logical-replication-in-postgresql
- https://www.postgresql.org/docs/current/logical-replication.html
- https://www.depesz.com/2017/02/07/waiting-for-postgresql-10-logical-replication/
- https://www.depesz.com/2017/04/03/waiting-for-postgresql-10-logical-replication-support-for-initial-data-copy/
- https://www.depesz.com/2020/04/17/waiting-for-postgresql-13-add-logical-replication-support-to-replicate-into-partitioned-tables/
- http://amitkapila16.blogspot.com/2021/09/logical-replication-improvements-in.html
- https://www.postgresql.fastware.com/blog/logical-replication-permissions-in-postgresql-15
- https://www.postgresql.fastware.com/blog/bi-directional-replication-using-origin-filtering-in-postgresql
One requirement of logical replication is that schema, table and columns have the same name on the publisher and the subscriber. So logical replication within the same database is not possible.
Logical replication directly reads from the write-ahead log (WAL), recording every database change, avoiding the need to intercept queries or periodically read the table. These changes are filtered, serialized and then sent to the standby servers where they can be applied.
Logical replication started by only allowing streaming of committed transactions. It then evolved to support in-flight transactions followed by two-phase commits and then parallel apply of in-flight transactions.
-
Replication Slot: A replication slot on the primary server is what reads changes from the WAL and passes it to the output plugin to be serialized and sent to the standby server (or ETL tool) to be applied. Periodically, the standby server sends a message to the primary to confirm that it has read the WAL to a certain point, at which point the slot can advance.
-
Publication: A publication is essentially a filter on the WAL changes. Publications are very powerful and can filter out schemas, tables and even particular columns of tables. You can also choose to publish inserts and not updates and also apply custom logic to filter out certain rows. When a standby starts reading from a replication slot, a set of publications are passed as input.
-
Subscriptions: A subscription is basically the Postgres syntax for creating a logical replication connection to a primary server for replicating changes from a slot and a set of publications. The standby then reads data from the primary and replicates it as long as the subscription is active. While this is Postgres specific, other tools end up behaving like subscribed standbys and get the same output from the primary server.
-
Output plugins: The replication slot passes raw WAL change data to an output plugin which serializes it to a stream of messages. This helps with the interoperability of logical replication as the message format is independent of the underlying database version or configuration. The de-facto output plugin is a Postgres project called pgoutput but other plugins like wal2json and decoderbufs enjoy support among the community.
- https://www.postgresql.org/docs/current/protocol-replication.html
- https://www.postgresql.org/docs/current/protocol-logical-replication.html#PROTOCOL-LOGICAL-REPLICATION-PARAMS
- https://www.postgresql.org/docs/current/protocol-logicalrep-message-formats.html
- https://postgrespro.ru/docs/postgresql/16/protocol-replication?lang=en
- https://postgrespro.ru/docs/postgresql/16/protocol-logical-replication?lang=en
- https://postgrespro.ru/docs/postgrespro/16/protocol-logicalrep-message-formats?lang=en
Protocol version. Currently versions 1, 2, 3, and 4 are supported. A valid version is required.
- Version 2 is supported only for server version 14 and above, and it allows streaming of large in-progress transactions.
- Version 3 is supported only for server version 15 and above, and it allows streaming of two-phase commits.
- Version 4 is supported only for server version 16 and above, and it allows streams of large in-progress transactions to be applied in parallel.
Discution on DDL replication:
Two possibilities method logical replication:
- We can capture commands before they get to the parser and transfer these commands to a second machine. [Slony](https:// www.slony.info/) - based on triggers
- We can try, in some way, to get the queries that are already parsed. Method used in logical replication
- Native logical replication
- The pglogical extension
The first method was the only one available for PostgreSQL versions prior to 9.4. Starting from 9.4, there is an extension called pglogical, and since version 10.0, the logical replica has become native.
Logical replication is based on the concept that WAL segments, after being processed through a logical decoding process that reverses the physical information in a logical information, are made available through a publication mechanism. The primary will then start a publication process, and the replica will start a subscription process that, by connecting to the primary’s publication, is able to pass the decoded instructions directly to the query executor of the replica machine.
Feature | pglogical | logical replication |
---|---|---|
Row filtering on provider | yes | no |
Support for TRUNCATE | yes | no |
Column filtering | yes | no |
Conflict detection and resolution | yes | no |
Version support | yes | no |
Sequence replication | yes | no |
Delayed replication | yes | no |
Postgres-XL as a subscriber | yes | no |
Support for TRUNCATE | yes | no |
Optional JSON output | yes | no |
tables witout PK | no | yes |
Table schema and DDL | yes | no |
Suport partition tables | yes | yes >=13 |
The tables used for logical replication must have replication identity in the form of UNIQUE/PRIMARY KEY. Data is replicated in real-time once the initial snapshot of the tables is created. Publisher can choose to replicate the changes for either INSERT, UPDATE, DELETE or TRUNCATE, or a combination of any of them.
Syntax:
CREATE PUBLICATION name
[ FOR TABLE [ ONLY ] table_name [ * ] [, ...]
| FOR ALL TABLES ]
[ WITH ( publication_parameter [= value] [, ... ] ) ]
The REPLICA IDENTITY is table-level parameter that can be used to control the information written to WAL to identify tuple data that is being deleted or updated. Creating a table, REPLICA IDENTITY is set to DEFAULT. To cheng use ALTER TABLE
ALTER TABLE
SELECT CASE relreplident
WHEN 'd' THEN 'default'
WHEN 'n' THEN 'nothing'
WHEN 'f' THEN 'full'
WHEN 'i' THEN 'index'
END AS replica_identity
FROM pg_class
WHERE oid = 'mytablename'::regclass;
A published table must have a “replica identity” configured in order to be able to replicate UPDATE and DELETE operations, so that appropriate rows to update or delete can be identified on the subscriber side. By default, this is the primary key, if there is one. Another unique index (with certain additional requirements) can also be set to be the replica identity. If the table does not have any suitable key, then it can be set to replica identity “full”, which means the entire row becomes the key. This, however, is very inefficient and should only be used as a fallback if no other solution is possible(in version 16 so that it’s able to use indexed columns even if they are not set explicitly in the replica identity). If a replica identity other than “full” is set on the publisher side, a replica identity comprising the same or fewer columns must also be set on the subscriber side.
INSERT operations can proceed regardless of any replica identity.
Every publication can have multiple subscribers.Each table can be added to multiple publications if needed.
The individual tables can be added and removed dynamically using ALTER PUBLICATION. Both the ADD TABLE and DROP TABLE operations are transactional; so the table will start or stop replicating at the correct snapshot once the transaction has committed.
- All tables:
CREATE PUBLICATION events FOR ALL TABLES;
- specific tables:
CREATE PUBLICATION events FOR TABLE user, todo;
- Filter based on row conditions (Postgres v15+ only):
CREATE PUBLICATION user_event FOR TABLE user WHERE (active IS TRUE);
Example
postgres=# CREATE TABLE data(id int, rgb text);
CREATE TABLE
postgres=# CREATE PUBLICATION pub_data_all FOR TABLE data;
CREATE PUBLICATION
postgres=# CREATE PUBLICATION pub_data_blue FOR TABLE data WHERE (rgb = 'B');
CREATE PUBLICATION
postgres=# CREATE PUBLICATION pub_data_red FOR TABLE data WHERE (rgb = 'R');
CREATE PUBLICATION
postgres=# \d data
postgres=# \dRp+
- Specify Column list to replicate
Any column list must include the REPLICA IDENTITY columns for UPDATE or DELETE operations to be published. Furthermore, if the table uses REPLICA IDENTITY FULL, specifying a column list is not allowed (it will cause publication errors for UPDATE or DELETE operations). If a publication publishes only INSERT operations, then any column can be specified in the list.
REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING }
DEFAULT - use primary key
CREATE PUBLICATION salary_information FOR TABLE employee (user_id, name, salary);
Column lists are applied before the publisher decides to publish the change. The publisher will skip replicating the columns that are not specified in the column list.
Example
CREATE TABLE employee (emp_no INTEGER NOT NULL, name VARCHAR,
postgres(# phone VARCHAR, email VARCHAR,
postgres(# address VARCHAR, dept_no INTEGER,
postgres(# birthday DATE, salary INTEGER);
CREATE TABLE
postgres=# CREATE TABLE department (dept_no INTEGER NOT NULL, dept_name VARCHAR,
postgres(# mgr_no INT, location VARCHAR);
CREATE TABLE
postgres=# CREATE PUBLICATION pub_emp FOR TABLE employee;
CREATE PUBLICATION
-- Publication can be altered to include another table with specified columns.
postgres=# ALTER PUBLICATION pub_emp ADD TABLE department (dept_no, dept_name);
ALTER PUBLICATION
postgres=# CREATE PUBLICATION pub_emp_birthday FOR TABLE employee (emp_no, birthday);
CREATE PUBLICATION
postgres=# CREATE PUBLICATION pub_emp_email FOR TABLE employee (emp_no, email);
CREATE PUBLICATION
postgres=# \d employee
postgres=# \dRp+
The display table command (\d
) now shows the column lists specified for the publication the table is a member of. The display publication command (\dRp+
) now shows the column lists specified for the table.
select * from pg_catalog.pg_publication;
Syntax:
CREATE SUBSCRIPTION subscription_name
CONNECTION ‘conninfo’
PUBLICATION publication_name [, ...]
[ WITH ( subscription_parameter [= value] [, ... ] ) ]
CREATE SUBSCRIPTION mysub
CONNECTION <conn stuff>
PUBLICATION foo
WITH (slot_name=my_slot, create_slot=false);
ALTER SUBSCRIPTION subscription_name DISABLE;
ALTER SUBSCRIPTION subscription_name ENABLE;
select pg_create_logical_replication_slot('YOUR_NEW_NAME_SLOT', 'pgoutput'); -- specify the slots in the subscriptor if you have two databases at the same server instance.
ALTER SUBSCRIPTION subscription_name SET (slot_name='YOUR_NEW_NAME_SLOT');
ALTER SUBSCRIPTION subscription_name REFRESH PUBLICATION;
- https://pgdash.io/blog/postgres-replication-gotchas.html
- Tables used in logical replication must have a Primary Key/Unique Key.
- Only DML can be replicated, and DDL cannot be replicated.
- TRUNCATE is also not replicated.
- SEQUENCES and views cannot be replicated.
pg_logical_slot_get_changes
- to consume each change of the slot
pg_logical_slot_peek_changes
- to view the changes but not consume them).