CloudSQL logical replication pglogical - ghdrako/doc_snipets GitHub Wiki
Resource
-
https://cloud.google.com/sql/docs/postgres/replication/configure-logical-replication
-
https://cloud.google.com/sql/docs/postgres/replication/configure-replication-from-external
-
https://www.2ndquadrant.com/en/resources-old/pglogical/pglogical-docs/
-
https://wiki.postgresql.org/images/7/7b/Pglogical-tallinn-2016-petr-jelinek.pdf
Setup pglogical https://github.com/2ndQuadrant/pglogical
Terminology:
- Nodes - PostgreSQL database instances
- Providers and Subscribers - roles taken by Nodes
- Replication Set - a collection of tables
Architectural details:
- pglogical works on a per-database level, not whole server level like physical streaming replication
- One Provider may feed multiple Subscribers without incurring additional disk write overhead
- One Subscriber can merge changes from several origins and detect conflict between changes with automatic and configurable conflict resolution (some, but not all aspects required for multi-master).
- Cascading replication is implemented in the form of changeset forwarding.
- Allow connections If the primary instance is a Cloud SQL instance, you can allow access from the replica's outgoing IP address by adding it as an authorized network.
# Get outgoing IP address of a replica instance
gcloud sql instances describe [REPLICA_NAME] --format="default(ipAddresses)"
- Create a replication user on both side
CREATE USER replication_user WITH REPLICATION
IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'secret';
# for existing user
ALTER USER existing_user WITH REPLICATION;
- Set up logical replication with pglogical
- Primary instance
cloudsql.logical_decoding=on
cloudsql.enable_pglogical=on
- Logical replica
cloudsql.enable_pglogical=on
- Install the pglogical extension On both side using replication_user
CREATE EXTENSION pglogical;
source-instance$ SELECT pglogical.create_node(
node_name := 'primary',
dsn := 'host=<primary-ip> port=5432 dbname=postgres user=replication_user password=secret'
);
dest-instance$ SELECT pglogical.create_node(
node_name := 'replica',
dsn := 'host=<replica-ip> port=5432 dbname=postgres user=replication_user password=secret'
);
- Create a table with data to replicate
CREATE TABLE replica_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO replica_test (data) VALUES ('apple'), ('banana'), ('cherry');
The table must also be created on the replica instance.
- Add the table to a replication set
SELECT pglogical.replication_set_add_table('default', 'replica_test', true);
- Create the pglogical subscription
SELECT pglogical.create_subscription(
subscription_name := 'test_sub',
provider_dsn := 'host=<primary-ip> port=5432 dbname=postgres user=replication_user password=replicapassword'
);
SELECT * FROM pglogical.show_subscription_status('test_sub');
-- SELECT pglogical.wait_for_subscription_sync_complete('subscription1');
Monitor initial load
SELECT datname, pid, state, query, age(clock_timestamp(), query_start) AS age
FROM pg_stat_activity
WHERE state <> 'idle'
AND query NOT LIKE '% FROM pg_stat_activity %'
AND query LIKE 'COPY%'
ORDER BY age;
- https://www.percona.com/blog/how-to-start-logical-replication-in-postgresql-for-specific-tables-based-on-a-pg_dump/
- https://github.com/2ndQuadrant/pglogical/issues/325
overcame some of my issues with taking a slightly different approach:
- After importing the schema into the subscriber - drop all of the foreign key constraints. This will allow you to manually sync each table which provides a few benefits - it's typically faster and it's easier/quicker to recover from unexpected errors because you don't have to resync every table if you run into problems
- Tweak your PG config on provider+subscriber. Specifically, look into tuning max_wal_size, checkpoint_timeout and synchronous_commit. Excessive checkpoints are very expensive.
- Create a new subscription with synchronize_data := 'false'
- Manually resync each table: select * from pglogical.alter_subscription_resynchronize_table('subscription_name', 'table_name');
Once you have synced all tables individually, you can add the foreign keys back to the tables. You may also try dropping indexes on the subscriber and re-creating them once all tables are in sync.
Also - make sure your instances are not I/O bound. We have found that some tables (for some unknown reason) are very I/O instensive+slow, and others are fast.
The event trigger facility can be used for describing rules which define replication sets for newly created tables.
Example:
CREATE OR REPLACE FUNCTION pglogical_assign_repset()
RETURNS event_trigger AS $$
DECLARE obj record;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
IF obj.object_type = 'table' THEN
IF obj.schema_name = 'config' THEN
PERFORM pglogical.replication_set_add_table('configuration', obj.objid);
ELSIF NOT obj.in_extension THEN
PERFORM pglogical.replication_set_add_table('default', obj.objid);
END IF;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
CREATE EVENT TRIGGER pglogical_assign_repset_trg
ON ddl_command_end
WHEN TAG IN ('CREATE TABLE', 'CREATE TABLE AS')
EXECUTE PROCEDURE pglogical_assign_repset();
Automatic assignment of replication sets for new tables
The event trigger facility can be used for describing rules which define replication sets for newly created tables.
Example:
CREATE OR REPLACE FUNCTION pglogical_assign_repset()
RETURNS event_trigger AS $$
DECLARE obj record;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
IF obj.object_type = 'table' THEN
IF obj.schema_name = 'config' THEN
PERFORM pglogical.replication_set_add_table('configuration', obj.objid);
ELSIF NOT obj.in_extension THEN
PERFORM pglogical.replication_set_add_table('default', obj.objid);
END IF;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
CREATE EVENT TRIGGER pglogical_assign_repset_trg
ON ddl_command_end
WHEN TAG IN ('CREATE TABLE', 'CREATE TABLE AS')
EXECUTE PROCEDURE pglogical_assign_repset();
Metric: cloudsql.googleapis.com/database/postgresql/replication/replica_byte_lag
Resource types: cloudsql_database
Inne ale chyba tylko dla binarnej:
Name
Network lag
Description
Indicates time taken from primary binary log to IO thread on replica. Only applicable to replicas.
Metric
cloudsql.googleapis.com/database/replication/network_lag
Resource types
cloudsql_database
Name
Replication lag
Description
Number of seconds the read replica is behind its primary (approximation). A negative value indicates that replication is inactive.
Metric
cloudsql.googleapis.com/database/replication/replica_lag
Resource types
cloudsql_databas
Nie ma metryki to robimy ja logow - moznaliczyc ile razy sie taki log pojawil
resource.type="cloudsql_database" AND textPayload =~ "remaining connection slots are reserved for non-replication superuser connections"