CloudSQL logical replication pglogical - ghdrako/doc_snipets GitHub Wiki

Resource

Terminology:

  • Nodes - PostgreSQL database instances
  • Providers and Subscribers - roles taken by Nodes
  • Replication Set - a collection of tables

Architectural details:

  • pglogical works on a per-database level, not whole server level like physical streaming replication
  • One Provider may feed multiple Subscribers without incurring additional disk write overhead
  • One Subscriber can merge changes from several origins and detect conflict between changes with automatic and configurable conflict resolution (some, but not all aspects required for multi-master).
  • Cascading replication is implemented in the form of changeset forwarding.
  1. Allow connections If the primary instance is a Cloud SQL instance, you can allow access from the replica's outgoing IP address by adding it as an authorized network.
# Get outgoing IP address of a replica instance
gcloud sql instances describe [REPLICA_NAME] --format="default(ipAddresses)"

  1. Create a replication user on both side
CREATE USER replication_user WITH REPLICATION
IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'secret';

# for existing user
ALTER USER existing_user WITH REPLICATION;
  1. Set up logical replication with pglogical
  • Primary instance
cloudsql.logical_decoding=on 
cloudsql.enable_pglogical=on
  • Logical replica
cloudsql.enable_pglogical=on
  1. Install the pglogical extension On both side using replication_user
CREATE EXTENSION pglogical;
  1. https://cloud.google.com/sql/docs/postgres/replication/configure-logical-replication#create-a-pglogical-node-on-each-instance
source-instance$ SELECT pglogical.create_node(
    node_name := 'primary',
    dsn := 'host=<primary-ip> port=5432 dbname=postgres user=replication_user password=secret'
);

dest-instance$ SELECT pglogical.create_node(
    node_name := 'replica',
    dsn := 'host=<replica-ip> port=5432 dbname=postgres user=replication_user password=secret'
);
  1. Create a table with data to replicate
CREATE TABLE replica_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO replica_test (data) VALUES ('apple'), ('banana'), ('cherry');

The table must also be created on the replica instance.

  1. Add the table to a replication set
SELECT pglogical.replication_set_add_table('default', 'replica_test', true);
  1. Create the pglogical subscription
SELECT pglogical.create_subscription(
    subscription_name := 'test_sub',
    provider_dsn := 'host=<primary-ip> port=5432 dbname=postgres user=replication_user password=replicapassword'
);

SELECT * FROM pglogical.show_subscription_status('test_sub');
-- SELECT pglogical.wait_for_subscription_sync_complete('subscription1');

Monitor initial load

SELECT datname, pid, state, query, age(clock_timestamp(), query_start) AS age 
FROM pg_stat_activity
WHERE state <> 'idle' 
    AND query NOT LIKE '% FROM pg_stat_activity %' 
    AND query LIKE 'COPY%'
ORDER BY age;

Slow initial load

overcame some of my issues with taking a slightly different approach:

  • After importing the schema into the subscriber - drop all of the foreign key constraints. This will allow you to manually sync each table which provides a few benefits - it's typically faster and it's easier/quicker to recover from unexpected errors because you don't have to resync every table if you run into problems
  • Tweak your PG config on provider+subscriber. Specifically, look into tuning max_wal_size, checkpoint_timeout and synchronous_commit. Excessive checkpoints are very expensive.
  • Create a new subscription with synchronize_data := 'false'
  • Manually resync each table: select * from pglogical.alter_subscription_resynchronize_table('subscription_name', 'table_name');

Once you have synced all tables individually, you can add the foreign keys back to the tables. You may also try dropping indexes on the subscriber and re-creating them once all tables are in sync.

Also - make sure your instances are not I/O bound. We have found that some tables (for some unknown reason) are very I/O instensive+slow, and others are fast.

Automatic assignment of replication sets for new tables

The event trigger facility can be used for describing rules which define replication sets for newly created tables.

Example:

CREATE OR REPLACE FUNCTION pglogical_assign_repset()
RETURNS event_trigger AS $$
DECLARE obj record;
BEGIN
    FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
    LOOP
        IF obj.object_type = 'table' THEN
            IF obj.schema_name = 'config' THEN
                PERFORM pglogical.replication_set_add_table('configuration', obj.objid);
            ELSIF NOT obj.in_extension THEN
                PERFORM pglogical.replication_set_add_table('default', obj.objid);
            END IF;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER pglogical_assign_repset_trg
    ON ddl_command_end
    WHEN TAG IN ('CREATE TABLE', 'CREATE TABLE AS')
    EXECUTE PROCEDURE pglogical_assign_repset();
Automatic assignment of replication sets for new tables

The event trigger facility can be used for describing rules which define replication sets for newly created tables.

Example:

CREATE OR REPLACE FUNCTION pglogical_assign_repset()
RETURNS event_trigger AS $$
DECLARE obj record;
BEGIN
    FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
    LOOP
        IF obj.object_type = 'table' THEN
            IF obj.schema_name = 'config' THEN
                PERFORM pglogical.replication_set_add_table('configuration', obj.objid);
            ELSIF NOT obj.in_extension THEN
                PERFORM pglogical.replication_set_add_table('default', obj.objid);
            END IF;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER pglogical_assign_repset_trg
    ON ddl_command_end
    WHEN TAG IN ('CREATE TABLE', 'CREATE TABLE AS')
    EXECUTE PROCEDURE pglogical_assign_repset();

Monitoring replication

Metric: cloudsql.googleapis.com/database/postgresql/replication/replica_byte_lag
Resource types:    cloudsql_database

Inne ale chyba tylko dla binarnej:

Name
    Network lag
Description
    Indicates time taken from primary binary log to IO thread on replica. Only applicable to replicas.
Metric
    cloudsql.googleapis.com/database/replication/network_lag
Resource types
    cloudsql_database
Name
    Replication lag
Description
    Number of seconds the read replica is behind its primary (approximation). A negative value indicates that replication is inactive.
Metric
    cloudsql.googleapis.com/database/replication/replica_lag
Resource types
    cloudsql_databas

Use logbsed metrics

Example Is there any predefined metrics which is used to monitor connection limit of cloud SQL in GCP and set alert on connection limit

Nie ma metryki to robimy ja logow - moznaliczyc ile razy sie taki log pojawil

resource.type="cloudsql_database" AND textPayload =~ "remaining connection slots are reserved for non-replication superuser connections"
⚠️ **GitHub.com Fallback** ⚠️