postgres physical replication - ghdrako/doc

Long queries can cause query conflicts in a replication scenario. For instance, when heavy updates on the primary are changing datathat is simultaneously being selected on the standby server, this can lead to the queries on the standby being cancelled. The query cancellation is due to the conflict caused by needing to apply WAL that will affect the data being selected. Wynika to z tego ze na standby nie ma takiej swobody jak na masterze gdzi moga powsatac dead tuple. Standby jest w trybie recovery i nie tworzy dead tuples. Jesli jest w trybie synchronicznym lub z opoznieniem ktore minelo musi zaaplikowac zmiany z WAL-a tym samym modyfikujac dane tranzakcji z dlugim selectem. To powoduje ze select sie przerywa.

In PostgreSQL, there are two kinds of physical replication techniques:

Asynchronous replication: In asynchronous replication, the primary device (source) sends a continuous flow of data to the secondary one (target), without receiving any return code from the target. This type of copying has the advantage of speed, but it brings with it greater risks of data loss because the received data is not acknowledged.
Synchronous replication: In synchronous replication, a source sends the data to a target, that is, the second server; at this point, the server acknowledges that the changes are correctly written. If the check is successful, the transfer is completed.

Summary of the key information about WAL segments:

The WAL size is fixed at 16 MB.
By default, WAL files are deleted as soon as they are older than the latest checkpoint.
We can maintain extra WAL segments using wal_keep_segments.
WAL segments are stored in the pg_wal directory

The wal_level directive

https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL

The wal_level directive sets what kind of information should be stored in WAL segments. The default value is minimal. With this value, all information that is stored in a WAL segment can support archiving and physical replication.

Default value wal_level=replica is for physical replication and wal_level=logical for logical.

Streaming replication

obraz obraz obraz obraz

The idea behind streaming replication is to copy the WAL files from the primary server to another (replica) server.

Streaming replication works by continuously transferring Write-Ahead Log (WAL) data from the primary server to the standby server in real-time, keeping the standby's database nearly identical to the primary.

The replica server will be in a state of continuous recovery, and it continuously executes the WAL that is passed by the primary machine; this way, the replica machine binarily replicates the data of the primary machine through the WAL.

In a streaming replication context, a communication channel will be open between the replica and primary, and the primary will send the WAL segments through it. The replica server will receive the WAL segments and rerun them, remaining in a permanent recovery state.

Asynchronous replication

On the primary server

Modify listen_addresses so that it listens to the network.If we set listen_addresses = '*', PostgreSQL will listen to any IP; otherwise, we can specify a list of IP addresses separated by commas. This change requires a restart of the PostgreSQL service.

SHOW hba_file

Examle with md5 authentication, but the recommended method is scram-sha-256 for improved security. Refer to https://www.postgresql.org/docs/current/auth-password.html for more information.

host replication replication_user <ip_replica>/32 md5

cat /var/lib/postgresql/data/pg_hba.conf
psql -c  "SELECT pg_reload_conf();"

Create a new user that is able to perform the replication

# Password for replication_user   
export REP_USER_PASSWORD= $( openssl rand -hex 12 )   
echo  "Create REP_USER_PASSWORD for replication_user"   
echo $REP_USER_PASSWORD     
# "rm replication_user.sql" for a clean starting point    
# CREATE USER statement as SQL file    
# Set password to DB_PASSWORD value
rm -f replication_user.sql  
echo  "CREATE USER replication_user    
WITH ENCRYPTED PASSWORD ' $REP_USER_PASSWORD '    
REPLICATION LOGIN;    
GRANT SELECT ON ALL TABLES IN SCHEMA public TO replication_user;    
ALTER DEFAULT PRIVILEGES IN SCHEMA public    
GRANT SELECT ON TABLES TO replication_user;"  >> replication_user.sql    
rm -f .pgpass  
echo  "*:*:*:replication_user: $REP_USER_PASSWORD "  >> .pgpass

CREATE role replicarole WITH REPLICATION ENCRYPTED PASSWORD 'SuperSecret' LOGIN;

Modify the pg_hba.conf file so that from the replica machine with the user replicarole, it is possible to reach the primary machine:

host replication replicarole 192.168.122.11/32 scram-sha-256

To make this configuration active, we need to run a reload of the PostgreSQL server.

select pg_reload_conf();

On the replica server, we have to turn off the PostgreSQL service, destroy the PGDATA directory, and remake it – this time, empty and with the right permissions. To do this, we can use these statements:

root@pg1:/# systemctl stop postgresql
root@pg1:/# cd /var/lib/postgresql/16/
root@pg1:/# rm -rf main
root@pg1:/# mkdir main
root@pg1:/# chown postgres:postgres main
root@pg1:/# chmod 0700 main

The wal_keep_segments option

The postgresql.conf directive that tells PostgreSQL how many WAL segments to keep on disk is called wal_keep_segments; by default, wal_keep_segments is set to zero because the replica is not installed by the PostgreSQL installation process. This means that PostgreSQL will not store any extra WAL segments as buffers. This means that if the replica machine (standby) goes down, then it will no longer be able to realign itself when it comes back up. This happens because in the time it takes the replica to get back up, it is possible that the primary machine has produced and deleted new WAL segments. The first way to overcome this problem is to set the wal_keep_ segments directive to a value greater than zero in postgresql.conf. For example, if we set a value of wal_keep_segments = 100, this means that at least 100 files of WAL segments will be present in the pg_wal folder, for a total occupied disk space of 100 * 16 MB = 1.6 GB. In this case, the primary always keeps these extra WAL segments, and if the replica should go down, then it will only be able to realign itself, once back up, if the primary has produced a number of WAL segments less than wal_keep_segments. This solution offers a static buffer in that you can store old WAL segments and offers a save an- chor that is shorter than the time taken by the primary to produce a number of WAL segments greater than wal_keep_segments. This solution is a static solution; it also has the disadvantage that the space occupied on disk is always equal to wal_keep_segments * 16 MB, even when it is no longer necessary to keep WAL segments on the primary server (because they have already been processed by the replica server). The advantage of this solution is that if the network goes down, PostgreSQL uses a maximum disk space equal to wal_keep_segments * 16 MB to avoid filling all the disk space if the primary server goes down; so if we don’t have much disk space, we can use this solution, keeping in mind that if we exceed the size of wal_keep_segments * 16 MB, the replica will no longer be synchronized, and we will have to rebuild it.

The slot way

In PostgreSQL, there is another approach that can be used to solve the problem of storing WAL segments: the slot technique. Through the slot technique, we can tell PostgreSQL to keep all the WAL segments on the primary until they have been transferred to the replica servers. In this way, we have dynamic, variable, and fully automated management of the number of WAL segments that the primary server must keep as a buffer. This is a very easy way to manage our physical replicas

Create a new slot

SELECT * FROM pg_create_physical_replication_slot('master_slot');

drop a slot

select pg_drop_replication_slot('master_slot');

Asynchronous replication

By default, in PostgreSQL, physical replication is asynchronous.

The replica server will now have the PostgreSQL service turned off and the PGDATA data folder created, empty, and with the right permissions granted:

go inside the PGDATA directory as the system postgres user

root@pg2:# su - postgres
postgres@pg2:~$ cd /var/lib/PostgreSQL/16/main

run the pg_basebackup command This command will execute the pg_base_backup command from the primary machine to the replica machine and prepare the replica machine to receive and execute the received WAL segments, causing the replica server to remain in a state of permanent recovery:

postgres@pg2:~/16/main$ pg_basebackup -h 192.168.122.10 -U replicarole -p5432 -D /var/lib/PostgreSQL/16/main -Fp -Xs -P -R -S master_slot

rm -rf /var/lib/postgresql/data/ *  &&  \
     pg_basebackup --host db01  \
    --username replication_user  \
    --pgdata /var/lib/postgresql/data  \
    --verbose  \
    --progress  \
    --wal-method stream  \
    --write-recovery-conf  \
    --slot=master_slot

If the pg_basebackup doesn’t start quickly, that means that it is waiting for a checkpoint from the primary, so to improve the performance of this operation, we can go on the primary server and execute:

postgres=# checkpoint ;

-h: With this option, we see the host that we want the replica to connect to.
-U: This is the user created on the primary server used for replication.
-p: This is the port where the primary server listens.
-D: This is the PGDATA value on the replica server.
-Fp: This performs a backup on the replica, maintaining the same data structure present on the primary.
-Xs: This opens a second connection to the primary server and starts the transfer of the WAL segments at the same time as the backup is performed.
-P: This shows the progress of the backup.
-S: This is the slotname created on the primary server.
-R: This creates the standby.signal file and adds the connection settings to the PostgreSQL.auto.conf file:
postgres@pg2:~/16/main$ cat postgresql.auto.conf

# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = ‘user=replicarole password=SuperSecret channel_binding=-
disable host=192.168.122.10 port=5432 sslmode=disable sslcompression=0 sslcert-
mode=disable sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable
krbsrvname=postgres target_session_attrs=any load_balance_hosts=disable’
primary_slot_name = ‘master

start the PostgreSQL service on the replica machine, and physical replication should work. As the root user, let’s execute the following:

root@pg2:/var/lib/postgresql/16# systemctl start postgresql

Replica monitoring

https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-REPLICATION-VIEW PostgreSQL offers us a view through which we can monitor the status of replicas in real time; its name is pg_stat_replication. This view must be queried by connecting to the primary node.

\x
select * from pg_stat_replication ;

Synchronous replication

In asynchronous replication, the primary server does not wait for the replica server to actually replicate the data. In synchronous replication, when the primary performs a commit, all the replicated servers synchronously commit. In synchronous replication, after the execution of the commit, we are sure that the data is replicated on the primary and all the replicas. When we want to achieve synchronous replication, it is good practice to have all identical machines and a good network connection between the machines; otherwise, performance can become slow.

PostgreSQL settings

it is possible to change from asynchronous replication to synchronous replication.

Primary server

On the primary server, we have to check whether the synchronous_commit parameter is set to on. Now, synchronous_commit = on is the default value on a new PostgreSQL installation.

add the synchronous_standby_names parameter, listing the names of all standby servers that will replicate the data synchronously. We can also use the '*' wildcard, thus indicating to PostgreSQL that each standby server can potentially have a synchronous replica.

synchronous_standby_names = 'pg2'
synchronous_commit = on

After this, we need to restart our server:

# systemctl restart postgresql

Standby server

On the standby server, we have to add a parameter to the connection string to the primary so that the primary knows from whom the reply request comes. We need to edit the postgresql.auto.conf file; it is currently as follows:

# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replicarole password=SuperSecret channel_
binding=disable host=192.168.122.10 port=5432 sslmode=disable
sslcompression=0 sslcertmode=disable sslsni=1 ssl_min_protocol_
version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_
attrs=any load_balance_hosts=disable'
primary_slot_name = 'master'

We need to change this to the following:

# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replicarole password=SuperSecret channel_
binding=disable host=192.168.122.10 port=5432 sslmode=disable
sslcompression=0 sslcertmode=disable sslsni=1 ssl_min_protocol_
version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_
attrs=any load_balance_hosts=disable application_name=pg2'
primary_slot_name = 'master'

We have added the application_name=pg2 option. After doing this, let’s restart the standby server.

Status

select * from pg_stat_replication;

Output show the primary server and standby servers are replicated in a synchronous way by sync_state=sync.

Delayed replication

The delay occurs only on WAL records for transaction commits. Other records are replayed as quickly as possible, which is not a problem because MVCC visibility rules ensure their effects are not visible until the corresponding commit record is applied.

The delay occurs once the database in recovery has reached a consistent state, until the standby is promoted or triggered. After that the standby will end recovery without further waiting.

WAL records must be kept on the standby until they are ready to be applied. Therefore, longer delays will result in a greater accumulation of WAL files, increasing disk space requirements for the standby's pg_wal directory.

!!! Nie nadaje sie do replikacji synchronicznej. Synchronous replication is affected by this setting when synchronous_commit is set to remote_apply; every COMMIT will need to wait to be applied.

recovery_min_apply_delay - definjujemy jakie ma byc opoznienie w stosunku do serwera primary. Maksymalne jest 24h. Z dokumentacji wynika ze ponad 24 dni ???? - https://pgpedia.info/r/recovery_min_apply_delay.html

postgresql.conf:

recovery_min_apply_delay = 5000

recovery_min_apply_delay = '2h'

and we make a reload of the postgresql service on the replica server:

root@pg2:~# systemctl reload postgresql

Using a delay on the replica server means that WAL files are regularly downloaded from the primary server, but they are processed with the delay specified on the parameter recovery_min_ apply_delay.

Promoting a replica server to a primary

it is possible to promote the replica node to the primary; to achieve this goal, on the replica node, as a postgres user, we have to execute this statement:

postgres@pg2:~$ pg_ctl promote -D /var/lib/PostgreSQL/16/main

we can see that the replica is 5 seconds behind the primary because the time unit used on recovery_ min_apply_delay is milliseconds.

Conflict with Recovery

When long queries run on a replica, they can get canceled due to accessing row versions that cause a conflict with versions on the primary. This can happen when VACUUM runs on the primary, triggered by the accumulation of dead tuples and having met a threshold. When VACUUM is running, if the reader instance is querying the rows that have dead tuples being processed or an active transaction is referencing them, this causes a conflict between VACUUM and the replica. When that conflict happens, PostgreSQL cancels the replica query due to the conflict. The cancellation can cause users to experience problems and errors. The cancellation looks like this in the log:

   ERROR : canceling statement due  to  conflict  with  recovery Is there a way to avoid that? One solution

is to use the hot_standby_feedback parameter. Setting this parameter to on on the replica makes the primary instance aware of long-running queries. That can help avoid cancellations due to conflicts. Note that this setting applies only to replicas that use physical replication. The trade-off with this is that the replica may have more stale data (see “PostgreSQL ERROR: Canceling Statement Due to Conflict with Recovery”[368] for more information).

postgres physical replication - ghdrako/doc_snipets GitHub Wiki

The wal_level directive

Streaming replication

Asynchronous replication

The wal_keep_segments option

The slot way

Asynchronous replication

Replica monitoring

Synchronous replication

PostgreSQL settings

Primary server

Standby server

Delayed replication

Promoting a replica server to a primary

Conflict with Recovery

⚠️ GitHub.com Fallback ⚠️

postgres physical replication - ghdrako/doc_snipets GitHub Wiki

The wal_level directive

Streaming replication

Asynchronous replication

The wal_keep_segments option

The slot way

Asynchronous replication

Replica monitoring

Synchronous replication

PostgreSQL settings

Primary server

Standby server

Delayed replication

Promoting a replica server to a primary

Conflict with Recovery

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️