postgres backup pg_basebackup - ghdrako/doc_snipets GitHub Wiki

It is designed to perform consistent online backups of the file system. These consistent backups are essential for setting up a secondary/slave server during streaming replication and for point-in-time recovery. pg_basebackup is employed to back up the entire cluster. It does not support backing up individual databases or database objects. The backups created by pg_basebackup can be in tar or plain text formats. Using tar and compressed formats is beneficial for minimizing the disk space required to back up and store the data directory, tablespaces, and WAL segments.

Steps to perform a successful base backup:

Create a backup directory and an archive location. mkdir -pv /pgbackup/archivedir as postgres user
Set up continuous WAL archiving.

find postgresql.conf location ps -ef | grep postgres or prom psql:show config_file;
change in postgresql.conf

archive_mode = on 
archive_command = 'cp %p /pgbackup/archivedir/%f'
wal_level = replica
max_wal_sender = 3

cd /pgbackup/archivedir
select pg_switch_wal();

Execute the base backup.

pg_basebackup -Ft -p5432 -v -D /pgbackup/basebackup

When you start pg_basebackup, it will wait for the server to do a checkpoint, and then start to copy the Postgres data directory over.

We are copying data while it is modified and this will return broken, incomplete and wrong data files. The content of the base backup is something like a mixture of the data as it was at the beginning of the base backup and of the data as it was at the end of the base backup. Since there is a transaction log entry for every change, we can always repair any inconsistencies using the PostgreSQL transaction log. In other words: corrupted base backup + WAL = consistent data.

Backup everything it needs for a consistent restore without an external WAL archive.

To make sure that the base backup contains enough WAL to reach at least a consistent state, I recommend adding --wal-method=stream when you call pg_basebackup. It will open a second stream which fetches the WAL created during the backup. The advantage is that the backup now has everything it needs for a consistent restore without an external WAL archive.

pg_basebackup -D /target_dir -h master.example.com --checkpoint=fast --wal-method=stream -R

basebackup

make a complete binary copy of a PostgreSQL instance. This command does its job without affecting other clients in the database and can be used both for point-in-time recovery and as the starting point for log shipping or streaming replication standby servers.

pg_basebackup -h 192.168.0.191 -D /var/lib/postgresql/12/main -P -U replicator -Xs -R --checkpoint=fast

As you can see, there are a couple of parameters:

-h: Master server IP addresses.
-D: The directory to write the output to.
-P: Indicates progress.
- U: User.
-checkpoint: We set fast speeds in the checkpoint process so the copy can begin.
-Xs: Streams the write-ahead log while the backup is created.
-R: Creates a standby.signal file to facilitate setting up a standby server. When pg_basebackup is complete, we will check two files – standby.signal and postgresql.auto.conf. Both are created thanks to the -R parameter.

#!/bin/bash
set -e 
pg_basebackup -D /backup/$(date +%Y%m%d) -Ft -X fetch

-X includes required WALs in the backup.
"fetch" indicates that WALs are collected at the end of the backup.

To make this work you need to have replication enabled in the hba file. You should have a line like this:

local replication postgres peer

And the following parameters in postgresql.conf file:

wal_level = hot_standby
max_wal_senders = 10

pg_basebackup can write your backups in two different formats

plain
tar Plain works well if you have one tablespace. If you have multiple tablespaces, by default it will have to write all the tablespace in the same location that they already are, which obviously only works if you’re doing this across the network. In recent versions you can remap your tablespaces to different locations, but it rapidly becomes very complicated to use the plain format if you have multiple tablespaces. The other format for pg_basebackup is tar, in which case the destination is still a directory and you will put tar files into that directory. Your root tablespace will be in a file called base.tar or, if you’re doing compression it will be base.tar.gz and then there will be one tar file for each tablespace. So, in a lot of ways that’s easier for dealing with scenarios where you have more than one tablespace.

Always use -x or -X (using -x is equivalent to -X fetch) to include WAL files in the backup. You can use the following command:

$ pg_basebackup -x -D /path/to/backupdir

pg_basebackup can also support compressed backups. It uses standard gzip compression. You can use the “-Z” option for that. One thing to remember is if you use -Z, the compression is done by pg_basebackup and not by PostgreSQL. So, if you run pg-basebackup across the network to PostgreSQL, the data will be sent uncompressed from PostgreSQL to pg_basebackup and then compressed and written to disk NoteCompression is only supported for the tar format and compression is CPU bound.

You need all the transaction logs (WALs) generated on your system from the beginning of the backup to the end of the backup. PostgreSQL basically takes an inconsistent copy of your data directory and then it uses the transaction log to make it consistent. So, if you do not have the transaction log, it is not a consistent backup, which means you don’t have a valid backup.

BACKUP THROTTLING

By default, pg_basebackup will operate at full speed. This can lead to trouble, because your network connection or your disk might run at peak capacity. The way to fix this problem is by reducing the speed of pg_basebackup in the first place:

 -r, --max-rate=RATE    maximum transfer rate to transfer data directory
                        (in kB/s, or use suffix 'k' or 'M')

Recovery procedure

https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-PITR-RECOVERY

Incremental backup Postgres 17+

https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-INCREMENTAL-BACKUP pg_basebackup can take a full or incremental base backup of the database. When used to take a full backup, it makes an exact copy of the database cluster's files. When used to take an incremental backup, some files that would have been part of a full backup may be replaced with incremental versions of the same files, containing only those blocks that have been modified since the reference backup. An incremental backup cannot be used directly; instead, pg_combinebackup must first be used to combine it with the previous backups upon which it depends.

Incremental backups typically only make sense for relatively large databases where a significant portion of the data does not change, or only changes slowly. For a small database, it's simpler to ignore the existence of incremental backups and simply take full backups, which are simpler to manage. For a large database all of which is heavily modified, incremental backups won't be much smaller than full backups.