postgres backup pg_basebackup - ghdrako/doc_snipets GitHub Wiki

When you start pg_basebackup, it will wait for the server to do a checkpoint, and then start to copy the Postgres data directory over.

We are copying data while it is modified and this will return broken, incomplete and wrong data files. The content of the base backup is something like a mixture of the data as it was at the beginning of the base backup and of the data as it was at the end of the base backup. Since there is a transaction log entry for every change, we can always repair any inconsistencies using the PostgreSQL transaction log. In other words: corrupted base backup + WAL = consistent data.

Backup everything it needs for a consistent restore without an external WAL archive.

To make sure that the base backup contains enough WAL to reach at least a consistent state, I recommend adding --wal-method=stream when you call pg_basebackup. It will open a second stream which fetches the WAL created during the backup. The advantage is that the backup now has everything it needs for a consistent restore without an external WAL archive.

pg_basebackup -D /target_dir -h master.example.com --checkpoint=fast --wal-method=stream -R

basebackup

make a complete binary copy of a PostgreSQL instance. This command does its job without affecting other clients in the database and can be used both for point-in-time recovery and as the starting point for log shipping or streaming replication standby servers.

pg_basebackup -h 192.168.0.191 -D /var/lib/postgresql/12/main -P -U replicator -Xs -R --checkpoint=fast 

As you can see, there are a couple of parameters:

  • -h: Master server IP addresses.
  • -D: The directory to write the output to.
  • -P: Indicates progress.
    • U: User.
  • -checkpoint: We set fast speeds in the checkpoint process so the copy can begin.
  • -Xs: Streams the write-ahead log while the backup is created.
  • -R: Creates a standby.signal file to facilitate setting up a standby server. When pg_basebackup is complete, we will check two files – standby.signal and postgresql.auto.conf. Both are created thanks to the -R parameter.
#!/bin/bash
set -e 
pg_basebackup -D /backup/$(date +%Y%m%d) -Ft -X fetch
 
  • -X includes required WALs in the backup.
  • "fetch" indicates that WALs are collected at the end of the backup.

To make this work you need to have replication enabled in the hba file. You should have a line like this:

local replication postgres peer

And the following parameters in postgresql.conf file:

wal_level = hot_standby
max_wal_senders = 10

pg_basebackup can write your backups in two different formats

  • plain
  • tar Plain works well if you have one tablespace. If you have multiple tablespaces, by default it will have to write all the tablespace in the same location that they already are, which obviously only works if you’re doing this across the network. In recent versions you can remap your tablespaces to different locations, but it rapidly becomes very complicated to use the plain format if you have multiple tablespaces. The other format for pg_basebackup is tar, in which case the destination is still a directory and you will put tar files into that directory. Your root tablespace will be in a file called base.tar or, if you’re doing compression it will be base.tar.gz and then there will be one tar file for each tablespace. So, in a lot of ways that’s easier for dealing with scenarios where you have more than one tablespace.

Always use -x or -X (using -x is equivalent to -X fetch) to include WAL files in the backup. You can use the following command:

$ pg_basebackup -x -D /path/to/backupdir

pg_basebackup can also support compressed backups. It uses standard gzip compression. You can use the “-Z” option for that. One thing to remember is if you use -Z, the compression is done by pg_basebackup and not by PostgreSQL. So, if you run pg-basebackup across the network to PostgreSQL, the data will be sent uncompressed from PostgreSQL to pg_basebackup and then compressed and written to disk NoteCompression is only supported for the tar format and compression is CPU bound.

You need all the transaction logs (WALs) generated on your system from the beginning of the backup to the end of the backup. PostgreSQL basically takes an inconsistent copy of your data directory and then it uses the transaction log to make it consistent. So, if you do not have the transaction log, it is not a consistent backup, which means you don’t have a valid backup.

BACKUP THROTTLING

By default, pg_basebackup will operate at full speed. This can lead to trouble, because your network connection or your disk might run at peak capacity. The way to fix this problem is by reducing the speed of pg_basebackup in the first place:

 -r, --max-rate=RATE    maximum transfer rate to transfer data directory
                        (in kB/s, or use suffix 'k' or 'M')