Restoring Complete Cluster Using PITR

Useful information for this disaster recovery scenario can be found in the PostgreSQL manual, for example at: http://www.postgresql.org/docs/9.1/static/continuous-archiving.html

The overall process is as follows:

Stop the server, if it’s running.

If the space to do so is available, the whole Cluster data directory and any tablespaces should be copied to a temporary location in case they are needed later.

It is very important to use the “-a” parameter to copy the files, so that the files retain their ownership

root@dc-u24Postgres16db-test:/var/lib/postgresql/16/main# cp -a pg_wal/* /tmp/pg_wal.backup/
root@dc-u24Postgres16db-test:/var/lib/postgresql/16/main# ls -lhtr /tmp/pg_wal.backup/
total 97M
-rw------- 1 postgres postgres  16M Jul  2 17:18 000000010000000000000001
-rw------- 1 postgres postgres  16M Jul  2 17:18 000000010000000000000002
-rw------- 1 postgres postgres  349 Jul  2 17:18 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres  16M Jul  2 17:26 000000010000000000000003
-rw------- 1 postgres postgres  16M Jul  2 17:26 000000010000000000000004
-rw------- 1 postgres postgres  370 Jul  2 17:26 000000010000000000000004.00000028.backup
-rw------- 1 postgres postgres  16M Jul  2 17:36 000000010000000000000005
drwx------ 2 postgres postgres 4.0K Jul  2 17:36 archive_status
-rw------- 1 postgres postgres  16M Jul  2 17:36 000000010000000000000006
root@dc-u24Postgres16db-test:/var/lib/postgresql/16/main#

Note

This precaution will require having enough free space to hold two copies of the existing databases. If enough space is not available, at least the contents of the pg_wal subdirectory of the Cluster data directory should be copied, as it may contain logs which were not archived before the system went down.

Clean out all existing files and subdirectories below the Cluster data directory and the root directories of any tablespaces being used.
Restore the database files from the backups. If tablespaces are used, it is strongly recommended to verify that the symbolic links in pg_tblspc/ were correctly restored. The PrefixLinks restore Job option can be useful here.
Any files present in pg_wal can be removed; these came from the backup and are therefore probably obsolete rather than current. Normally, this directory should be empty after a restore.
If there are unarchived WAL segment files that were saved in the step 2, they need to be copied back into pg_wal/ (it is best to copy, not move them, so that the unmodified ones are available if a problem occurs and the process needs to be done again).
The recovery command file recovery.conf.sample inside the Cluster data directory may need to be edited and renamed to postgresql.recovery.conf. It may be useful to temporarily modify pg_hba.conf to prevent ordinary users from connecting until the recovery has been verified.
Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode in case of a later crash) and then commence normal database operations.
```
# su postgres
$ cd /path/to/the/data/directory
$ mv recovery.conf.sample recovery.conf
$ vi recovery.conf
$ pg_ctl -D $PWD start
```
The contents of the databases restored should be verified to ensure it was recovered to the desired state. If not, return to step 1.
If all is well, users can be allowed to connect by restoring pg_hba.conf to its normal contents.

Warning

About tablespaces and symlinks:

When applying logs, PostgreSQL needs to create the tablespace directory to re-create the tablespace, and PostgreSQL doesn’t support the relocation. So, when replaying logs, it will overwrite or fail on this operation.