Note
You can download this article as a PDF
PostgreSQL plugin
Presentation
The PostgreSQL plugin is designed to simplify the backup and restore procedure of PostgreSQL clusters, so that the backup administrator does not need to know about internals of Postgres backup techniques or write complex scripts. The plugin will automatically take care to backup essential information such as configuration, users definition or tablespaces. The PostgreSQL plugin supports both Dump and Point In Time Recovery (PITR) backup techniques.
This plugin is available for 32 and 64-bit Linux platforms, and supports all officially supported PostgreSQL versions since version 8.4.
The PostgreSQL plugin is compatible with Copy/Migration jobs. Please read the blb:MigrationChapter for more information.
Choosing Between PITR and Dump
The following table helps choosing between backup techniques supported by the Bacula Enterprise PostgreSQL Plugin. Major functionalities such as being able to restore databases to any point in time, or being able to filter objects during backup or restore should be used to guide through the backup design. It is quite common to combine Dump and PITR techniques for the same Cluster. In the table, the Custom format corresponds to the Custom Dump format of pg_dump and the Dump format of our plugin corresponds to the plain format of pg_dump.
Please note that regardless the backup method used, no temporary local disk space is necessary to save any temporary file.
Custom |
Dump |
PITR |
|
---|---|---|---|
Can restore directly a single object (table, schema, …) |
Yes |
No |
No |
Backup speed |
Slow |
Slow |
Fast |
Restore speed |
Slow |
Very Slow |
Fast |
Backup size |
Small |
Small |
Big |
Can restore at any point in time |
No |
No |
Yes |
Incremental/Differential support |
No |
No |
Yes |
Can restore in parallel |
Yes |
No |
– |
Online backup |
Yes |
Yes |
Yes |
Consistent |
Yes |
Yes |
Yes |
Can restore to previous major version of PostgreSQL |
No |
Yes |
No |
Can restore to newer major version of PostgreSQL |
Yes |
Yes |
No |
1
Custom Dump format is the default.
2
Run the most time-consuming parts of pg_restore - those which load data,
create indexes, or create constraints - using multiple concurrent jobs. This
option can dramatically reduce the time to restore a large database to a server
running on a multiprocessor machine. It requires to store the Dump to the disk first.
3
To restore a SQL plain Dump to a previous version of PostgreSQL, you
might have to edit the SQL file if you are using some features that are not
available on the previous version. Generally, restoring to a previous version of
PostgreSQL is not supported or not guaranteed.
Configuration
As with all Bacula plugins, the Plugin Directory directive in the
FileDaemon resource of the bacula-fd.conf
file needs to be set:
FileDaemon {
Name = test-fd
...
Plugin Directory = /opt/bacula/plugins
}
Automatic Object Integration
Since Bacula version 16.0.7, a new solution has been introduced, so that each object can be backed up separately with different Jobs to maximize the throughput and the resiliency. It is highly recommended to use this new solution for that purpose - Automatic Object Integration (Scan Plugin). See an example for MySQL.
PITR Configuration
PostgreSQL Configuration for PITR
Basically, on 8.4, the archive_command and the archive_mode
settings need to be configured.
# on 8.3 - 8.4
archive_mode = on
archive_command = 'test ! -f /mnt/waldir/%f && cp %p /mnt/waldir/%f'
For 9.x, it is needed to configure archive_command, wal_level,
and archive_mode
.
# on 9.0 - 9.1
wal_level = archive
archive_mode = on
archive_command = 'test ! -f /mnt/waldir/%f && cp %p /mnt/waldir/%f'
The /mnt/waldir
directory should be purged from time to time when
backup jobs are successful. We do not recommend to remove WAL files just
after a backup job. If something is going wrong with the backup job and
files are no longer on the database server, it will not be possible to
easily re-start the backup job.
# This example will remove WAL files older than 14 days
find /mnt/waldir -type f -mtime +14 -exec rm -f {} \;
As shown below, the PostgreSQL plugin needs to know where WAL files are
stored after archiving. In this example, archive_dir would point to
/mnt/waldir
.
Note that it may be useful to compress WAL files in the archive_command using something like:
archive_command = 'test ! -f /mnt/waldir/%f.gz && gzip -c %p > /mnt/waldir/%f.gz'
In this case, the restore_command in recovery.conf
will need to
be modified during the restore, as the PostgreSQL Plugin will not be
able to reverse the custom archive_command automatically. It is good
practice to put the needed restore_command command as comment into
the postgresql.conf
.
PostgreSQL Plugin Configuration for PITR
With the PITR option, the PostgreSQL plugin uses Accurate mode information to handle Differential backups, thus the Accurate option in the backup Job resource needs to be enabled. If Accurate mode is not used in Differential level backups, orphan data files in the Cluster directory may occur when restoring. Note that using Accurate mode is not mandatory only Full and Incremental backups are planed.
Job {
Name = "Postgresql-PITR"
Client = laptop1-fd
FileSet = FS_postgresql
Accurate = yes
...
}
FileSet {
Name = FS_postgresql
Include {
Options {
Signature = MD5
Compression = GZIP
}
Plugin = "postgresql: mode=pitr"
}
}
In the PITR mode, the PostgreSQL plugin also accepts the parameters listed in table below:
Option |
Comment |
Default |
Example |
---|---|---|---|
mode=pitr |
Needed to enable PITR backup |
custom |
|
abort_on_error |
Abort the job after a connection error with PostgreSQL (available with Bacula Enterprise 8.2.0 and later). |
not set |
abort_on_error |
archive_dir |
Should point to where you are archiving WAL with the |
pg_wal |
|
bin_dir |
PostgreSQL binaries location |
bin_dir=/opt/pg9.1/bin |
|
fast_backup |
Use fast backup option in pg_backup_start() procedure. It will create a peak of IO if used. |
false |
fast_backup |
pgpass |
Path to PostgreSQL password file |
pgpass=/etc/pgpass |
|
user |
PostgreSQL Unix super user. A standard user cannot be used here. |
postgres |
user=dba |
service |
PostgreSQL connection information |
service=main |
|
timeout |
Specify a custom timeout (in secs) for commands sent to PostgreSQL. |
300 |
timeout=600 |
tmp_dir |
Where the plugin will create files and scripts for the database backup (available with Bacula Enterprise 6.6.6 and later). |
|
tmp_dir=/othertmp |
use_sudo |
Use sudo to execute Postgresql commands. |
use_sudo |
|
unix_user |
Use specific unix user to execute Postgresql commands. The unix user owning the cluster is required. |
postgres |
unix_user=bob |
If PostgreSQL clusters are planned to be restored using the file relocation feature of Bacula, it is useful to use the PrefixLinks directive in the prepared Restore Job.
For example, if symbolic links in the PGDATA cluster directory are used, like for pg_wal 1 or tablespaces, the default Restore Job will re-create those symbolic links pointing to their original locations and not the restored datas directory, which will break PostgreSQLs recovery process.
On the other hand, if restored files are planned to be moved to their original locations later, outside of Baculas restore process, all files linked with absolute names will be broken.
Backup Level in PITR
When using PITR mode, depending on the Job level, the PostgreSQL Plugin will do the following:
For a Full backup, the Plugin will backup the data directory and all WAL files generated during the backup.
During an Incremental backup, the Plugin will force the switch of the current WAL, and will backup WAL files generated since the previous backup.
During a Differential backup, the Plugin will backup data files that changed since the latest Full backup, and it will backup WAL files generated during the backup.
Note that it is not possible to run two concurrent Full or Differential jobs at the same time.
Note that replaying a long list of WAL files may take considerable time on a large system with lot of activities.
Schedule Consideration for PITR
In order to be able to restore to any point in time between the latest Incremental and a previous Full or Differential backup (see the (1) area in figure Backup Level Impact in PITR Mode), it is good practice to schedule an Incremental to the same time of Full or Differential backups. The Maximum Concurrent Jobs setting on the Client and the Job resource should allow to run two jobs concurrently.
Schedule {
Name = SCH_PostgreSQL
Run = Full 1st sun at 23:05
Run = Differential 2nd-5th sun at 23:05
Run = Incremental mon-sun at 23:05
}
Without this schedule configuration, it will not be possible to restore
to a specific point in time during the (1) period. Note that disabling
Allow Duplicate Jobs
prevents starting two Full backup Job at the
same time.
Traditional Dump Configuration
With the Dump option, Bacula cannot perform Incremental or Differential backup.
Job {
Name = "Postgresql-dump"
Client = laptop1-fd
FileSet = FS_postgresql_dump
...
}
FileSet {
Name = FS_postgresql_dump
Include {
Options {
Signature = MD5
Compression = GZIP
}
Plugin = postgresql
}
}
With the above example, the plugin will detect and backup all databases of the Cluster.
FileSet {
Name = FS_postgresql
Include {
Options {
Signature = MD5
Compression = GZIP
}
Plugin = "postgresql: database=bacula"
Plugin = "postgresql: database=master"
}
}
In this example, the plugin will backup only the databases bacula
and master
.
In Dump mode, the PostgreSQL plugin also accepts the parameters listed in table below:
Option |
Comment |
Default |
Example |
---|---|---|---|
dump_opt |
This string will be passed to the pg_dump command |
-c -b -F p |
dump_opt=”-c” |
user |
PostgreSQL user to use for PostgreSQL commands |
postgres |
user=rob |
unix_user |
Unix user to use for PostgreSQL commands |
set to |
unix_user=pg1 |
service |
pg_service to use for PostgreSQL commands |
service=main |
|
pgpass |
Path to PostgreSQL password file |
pgpass=/etc/pgpass |
|
use_sudo |
Use sudo instead to run PostgreSQL commands (when not root) |
use_sudo |
|
compress |
Use pg_dump compression level. 0-9, 0 is off |
0 |
compress=5 |
database |
Will backup on databases matching this string |
database=prod* |
|
exclude |
Will exclude databases matching this string |
exclude=test* |
|
bin_dir |
PostgreSQL binaries location |
bin_dir=/opt/pg9.1/bin |
|
tmp_dir |
Where the plugin will create files and scripts for the database backup. |
/tmp |
tmp_dir=/othertmp |
abort_on_error |
Abort the job after a connection error with PostgreSQL. |
not set |
abort_on_error |
timeout |
Specify a custom timeout (in secs) for commands sent to PostgreSQL. |
60 seconds |
timeout=120 |
1
The dump_opt option cannot be used to backup remote servers. Please use PGSERVICE instead
2
Available with Bacula Enterprise 8.4.12 and later.
3
Available with Bacula Enterprise 6.6.6 and later.
4
Available with Bacula Enterprise 8.2.0 and later.
5
Available with Bacula Enterprise 14.0.5 and later.
Note
The database
and exclude
options support regex.
FileSet {
Name = FS_postgresql_dump
Include {
Options {
Signature = MD5
}
Plugin = "postgresql: use_sudo user=rob dump_opt=\"-T temp\""
}
}
In this example, the PostgreSQL plugin will use the Unix account “rob” to perform a Custom Dump backup with the PostgreSQL “rob” account excluding tables named “temp”.
In order to use the sudo
wrapper, it is needed to comment out the
following option in /etc/sudoers
.
Defaults requiretty
Service Connection Information
The connection service file allows PostgreSQL connection parameters to be associated with a single service name. That service name can then be specified by a PostgreSQL connection, and the associated settings will be used.
The connection service file can be a per-user service file at
`` /.pg_service.conf``, can have its location specified by the
environment variable PGSERVICEFILE, or it can be a system-wide file
at /etc/pg_service.conf
or in the directory specified by the
environment variable PGSYSCONFDIR. If service definitions with the
same name exist in the user and the system files, the user file takes
precedence.
The file uses an INI file format where the section name represents the service name and the parameters are connection parameters; see http://www.postgresql.org/docs/9.1/static/libpq-connect.html for a list. For example:
# comment
[main]
port=5433
If a password is needed to connect, the pgpass
option in the plugin
command string can be used to make the plugin define the needed
PGPASSFILE environment variable.
Testing Database Access
The estimate
command can be used to verify that the PostgreSQL
plugin is well configured.
* estimate listing job=pg-test
...
If estimate
or the job output displays the following error:
Error: Can't reach PostgreSQL server to get database config.
the next steps should be to verify that the Bacula Enterprise PostgreSQL
plugin can retrieve information using the psql
command as “postgres”
user on the Client.
To verify if the “postgres” user can connect to the PostgreSQL Cluster,
the psql -l
command can be used, and it should list all databases in
the Cluster:
postgres% psql -l
List of databases
Name | Owner |
-----------+----------+
postgres | xxx
template0 | xxx
template1 | xxx
If options such as -h localhost are needed on the psql
command
line, a service file as described in servicecon will be required.
Installation
The PostgreSQL plugin is available as a Bacula Enterprise package for all supported platforms.
The plugin must be installed on the primary node in the PostgreSQL
Cluster. The PostgreSQL client software package, usually
“postgresql-client”, should also be installed, as it provides tools such
as pg_dump
and psql
which are used by the plugin, or may be
useful for diagnostić purposes.
In order to get information and / or data from the PostgreSQL database,
the Bacula Enterprise Plugin uses PostgreSQL standard commands using the
unix “postgres” user. This user account needs to be able to connect to
the Cluster without interactive password dialog. Such a configuration
(which is the default one) can be achieved using the following entry in
pg_hba.conf
. The following entry should be the first one in the
list:
local all postgres ident
If remote databases need to be accessed, or disabling the interactive
password authentication method is not an option, it is possible to
define a pgpass
file.
Estimate Information
The estimate
command will display all information found by the
PostgreSQL plugin. Note that in Dump mode, Bacula can not compute the
Dump size for databases, so it will display database size instead.
Backup Information in Dump Mode
The PostgreSQL plugin will generate the following files for a Cluster containing the single database “test”:
@PG/main/roles.sql
@PG/main/postgresql.conf
@PG/main/pg_hba.conf
@PG/main/pg_ident.conf
@PG/main/tablespaces.sql
@PG/main/test/createdb.sql
@PG/main/test/schema.sql
@PG/main/test/data.sqlc
File |
Context |
Comment |
---|---|---|
roles.sql |
global |
List of all users, their password and specific options |
postgresql.conf |
global |
PostgreSQL cluster configuration |
pg_hba.conf |
global |
Client connection configuration |
pg_ident.conf |
global |
Client connection configuration |
tablespaces.sql |
global |
Tablespaces configuration for the PostgreSQL cluster |
createdb.sql |
database |
Database creation script |
schema.sql |
database |
Schema database creation script |
data.sqlc |
database |
Database data in custom format, contains everything needed to restore |
data.sql |
database |
Database data in dump format |
Restore Scenarios
Restoring Using Dumps
Restoring Users and Roles
To restore roles and users to a PostgreSQL Cluster, the roles.sql
file located in /@PG/<service>/roles.sql
needs to be selected.
Then, using where=/ or where= the plugin will load this SQL file
to the database. If some roles already exist, errors will be printed to
the Job log. Note that it is possible to restore the roles.sql
file
to a local directory, edit it, and load it using psql
to restore
only a selection of its original contents.
Restoring the Database Structure
To restore only the database structure using the Bacula Enterprise
Postgresql plugin, the file createdb.sql
located in the database
directory needs to be selected during the restore process. To recreate
the SQL database schema, the schema.sql
file is used which contains
all commands needed to recreate the database schema. The schema.sql
file must be restored to disk and loaded manually into the database
using the psql
command.
Restoring a Single Database
To restore a single database with the Bacula Enterprise Postgresql plugin, the appropriate files from the database directory are selected during the restore process.
To restore the database with its original name, the selection should
only contain the data file (data.sqlc
or data.sql
). If the
createdb.sql
file is also selected, harmless messages might be
printed during the restore.
To restore a single database to a new name, the two files
createdb.sql
and data.sqlc
(or data.sql
) must be selected.
The where parameter is used to specify the new database name. If
where is set to a single word consisting of only a..z
, 0-9
and _
, Bacula will create the specified database and restore the
data into it.
* restore where=baculaold
...
cwd is: /
$ cd /@PG/main/bacula
cwd is: /@PG/main/bacula/
$ m data.sqlc
$ m createdb.sql
$ ls
schema.sql
*data.sqlc
*createdb.sql
If the restore process has an error such as
ERROR: database "xxx" already exists
, the createdb.sql
can be
skipped in the restore selection.
If the replace parameter is set to never, Bacula will check the database list, and will abort the Job if the database currently restored already exists.
Using replace=always is not recommended.
If the where parameter is a directory (containing /
), Bacula
will restore all files into this directory. Doing so, it is possible to
use pg_restore
directly and restore only particular contents, such
as triggers, tables, indexes, etc…
Note that some databases such as template1
, postgresql
or
databases with active users can not be replaced.
Restoring Dump Files to a Directory
To restore SQL dumps to a directory, the where parameter needs to be set to indicate an existing directory.
* restore where=/tmp
Restoring a Single Table
To restore a single item such as a table, it is currently needed to
restore the dump file to a directory and use the pg_restore
command.
Restoring the Complete Cluster Using PITR
Useful information for this disaster recovery scenario can be found in the PostgreSQL manual, for example at http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
The overall process is as follows:
Stop the server, if it’s running.
If the space to do so is available, the whole Cluster data directory and any tablespaces should be copied to a temporary location in case they are needed later. Note that this precaution will require having enough free space to hold two copies of the existing databases. If enough space is not available, at least the contents of the
pg_wal
subdirectory of the Cluster data directory should be copied, as it may contain logs which were not archived before the system went down.Clean out all existing files and subdirectories below the Cluster data directory and the root directories of any tablespaces being used.
Restore the database files from the backups. If tablespaces are used, it is strongly recommended to verify that the symbolic links in
pg_tblspc/
were correctly restored. The PrefixLinks restore Job option can be useful here.Any files present in
pg_wal
can be removed; these came from the backup and are therefore probably obsolete rather than current. Normally, this directory should be empty after a restore.If there are unarchived WAL segment files that were saved in step 2, they need to be copied back into
pg_wal/
. (It is best to copy, not move, them, so that the unmodified ones are available if a problem occurs and the process needs to be done again.)The recovery command file
recovery.conf.sample
inside the Cluster data directory may need to be edited and renamed torecovery.conf
. It may be useful to temporarily modifypg_hba.conf
to prevent ordinary users from connecting until the recovery has been verified.Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename
recovery.conf
torecovery.done
(to prevent accidentally re-entering recovery mode in case of a later crash) and then commence normal database operations.# su postgres $ cd /path/to/the/data/directory $ mv recovery.conf.sample recovery.conf $ vi recovery.conf $ pg_ctl -D $PWD start
The contents of the databases restored should be verified to ensure it was recovered to the desired state. If not, return to step 1.
If all is well, users can be allowed to connect by restoring
pg_hba.conf
to its normal contents.
Warning
About tablespaces and symlinks:
When applying logs, PostgreSQL needs to create the tablespace directory to re-create the tablespace, and PostgreSQL doesn’t support the relocation. So, when replaying logs, it will overwrite or fail on this operation.
Deduplication Tips
Postgres developers didn’t implement the backup routines with data
deduplication in mind, therefore the results may not be ideal. However
the cluster
command may be used at times to enhance the
deduplication ratio as it physically reorders the data according to the
index information. However, please notice that this command requires
exclusive lock while it is running, and also may quite heavy on CPU and
I/O resources.
Limitations
In PITR mode, each Job should contain a single Cluster. To backup multiple clusters installed on the same client, multiple jobs are needed.
In Dump mode, PostgreSQL version 8.3 and earlier may not support automatic restore through the Bacula Enterprise PostgreSQL plugin. The
pg_restore
program can produce the following error:WARNING: invalid creation date in header
. In this case, it is needed to restore data to a directory, and runpg_restore
in a terminal.The clean option requires PostgreSQL 9.1 or later.
The backup will include all files in the data directory and tablespaces, including the configuration files and any additional files placed in the directory by third parties, except certain temporary files managed by PostgreSQL. But only regular files and directories are copied, except that symbolic links used for tablespaces are preserved. Symbolic links pointing to certain directories known to PostgreSQL are copied as empty directories. Other symbolic links and special device files are skipped. See: https://www.postgresql.org/docs/current/protocol-replication.html for the precise details.
Copyright Notice
PostgreSQL is Copyright © 1996-2011 by the PostgreSQL Global Development Group and is distributed under the terms of the license of the University of California below.
Postgres95 is Copyright © 1994-5 by the Regents of the University of California.
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS Bacula EnterpriseN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN “AS-IS” BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
- 1
In old versions of postgresql (<= 9.x), pg_wal was pg_xlog - it was renamed in version 10.