PostgreSQL Catalog Administration

Catalog Backup

The default backup script is doing an SQL dump. You can change this to other more advanced backup methods supported by PostgreSQL, see https://www.postgresql.org/docs/current/continuous-archiving.html for further suggestions. Make sure to test the full Catalog restore from time to time.

The pg_dump tool can run concurrently with backup jobs but we advise starting the Catalog Backup job when no other jobs are running. There are two reasons for this. Running jobs have a specific status set in the Catalog, and when Catalog is restored Director checks it and marks those jobs as failed. Secondly, jobs running while the Catalog is backed up will continue writing to the volumes, and the Catalog dump will not hold all the information backup volumes keep. Bscan is a tool one can use to recreate Catalog records from backup volumes.

Cleaning up Catalog Database

There are two tables in the Catalog database that occupy most of the space. The File table stores records of all the files that were backed up, and the Path table stores records of all the directory paths. The Path table is shared between all clients and jobs, while the File table will store all files backed up by each backup job.

When files are stored or read the Path table is loaded into memory, and if it grows too much it may affect the performance of the database server.

Usually, Bacula will prune the records from the Catalog when backup jobs are out of retention. Pruning jobs and related files can leave orphaned records. To avoid having too many orphans it is advised to use the dbcheck program periodically to clean the Catalog.

Vacuum and Analyze

When a record is deleted or updated PostgreSQL doesn’t remove the old row from the table but marks it as a dead tuple. Daily operations and orphan removal specifically leave dead tuples in a table. For the database to reuse the table rows occupied by dead tuples it is required to vacuum a table. Regular vacuum ensures the database files do not grow out of proportion keeping dead tuples on disk.

Analyze collects the statistics about column values in tables. These statistics are used to find the best query plan. Rows manipulation changes statistics so regular analyze ensures optimal query performance.

Usually, it would be sufficient to let autovacuum and autoanalyze to deal with vacuuming and statistics. But for large tables, the default settings may be inadequate. Automatic vacuum is controlled by two configuration options with the below defaults.

autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2

The autovacuum is triggered when:

pg_stat_user_tables.n_dead_tup >
(pg_class.reltuples x autovacuum_vacuum_scale_factor)
+ autovacuum_vacuum_threshold

Analyze is run after the autovacuum automatically and is similarly controlled with:

autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.1

Where autovacuum_analyze_scale_factor specifies a fraction of the table that is changed before autoanalyze is triggered.

As the number of files kept in the Catalog grows the gap between subsequent autovacuum/autoanalyze executions grows too which results in longer runtimes and possibly innability to run autovacuum/autoanalyze at all. Then we advise running it manually on daily basis and especially after pruning operations. You can schedule this task with cron or use the following Admin job during non-production hours (jobs can run concurrently).

Job {
  Name = "VACUUM"
  Type = Admin
  JobDefs = DefaultJob
  Runscript {
    Command = "vacuumdb -a -z -q"
    RunsOnClient = no
    RunsWhen = Before
  }
}

To avoid using trust in the pg_hba.conf, you may want to use the .pgpass file (with 0600 permission bits) in the bacula user’s home directory.

% ls -l /opt/bacula/.pgpass
-rw------- 1 bacula 201 32 Feb  3 10:19 /opt/bacula/.pgpass

% cat /opt/bacula/.pgpass
*:*:bacula:bacula:yourpassword

Note

For large Catalogs, we also advise not to rely on automatic pruning but to schedule maintenance jobs outside of the backup window to deal with records pruning. Please see Pruning for more information. If you run pruning and vacuum in a single Admin job, please take note Bacula cannot guarantee the order of execution for RunScript sections. If there is a need to execute tasks in a set order, please use a shell script and run the script from the Admin job.

A different approach would be to tweak settings for the File table to ensure vacuum and analyze is run automatically. Eliminating autovacuum_vacuum_scale_factor from the equation above allows triggering autovacuum as soon as autovacuum_vacuum_threshold is crossed. It is important to set the autovacuum_vacuum_threshold to a value that would enable vacuum runs automatically after pruning.

For example, if 1 million file records are pruned or updated daily the following settings seem appropriate.

ALTER TABLE file SET (autovacuum_vacuum_scale_factor = 0,
autovacuum_vacuum_threshold = 1000000,
autovacuum_analyze_scale_factor = 0,
autovacuum_analyze_threshold = 250000,
autovacuum_vacuum_cost_delay = 0);