PostgreSQL Catalog Administration
Catalog Backup
The default backup script is doing an SQL dump. You can change this to other more advanced backup methods supported by PostgreSQL, see https://www.postgresql.org/docs/current/continuous-archiving.html for further suggestions. Make sure to test the full Catalog restore from time to time.
The pg_dump
tool can run concurrently with backup jobs but we advise
starting the Catalog Backup job when no other jobs are running. There are
two reasons for this. Running jobs have a specific status set in the
Catalog, and when Catalog is restored Director checks it and marks those
jobs as failed. Secondly, jobs running while the Catalog is backed up will
continue writing to the volumes, and the Catalog dump will not hold all
the information backup volumes keep. Bscan is a tool
one can use to recreate Catalog records from backup volumes.
Cleaning up Catalog Database
There are two tables in the Catalog database that occupy most of the space. The File table stores records of all the files that were backed up, and the Path table stores records of all the directory paths. The Path table is shared between all clients and jobs, while the File table will store all files backed up by each backup job.
When files are stored or read the Path table is loaded into memory, and if it grows too much it may affect the performance of the database server.
Usually, Bacula will prune the records from the Catalog when backup jobs are out of retention. Pruning jobs and related files can leave orphaned records. To avoid having too many orphans it is advised to use the dbcheck program periodically to clean the Catalog.
Vacuum and Analyze
When a record is deleted or updated PostgreSQL doesn’t remove the old row from the table but marks it as a dead tuple. Daily operations and orphan removal specifically leave dead tuples in a table. For the database to reuse the table rows occupied by dead tuples it is required to vacuum a table. Regular vacuum ensures the database files do not grow out of proportion keeping dead tuples on disk.
Analyze collects the statistics about column values in tables. These statistics are used to find the best query plan. Rows manipulation changes statistics so regular analyze ensures optimal query performance.
Usually, it would be sufficient to let autovacuum and autoanalyze to deal with vacuuming and statistics. But for large tables, the default settings may be inadequate. Automatic vacuum is controlled by two configuration options with the below defaults.
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
The autovacuum is triggered when:
pg_stat_user_tables.n_dead_tup >
(pg_class.reltuples x autovacuum_vacuum_scale_factor)
+ autovacuum_vacuum_threshold
Analyze is run after the autovacuum automatically and is similarly controlled with:
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.1
Where autovacuum_analyze_scale_factor
specifies a fraction of the
table that is changed before autoanalyze is triggered.
As the number of files kept in the Catalog grows the gap between
subsequent autovacuum/autoanalyze executions grows too which results in
longer runtimes and possibly innability to run autovacuum/autoanalyze
at all. Then we advise running it manually on daily basis and especially
after pruning operations. You can schedule this task with cron
or
use the following Admin job during non-production hours (jobs can run
concurrently).
Job {
Name = "VACUUM"
Type = Admin
JobDefs = DefaultJob
Runscript {
Command = "vacuumdb -a -z -q"
RunsOnClient = no
RunsWhen = Before
}
}
To avoid using trust
in the pg_hba.conf
, you may want to use the
.pgpass
file (with 0600
permission bits) in the bacula
user’s home directory.
% ls -l /opt/bacula/.pgpass
-rw------- 1 bacula 201 32 Feb 3 10:19 /opt/bacula/.pgpass
% cat /opt/bacula/.pgpass
*:*:bacula:bacula:yourpassword
Note
For large Catalogs, we also advise not to rely on automatic pruning but to schedule maintenance jobs outside of the backup window to deal with records pruning. Please see Pruning for more information. If you run pruning and vacuum in a single Admin job, please take note Bacula cannot guarantee the order of execution for RunScript sections. If there is a need to execute tasks in a set order, please use a shell script and run the script from the Admin job.
A different approach would be to tweak settings for the File table to
ensure vacuum and analyze is run automatically. Eliminating
autovacuum_vacuum_scale_factor
from the equation above allows
triggering autovacuum as soon as autovacuum_vacuum_threshold
is
crossed. It is important to set the autovacuum_vacuum_threshold
to a
value that would enable vacuum runs automatically after pruning.
For example, if 1 million file records are pruned or updated daily the following settings seem appropriate.
ALTER TABLE file SET (autovacuum_vacuum_scale_factor = 0,
autovacuum_vacuum_threshold = 1000000,
autovacuum_analyze_scale_factor = 0,
autovacuum_analyze_threshold = 250000,
autovacuum_vacuum_cost_delay = 0);
Read also:
See also
Go back to:
Go to:
Go back to the main Bacula Enterprise Management page.