High Availability: Catalog Setup Example

CommunityEnterprise

This section provides an example on how to assemble and configure the components described in the previous section to build a distributed and clustered PostgreSQL architecture with a master, several read-replicas and using a WAL-based logical replication of the data.

Note

This is not intended to be an in-depth guide to PostgreSQL high availability. Instead, it aims to help the reader to consolidate the main concepts and to understand how the different layers interact.

To simplify the scenario, this example uses only three nodes:

One node with PostgreSQL, Patroni, Etcd, HAProxy and Keepalived.
A second node with the same stack; together with the previous one, these form the Catalog Cluster.
A third node containing the full Bacula stack; the Bacula Director connects remotely to the Catalog Cluster.

Note that it is possible to deploy all the components on separate hosts, and some components can also run in containers. The only strict recommendation is to run exactly one Patroni instance per PostgreSQL instance. Components such as Keepalived and HAProxy may be deployed separately, but they are commonly installed together. The etcd cluster may run on the same or on dedicated hosts, and Keepalived and HAProxy could also be deployed in external components.

For simplicity, the connections between services are not using SSL and the corresponding certificates, but all services can be configured to use them and this is highly recommended in production.

The following example configurations are based on Ubuntu Server 24.04. However, all the implied tools are available in mostly every other Linux distribution. The different programs can also be built and installed independently from their source code.

Architecture Overview

Nodes (physical or VMs on Ubuntu 24.04):

catalog01 192.168.1.151 — Patroni/PG/etcd member/haproxy/keepalived/bacula-fd
catalog02 192.168.1.152 — Patroni/PG/etcd member/haproxy/keepalived/bacula-fd
virtualip (catalog01 and catalog02): 192.168.1.160
dirtoha 192.168.1.180 — bacula-dir/bacula-SD/bacula-fd/bweb

Components:

An etcd cluster across the two Catalog nodes
Patroni service on each db node, manages local postgres instance
PostgreSQL 16 on each node
HAProxy and Keepalived for VIP and routing (Bacula and BWeb connect to VIP)
A Bacula FD is added to each Catalog node for catalog backup purposes
A Bacula Director with BWeb connects remotely to the Catalog in HA mode

Replication behavior:

Patroni handles streaming replication (asynchronous by default; synchronous replication can be enabled by requiring at least one synchronous replica)
Replication slots created automatically to prevent WAL removal before replica receives them

High-level Steps

Prepare and install PostgreSQL and Bacula.
Set up etcd on each Catalog node.
Set up Patroni on each Catalog node.
Set up HAProxy + Keepalived in Catalog nodes and configure VIP for clients (Bacula).
Connect the remote Bacula node with the Catalog VIP.
Perform tests and checks.

Setup

1. Preparation and installation of PostgreSQL and Bacula.

Update your system on all three nodes:

apt -y update
apt upgrade

Then install the Bacula Director on all three nodes. To install Bacula Director with BIM, follow the instructions presented here.

Some details to consider:

On both Catalog nodes, we use BIM to simplify the Catalog installation, but you can perform a manual installation if you prefer.
On both Catalog nodes, you also need to install the File Daemon. Click here here for details on File Daemon installation process.
On the Bacula Director node (dirtoha in this setup), we install BWeb as well. We can complement the installation with the locally deployed Catalog, follow the guide here. We will change the Catalog later.

Next, on the Catalog nodes, open the ports required by the different services (this example uses UFW firewall here):

# Enable ports
ufw allow 2379,2380,5432,5433,8008/tcp

# Enable multicast traffic in the interface and network where Keepalived service will work
ufw allow in on enp0s3 from 192.168.1.0/24 to 224.0.0.18 comment 'keepalived multicast'

# Enable firewall
ufw --force enable

Port association:

ETCD uses 2379 to offer the service
ETCD uses 2380 for internal communication among etcd nodes
PostgreSQL is served in 5432
HAProxy proxies the PostgreSQL service in port 5433 (in production, HAProxy typically runs on separate hosts and may use the same PostgreSQL port on the frontend)
Patroni REST API is served in port 8008
Keepalived uses multicast traffic to communicate its nodes

2. Setup ETCD.

On each Catalog node, install etcd:

# Adjust the version with the most recent one or the one you prefer
ETCD_VER="3.5.11"
cd /tmp
wget -q "https://github.com/etcd-io/etcd/releases/download/v${ETCD_VER}/etcd-v${ETCD_VER}-linux-amd64.tar.gz"
tar xzf etcd-v${ETCD_VER}-linux-amd64.tar.gz
mv etcd-v${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/
mkdir -p /var/lib/etcd

On node catalog01, create the systemd service unit with the following content in /etc/systemd/system/etcd.service:

[Unit]
Description=etcd key-value store
After=network.target

[Service]
User=root
Type=notify
Environment="ETCD_NAME=db1"
Environment="ETCD_DATA_DIR=/var/lib/etcd"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.1.151:2380"
Environment="ETCD_LISTEN_PEER_URLS=http://192.168.1.151:2380"
Environment="ETCD_ADVERTISE_PEER_URLS=http://192.168.1.151:2380"
Environment="ETCD_ADVERTISE_CLIENT_URLS=http://192.168.1.151:2379"
Environment="ETCD_LISTEN_CLIENT_URLS=http://192.168.1.151:2379,http://127.0.0.1:2379"
Environment="ETCD_INITIAL_CLUSTER=db1=http://192.168.1.151:2380,db2=http://192.168.1.152:2380"
Environment="ETCD_INITIAL_CLUSTER_STATE=new"
Environment="ETCD_INITIAL_CLUSTER_TOKEN=pgcluster"

ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

On catalog02, create the corresponding one, adjusting the name and the IP addresses:

[Unit]
Description=etcd key-value store
After=network.target

[Service]
User=root
Type=notify
Environment="ETCD_NAME=db2"
Environment="ETCD_DATA_DIR=/var/lib/etcd"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.1.152:2380"
Environment="ETCD_LISTEN_PEER_URLS=http://192.168.1.152:2380"
Environment="ETCD_ADVERTISE_PEER_URLS=http://192.168.1.152:2380"
Environment="ETCD_ADVERTISE_CLIENT_URLS=http://192.168.1.152:2379"
Environment="ETCD_LISTEN_CLIENT_URLS=http://192.168.1.152:2379,http://127.0.0.1:2379"
Environment="ETCD_INITIAL_CLUSTER=db1=http://192.168.1.151:2380,db2=http://192.168.1.152:2380"
Environment="ETCD_INITIAL_CLUSTER_STATE=new"
Environment="ETCD_INITIAL_CLUSTER_TOKEN=pgcluster"


ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Because the cluster is being created, we set up cluster state to new. Once it has been created, it can be changed to existing.

Once installed, enable and start the service.

systemctl daemon-reload
systemctl enable etcd
systemctl start etcd

You can verify the ETCD cluster health with:

root@catalog01:~# etcdctl member list
1aff923be86916cb, started, db1, http://192.168.1.151:2380, http://192.168.1.151:2380, false
6250d387d4559954, started, db2, http://192.168.1.152:2380, http://192.168.1.152:2380, false

root@catalog01:~# etcdctl endpoint health
127.0.0.1:2379 is healthy: successfully committed proposal: took = 3.427043ms
root@catalog01:~# etcdctl --endpoints=192.168.1.152:2379 put test3 test3
OK

root@catalog01:~# etcdctl --endpoints=192.168.1.151:2379 get test3
test3
test3

Once everything is fine, adjust the cluster status to existing by modifying the cluster status value, again in /etc/systemd/system/etcd.service:

...
Environment="ETCD_INITIAL_CLUSTER_STATE=existing"
...

Then, restart the service:

systemctl restart etcd

3. Set up Patroni.

Install the tool and its dependencies on both Catalog nodes, and create the required directories:

sudo apt install python patroni python3-psycopg
mkdir -p /etc/patroni /var/log/patroni
useradd -r -M -d /var/lib/postgresql patroni

Patroni is able to store and distribute PostgreSQL configuration and can also set it up from scratch. In this example, however, we start from an existing PostgreSQL database installation created by the Bacula Installation Manager. As a result, the process is slightly different.

First, create the users that Patroni will use to connect to the PostgreSQL instances and manage status and replication. Connect to each PostgreSQL instance on each Catalog node (using psql command) and run:

-- Patroni superuser
-- Replace pat_admin and PATRONI_SUPERUSER_PASSWORD accordingly
CREATE USER pat_admin WITH SUPERUSER ENCRYPTED PASSWORD 'Admin@Bacula25';

-- Patroni replication user
-- Replace PATRONI_REPLICATION_USERNAME and PATRONI_REPLICATION_PASSWORD accordingly
CREATE USER pat_replica WITH REPLICATION ENCRYPTED PASSWORD 'Admin@Bacula25';

-- Patroni rewind user, if you intend to enable use_pg_rewind in your Patroni configuration
-- Replace pat_rewind and PATRONI_REWIND_PASSWORD accordingly
CREATE USER pat_rewind WITH ENCRYPTED PASSWORD 'Admin@Bacula25';
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean) TO pat_rewind;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO pat_rewind;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO pat_rewind;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean) TO pat_rewind;

Next, configure Patroni on each node. Create the configuration file /etc/patroni/patroni.yml.

For catalog01:

scope: bacula_cluster
namespace: /db/
name: catalog01

restapi:
    listen: 0.0.0.0:8008
    connect_address: 192.168.1.151:8008

etcd3:
    hosts:
       - 192.168.1.151:2379
       - 192.168.1.152:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true
            parameters:
                wal_level: replica
                hot_standby: "on"
                max_wal_senders: 10
                max_replication_slots: 10
                wal_keep_size: 256MB

postgresql:
    listen: 0.0.0.0:5432
    connect_address: 192.168.1.151:5432
    data_dir: /var/lib/postgresql/16/main
    bin_dir: /usr/lib/postgresql/16/bin/
    config_dir: /etc/postgresql/16/main/
    authentication:
        replication:
            username: pat_replica
            password: 'Admin@Bacula25'
        superuser:
            username: pat_admin
            password: 'Admin@Bacula25'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

log:
    type: plain
    dir: /var/log/patroni
    level: DEBUG

For catalog02:

scope: bacula_cluster
namespace: /db/
name: catalog02

restapi:
    listen: 0.0.0.0:8008
    connect_address: 192.168.1.152:8008

etcd3:
    hosts:
        - 192.168.1.151:2379
        - 192.168.1.152:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true
            parameters:
                wal_level: replica
                hot_standby: "on"
                max_wal_senders: 10
                max_replication_slots: 10
                wal_keep_size: 256MB

postgresql:
    listen: 0.0.0.0:5432
    connect_address: 192.168.1.152:5432
    data_dir: /var/lib/postgresql/16/main
    bin_dir: /usr/lib/postgresql/16/bin/
    config_dir: /etc/postgresql/16/main/
    authentication:
        replication:
            username: pat_replica
            password: 'Admin@Bacula25'
        superuser:
            username: pat_admin
            password: 'Admin@Bacula25'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

log:
    type: plain
    dir: /var/log/patroni
    level: DEBUG

Explaining every parameter is outside the scope of this setup example. The configuration files above primarily:

connect Patroni with the underlying PostgreSQL instance (address, port, data directory)
connect Patroni with the etcd cluster we configured beforehand
define the users Patroni uses to connect for management and replication
define the behavior of the cluster in terms of synchronization, replication, etc.
establish where the REST API is served so other Patroni nodes and other services can consume it
define the logging behavior

For a complete list of options, refer to the official Patroni documentation.

Once the YAML configuration files are in place, create the service units, so Patroni can be handled through Systemd in /etc/systemd/system/patroni.service:

[Unit]
Description=High availability PostgreSQL Cluster
After=syslog.target network.target

[Service]
Type=simple
User=postgres
Group=postgres
Environment=PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lib/postgresql/16/bin/
ExecStart=/usr/bin/patroni /etc/patroni/patroni.yml
KillMode=process
TimeoutSec=30
Restart=no

[Install]
WantedBy=multi-user.target

Now, disable PostgreSQL as it will be handled by Patroni. We are not interested either in these nodes in bacula dir or sd, so we will disable them as well. Then, we will enable and start Patroni on both nodes:

systemctl daemon-reload

# Disable postgresql
systemctl disable postgresql

# Disable and stop bacula-dir and bacula-sd
systemctl disable bacula-dir bacula-sd
systemctl stop bacula-dir bacula-sd

systemctl enable patroni
systemctl start patroni

At this point, the cluster is running. However, we started from two nodes both with data, as we installed PostgreSQL through BIM. As a result, we need now to choose one of the nodes as the first primary or master node, and discard the data from the other.

To do this, connect to catalog02 and run the following Patroni command:

patronictl -c /etc/patroni/patroni.yml reinit bacula_cluster

You can then check the Patroni cluster health:

root@catalog02:~# patronictl -c /etc/patroni/patroni.yml list
+ Cluster: bacula_cluster (7509821870589132491) ---+----+-----------+-----------------+
| Member    | Host           | Role    | State     | TL | Lag in MB | Pending restart |
+-----------+----------------+---------+-----------+----+-----------+-----------------+
| catalog01 | 127.0.0.1:5432 | Leader  | running   |  1 |           | *               |
| catalog02 | 127.0.0.1:5432 | Replica | streaming |  1 |         0 |                 |
+-----------+----------------+---------+-----------+----+-----------+-----------------+

root@catalog02:~# patronictl -c /etc/patroni/patroni.yml topology
+ Cluster: bacula_cluster (7509821870589132491) -----+----+-----------+-----------------+
| Member      | Host           | Role    | State     | TL | Lag in MB | Pending restart |
+-------------+----------------+---------+-----------+----+-----------+-----------------+
| catalog01   | 127.0.0.1:5432 | Leader  | running   |  1 |           | *               |
| + catalog02 | 127.0.0.1:5432 | Replica | streaming |  1 |         0 |                 |
+-------------+----------------+---------+-----------+----+-----------+-----------------+

You can also review the logs in /var/log/patroni/patroni.log:

# In the master we see
2025-11-12 15:49:11,422 INFO: Lock owner: catalog01; I am catalog01
...

# In the replica we see
2025-11-12 15:52:02,015 INFO: Lock owner: catalog01; I am catalog02
2025-11-12 15:52:02,015 DEBUG: does not have lock
...

Replication status can also be tracked directly in PostgreSQL on the master node:

postgres=# SELECT * FROM pg_stat_replication;
pid  | usesysid |   usename   | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time
------+----------+-------------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
1592 |    25202 | pat_replica | catalog02        | 192.168.1.152 |                 |       50708 | 2025-11-12 11:41:22.512199+00 |              | streaming | 0/60E0A38 | 0/60E0A38 | 0/60E0A38 | 0/60E0A38  |           |           |            |             0 | async      | 2025-11-12 15:50:37.078039+00

4. Setup HAProxy + Keepalived

First, install both software packages in both Xatalog nodes:

apt-get install -y haproxy keepalived

Now, configure each HAProxy on each node by adding the following configuration to the end of the /etc/haproxy/haproxy.cfg file:

frontend postgres_frontend
    bind *:5433
    mode tcp
    default_backend postgres_backend

backend postgres_backend
    mode tcp
    option tcp-check
    option httpchk OPTIONS /primary     # patroni endpoint, we use OPTIONS, as GET generates tracebacks when used with Patroni
    http-check expect status 200        # only master returns 200
    timeout connect 5s
    timeout server 30s
    server catalog01 192.168.1.151:5432 port 8008 check verify none
    server catalog02 192.168.1.152:5432 port 8008 check verify none

This configuration means:

The frontend of the proxy (where the clients such as Bacula can connect) will be served on port 5433
For the backend, we check the Patroni endpoint /primary to know which one is the the master (so to which one the traffic should be redirected)
We indicate our two Catalog nodes so the check can be done on both

As a result, any request to HAProxy on port 5433 is automatically redirected to the PostgreSQL node that is acting as primary in the moment. HAProxy supports additional load-balancing modes, but that is out of scope of this guide and is not required for most Bacula deployments.

Now, set up Keepalived. On catalog01, insert the following contents in /etc/keepalived/keepalived.conf:

global_defs {
    enable_script_security
    script_user keepalived_script
}

vrrp_script check_haproxy {
    script "/etc/keepalived/check_haproxy.sh"
    interval 2
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface enp0s3
    virtual_router_id 51
    priority 100 # The one with hightest up sooner. Max is 255
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass BacChaME # Needs to be 8 alphanumeric characters(!)
    }
    virtual_ipaddress {
        192.168.1.160
    }
    track_script {
        check_haproxy
    }
}

With this configuration, Keepalived is started only if HAProxy is up, which is checked with the check_haproxy.sh script. On the other hand, we indicate the Virtual IP configuration with the desired IP address and we set priorities and configuration to indicate Keepalived how to behave if more than one node is alive.

In this setup:

catalog01 is the MASTER of the whole cluster when available (state MASTER)
catalog01 has priority 100 in the cluster, which is higher than any node with a lower priority. Here, this is redundant with the state, but we need to remember that we can have any number of nodes.

The script to check HAProxy will contain:

#!/bin/bash

# Port of the fronted to check (HAProxy)
PORT=5433

# Check service
if ! pidof haproxy > /dev/null; then
    echo "HAProxy is not running!"
    exit 1
fi

# Check if HAProxy is listening where we expect
if ! ss -ltn | grep -q ":${PORT}"; then
    echo "HAProxy is not listening on port ${PORT}"
    exit 2
fi

# All good
exit 0

Now, configure Keepalived on node catalog02 as well, adjusting the state parameter and priority:

global_defs {
    enable_script_security
    script_user keepalived_script
}

vrrp_script check_haproxy {
    script "/etc/keepalived/check_haproxy.sh"
    interval 2
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface enp0s3
    virtual_router_id 51
    priority 50     # The one with hightest up sooner. Max is 255
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass BacChaME  # Needs to be 8 alphanumeric characters(!)
    }
    virtual_ipaddress {
        192.168.1.160
    }
    track_script {
        check_haproxy
    }
}

Enable and start both services on both nodes:

systemctl enable haproxy keepalived
systemctl start haproxy keepalived

You can check Keepalived status and confirm the roles on each node:

# catalog01 is master
root@catalog01:~# journalctl -u keepalived
nov 12 11:41:15 catalog01 systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).
nov 12 11:41:17 catalog01 Keepalived_vrrp[946]: Script `check_haproxy` now returning 0
nov 12 11:41:19 catalog01 Keepalived_vrrp[946]: VRRP_Script(check_haproxy) succeeded
nov 12 11:41:19 catalog01 Keepalived_vrrp[946]: (VI_1) Entering BACKUP STATE
nov 12 11:41:20 catalog01 Keepalived_vrrp[946]: (VI_1) received lower priority (50) advert from 192.168.1.152 - discarding
nov 12 11:41:21 catalog01 Keepalived_vrrp[946]: (VI_1) received lower priority (50) advert from 192.168.1.152 - discarding
nov 12 11:41:24 catalog01 Keepalived_vrrp[946]: (VI_1) received lower priority (50) advert from 192.168.1.152 - discarding
nov 12 11:41:25 catalog01 Keepalived_vrrp[946]: (VI_1) Entering MASTER STATE

# catalog02 is backup state
root@catalog01:~# journalctl -u keepalived
nov 12 11:41:12 catalog02 systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).
nov 12 11:41:14 catalog02 Keepalived_vrrp[990]: Script `check_haproxy` now returning 0
nov 12 11:41:16 catalog02 Keepalived_vrrp[990]: VRRP_Script(check_haproxy) succeeded
nov 12 11:41:16 catalog02 Keepalived_vrrp[990]: (VI_1) Entering BACKUP STATE
nov 12 11:41:22 catalog02 Keepalived_vrrp[990]: (VI_1) Entering MASTER STATE
nov 12 11:41:25 catalog02 Keepalived_vrrp[990]: (VI_1) Master received advert from 192.168.1.151 with higher priority 100, our>
nov 12 11:41:25 catalog02 Keepalived_vrrp[990]: (VI_1) Entering BACKUP STATE

5. Connect Bacula to the remote Catalog.

Now, switch the node to run Bacula and BWeb and modify the configuration.

First, we need to modify the Catalog configuration so it points to our remote Catalog:

root@dirtoha:~# cat /opt/bacula/etc/conf.d/Director/jubu-dir/Catalog/BaculaCatalog.cfg
Catalog {
    Name = "BaculaCatalog"
    DbName = "bacula"
    Password = "xxxxxxx"
    User = "bacula"
    DB Address = 192.168.1.160
    DB Port = 5433
}

Then, access BWeb and adjust the same configuration through the Configuration –> BWeb configuration page:

BWeb connection to the remote catalog — BWeb connection to the remote Catalog

Set the connection type to TCP/IP and then enter the same parameters used in the Bacula Director.

Finally, add a new client named catalog-fd and set the Catalog VIP address:

Adjust the bacula-fd.conf files on each Catalog node so that the password the one configured for this client.

Finally, change the predefined BaculaDirectorCatalog so it uses the catalog-fd client. This allows you to back up the remote Catalog. In production, it would be necessary also to adjust the Storage destination so it is outside the Bacula host in a secured external location.

At this point, this Bacula instance uses a Catalog that is in high availability with all the benefits described at the beginning of this section.