High Availability: Configuration Sync Setup Example

CommunityEnterprise

This section provides an example on how to setup a replicated device to synchronize the Bacula Configuration between two remote nodes. The technique used is DRBD. Please, note there are other alternatives as well, as discussed in the previous section, and that DRBD can also be used to replicate other elements, even the backup data between Storage Daemons.

This clustering method may encompass both the configuration and the Catalog, even though the Catalog can utilize alternative replication techniques as detailed in the HA: Catalog article. To simplify the setup, this example covers only the Bacula configuration.

Note

This is not an in-depth manual about high availability or replication with DRBD. Instead, it is intended to help readers consolidate the main concepts and build a solid foundation for choosing the right architecture for each environment when designing Bacula clustering solutions.

The following example configurations are based on Ubuntu Server 24.04. However, all the implied tools are available on most Linux distributions.

Architecture Overview

Nodes (physical or VMs on Ubuntu 24.04):

dir01 192.168.1.180 — DRBD/bacula-dir
dir02 192.168.1.181 — DRBD/bacula-dir

The Catalog is remotely connected and implements its own HA layer.

The configuration will be replicated by using DRBD.

From a port perspective on the HA Services, the requirement is:

DRBD uses the port TCP: 7788

This example handles one device resource (one disk) to be synchronized. If more resources are used, additional ports may be required.

High-level Steps

Prepare and install Bacula.
Set up DRBD.
Set up the cluster.
Move the data.
Integrate with Pacemaker.

Setup

1. Preparation and installation of Bacula.

Update your system on both nodes:

apt -y update
apt upgrade

Then install Bacula Director on both nodes. To install Bacula Director with BIM, follow the instructions detailed here.

2. Set up DRBD.

Install the software on both nodes using the package manager:

apt -y install drbd-utils

Select a device to replicate. It needs to be empty before setting it up with DRBD. In this example, we use one disk mapped in sdb.

ls -l /dev/sdb
brw-rw---- 1 root disk 8, 16 nov 21 07:34 /dev/sdb

At this point, you could create an LVM layer to make our devices more flexible. If you do, use the logical volumes created in the next step instead of referencing the raw disk directly (as this example does). Using the LVM layer may be a good choice in production environments, but we skip it here for simplicity.

Now create the configuration file of the DRBD resource (r0). This resource will be created on top of sdb and replicated between the two nodes.

# vi /etc/drbd.d/r0.res
resource r0 {
   net {
      # A : write completion is determined when data is written to the local disk and the local TCP transmission buffer
      # B : write completion is determined when data is written to the local disk and remote buffer cache
      # C : write completion is determined when data is written to both the local disk and the remote disk
      protocol C;

      cram-hmac-alg sha1;
      # any secret key for authentication among nodes. Please change this
      shared-secret "MyBaculaSecret";
   }
   disk {
      # we can limit bandwidth for sync (example is 30MB/sec)
      resync-rate 30M;
   }

   # on (hostname node 1)
   on dir01 {
      address 192.168.1.180:7788;
      volume 0 {
            # device name
            device /dev/drbd0;
            # specify disk to be used for the drbd device above
            disk /dev/sdb;
            # where to create metadata. Internal is ok for most common use case
            meta-disk internal;
      }
   }

   # on (hostname node 2)
   on dir02 {
      address 192.168.1.181:7788;
      volume 0 {
            device /dev/drbd0;
            disk /dev/sdb;
            meta-disk internal;
      }
   }
}

Ensure the DRBD kernel module can be loaded:

# modprobe drbd
# lsmod | grep drbd
drbd                  458752  0
lru_cache              16384  1 drbd
libcrc32c              12288  7 nf_conntrack,nf_nat,btrfs,nf_tables,drbd,raid456,sctp

3. Set up the cluster.

Start DRBD on both nodes.

# systemctl enable --now drbd
Synchronizing state of drbd.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable drbd
Created symlink /etc/systemd/system/multi-user.target.wants/drbd.service → /usr/lib/systemd/system/drbd.service.

Initialize the resource r0 we have configured. On the primary node, run:

# drbdadm create-md r0
initializing activity log
initializing bitmap (1600 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.

# drbdadm -- --overwrite-data-of-peer primary r0

DRBD will then start synchronizing data between the two nodes:

:~# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 99EF066AEFB069BE05A5E7F
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
   ns:87360 nr:0 dw:0 dr:89496 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:52339804
   [>....................] sync'ed:  0.2% (51112/51196)M
   finish: 2:39:19 speed: 5,460 (5,460) K/sec

# Node 2
:~# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 99EF066AEFB069BE05A5E7F
0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
   ns:0 nr:1188864 dw:1188864 dr:0 al:8 bm:0 lo:1 pe:2 ua:0 ap:0 ep:1 wo:f oos:51238300
   [>....................] sync'ed:  2.3% (50036/51196)M
   finish: 0:24:27 speed: 34,888 (20,148) want: 41,040 K/sec

When synchronization finishes, you should see the following status:

:~# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 99EF066AEFB069BE05A5E7F
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
   ns:719056 nr:0 dw:1800496 dr:314226 al:267 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

If one node is disconnected, the remaining node will show a status similar to:

cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 99EF066AEFB069BE05A5E7F
0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
   ns:0 nr:4096 dw:4096 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Once connectivity is restored, the synchronization of the changed blocks will happen automatically and the UpToDate/UpToDate status will be reached again.

If you encounter a split-brain condition and your STONITH mechanisms did not work, several recovery actions are possible. However, you must decide which node’s data should be preserved. Refer to the official LINBIT documentation for the appropriate recovery procedure and commands for your scenario.

4. Move the data.

So far, the replicated disk is empty. Next, stop Bacula and place the configuration in the shared resource:

Create and mount a filesystem on the replicated device:

# mkfs.ext4 /dev/drbd0
mke2fs 1.47.0 (5-Feb-2023)
Creating filesystem with 13106791 4k blocks and 3276800 inodes
Filesystem UUID: 0f581083-c9c1-4e8c-96be-87c5372f8968
Superblock backups stored on blocks:
   32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
   4096000, 7962624, 11239424

Allocating group tables: done
Writing inode tables: done
Creating journal (65536 blocks): done
Writing superblocks and filesystem accounting information: done

mkdir /tmp/drbd0
mount /dev/drbd0 /tmp/drbd0

Stop all Bacula services:

systemctl stop bacula-dir
systemctl stop bacula-sd
systemctl stop bacula-fd

Copy the Bacula configuration and working files onto the DRBD device:

rsync -aW --no-compress /opt/bacula/ /tmp/drbd0/
ls -l /tmp/drbd0/
total 40
drwx------ 2 bacula root   4096 nov  7 14:52 archive
drwxr-xr-x 2 root   root   4096 oct 30 15:38 bin
drwxr-x--- 2 bacula bacula 4096 nov  7 14:52 bsr
drwxr-xr-x 3 root   root   4096 may 29 11:06 docs
drwxr-x--- 5 bacula bacula 4096 oct 30 15:38 etc
drwxr-xr-x 2 root   root   4096 oct 30 15:38 lib
drwxr-x--- 2 bacula adm    4096 nov  7 12:46 log
drwxr-xr-x 2 root   root   4096 oct 30 15:42 plugins
drwxr-xr-x 3 root   root   4096 oct 30 15:38 scripts
drwxr-x--- 5 bacula bacula 4096 nov 24 11:16 working

Unmount the device so it can be mounted at the original location:

umount /tmp/drbd0

# Be very careful with these kind of command
rm -rf /opt/bacula/*

mount /dev/drbd0 /opt/bacula/
ls -l /opt/bacula/
total 40
drwx------ 2 bacula root   4096 nov  7 14:52 archive
drwxr-xr-x 2 root   root   4096 oct 30 15:38 bin
drwxr-x--- 2 bacula bacula 4096 nov  7 14:52 bsr
drwxr-xr-x 3 root   root   4096 may 29 11:06 docs
drwxr-x--- 5 bacula bacula 4096 oct 30 15:38 etc
drwxr-xr-x 2 root   root   4096 oct 30 15:38 lib
drwxr-x--- 2 bacula adm    4096 nov  7 12:46 log
drwxr-xr-x 2 root   root   4096 oct 30 15:42 plugins
drwxr-xr-x 3 root   root   4096 oct 30 15:38 scripts
drwxr-x--- 5 bacula bacula 4096 nov 24 11:16 working

Of course, the mount should persist across reboots. Normally, this could be done through a line in FSTAB, however, when DRBD is managed by Pacemaker or another similar tool, you should put the DRBD resource in the list of resources to be managed by the cluster, and mount the device before starting any of the dependent services.

If you use BWeb or another component that requires additional directories, you can still move the information to the same device and use symlinks, or you can create another resource.

# symlink example for bweb

# rsync -aW --no-compress /opt/bweb /opt/bacula/

ls -l /opt/bacula/bweb/
total 32
drwxr-xr-x  2 root root    4096 oct 30 15:38 bin
drwxr-xr-x  2 root root    4096 oct 30 15:38 cgi
drwxr-xr-x  2 root root    4096 oct 30 15:41 etc
drwxr-xr-x 13 root root   12288 oct 30 15:38 html
drwxrwx---  2 root bacula  4096 may 29 11:09 spool
drwxr-xr-x  8 root root    4096 may 29 11:06 tpl

ln -fs /opt/bweb /opt/bacula/bweb/
ls -l /opt/bweb/
total 60
drwxr-xr-x  2 root root    4096 oct 30 15:38 bin
drwxr-xr-x  2 root root    4096 oct 30 15:38 cgi
drwxr-xr-x  2 root root    4096 oct 30 15:41 etc
drwxr-xr-x 13 root root   36864 oct 30 15:38 html
drwxrwx---  2 root bacula  4096 may 29 11:09 spool
drwxr-xr-x  8 root root    4096 may 29 11:06 tpl

At this point, you have a shared device containing all Bacula and BWeb configuration.

If you also wanted to include the Catalog, you could apply the same approach and include the data directory of the database in the replicated device, and access to it from the original folder with a symlink (even if we could also modify the database configuration and just point to the new directory we use).

5. Integrate with Pacemaker.

Including a DRBD resource in Pacemaker is common and it is handled similarly to other resource agents of the Pacemaker stack.

Bacula cannot work unless the directories and files we put in the DRBD resource are available. Services will fail to start on the non-active (slave) node, or if the shared resource is not properly initiated. As a result, when using DRBD, it is necessary to set up also a cluster with Pacemaker or with another similar tool.

For more information about Pacemaker and how to use it with Bacula, refer to HA: Clustering Setup. The steps below build on the configuration described in that article.

When DRBD is included in the stack of resources, the service is handled by Pacemaker, so we need to disable it on both nodes:

systemctl disable drbd

In the Pacemaker article, Bacula services such as bacula-dir and BWeb were already managed by Pacemaker. Any other service such as bacula-fd or bacula-sd should be integrated the same way in the resource agent stack once the entire Bacula directory resides on DRBD, because it will not be available on the passive node.

Next, create the DRBD resource. This requires to be done with 2 PCS resources as follows:

# Save config to a local drbd_cfg file
pcs cluster cib drbd_cfg

# Create drbd master/slave resource, it's important to do it together at the same time to the file, before really applying it
pcs -f drbd_cfg resource create shared_disk ocf:linbit:drbd drbd_resource=r0 op monitor interval=30
pcs -f drbd_cfg resource promotable shared_disk meta promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 notify=true

# Push the configuration from the file to the real cluster
pcs cluster cib-push drbd_cfg --config

Now you can verify that the cloned resource was added and that one node is promoted while the other is unpromoted:

pcs resource status
* Resource Group: BaculaDir:
   * bacula-dir      (systemd:bacula-dir):    Started 192.168.1.181
   * VirtualIP       (ocf:heartbeat:IPaddr2):         Started 192.168.1.181
   * bweb    (systemd:bweb):  Started 192.168.1.181
* Clone Set: shared_disk-clone [shared_disk] (promotable):
   * Promoted: [ 192.168.1.181 ]
   * Unpromoted: [ 192.168.1.180 ]

For the next step, install an extra package on Ubuntu, that provides the resource agent used to manage filesystems. The previous step handled the DRBD resource; now you must mount it at the directory Bacula uses.

apt install resource-agents-extra

Create the filesystem resource using the Filesystem agent:

pcs resource create fs_bacula_dir Filesystem device="/dev/drbd0" directory="/opt/bacula" fstype="ext4" op monitor interval=30s

Verify the updated status:

pcs resource status
* Resource Group: BaculaDir:
   * bacula-dir      (systemd:bacula-dir):    Started 192.168.1.181
   * VirtualIP       (ocf:heartbeat:IPaddr2):         Started 192.168.1.181
   * bweb    (systemd:bweb):  Started 192.168.1.181
* Clone Set: shared_disk-clone [shared_disk] (promotable):
   * Promoted: [ 192.168.1.181 ]
   * Unpromoted: [ 192.168.1.180 ]
* fs_bacula_dir      (ocf:heartbeat:Filesystem):      Started 192.168.1.181

Finally, add ordering constraints so DRBD is promoted first, then the filesystem is mounted, and then the Bacula services, BWeb, and the virtual IP are started:

# First DRBD, then Filesystem
pcs constraint order start shared_disk-clone then fs_bacula_dir
Adding shared_disk-clone fs_bacula_dir (kind: Mandatory) (Options: first-action=start then-action=start)

# The group containing services and virtual IP should be run just after the filesystem
pcs constraint order start fs_bacula_dir then BaculaDir
Adding fs_bacula_dir BaculaDir (kind: Mandatory) (Options: first-action=start then-action=start)

With this, the DRBD setup is complete and integrated into the Pacemaker cluster.