High Availability: Foundation

CommunityEnterprise

The high-availability (HA) strategy for Bacula is built upon a set of foundational principles designed to provide flexibility, resilience, and operational efficiency.

Architectural Flexibility

Bacula Enterprise prioritizes flexibility as a core design value. It does not mandate a built-in HA solution for any component. Instead, it integrates seamlessly with industry-standard HA technologies, particularly within the Linux ecosystem, allowing organizations to leverage existing infrastructure and expertise.

Software Flexibility

Bacula is technology-agnostic when it comes to implementing HA. While different industries may favor specific HA architectures (e.g., Pacemaker, DRBD, ZFS replication), any software solution that provides equivalent service can be used. This ensures that the HA implementation aligns with organizational preferences and operational standards.

Stateless

The HA solutions described in this document are stateless across cluster nodes. Consequently, in-flight operations, such as running backups, copy jobs, or restores, must be restarted after a failover. This behavior can be fully automated to minimize operational disruption, ensuring consistent completion of critical tasks.

Shared Stack Principle

When deploying HA, it is recommended to reuse approaches and technologies consistent with the organization’s existing systems. Aligning Bacula’s HA strategy with the HA methodology used for other enterprise applications enables better reuse of expertise, streamlined operations, and reduced learning curves.

Restore Focus

Bacula Systems emphasizes that the primary goal of HA is to enable reliable restore operations during disruptions. The focus is on minimizing Recovery Time Objectives (RTOs) rather than achieving continuous 99,9% availability of the backup service itself. This approach ensures that business-critical data remains protected without introducing unnecessary operational complexity.

Availability vs. Complexity

Higher levels of availability inevitably come with increased architectural complexity, resource requirements, and costs. Bacula Systems advocates for a pragmatic balance, prioritizing production continuity and data protection while avoiding excessive infrastructure overhead. The recommended HA strategy is therefore proportionate to operational risk and organizational priorities.

Foundation Protection

The most basic approach to recover from a disaster a Bacula environment is to ensure the backup of the Catalog and the configuration used by the Bacula Director.

From here, each step towards a minor downtime would add a layer of complexity that needs to be managed. More information about disaster recovery, not directly involved with HA, can be found in the Disater Recovery section.

Solution Comparison

Max. downtime	Solution	Notes
\(<\) 5 mins	High Availability cluster and database physical block level replication	Needs to be experienced with clustering technologies such as HACMP, HeartBeat, Pacemaker, DRBD etc.
\(<\) 5 mins	High Availability cluster and database logical block level replication	Needs to be experienced with clustering technologies such as HACMP, HeartBeat, Pacemaker, and also Patroni, Keepalived, ETCD, etc.
\(<\) 1-3 hours	Spare hardware and database replication	Needs a clear procedure to restore Bacula and use PostgreSQL internal replication.
\(<\) 12 hours	Spare hardware and database restore	Needs a clear procedure to restore Bacula and you can restore PostgreSQL catalog from the last backup.

If you loose your Catalog server, all records about jobs that ran after your previous Catalog backup will be lost. Keeping trace of emails and Bootstrap files is sufficient to restore files, but it is not very convenient. To avoid this problem, you can use the PostgreSQL Continuous Archiving option and do binary Catalog backups instead of the default SQL dump procedure. See https://www.postgresql.org/docs/current/continuous-archiving.html for more information.