High Availability: Foundation
CommunityEnterpriseThe high-availability (HA) strategy for Bacula is built upon a set of foundational principles designed to provide flexibility, resilience, and operational efficiency.
Architectural Flexibility
Bacula Enterprise prioritizes flexibility as a core design value. It does not mandate a built-in HA solution for any component. Instead, it integrates seamlessly with industry-standard HA technologies, particularly within the Linux ecosystem, allowing organizations to leverage existing infrastructure and expertise.
Software Flexibility
Bacula is technology-agnostic when it comes to implementing HA. While different industries may favor specific HA architectures (e.g., Pacemaker, DRBD, ZFS replication), any software solution that provides equivalent service can be used. This ensures that the HA implementation aligns with organizational preferences and operational standards.
Stateless
The HA solutions described in this document are stateless across cluster nodes. Consequently, in-flight operations, such as running backups, copy jobs, or restores, must be restarted after a failover. This behavior can be fully automated to minimize operational disruption, ensuring consistent completion of critical tasks.
Restore Focus
Bacula Systems emphasizes that the primary goal of HA is to enable reliable restore operations during disruptions. The focus is on minimizing Recovery Time Objectives (RTOs) rather than achieving continuous 99,9% availability of the backup service itself. This approach ensures that business-critical data remains protected without introducing unnecessary operational complexity.
Availability vs. Complexity
Higher levels of availability inevitably come with increased architectural complexity, resource requirements, and costs. Bacula Systems advocates for a pragmatic balance, prioritizing production continuity and data protection while avoiding excessive infrastructure overhead. The recommended HA strategy is therefore proportionate to operational risk and organizational priorities.
Foundation Protection
The most basic approach to recover from a disaster a Bacula environment is to ensure the backup of the Catalog and the configuration used by the Bacula Director.
From here, each step towards a minor downtime would add a layer of complexity that needs to be managed. More information about disaster recovery, not directly involved with HA, can be found in the Disater Recovery section.
Solution Comparison
Max. downtime |
Solution |
Notes |
|---|---|---|
\(<\) 5 mins |
High Availability cluster and database physical block level replication |
Needs to be experienced with clustering technologies such as HACMP, HeartBeat, Pacemaker, DRBD etc. |
\(<\) 5 mins |
High Availability cluster and database logical block level replication |
Needs to be experienced with clustering technologies such as HACMP, HeartBeat, Pacemaker, and also Patroni, Keepalived, ETCD, etc. |
\(<\) 1-3 hours |
Spare hardware and database replication |
Needs a clear procedure to restore Bacula and use PostgreSQL internal replication. |
\(<\) 12 hours |
Spare hardware and database restore |
Needs a clear procedure to restore Bacula and you can restore PostgreSQL catalog from the last backup. |
If you loose your Catalog server, all records about jobs that ran after your previous Catalog backup will be lost. Keeping trace of emails and Bootstrap files is sufficient to restore files, but it is not very convenient. To avoid this problem, you can use the PostgreSQL Continuous Archiving option and do binary Catalog backups instead of the default SQL dump procedure. See https://www.postgresql.org/docs/current/continuous-archiving.html for more information.
See also
Previous articles:
Next articles:
Go back to: High Availability.