Verification Criteria
Bacula Enterprise can be configured to compare or to ignore different parts of a file’s metadata during Verify Jobs. This configuration affects all variants of Bacula’s Verify Jobs.
It is important to be aware that the actual contents of a file will never be used; instead, cryptographic checksums can be generated when reading a file, and are stored along with the other metadata in the catalog database. We will discuss those checksums in section Checksums.
The complete set of metadata stored in the catalog, and available for verification purposes, is described in detail in the Bacula Enterprise manual. 4 File ownership, permissions and times are available. Note that permissions only cover the basic unix file system permission bits; in particular, ACLs as used by linux, unix or Windows file systems are not included. 5
File Set {
Name = LinuxSys
Include {
Options {
Verify = pnugsicm1
}
File = /sbin
File = /usr/sbin
File = /boot
File = /etc
File = /lib
}
}
Configuring which data to use during verification is done in the
File Set
resource used, in particular in the Options
sub-resources of the Include
part of each File Set. A File Set might
look like the one shown above, where rather
paranoid Verify
options are specified. In this example, the SHA1
checksum is used. Using MD5
instead – for example to save a bit of
the client’s CPU cycles – would be achieved by replacing 1 with 5.
SHA1
and MD5
are mutually exclusive – you can use only one of
them, while all the other option letters may be freely added (or
concatented).
Bacula Systems recommends to always include ownership, permissions, size, inode and creation time metadata as well as a cryptographic checksum for security-related verification (i.e. InitCatalog / Catalog verifies), and ownership, permissions, and a checksum for integrity-related verification.
Checksums
Cryptographic checksums are an efficient way to verify data identity without actually doing byte-per-byte comparison. The underlying idea is to use a hash function that generates a checksum that is considerably shorter than the full data. As the checksum is shorter than the input data, different input data will generate identical checksums. The art of choosing the right checksumming algorithm is to use one that will generate a different checksum for small as well as large changes to the input data. Bacula supports two checksumming algorithms that both were developed for use in cryptograhic applications, and as such are well understood and known to minimize the risk of hash collisions.
MD5 is the older one, creating a hash of 128 bits length. Even though attacks are possible today, for purposes of data integrity checking it’s still considered reasonably safe.
SHA1 is a newer algorithm, generating a checksum of 160 bits length. SHA1 is more cpu-intensive, but reduces the risk of different data producing identical hashes, so that it’s considered more secure for data integrity purposes.
When using either of these algorithms on a really large number of files, hash collisions – different files with identical checksums – may happen. Adding other metadata, like file size, file times, or inode number as verification criteria is assumed to reduce the risk of incorrect identification of files. For data integrity verification, this is less of a problem, but it needs to be considered when using Data Deduplication with Base Jobs in Bacula Enterprise and using the same file sets for both purposes. 6
See also
Go back to:
Go to:
Go back to the main Advanced Features Usage page.
- 4
See chapter 12.7, pp. 139f of the published manual for Bacula Enterprise version 4.0.4*
- 5
ACLs are backed up and restored, but they are not stored in Bacula’s catalog – from s point of view, they are just a separate part of a file’s data.
- 6
Base Jobs are not covered in this document. Refer to Bacula Systems if you are interested in using this file-based deduplication technology.