Catalyst Storage Plugin
Catalyst Storage Plugin, also known as Bacula Catalyst FileSystem (BCFS), is a filesystem client for HPE StoreOnce Catalyst Stores.
BCFS enables the mounting of a StoreOnce Catalyst Store as a filesystem specifically for the storage of Bacula’s backups. It offers the advantage of supporting Immutability, which is not available with StoreOnce NAS Share. It is important to note that the BCFS does not fully comply with POSIX standards; further details can be found in the section titled Limits Of The BCFS Filesystem below.
By default, access to the filesystem is restricted to the user who mounted it, and it is
advised against using the chown
command to alter this access.
To optimize functionality, it is advisable to utilize the setuid=bacula
option, allowing
BCFS to operate under the bacula user (not as root), while also ensuring that the
mountpoint is owned by the bacula user.
Synopsis
To mount a filesystem:
bcfs <CONFIG-FILE> <mountpoint> [options]
The <CONFIG-FILE> file holds the Catalyst’s configuration parameters, see below for the format and the parameters.
To unmount it:
fusermount3 -u <mountpoint> # Linux
umount <mountpoint> # OS X, FreeBSD or recent Linux
Options
- -o opt,[opt…]
mount options, see below for details. A variety of FUSE options can be given here as well. The
allow_root
is forced by the program and you must add-o direct_io
to fully take advantage of the fixed mode. see Mounting from fstab for more options.
- -h, --help
print help and exit.
- -V, --version
print version information and exit.
- -d, --debug
print debugging information.
- -v, --verbose
print debugging information.
- -f
do not daemonize, stay in foreground.
- -s
Single threaded operation.
- -o logfile=<PATH>
to specify the location of the BCFS log file. (default is stderr)
- -o loglevel=<LEVEL>
to specify the level of BCFS log file. Valid values are:
debug
,info
,warning
anderror
(default iserror
)
Configuration file
The configuration file, typically referred to as bcfs.conf
, is structured as follows:
#
# Connection section
#
address = 10.0.99.242
store_name = test2
cmd_port = 9387
data_port = 9388
user = Admin
password = MySecretPassword
#
# Logging section
#
catalyst_log_file = /tmp/bcatalystfs.log
catalyst_log_size_mb = 50
# catalyst_log_level = debug|trace|info|quiet|error
catalyst_log_level = error
#
# Rule Section
#
rule_20 = fixed=4096:*.add
rule_50 = variable:*
Exercise caution, as this file contains a password and must be accessible by the BCFS process.
If you utilize the setuid=UID
option when initializing the filesystem, ensure that the user
with the specified UID has the necessary permissions to access this file.
This file is formatted as a .ini file. You may initiate a comment by placing a #
(Number Sign)
at the start of a line. Quote characters are treated like any other character; therefore,
do not enclose your strings in quotes. The =
(Equal) sign serves as a delimiter between
the key and the value. Any spaces at the beginning or end of the key or value will
be disregarded, while spaces within the values are considered significant.
The recognized fields are:
- address
This is the ip address or FQDN of your Catalyst Server (MANDATORY).
- store_name
This is the name of your Catalyst Store (MANDATORY).
- cmd_port
This is the command session port number, default is 9387.
- data_port
This is the data session port number, default is 9388.
- user
This is the user name used to connect to the Store of the Catalyst Server (MANDATORY).
- password
This is the password of the user above (MANDATORY).
- catalyst_log_file
This is the name of your local Catalyst’s log file. This log file is handled by the Catalyst’s API (MANDATORY). This is different of the fuse log file, see Logging below for more.
- catalyst_log_size_mb
This is the maximum size of the Catalyst’s log file in MB. Do not specify the MB unit! The beginning of the file is overwritten when the file reaches the limit, this is how the Catalyst log file works. Default is 50MB.
- catalyst_log_level
This is the log level of the Catalyst’s log file. From the most to the less verbose you can use: debug, trace, info, quiet, error. Default is error.
- rule_XXX
Keys starting with
*rule_\*
are for rules that define the deduplication mode regarding the pattern of the file, see below.
Note
The BCFS daemon has its own log file that is controlled by the command line parameters.
Deduplication Rules
The XXX part in the rule_XXX has no other purpose than uniquely identifying the rule.
It is not used to define any priority order.
Each rule consists of two components, which are divided by a :
(colon).
The first part is the deduplication mode, and the second part is a well known shell globbing file name.
See the unix glob(7) manpage for more.
The first part can be of the 2 forms:
- variable
meaning that the Catalyst will use variable block deduplication for the files matching the pattern.
- fixed=SIZE
meaning that the Catalyst will use fixed block deduplication to the files matching the pattern.
SIZE
is the size in bytes of the blocks and must be betweenOSCMN_MINIMUM_FIXED_CHUNK_SIZE
andMaximumFixedChunkSize
. These two values are displayed in the log file during the initialization of the BCFS client and can also be found in thesys/version
special file. The minimum value is hardcoded in the API at 2048 (2KiB), and the maximum value is set at 8192 (8KiB) on our Catalyst test server. If the size exceeds these limits, the error message displays them. The recommended value is 4096. Use the fixed mode only for files that will be accessed on an aligned block boundary, otherwise file input/output errors will occur.
Filenames corresponding to the second part will use the deduplication mode defined in the first part. Rules are applied in the order they are defined- - the first one that matches, applies. If no rules apply, the default is variable.
Examples
When using the Aligned Plugin and storing your Data and Metadata volumes on the BCFS mount point, it is advisable to deduplicate the Data part of the volume that has the .add extension and not deduplicate the metadata volume using these rules:
rule_20 = fixed=4096:*.add
rule_50 = variable:*
4096 is the value recommended by the HPE support. When using Bacula without the Aligned Plugin, it is advisable to use the Variable Block Deduplication for all the volumes that need deduplication. By prefixing these volumes with the identifier “DedupVolume”, the following rules can be applied:
rule_20 = variable:DedupVolume*
rule_50 = variable:*
Important
Remember that this is globbing, not regex patterns.
Usage
Logging
BCFS generates two distinct log files:
The first log file is managed by the Catalyst library and is configured through parameters such as
catalyst_log_file
,catalyst_log_level
, andcatalyst_log_size_mb
within the configuration file. This log operates as a circular buffer, resembling a ring structure, which renders the use of thetail
command ineffective. Additionally, it contains a binary header of a few bytes.The second log file is of greater significance as it captures the BCFS logs. It is configured using the
-o logfile=PATH
and should be maintained at theerror
level with the option to increase the level for more information. The level can be set using-o loglevel=<LEVEL>
, where the permissible levels include:debug
,info
,warning
, anderror
It is crucial to ensure that the BCFS process is able to write to these log files,
particularly when using the setuid=UID
option.
Mounting from fstab
The BCFS can be mounted automatically from /etc/fstab
, using the following line:
/etc/bcfs.conf /backup fuse.bcfs setuid=bacula,direct_io,nodev,noexec,noatime,loglevel=error,logfile=/tmp/bcfs.log 0 0
The first parameter is the configuration file. Notice that the file system
type is bcfs
(for backwards compatibility, you may also use fuse.bcfs
).
The bcfs
binary must be in the system path, usually it is installed into /usr/bin
.
The setuid=bacula
changes the user ID of the process to bacula
. If this adjustment is
made, it is essential to ensure that the bacula user possesses the necessary read and write
permissions for the mount point.
Additionally, it is crucial to confirm that the /sbin/nologin
shell is not set for
user bacula
:
# grep bacula /etc/passwd /etc/shadow
/etc/passwd:bacula:x:990:986:Bacula:/opt/bacula/working:/bin/sh
/etc/shadow:bacula:!!:19550::::::
It is possible to use the commands chsh
to change the shell and passwd
to lock the account:
# chsh -s /bin/sh bacula
# passwd -l bacula
The direct_io
option will disable the internal FUSE buffering and transmit all Bacula’s IOs to BCFS.
The nodev
and noexec``options are recommended for enhancing security, while the ``noatime
option prevents unnecessary updates. For the loglevel
and logfile
options, see
Logging above.
Querying the extra attributes
Extra information can be retrieved from the files themself by using xattr
.
The Catalyst tags will show a true value when present:
$ xattr -l /catalyst/foobar
dedup: var
chunksize: 0
Complete: true
BaculaEnterprise: true
- dedup
can be var or fixed depending on the rules you have set up in the Configuration file.
- chunksize
when dedup is set to fixed, this is the size you have set up in the configuration file, else it is 0.
- Complete or Incomplete
are tags handled by the Catalyst, Complete is synonymous to read-only while Incomplete is synonymous to read/write.
- BaculaEnterprise
is a tag that is added to every object that are created by the BCFS. A file without this tag has been created by another application.
If xattr
is not available, you can try getfattr
:
$ getfattr -d -m ".*" /catalyst/foobar
getfattr: Removing leading '/' from absolute path names
# file: catalyst/foobar
BaculaEnterprise="true"
Complete="true"
chunksize="0"
dedup="var"
Querying the sys/ filesystem
The sys/ sub-directory comprises a set of virtual files that serve as an interface to the BCFS internal.
- command_arguments
holds the arguments on the command line at the time BCFS was started:
bcfs -oallow_root -osubtype=bcfs,fsname=/etc/bcfs.conf -o rw,direct_io,nodev,noexec,noatime,nosuid /catalyst
- configuration
holds the configuration (the password line has been removed):
address = 10.0.99.242 user = Admin cmd_port = 9387 data_port = 9388 store_name = test2 catalyst_log_file = /tmp/bcatalystfs.log catalyst_log_size_mb = 50 # catalyst_log_level = debug|trace|info|quiet|error catalyst_log_level = error rule_10 = variable:*.add #rule_20 = fixed=4096:*.quatre rule_90 = variable:*
- version
holds the BCFS version and information from the Catalyst’s library and server:
version: 1.0.1 fuse_version: 29 fuse_package_version: NA read_bandwidth: high write_bandwidth: low OSCMN_MINIMUM_FIXED_CHUNK_SIZE: 2048 MaximumFixedChunkSize: 8192 ClientSoftwareVersion: RHEL_x64_2022-04-28T12.28.723_4.3.2-2217.20 ServerSerialNumber: 17TLGM2FH78D9WAM ServerSoftwareVersion: 4.3.6-2323.20 DiskCapacity: 882079956992 SupportIsvPerObjectImmutability: 1 MaximumDataSessions: 256 FreeDataSessions: 61
- cmd_connections
holds information about recycled command connections:
0x55895bf8afb0 in_use=0 age=0s 0x55895bf8afc8 in_use=0 age=0s 0x55895bf8afe0 in_use=0 age=0s 0x55895bf8aff8 in_use=0 age=0s 0x55895bf8b010 in_use=0 age=14s
- data_connections
one line per data connection to the Catalyst. Any open file has one line:
0x7ff0dc0225d0 off=6 age=14s name=foobar
- store
holds the information about the store and the server that can change over time:
DiskCapacity: 882079956992 DiskSpaceFree: 870689128448 DedupeRatio: 3.9 UserDataSize: 25092777408 UserDataStored: 25092777408 DedupedDataSizeOnDisk: 6336564418 DedupeMetaSizeOnDisk: 501318090 DatabaseSizeOnDisk: 13312879 ScratchPadSizeOnDisk: 0 NumObjects: 51 SupportUserDataSizeQuotas: true UserDataSizeLimit: 0 (unlimited) SupportDedupedDataSizeOnDiskQuotas: true DedupedDataSizeOnDiskLimit: 0 (unlimited) last_update: 2024-11-20T13:36:20Z last_update_epoch: 1732109780
The available
DiskSpaceFree
is shared among all the Catalyst stores. TheDedupeRatio
represents the ratio ofUserDataSize
andDedupedDataSizeOnDisk
. TheUserDataStored
excludes the sparse regions from the totalUserDataSize
. The quota values, namely``UserDataSizeLimit`` andDedupedDataSizeOnDiskLimit
, relate to the two previous values. Thelast_update
indicates the timestamp when these value were retrieved.
Limits of the BCFS
BCFS has been implemented on top of the Catalyst API to address the storage requirements of the Bacula volumes and the Aligned Plugin. In Bacula, the usage of the file based volumes mimics the use of the tape volumes, and the access patterns do not require the full range of features typically offered by contemporary filesystems. Additionally, the Catalyst API has its limitations and may not readily fulfill all the demands of a traditional filesystem. Nevertheless, it is anticipated that the Catalyst API will meet the storage needs for Bacula’s volumes and its Aligned Plugin.
The following outlines the primary distinctions between BCFS and a POSIX compliant filesystems.
No sub-directory tree
The Catalyst API exclusively supports Stores and Items and does not accommodate structures resembling a directory tree. However, this does not pose a limitation for Bacula, as each volume is required to have a unique name and is frequently stored within the same directory. Bacula handles all management tasks, eliminating the need for users to directly access the archive directory for their daily operations.
$ mkdir ~/mnt/dir
mkdir: cannot create directory ‘/home/bac/mnt/dir’: Function not implemented
File cannot be renamed
The name of a Catalyst item is fixed and cannot be altered, which means the corresponding file is also unchangeable. However, the file can be duplicated under a new name, allowing for the original file to be removed.
$ mv ~/mnt/a4 ~/mnt/a99
mv: cannot move '/home/bac/mnt/a4' to '/home/bac/mnt/a99': Function not implemented
Deletion of open files
When a file in use is deleted, FUSE attempts to rename it to a hidden file to delete it later. However, if the file cannot be renamed, as indicated in the section titled File cannot be renamed above, the operation will not succeed.
$ rm ~/mnt/file
rm: cannot remove '/home/bac/mnt/file': Function not implemented
The content of a file cannot be modified
Modifying data within a file is possible, but it will truncate the file at the point of modification, resulting in the loss of any subsequent data. This is CRUCIAL to note, as attempting to run applications other than Bacula may lead to unforeseen consequences.
Characters allowed in file names
The name may consist of the following characters: 0-9 a-z A-Z _ . + [ ] ^ | ? * ( ) # : ; = @ { }
.
However, spaces are not permitted in the name.
Inconsistency between atime, mtime and ctime
The Calatyst API does not provide support for the atime
value, atime
is always a
copy of the mtime
value. When the file is not open (nobody reading or writing to it),
these values come from the Catalyst server, using the Catalyst clock.
However, when a file is open, these three values are managed at the fuse level in a
consistent manner using the time of the FUSE host.
They are initialized using the Catalyst API. If there is a discrepancy between the clocks
of the client and the server, inconsistencies may arise.
Immutability
The BCFS offers support for immutability, provided that this option is activated on the Catalyst Store.
First, verify in the StoreOnce Management Console that the option is enabled for
your Catalyst Store. Check the details page of the Store to ensure that the value for
Server Controlled Data Immutability Retention in the Security section is not
set to No limits. If it is, enable it and set a value that corresponds with the
MinimumVolumeProtectionTime
directive in your Bacula configuration.
When you enable the Server Controlled Data Immutability feature, you need to specify the duration of the Immutable Period, which defaults to 30 days. The Immutable Period begins once the backup application marks the item as Complete. The Immutable Period includes a one-hour grace period during which you can delete or append data to the end of a Catalyst item. After this grace period, the Immutable Period officially starts. The volumes that are designated as read-only on the filesystem will have the Complete tag applied and will either be eligible for immutability or already be immutable.
About Deduplication
Two data segments of 4KiB will deduplicate due to their identical nature. Their similarity arises from sharing the same origin. It is highly unlikely for two data segments of a size significant enough to warrant consideration for deduplication to lack a common origin. Achieving a favorable deduplication ratio is not necessarily advantageous. The choice between duplicated and centralized data presents both advantages and disadvantages, and it is essential to determine which option best suits your needs.
The major sources of duplication are: - Always running Full backups instead of Incremental. - Running backups of Virtual Machines that share the same OS. - Having multiple copies of identical data.
If you are aware of duplicated data and are experiencing a low deduplication ratio, it is advisable to take this matter seriously.
Additionally, the implementation of compression or encryption may disrupt the deduplication algorithm.
Bacula configuration
Using the Aligned driver with the Fixed mode and a 4KiB block size will give you the best deduplication performance on the Catalyst system. will yield optimal deduplication performance on the Catalyst system. If there is a discrepancy in alignment between your data source, such as when dealing with virtual machines that have misaligned partitions, or if your data is subject to changes over time due to the management of numerous uncompressed text documents, the Variable mode may deliver superior results. Regardless, the Aligned driver consistently outperforms the File driver in terms of deduplication efficiency.
Here is a typical device configuration for an Aligned device:
Device {
Name = storeonce1
DeviceType = Aligned
MediaType = sobcfs
ArchiveDevice = /opt/bacula/archive/catalyst001
AlwaysOpen = no
LabelMedia = yes
AutomaticMount = yes
RemovableMedia = no
RandomAccess = yes
MaximumConcurrentJobs = 1
SetVolumeReadOnly = yes
MinimumVolumeProtectionTime = 30 days
#FileAlignment = 64k
}
The following directives are noteworthy:
- DeviceType = Aligned
Chose the Aligned driver for the best deduplication performance.
- MediaType = sobcfs
Do not forget to define a specific MediaType for the volumes in the same store.
- SetVolumeReadOnly = yes
This allows the device to activate the immutability on the volumes.
- MinimumVolumeProtectionTime = 30 days
This value must match the value configured on the Catalyst system.
- MaximumConcurrentJobs = 1
If you are using Aligned, it is good to set the MCJ to 1, as the Aligned driver forces the MCJ to 1 anyway.
Do not modify the directives controlling the size of the chunks related to the Aligned driver, the default settings, particularly the 64K for FileAlignment, are ideal.
Of course, multiple devices or even an autochanger can use the same Catalyst Store
Note
It is useless to compress the data in the FileSet as the Catalyst already compress the chunks.
See also
Go back to the Storage Backend.