Parameters
One of the most important reasons to modify the default parameters is to select the services to include or exclude during the execution of BGuardian. By default, all services are included.
Additionally, some services allow sub-checks and those may also be excluded. The parameters to control these two features are:
service: For the main services
configuration_checks_exclude: To exclude checks from the configuration security service
After selecting the services to apply, there are also parameters which can control the results of those services.
Described below are all services and all of their specific parameters, as well as what actions are recommended to take if results are detected for each of them.
Fileset Common Parameters
The following parameters are applicable to the general behavior of the plugin:
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
mode |
No |
check |
check, alert |
alert |
Run normal check mode or alert mode to query current alerts or add/remove alerts or ignores |
service |
No |
configurationsecurity, infected, securityevents, deviation, successratio, failedinarow, empty, lastgood, nocopy, noverify, lowdedup, orphanchain, restorefrequency, differentfilesystem, nototp |
List (separated by ‘,’) of elements from: configurationsecurity, infected securityevents, deviation, successratio, failedinarow, empty, lastgood, nocopy, noverify, differentfilesystem, lowdedup, orphanchain, restorefrequency, differentfilesystem, nototp |
configurationsecurity, nocopy |
Select the services that BGuardian will execute |
service_exclude |
No |
List (separated by ‘,’) of elements from: configurationsecurity, infected, securityevents, deviation, successratio, failedinarow, empty, lastgood, nocopy, noverify, lowdedup, orphanchain, restorefrequency, differentfilesystem, nototp |
configurationsecurity, nocopy |
Select the services that BGuardian will exclude. Use this variable to exclude a list or the ‘service’ one to include a list, but do not use both |
|
config_file |
No |
The path pointing to a file containing any combination of plugin parameters |
/opt/bacula/etc/bguardian.settings |
Allows to define a config file where configure any parameter of the plugin. Therefore you don’t need to put them directly in the Plugin line of the fileset |
|
log |
No |
An existing path with enough permissions for File Daemon to create a file with the provided name |
/tmp/bguardian.log |
Generates additional log in addition to what is shown in job log. This parameter is included in the backend file, so, in general, by default the log is going to be be stored in the working directory |
|
debug |
No |
0, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
8 |
Generates the working/bguardian/bguardian-debug.log* files containing debut information which is more complete with a greater debug number |
|
reports_path |
No |
An existing path where BGuardian has permissions to write and keep the reports |
/opt/bacula/bguardian/reports |
Path to store BGuardian reports (html and json files). By default, they are stored inside working directory |
|
reports_keep_number |
No |
100 |
Integer |
20 |
Number of reports to keep. BGuardian will remove the older ones over this number |
alerts_path |
No |
An existing path where BGuardian has permissions to write and keep the alerts |
/opt/bacula/bguardian/alerts |
Path to store BGuardian alerts (json files). By default, they are stored inside working directory |
|
bconsole_timeout_seconds |
No |
20 |
Integer |
10 |
Timeout with BConsole commands |
bweb_jobid_link |
No |
true |
true, 1, yes, Yes ; false, 0, no, No |
false |
Enable/disable generation of the link to BWeb job report inside the BGuardian report |
disable_events |
No |
false |
true, 1, yes, Yes ; false, 0, no, No |
true |
Enable/disable Bacula Events generation on alert adding/removing situations |
silent |
No |
false |
true, 1, yes, Yes ; false, 0, no, No |
true |
Enable/disable silent mode, where BGuardian will not produce console output |
Configuration security service
This service is activated if the service parameter contains the keyword: configurationsecurity.
Alert code is: GC__[SUBSERVICE]. Subservice codes are detailed below.
The purpose of this service is to report the result of different checks regarding security, status and best practices related to how Bacula is configured and running in the environment.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
max_days_bacula_backup |
No |
10 |
Integer |
5 |
Maximum number of days without a backup of the configuration or catalog before generating an alert |
max_days_copy |
No |
15 |
Integer |
3 |
Maximum number of days without any copy job before generating an alert |
max_days_verify |
No |
30 |
Integer |
30 |
Maximum number of days without any verify job before generating an alert |
max_days_restore |
No |
30 |
Integer |
60 |
Maximum number of days without any restore job before generating an alert |
max_days_vacuum |
No |
7 |
Integer |
2 |
Maximum number of days without running vacuum in dedup enabled storage daemons |
dedup_free_space_min_bytes |
No |
107374182400 |
Long |
536870912000 |
Limit on bytes of free space for dedup engines before raising the alert |
permissions_base_paths |
No |
/opt/bacula |
List of paths |
/opt/bacula/etc |
List of paths of bacula installation files where checking permissions |
config_backup_job_name |
No |
BaculaDirectorConfigs |
Job Name |
BaculaConfig |
Name of the job of the Backup of the configuration of Bacula, to analyze if it’s regularly run |
catalog_backup_job_name |
No |
(auto-detected) |
Job Name |
BaculaCatalog |
Name of the job of the Backup of the catalog of Bacula, to analyze if it’s regularly run |
min_free_space_percent |
No |
10 |
Integer |
5 |
Minimum percentage of free space for a given Storage Daemon before generating an alert |
configuration_checks_exclude |
No |
*List of checks separated by ‘,’ (see next point) |
restore, verify, copy, malware, antivirus |
List of subchecks to exclude from the execution of this Configuration security service |
Configuration security service subchecks
Configuration security is a special service of BGuardian that checks many things related with the configuration of the environment.
By default, it will check everything, but it is possible to exclude some checks using the configuration_checks_exclude parameter and adding a list of keywords there (separated by ‘,’).
Below we briefly describe the keyword and the function of each of the services and what is the recommended action if a related issue is reported. The code serves for the alerts functionality and to quickly identify each issue:
passwords: Checks duplicated passwords and the strength of them inside Bacula Director configuration.
Code: GC__PASSWOR
Action: Make your passwords unique and use a strong keyword (+8 characters, include upper and lowercase, digits and symbols)
catalog_backup: Checks configuration and recent execution (max_days_bacula_backup parameter) of the backup of the catalog of Bacula (configurable by config_backup_job_name).
Code: GC__CATALOG
Action: Immediately run a backup of the catalog and make sure your schedule is frequent enough (once a week at least)
config_backup: Checks configuration and recent execution (max_days_bacula_backup parameter) of the backup of the configuration of Bacula (configurable by catalog_backup_job_name).
Code: GC__CONFIG_
Action: Immediately run a backup of the catalog and make sure your schedule is frequent enough (once a week at least)
restore: Checks execution of some recent restore (max_days_restore parameter). It is a best practice to run some restore of the different kinds of data from time to time.
Code: GC__RESTOR
Action: Run some restore for each kind of data and each kind of storage from time to time.
verify: Checks execution of some recent verify job (max_days_verify parameter). It is a best practice to verify the data of your jobs.
Code: GC__VERIFY
Action: Configure verify jobs for your data
copy: Checks execution of some recent restore (max_days_copy parameter). It is a best practice to use a multi-tier strategy with your backups.
Code: GC__COPY
Action: Define a second storage tier and configure copy jobs to send your data there.
malware: Checks jobs that could activate the Malware protection function, but have not enabled it.
Code: GC__MALWAR
Action: Enable Malware detection for any reported system that could be suitable to be infected because its location, usage pattern and data kind
antivirus: Checks the existence of one antivirus job for each client.
Code: GC__ANTIVI
Action: Configure an antivirus job for your clients, specially for file servers.
consoles: Checks the usage of restricted consoles.
Code: GC__CONSOL
Action: If you have different users accessing your Bacula environment, configure a restricted console for each of them, using only the minimal needed permissions
events: Checks the activation of Events in Message resources for auditing purposes.
Code: GC__EVENTS
Action: Enable events messages in your configuration and store them at your convenience (it is recommended to store them in a file and also in the catalog)
dir_status: Checks the status of the Director daemon to see if there is any reported error with the service.
Code: GC__DIR_ST
Action: Review the status of your Director service and start or restart it
dir_address: Checks the usage of DirAddress setting to limit Director service to be listening on specific interfaces
Code: GC__DIR_AD
Action: Use the DirAddress setting, so you limit the service to be listening only on the required interface
sd_status: Checks the status of the Storage Daemon(s) to see if there is any reported error with the service or if there is no connectivity with some of them.
Code: GC__SD_STA
Action: Review the status of the affected Storage Daemon and the connectivity to it from the Director host. Start or restart the service if needed
sd_free: Checks free space in each Storage Daemon is above the threshold (defined by min_free_space_percent parameter).
Code: GC__SD_FREE
Action: Review the status of the affected Storage Daemon and start or restart it
dedup: Enable getting dedup status for Storage Daemons in order to make dedup checks around Global Endpoint Deduplication. (Requires to not exclude sd_status)
ded_errors: Check if GED is reporting some general error. (Requires to not exclude dedup)
Code: GC__DED_ER
Action: Review the message and the status of the affected Storage Daemon. Restart it, run vacuum procedures and re-check
ded_orphan: Check if GED is reporting some orphan reference. (Requires to not exclude dedup)
Code: GC__DED_OR
Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
ded_vacuum: Check if GED vacuum procedure was executed recently enough. The number of days can be controlled with max_days_vacuum parameter. (Requires to not exclude dedup)
Code: GC__DED_VA
Action: Run vacuum as soon as possible.
ded_idx: Check if GED is marking some error with the indexes. (Requires to not exclude dedup)
Code: GC__DED_ID
Action: Review the status of the filesystem holding the indexes.
ded_miss: Check if GED is marking some missed reference. (Requires to not exclude dedup)
Code: GC__DED_MI
Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
ded_free: Check the free space available in GED engine (it relies on parameter dedup_free_space_min_bytes). (Requires to not exclude dedup)
Code: GC__DED_FR
Action: Provide more space to your GED containers filesystem. You can also purge data from your dedup storages and then run vacuum process to try to recover some space.
ded_suspect: Check if GED is reporting some suspect reference. (Requires to not exclude dedup)
Code: GC__DED_SU
Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
ded_derr: Check if GED is reporting errors in the dedup engine. (Requires to not exclude dedup)
Code: GC__DED_DE
Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
ded_cerr: Check if GED is reporting errors in the containers. (Requires to not exclude dedup)
Code: GC__DED_CE
Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
fd_status: Checks the status of the client File Daemon(s) to see if there is any reported error with the service or if there is no connectivity with some of them.
Code: GC__FD_STA
Action: Review the status of the affected File Daemon and the connectivity to it from the Director host. Start or restart the service if needed
fips: Checks if FIPS is enabled on daemons supporting it (for DIR requires to not exclude ‘dir_status’ ; for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).
Code: GC__FIPS
Action: Consider enabling FIPS to the affected Daemon if your security posture needs to be very high
trace: Checks if trace is enabled in any daemon with the risk of fulling a disk (for DIR requires to not exclude ‘dir_status’ ; for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).
Code: GC__TRACE
Action: Disable debug and trace from the affected Daemon as soon as possible if you are not doing debug activities anymore
versions: Checks if the FDs and/or SDs Bacula versions are aligned with the version of the Director (for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).
Code: GC__VERSIO
Action: Install a supported Bacula version in the affected Daemon. Storage Daemon and Director must be on the same version, while File Daemons can run an older version than the Director.
security_plugin: Checks if the security plugin is deployed in each FD (requires to not exclude ‘fd_status’).
Code: GC__SECURI
Action: Install the security plugin in the affected File Daemons
permissions: Checks if permissions are strong enough for the given path and subpaths (permissions_base_paths parameter).
Code: GC__PERMIS
Action: Correct the permisisons on the affected paths. Usually you need to exclude the ‘others’ group from any bacula directory and to protect the bacula configuraton from undesiderable writes.
running_processes: Checks if running processes are running with root user, which is generally not recommended for secure environments.
Code: GC__RUNNIG
Action: Configure your daemos with the correct user. Director and Storage Daemon do not need to be run with root.
encryption: Check if encryption is used at all in the environment
Code: GC__ENCRYP
Action: Consider using encryption for any sensitive data or any untrusted storage.
volprotection: Check if volume protection is used at all in the environment
Code: GC__VOLPRO
Action: Consider enabling volume protection for you disk backup over linux, as well as any backup sent to NAS from NetApp, DataDomain or HPE StoreOnce.
pg_vacuum: Check if PostgreSQL vacuum process was executed recently enough on the key tables
Code: GC__PG_VAC
Action: Run Vacuum on the affected tables as soon as possible, during a low load window in your environment.
pg_analyze: Check if PostgreSQL analyze process was executed recently enough on the key tables
Code: GC__PG_ANA
Action: Run Analyze on the affected tables as soon as possible, during a low load window in your environment.
pg_config: Check if PostgreSQL configuration parameters are inside the recommended margins
Code: GC__PG_CON
Action: Correct the mentioned values to comply with the recommended configuration
Infected service
This service is activated if the service parameter contains the keyword: infected.
Alert code is: GIN
The purpose of this service is to reports jobs where some virus, ransomware or malware were detected, so a summary of them is easily available while a new alert for any new entry will also be generated.
Action: If you find any job containing some kind of malware or virus you should quickly isolate that system from your network and run healing activities on it. After, run a new backup and check that no more virus or malware are detected.
Security events service
This service is activated if the service parameter contains the keyword: securityevents.
Alert code is: GSE
This service will report any recent event registered in Bacula core with the security category.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
securityevents_days_since |
No |
15 |
Integer |
30 |
Defines the number of days to consider, from today, for the report of security events |
Action: Review the nature of the event and act in consequence. If you find, for instance, many failed attempts to connect to the Director from BConsole, consider to change the location of your consoles or improve the security posture at networking level for them.
Deviation service
This service is activated if the service parameter contains the keyword: deviation.
Alert code is: GDV
The purpose of this service is to analyze job executions statistically and find deviation from the expected values. Calculations are done over the size, number of files and duration of the jobs.
Depending on what kind of jobs and the nature of data of your environment, you may need to adjust the following parameters to maximize the utility of the deviation results. It is possible to adjust the different thresholds, as well as to decide if results should only be listed following regression deviation and deviation from average (default behavior) or exclude deviation from average if it is generating too much information and it is not pointing to issues in the environment (dev_include_by_avg parameter).
Parameters for deviation service are explained below:
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
dev_factor |
No |
0.4 |
Float |
0.25 |
Defines what jobs that will trigger the alert of deviation. It means what relation with the calculated standard deviation is considered significant enough. 1 means 200% of the standard deviation, 0.5 means 150% of the standard deviation. Example: If the standard deviation for size is 100Mb, with a value of 0.5: A job with a deviation from the average or regression of 160Mb will be selected, a job with a deviation from the average of 50Mb won’t be selected, a job with a deviation from the average of 120Mb won’t be selected |
dev_severity_low_limit |
No |
0.5 |
Float |
0.8 |
Defines the limit to consider a selected deviated job as severity Low |
dev_severity_medium_limit |
No |
0.75 |
Float |
0.9 |
Defines the limit to consider a selected deviated job as severity Medium |
dev_min_regression_accuracy |
No |
0.5 |
Float |
0.4 |
Minimum value of regression accuracy in order to use the regression analysis |
dev_min_dev_from_avg |
No |
0.8 |
Float |
2 |
Minimum deviation from the average to consider a job as deviated for selection based on average |
dev_min_duration |
No |
1200 |
Integer |
3600 |
Minimum duration of a job in order to consider deviation by duration as something significant to select the job |
dev_min_executions |
No |
5 |
Integer |
20 |
Minimum number of executions of a given job in order to consider it for deviation analysis |
dev_include_by_avg |
No |
true |
1, true, yes, Yes ; 0, false, no, No |
false |
Enable/disable selection of results based on the average deviaton |
dev_min_files |
No |
50 |
Integer |
100 |
Minimum nuber of files in a job in order to consider deviation by number of files as something signficatn to select the job |
dev_min_size |
No |
10485760 |
Integer |
52428800 |
Minimu size of a job in order to include it in deviation analysis |
Action: When this service reports results, you should review every result and analyze what caused the given deviation. A deviation can be caused by many reasons, like sudden new information to backup, a sudden slowness problem, some controlled massive deletion… However, the same effect can be caused from ransomware activities or even a not controlled or desired user activity. If you find such event, you will need to solve it and consider adjusting or re-runnnig some backups. If there is an explanation, every issue can be marked to be ignored in further alerts through the ignoring mechanism.
Failed in a row service
This service is activated if the service parameter contains the keyword: failedinarow.
Alert code is: GFR
The purpose of this service is to report jobs that have failed N or more times in a row, according to the parameter failedrow_times.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
failedrow_times |
No |
3 |
Integer |
5 |
Defines thresshold of times a given job failed in order to select it to be included in the report |
Action: Review as soon as possible the affected jobs and analyze the causes of the failure. You can run them manually to check the result and then adjust the configuration or the schedule according to the results of your analysis.
Restore frequency service
This service is activated if the service parameter contains the keyword: restorefrequency.
Alert code is: GRF
The purpose of this service is to report jobs whose restore frequency is below the factor established by the parameter: restore_factor.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
restore_factor |
No |
0.15 |
Float |
0.5 |
Defines thresshold of restoring frequency for a given job in terms of % |
Action: Review your backup policies and include some periodic restores on it, in order to ensure your backups and restore strategies are correct and agile enough to be correctly prepared for the time when an urgent restore comes.
Success ratio service
This service is activated if the service parameter contains the keyword: successratio.
Alert code is: GSR
The purpose of this service is to report jobs whose successratio is below the factor established by the parameter: success_factor.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
success_factor |
No |
0.8 |
Float |
0.75 |
Defines thresshold of success ratio for the executions of a given job in terms of % |
success_severity_medium_limit |
No |
0.4 |
Float |
0.5 |
Defines the limit to consider a selected deviated job as severity Medium |
success_severity_low_limit |
No |
0.6 |
Float |
0.7 |
Defines the limit to consider a selected deviated job as severity Low |
Action: Review the affected jobs and analyze the causes of the failures. If they are apparently random, consider to run a load analysis over your system in order to spread better the load over your network and hosts.
No Copy service
This service is activated if the service parameter contains the keyword: nocopy.
Alert code is: GNC
The purpose of this service is to reports jobs not included in any 2-tier policy, which means jobs that have never been copied or migrated.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
nocopy_grace_period_days |
No |
10 |
Integer |
10 |
Jobs that are more recent than the days defined by this parameter won’t be included in the report |
Action: Review your backup policies and include a 2-Tier storage where sending the reported jobs through the configuration and execution of Copy jobs.
No Verify service
This service is activated if the service parameter contains the keyword: noverify.
Alert code is: GNV
The purpose of this service is to reports jobs not included in any verification policy, which means jobs that have never been verified.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
noverify_grace_period_days |
No |
20 |
Integer |
50 |
Jobs that are more recent than the days defined by this parameter won’t be included in the report |
Action: Review your backup policies and include a verification phase where you run periodic verify jobs. Include the listed jobs in the report.
Empty service
This service is activated if the service parameter contains the keyword: empty.
Alert code is: GE
The purpose of this service is to reports Full jobs that were successful but have no contents (no files and no bytes stored).
Action: Review the affected jobs, including the joblog and the configuration. You may need to adjust the configuration or to run again the affected jobs.
Last good service
This service is activated if the service parameter contains the keyword: lastgood. Alert code is: GLG
The purpose of this service is to reports jobs where its last successful execution is older than the number of days specified by the parameter: lastgood_max_since_days.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
lastgood_max_since_days |
No |
5 |
Integer |
10 |
Jobs that have no successful execution more recent than the days defined by this parameter will be included into the report |
Action: Review as soon as possible the affected jobs and their latest executions. Run them manually if they have been missed, cancelled or failed.
Low Dedup
This service is activated if the service parameter contains the keyword: lowdedup.
Alert code is: GLD
The purpose of this service is to provide a report of jobs that are not having good enough deduplication and that are potentially miss-using resources for information that are not a good candidate to be deduplicated. This information is also useful if running out of space and need to delete or migrate the information of some jobs that are using a good amount of storage.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
dedup_ratio_min |
No |
40 |
Float |
60 |
Defines thresshold of dedup ratio. Jobs with a lower ratio will be reported. The threshold represents a percentage and jobs store it as ‘compressratio’ in the catalog |
dedup_size_min |
No |
52428800 |
Long |
262144000 |
Defines the limit in size to consider a selected low dedup job. Jobs with smaller size won’t be considered |
Action: Consider disabling deduplication for the affected jobs if the ratio is very poor for your needs. If running out of space, consider deleting (or moving to a different storage with a Migration) large jobs with a poor ratio that are old enough for your needs.
Orphan Chain
This service is activated if the service parameter contains the keyword: orphanchain.
Alert code is: GOC
Incremental and Differential jobs are part of a chain, and they are dependent on the information of a previous job. Sometimes, due to human mistakes or due to a bad retention policy, chains can be broken, and a dependent job is recycled before an after one. This service will detect this situation and report jobs that are ‘orphan’ in these terms.
It is important to note that depending on the backup nature, Incremental or Differential jobs alone can be still useful. For instance, for any job that contains files, the information is still fully recoverable, and they can also be based on a previous older Full or Incremental having the restore job still working. However, for some Virtual Machine or Database plugins, it is possible that one Incremental or Differential job without their predecessor will not contain recoverable information. In general, it is important to try to avoid having any orphan job and this service is intended to help in that direction.
Action: If you ever detect an orphan job, review your backup policies regarding retention times and adjust them, if necessary, to not be automatically producing any orphan job. Check also for the affected jobs that you have other valid copies of the information, if you don’t, run new jobs as soon as possible.
Different filesystem
This service is activated if the service parameter contains the keyword: differentfilesystem.
Alert code is: GDFS
This service will detect and report jobs where the log message ‘xxx is a different filesystem. Will not descend from yyyy’ is produced. Paths reported there will be compared with the list of excluded ‘knonw’ paths configured by ‘different_fs_exclude’ parameter. If they do not match, jobs will be reported.
This situation happens when jobs are using filesets with OneFS option enabled. Depending on the environment, this behavior is absolutely desirable, but can hide some non desired exclussion. This service helps to avoid those kind of situations.
Option |
Required |
Default |
Values |
Example |
Description |
---|---|---|---|---|---|
different_fs_exclude |
No |
/proc, /sys, /tmp, /boot |
List of paths separated by ‘,’ |
/proc, /sys, /tmp, /boot, /myNonDesiredFS2, /mnt/myNonDesiredFS1, |
List of known paths that are on different filesystems and there is no problem if they are reported as paths that won’t be backed up |
Action: Review the path that was excluded and consider modifying your backup configuration if it is necessary to include it, or add it to the list to be excluded on this service otherwise.
NoTOTP service
This service is activated if the service parameter contains the keyword: nototp.
Alert code is: GNT
The purpose of this service is to report users that have not enabled TOTP 2-Tier authentication mechanism.
Action: Consider to enable the TOTP 2-Tier authentication mechanism for the reported users in order to improve your security posture regarding BWeb access.
Go back to the main configuration page.