Parameters

One of the most important reasons to modify the default parameters is to select the services to include or exclude during the execution of BGuardian. By default, all services are included.

Additionally, some services allow sub-checks and those may also be excluded. The parameters to control these two features are:

service: For the main services
configuration_checks_exclude: To exclude checks from the configuration security service

After selecting the services to apply, there are also parameters which can control the results of those services.

Described below are all services and all of their specific parameters, as well as what actions are recommended to take if results are detected for each of them.

Fileset Common Parameters

The following parameters are applicable to the general behavior of the plugin:

Option	Required	Default	Values	Example	Description
mode	No	check	check, alert	alert	Run normal check mode or alert mode to query current alerts or add/remove alerts or ignores
service	No	configurationsecurity, infected, securityevents, deviation, successratio, failedinarow, empty, lastgood, nocopy, noverify, lowdedup, orphanchain, restorefrequency, differentfilesystem, nototp, objectlastgood	List (separated by ‘,’) of elements from: configurationsecurity, infected securityevents, deviation, successratio, failedinarow, empty, lastgood, nocopy, noverify, differentfilesystem, lowdedup, orphanchain, restorefrequency, differentfilesystem, nototp, objectlastgood	configurationsecurity, nocopy	Select the services that BGuardian will execute
service_exclude	No		List (separated by ‘,’) of elements from: configurationsecurity, infected, securityevents, deviation, successratio, failedinarow, empty, lastgood, nocopy, noverify, lowdedup, orphanchain, restorefrequency, differentfilesystem, nototp, objectlastgood	configurationsecurity, nocopy	Select the services that BGuardian will exclude. Use this variable to exclude a list or the ‘service’ one to include a list, but do not use both
config_file	No		The path pointing to a file containing any combination of plugin parameters	/opt/bacula/etc/bguardian.settings	Allows to define a config file where configure any parameter of the plugin. Therefore you don’t need to put them directly in the Plugin line of the fileset
log	No		An existing path with enough permissions for File Daemon to create a file with the provided name	/tmp/bguardian.log	Generates additional log in addition to what is shown in job log. This parameter is included in the backend file, so, in general, by default the log is going to be be stored in the working directory
debug	No		0, 1, 2, 3, 4, 5, 6, 7, 8, 9	8	Generates the working/bguardian/bguardian-debug.log* files containing debut information which is more complete with a greater debug number
reports_path	No		An existing path where BGuardian has permissions to write and keep the reports	/opt/bacula/bguardian/reports	Path to store BGuardian reports (html and json files). By default, they are stored inside working directory
reports_keep_number	No	100	Integer	20	Number of reports to keep. BGuardian will remove the older ones over this number
alerts_path	No		An existing path where BGuardian has permissions to write and keep the alerts	/opt/bacula/bguardian/alerts	Path to store BGuardian alerts (json files). By default, they are stored inside working directory
bconsole_timeout_seconds	No	20	Integer	10	Timeout with BConsole commands
bweb_jobid_link	No	true	true, 1, yes, Yes ; false, 0, no, No	false	Enable/disable generation of the link to BWeb job report inside the BGuardian report
disable_events	No	false	true, 1, yes, Yes ; false, 0, no, No	true	Enable/disable Bacula Events generation on alert adding/removing situations
silent	No	false	true, 1, yes, Yes ; false, 0, no, No	true	Enable/disable silent mode, where BGuardian will not produce console output

Configuration Security Service

This service is activated if the service parameter contains the keyword: configurationsecurity.

Alert code is: GC__[SUBSERVICE]. Subservice codes are detailed below.

The purpose of this service is to report the result of different checks regarding security, status and best practices related to how Bacula is configured and running in the environment.

Option	Required	Default	Values	Example	Description
max_days_bacula_backup	No	10	Integer	5	Maximum number of days without a backup of the configuration or catalog before generating an alert
max_days_copy	No	15	Integer	3	Maximum number of days without any copy job before generating an alert
max_days_verify	No	30	Integer	30	Maximum number of days without any verify job before generating an alert
max_days_restore	No	30	Integer	60	Maximum number of days without any restore job before generating an alert
max_days_vacuum	No	7	Integer	2	Maximum number of days without running vacuum in dedup enabled storage daemons
dedup_free_space_min_bytes	No	107374182400	Long	536870912000	Limit on bytes of free space for dedup engines before raising the alert
permissions_base_paths	No	/opt/bacula	List of paths	/opt/bacula/etc	List of paths of bacula installation files where checking permissions
config_backup_job_name	No	BaculaDirectorConfigs	Job Name	BaculaConfig	Name of the job of the Backup of the configuration of Bacula, to analyze if it’s regularly run
catalog_backup_job_name	No	(auto-detected)	Job Name	BaculaCatalog	Name of the job of the Backup of the catalog of Bacula, to analyze if it’s regularly run
min_free_space_percent	No	10	Integer	5	Minimum percentage of free space for a given Storage Daemon before generating an alert
configuration_checks_exclude	No		*List of checks separated by ‘,’ (see next point)	restore, verify, copy, malware, antivirus	List of subchecks to exclude from the execution of this Configuration security service

Configuration Security Service Subchecks

Configuration security is a special service of BGuardian that checks many things related with the configuration of the environment.

By default, it will check everything, but it is possible to exclude some checks using the configuration_checks_exclude parameter and adding a list of keywords there (separated by ‘,’).

Below we briefly describe the keyword and the function of each of the services and what is the recommended action if a related issue is reported. The code serves for the alerts functionality and to quickly identify each issue:

passwords: Checks duplicated passwords and the strength of them inside Bacula Director configuration.
- Code: GC__PASSWOR
- Action: Make your passwords unique and use a strong keyword (+8 characters, include upper and lowercase, digits and symbols)
catalog_backup: Checks configuration and recent execution (max_days_bacula_backup parameter) of the backup of the catalog of Bacula (configurable by config_backup_job_name).
- Code: GC__CATALOG
- Action: Immediately run a backup of the catalog and make sure your schedule is frequent enough (once a week at least)
config_backup: Checks configuration and recent execution (max_days_bacula_backup parameter) of the backup of the configuration of Bacula (configurable by catalog_backup_job_name).
- Code: GC__CONFIG_
- Action: Immediately run a backup of the catalog and make sure your schedule is frequent enough (once a week at least)
restore: Checks execution of some recent restore (max_days_restore parameter). It is a best practice to run some restore of the different kinds of data from time to time.
- Code: GC__RESTOR
- Action: Run some restore for each kind of data and each kind of storage from time to time.
verify: Checks execution of some recent verify job (max_days_verify parameter). It is a best practice to verify the data of your jobs.
- Code: GC__VERIFY
- Action: Configure verify jobs for your data
copy: Checks execution of some recent restore (max_days_copy parameter). It is a best practice to use a multi-tier strategy with your backups.
- Code: GC__COPY
- Action: Define a second storage tier and configure copy jobs to send your data there.
malware: Checks jobs that could activate the Malware protection function, but have not enabled it.
- Code: GC__MALWAR
- Action: Enable Malware detection for any reported system that could be suitable to be infected because its location, usage pattern and data kind
antivirus: Checks the existence of one antivirus job for each client.
- Code: GC__ANTIVI
- Action: Configure an antivirus job for your clients, specially for file servers.
consoles: Checks the usage of restricted consoles.
- Code: GC__CONSOL
- Action: If you have different users accessing your Bacula environment, configure a restricted console for each of them, using only the minimal needed permissions
events: Checks the activation of Events in Message resources for auditing purposes.
- Code: GC__EVENTS
- Action: Enable events messages in your configuration and store them at your convenience (it is recommended to store them in a file and also in the catalog)
dir_status: Checks the status of the Director daemon to see if there is any reported error with the service.
- Code: GC__DIR_ST
- Action: Review the status of your Director service and start or restart it
dir_address: Checks the usage of DirAddress setting to limit Director service to be listening on specific interfaces
- Code: GC__DIR_AD
- Action: Use the DirAddress setting, so you limit the service to be listening only on the required interface
sd_status: Checks the status of the Storage Daemon(s) to see if there is any reported error with the service or if there is no connectivity with some of them.
- Code: GC__SD_STA
- Action: Review the status of the affected Storage Daemon and the connectivity to it from the Director host. Start or restart the service if needed
sd_free: Checks free space in each Storage Daemon is above the threshold (defined by min_free_space_percent parameter).
- Code: GC__SD_FREE
- Action: Review the status of the affected Storage Daemon and start or restart it
dedup: Enable getting dedup status for Storage Daemons in order to make dedup checks around Global Endpoint Deduplication. (Requires to not exclude sd_status)
ded_errors: Check if GED is reporting some general error. (Requires to not exclude dedup)
- Code: GC__DED_ER
- Action: Review the message and the status of the affected Storage Daemon. Restart it, run vacuum procedures and re-check
ded_orphan: Check if GED is reporting some orphan reference. (Requires to not exclude dedup)
- Code: GC__DED_OR
- Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
ded_vacuum: Check if GED vacuum procedure was executed recently enough. The number of days can be controlled with max_days_vacuum parameter. (Requires to not exclude dedup)
- Code: GC__DED_VA
- Action: Run vacuum as soon as possible.
ded_idx: Check if GED is marking some error with the indexes. (Requires to not exclude dedup)
- Code: GC__DED_ID
- Action: Review the status of the filesystem holding the indexes.
ded_miss: Check if GED is marking some missed reference. (Requires to not exclude dedup)
- Code: GC__DED_MI
- Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
ded_free: Check the free space available in GED engine (it relies on parameter dedup_free_space_min_bytes). (Requires to not exclude dedup)
- Code: GC__DED_FR
- Action: Provide more space to your GED containers filesystem. You can also purge data from your dedup storages and then run vacuum process to try to recover some space.
ded_suspect: Check if GED is reporting some suspect reference. (Requires to not exclude dedup)
- Code: GC__DED_SU
- Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
ded_derr: Check if GED is reporting errors in the dedup engine. (Requires to not exclude dedup)
- Code: GC__DED_DE
- Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
ded_cerr: Check if GED is reporting errors in the containers. (Requires to not exclude dedup)
- Code: GC__DED_CE
- Action: Run vacuum as soon as possible. If it is not solved, run scrub process.
fd_status: Checks the status of the client File Daemon(s) to see if there is any reported error with the service or if there is no connectivity with some of them.
- Code: GC__FD_STA
- Action: Review the status of the affected File Daemon and the connectivity to it from the Director host. Start or restart the service if needed
fips: Checks if FIPS is enabled on daemons supporting it (for DIR requires to not exclude ‘dir_status’ ; for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).
- Code: GC__FIPS
- Action: Consider enabling FIPS to the affected Daemon if your security posture needs to be very high
trace: Checks if trace is enabled in any daemon with the risk of fulling a disk (for DIR requires to not exclude ‘dir_status’ ; for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).
- Code: GC__TRACE
- Action: Disable debug and trace from the affected Daemon as soon as possible if you are not doing debug activities anymore
versions: Checks if the FDs and/or SDs Bacula versions are aligned with the version of the Director (for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).
- Code: GC__VERSIO
- Action: Install a supported Bacula version in the affected Daemon. Storage Daemon and Director must be on the same version, while File Daemons can run an older version than the Director.
security_plugin: Checks if the security plugin is deployed in each FD (requires to not exclude ‘fd_status’).
- Code: GC__SECURI
- Action: Install the security plugin in the affected File Daemons
permissions: Checks if permissions are strong enough for the given path and subpaths (permissions_base_paths parameter).
- Code: GC__PERMIS
- Action: Correct the permisisons on the affected paths. Usually you need to exclude the ‘others’ group from any bacula directory and to protect the bacula configuraton from undesiderable writes.
running_processes: Checks if running processes are running with root user, which is generally not recommended for secure environments.
- Code: GC__RUNNIG
- Action: Configure your daemos with the correct user. Director and Storage Daemon do not need to be run with root.
encryption: Check if encryption is used at all in the environment
- Code: GC__ENCRYP
- Action: Consider using encryption for any sensitive data or any untrusted storage.
volprotection: Check if volume protection is used at all in the environment
- Code: GC__VOLPRO
- Action: Consider enabling volume protection for you disk backup over linux, as well as any backup sent to NAS from NetApp, DataDomain or HPE StoreOnce.
pg_vacuum: Check if PostgreSQL vacuum process was executed recently enough on the key tables
- Code: GC__PG_VAC
- Action: Run Vacuum on the affected tables as soon as possible, during a low load window in your environment.
pg_analyze: Check if PostgreSQL analyze process was executed recently enough on the key tables
- Code: GC__PG_ANA
- Action: Run Analyze on the affected tables as soon as possible, during a low load window in your environment.
pg_config: Check if PostgreSQL configuration parameters are inside the recommended margins
- Code: GC__PG_CON
- Action: Correct the mentioned values to comply with the recommended configuration

Infected Service

This service is activated if the service parameter contains the keyword: infected.

Alert code is: GIN

The purpose of this service is to report jobs where some virus, ransomware or malware were detected, so a summary of them is easily available while a new alert for any new entry will also be generated.

Action: If you find any job containing some kind of malware or virus you should quickly isolate that system from your network and run healing activities on it. After, run a new backup and check that no more virus or malware are detected.

Security Events Service

This service is activated if the service parameter contains the keyword: securityevents.

Alert code is: GSE

This service will report any recent event registered in Bacula core with the security category.

Option	Required	Default	Values	Example	Description
securityevents_days_since	No	15	Integer	30	Defines the number of days to consider, from today, for the report of security events

Action: Review the nature of the event and act in consequence. If you find, for instance, many failed attempts to connect to the Director from BConsole, consider to change the location of your consoles or improve the security posture at networking level for them.

Deviation Service

This service is activated if the service parameter contains the keyword: deviation.

Alert code is: GDV

The purpose of this service is to analyze job executions statistically and find deviation from the expected values. Calculations are done over the size, number of files and duration of the jobs.

Depending on what kind of jobs and the nature of data of your environment, you may need to adjust the following parameters to maximize the utility of the deviation results. It is possible to adjust the different thresholds, as well as to decide if results should only be listed following regression deviation and deviation from average (default behavior) or exclude deviation from average if it is generating too much information and it is not pointing to issues in the environment (dev_include_by_avg parameter).

Parameters for deviation service are explained below:

Option	Required	Default	Values	Example	Description
dev_factor	No	0.4	Float	0.25	Defines what jobs that will trigger the alert of deviation. It means what relation with the calculated standard deviation is considered significant enough. 1 means 200% of the standard deviation, 0.5 means 150% of the standard deviation. Example: If the standard deviation for size is 100Mb, with a value of 0.5: A job with a deviation from the average or regression of 160Mb will be selected, a job with a deviation from the average of 50Mb won’t be selected, a job with a deviation from the average of 120Mb won’t be selected
dev_severity_low_limit	No	0.5	Float	0.8	Defines the limit to consider a selected deviated job as severity Low
dev_severity_medium_limit	No	0.75	Float	0.9	Defines the limit to consider a selected deviated job as severity Medium
dev_min_regression_accuracy	No	0.5	Float	0.4	Minimum value of regression accuracy in order to use the regression analysis
dev_min_dev_from_avg	No	0.8	Float	2	Minimum deviation from the average to consider a job as deviated for selection based on average
dev_min_duration	No	1200	Integer	3600	Minimum duration of a job in order to consider deviation by duration as something significant to select the job
dev_min_executions	No	5	Integer	20	Minimum number of executions of a given job in order to consider it for deviation analysis
dev_include_by_avg	No	true	1, true, yes, Yes ; 0, false, no, No	false	Enable/disable selection of results based on the average deviaton
dev_min_files	No	50	Integer	100	Minimum nuber of files in a job in order to consider deviation by number of files as something signficatn to select the job
dev_min_size	No	10485760	Integer	52428800	Minimu size of a job in order to include it in deviation analysis

Action: When this service reports results, you should review every result and analyze what caused the given deviation. A deviation can be caused by many reasons, like sudden new information to backup, a sudden slowness problem, some controlled massive deletion… However, the same effect can be caused from ransomware activities or even a not controlled or desired user activity. If you find such event, you will need to solve it and consider adjusting or re-runnnig some backups. If there is an explanation, every issue can be marked to be ignored in further alerts through the ignoring mechanism.

Failed in a row Service

This service is activated if the service parameter contains the keyword: failedinarow.

Alert code is: GFR

The purpose of this service is to report jobs that have failed N or more times in a row, according to the parameter failedrow_times.

Option	Required	Default	Values	Example	Description
failedrow_times	No	3	Integer	5	Defines thresshold of times a given job failed in order to select it to be included in the report

Action: Review as soon as possible the affected jobs and analyze the causes of the failure. You can run them manually to check the result and then adjust the configuration or the schedule according to the results of your analysis.

Restore Frequency Service

This service is activated if the service parameter contains the keyword: restorefrequency.

Alert code is: GRF

The purpose of this service is to report jobs whose restore frequency is below the factor established by the parameter: restore_factor.

Option	Required	Default	Values	Example	Description
restore_factor	No	0.15	Float	0.5	Defines thresshold of restoring frequency for a given job in terms of %

Action: Review your backup policies and include some periodic restores on it, in order to ensure your backups and restore strategies are correct and agile enough to be correctly prepared for the time when an urgent restore comes.

Success Ratio Service

This service is activated if the service parameter contains the keyword: successratio.

Alert code is: GSR

The purpose of this service is to report jobs whose successratio is below the factor established by the parameter: success_factor.

Option	Required	Default	Values	Example	Description
success_factor	No	0.8	Float	0.75	Defines thresshold of success ratio for the executions of a given job in terms of %
success_severity_medium_limit	No	0.4	Float	0.5	Defines the limit to consider a selected deviated job as severity Medium
success_severity_low_limit	No	0.6	Float	0.7	Defines the limit to consider a selected deviated job as severity Low

Action: Review the affected jobs and analyze the causes of the failures. If they are apparently random, consider to run a load analysis over your system in order to spread better the load over your network and hosts.

No Copy Service

This service is activated if the service parameter contains the keyword: nocopy.

Alert code is: GNC

The purpose of this service is to report jobs not included in any 2-tier policy, which means jobs that have never been copied or migrated.

Option	Required	Default	Values	Example	Description
nocopy_grace_period_days	No	10	Integer	10	Jobs that are more recent than the days defined by this parameter won’t be included in the report

Action: Review your backup policies and include a 2-Tier storage where sending the reported jobs through the configuration and execution of Copy jobs.

No Verify Service

This service is activated if the service parameter contains the keyword: noverify.

Alert code is: GNV

The purpose of this service is to report jobs not included in any verification policy, which means jobs that have never been verified.

Option	Required	Default	Values	Example	Description
noverify_grace_period_days	No	20	Integer	50	Jobs that are more recent than the days defined by this parameter won’t be included in the report

Action: Review your backup policies and include a verification phase where you run periodic verify jobs. Include the listed jobs in the report.

Empty Service

This service is activated if the service parameter contains the keyword: empty.

Alert code is: GE

The purpose of this service is to report Full jobs that were successful but have no contents (no files and no bytes stored).

Action: Review the affected jobs, including the joblog and the configuration. You may need to adjust the configuration or to run again the affected jobs.

Last Good Service

This service is activated if the service parameter contains the keyword: lastgood. Alert code is: GLG

The purpose of this service is to report jobs where its last successful execution is older than the number of days specified by the parameter: lastgood_max_since_days.

Option	Required	Default	Values	Example	Description
lastgood_max_since_days	No	5	Integer	10	Jobs that have no successful execution that is more recent than the days defined by this parameter will be included into the report

Action: Review as soon as possible the affected jobs and their latest executions. Run them manually if they have been missed, cancelled or failed. Troubleshoot any problem if they are not successful even with the manual execution.

Last Object Good Service

This service is activated if the service parameter contains the keyword: objectlastgood. Alert code is: GOLG

The purpose of this service is to report objects where its last successful associated backup job execution is older than the number of days specified by the parameter: lastgood_max_since_days.

Please, note that this object uses the same parameter as the Last good service.

Option	Required	Default	Values	Example	Description
lastgood_max_since_days	No	5	Integer	10	Objects that have no successful protection that is more recent than the days defined by this parameter will be included into the report

Action: Review as soon as possible the affected objects and their associated jobs, to review their latest executions. Run them manually if they have been missed, cancelled or failed. Troubleshoot any problem if they are not successful even with the manual execution.

Low Dedup

This service is activated if the service parameter contains the keyword: lowdedup.

Alert code is: GLD

The purpose of this service is to provide a report of jobs that are not having good enough deduplication and that are potentially miss-using resources for information that are not a good candidate to be deduplicated. This information is also useful if running out of space and need to delete or migrate the information of some jobs that are using a good amount of storage.

Option	Required	Default	Values	Example	Description
dedup_ratio_min	No	40	Float	60	Defines thresshold of dedup ratio. Jobs with a lower ratio will be reported. The threshold represents a percentage and jobs store it as ‘compressratio’ in the catalog
dedup_size_min	No	52428800	Long	262144000	Defines the limit in size to consider a selected low dedup job. Jobs with smaller size won’t be considered

Action: Consider disabling deduplication for the affected jobs if the ratio is very poor for your needs. If running out of space, consider deleting (or moving to a different storage with a Migration) large jobs with a poor ratio that are old enough for your needs.

Orphan Chain

This service is activated if the service parameter contains the keyword: orphanchain.

Alert code is: GOC

Incremental and Differential jobs are part of a chain, and they are dependent on the information of a previous job. Sometimes, due to human mistakes or due to a bad retention policy, chains can be broken, and a dependent job is recycled before an after one. This service will detect this situation and report jobs that are ‘orphan’ in these terms.

It is important to note that depending on the backup nature, Incremental or Differential jobs alone can be still useful. For instance, for any job that contains files, the information is still fully recoverable, and they can also be based on a previous older Full or Incremental having the restore job still working. However, for some Virtual Machine or Database plugins, it is possible that one Incremental or Differential job without their predecessor will not contain recoverable information. In general, it is important to try to avoid having any orphan job and this service is intended to help in that direction.

Action: If you ever detect an orphan job, review your backup policies regarding retention times and adjust them, if necessary, to not be automatically producing any orphan job. Check also for the affected jobs that you have other valid copies of the information, if you don’t, run new jobs as soon as possible.

Different filesystem

This service is activated if the service parameter contains the keyword: differentfilesystem.

Alert code is: GDFS

This service will detect and report jobs where the log message ‘xxx is a different filesystem. Will not descend from yyyy’ is produced. Paths reported there will be compared with the list of excluded ‘knonw’ paths configured by ‘different_fs_exclude’ parameter. If they do not match, jobs will be reported.

This situation happens when jobs are using filesets with OneFS option enabled. Depending on the environment, this behavior is absolutely desirable, but can hide some non desired exclussion. This service helps to avoid those kind of situations.

Option	Required	Default	Values	Example	Description
different_fs_exclude	No	/proc, /sys, /tmp, /boot	List of paths separated by ‘,’	/proc, /sys, /tmp, /boot, /myNonDesiredFS2, /mnt/myNonDesiredFS1,	List of known paths that are on different filesystems and there is no problem if they are reported as paths that won’t be backed up

Action: Review the path that was excluded and consider modifying your backup configuration if it is necessary to include it, or add it to the list to be excluded on this service otherwise.

NoTOTP Service

This service is activated if the service parameter contains the keyword: nototp.

Alert code is: GNT

The purpose of this service is to report users that have not enabled TOTP 2-Tier authentication mechanism.

Action: Consider to enable the TOTP 2-Tier authentication mechanism for the reported users in order to improve your security posture regarding BWeb access.