Parameters

One of the most important reasons to modify the default parameters is to select the services to include or exclude during the execution of BGuardian. By default, all services are included.

Additionally, some services allow sub-checks and those may also be excluded. The parameters to control these two features are:

  • service: For the main services

  • configuration_checks_exclude: To exclude checks from the configuration security service

After selecting the services to apply, there are also parameters which can control the results of those services.

Described below are all services and all of their specific parameters, as well as what actions are recommended to take if results are detected for each of them.

Fileset Common Parameters

The following parameters are applicable to the general behavior of the plugin:

Option

Required

Default

Values

Example

Description

mode

No

check

check, alert

alert

Run normal check mode or alert

mode to query current alerts

or add/remove alerts or ignores

service

No

configurationsecurity,

infected, securityevents,

deviation, successratio,

failedinarow, empty,

lastgood, nocopy,

noverify, lowdedup,

orphanchain,

restorefrequency,

differentfilesystem,

nototp

List (separated by ‘,’)

of elements from:

configurationsecurity,

infected

securityevents,

deviation,

successratio,

failedinarow,

empty, lastgood,

nocopy, noverify,

differentfilesystem,

lowdedup, orphanchain,

restorefrequency,

differentfilesystem,

nototp

configurationsecurity, nocopy

Select the services that

BGuardian will execute

service_exclude

No

List (separated by ‘,’)

of elements from:

configurationsecurity,

infected,

securityevents,

deviation,

successratio,

failedinarow,

empty, lastgood,

nocopy, noverify,

lowdedup, orphanchain,

restorefrequency,

differentfilesystem,

nototp

configurationsecurity, nocopy

Select the services that

BGuardian will exclude. Use

this variable to exclude a list

or the ‘service’ one to include

a list, but do not use both

config_file

No

The path pointing to

a file containing any

combination of plugin

parameters

/opt/bacula/etc/bguardian.settings

Allows to define a config file

where configure any parameter of

the plugin. Therefore you don’t

need to put them directly in

the Plugin line of the fileset

log

No

An existing path with

enough permissions for

File Daemon to create

a file with

the provided name

/tmp/bguardian.log

Generates additional log in addition

to what is shown in job log.

This parameter is included in the

backend file, so, in general,

by default the log is going to be

be stored in the working directory

debug

No

0, 1, 2, 3, 4, 5,

6, 7, 8, 9

8

Generates the

working/bguardian/bguardian-debug.log*

files containing debut information

which is more complete with a greater

debug number

reports_path

No

An existing path where

BGuardian has

permissions to write

and keep the reports

/opt/bacula/bguardian/reports

Path to store BGuardian reports

(html and json files). By default,

they are stored inside working

directory

reports_keep_number

No

100

Integer

20

Number of reports to keep.

BGuardian will remove the

older ones over this number

alerts_path

No

An existing path where

BGuardian has

permissions to write

and keep the alerts

/opt/bacula/bguardian/alerts

Path to store BGuardian alerts

(json files). By default, they are

stored inside working directory

bconsole_timeout_seconds

No

20

Integer

10

Timeout with BConsole commands

bweb_jobid_link

No

true

true, 1, yes, Yes ;

false, 0, no, No

false

Enable/disable generation of the

link to BWeb job report inside

the BGuardian report

disable_events

No

false

true, 1, yes, Yes ;

false, 0, no, No

true

Enable/disable Bacula Events

generation on alert

adding/removing situations

silent

No

false

true, 1, yes, Yes ;

false, 0, no, No

true

Enable/disable silent mode, where

BGuardian will not produce

console output

Configuration security service

This service is activated if the service parameter contains the keyword: configurationsecurity.

Alert code is: GC__[SUBSERVICE]. Subservice codes are detailed below.

The purpose of this service is to report the result of different checks regarding security, status and best practices related to how Bacula is configured and running in the environment.

Option

Required

Default

Values

Example

Description

max_days_bacula_backup

No

10

Integer

5

Maximum number of days without a backup of the configuration or catalog before generating an alert

max_days_copy

No

15

Integer

3

Maximum number of days without any copy job before generating an alert

max_days_verify

No

30

Integer

30

Maximum number of days without any verify job before generating an alert

max_days_restore

No

30

Integer

60

Maximum number of days without any restore job before generating an alert

max_days_vacuum

No

7

Integer

2

Maximum number of days without running vacuum in dedup enabled storage daemons

dedup_free_space_min_bytes

No

107374182400

Long

536870912000

Limit on bytes of free space for dedup engines before raising the alert

permissions_base_paths

No

/opt/bacula

List of paths

/opt/bacula/etc

List of paths of bacula installation files where checking permissions

config_backup_job_name

No

BaculaDirectorConfigs

Job Name

BaculaConfig

Name of the job of the Backup of the configuration of Bacula, to analyze if it’s regularly run

catalog_backup_job_name

No

(auto-detected)

Job Name

BaculaCatalog

Name of the job of the Backup of the catalog of Bacula, to analyze if it’s regularly run

min_free_space_percent

No

10

Integer

5

Minimum percentage of free space for a given Storage Daemon before generating an alert

configuration_checks_exclude

No

*List of checks separated by ‘,’ (see next point)

restore, verify, copy, malware, antivirus

List of subchecks to exclude from the execution of this Configuration security service

Configuration security service subchecks

Configuration security is a special service of BGuardian that checks many things related with the configuration of the environment.

By default, it will check everything, but it is possible to exclude some checks using the configuration_checks_exclude parameter and adding a list of keywords there (separated by ‘,’).

Below we briefly describe the keyword and the function of each of the services and what is the recommended action if a related issue is reported. The code serves for the alerts functionality and to quickly identify each issue:

  • passwords: Checks duplicated passwords and the strength of them inside Bacula Director configuration.

    • Code: GC__PASSWOR

    • Action: Make your passwords unique and use a strong keyword (+8 characters, include upper and lowercase, digits and symbols)

  • catalog_backup: Checks configuration and recent execution (max_days_bacula_backup parameter) of the backup of the catalog of Bacula (configurable by config_backup_job_name).

    • Code: GC__CATALOG

    • Action: Immediately run a backup of the catalog and make sure your schedule is frequent enough (once a week at least)

  • config_backup: Checks configuration and recent execution (max_days_bacula_backup parameter) of the backup of the configuration of Bacula (configurable by catalog_backup_job_name).

    • Code: GC__CONFIG_

    • Action: Immediately run a backup of the catalog and make sure your schedule is frequent enough (once a week at least)

  • restore: Checks execution of some recent restore (max_days_restore parameter). It is a best practice to run some restore of the different kinds of data from time to time.

    • Code: GC__RESTOR

    • Action: Run some restore for each kind of data and each kind of storage from time to time.

  • verify: Checks execution of some recent verify job (max_days_verify parameter). It is a best practice to verify the data of your jobs.

    • Code: GC__VERIFY

    • Action: Configure verify jobs for your data

  • copy: Checks execution of some recent restore (max_days_copy parameter). It is a best practice to use a multi-tier strategy with your backups.

    • Code: GC__COPY

    • Action: Define a second storage tier and configure copy jobs to send your data there.

  • malware: Checks jobs that could activate the Malware protection function, but have not enabled it.

    • Code: GC__MALWAR

    • Action: Enable Malware detection for any reported system that could be suitable to be infected because its location, usage pattern and data kind

  • antivirus: Checks the existence of one antivirus job for each client.

    • Code: GC__ANTIVI

    • Action: Configure an antivirus job for your clients, specially for file servers.

  • consoles: Checks the usage of restricted consoles.

    • Code: GC__CONSOL

    • Action: If you have different users accessing your Bacula environment, configure a restricted console for each of them, using only the minimal needed permissions

  • events: Checks the activation of Events in Message resources for auditing purposes.

    • Code: GC__EVENTS

    • Action: Enable events messages in your configuration and store them at your convenience (it is recommended to store them in a file and also in the catalog)

  • dir_status: Checks the status of the Director daemon to see if there is any reported error with the service.

    • Code: GC__DIR_ST

    • Action: Review the status of your Director service and start or restart it

  • dir_address: Checks the usage of DirAddress setting to limit Director service to be listening on specific interfaces

    • Code: GC__DIR_AD

    • Action: Use the DirAddress setting, so you limit the service to be listening only on the required interface

  • sd_status: Checks the status of the Storage Daemon(s) to see if there is any reported error with the service or if there is no connectivity with some of them.

    • Code: GC__SD_STA

    • Action: Review the status of the affected Storage Daemon and the connectivity to it from the Director host. Start or restart the service if needed

  • sd_free: Checks free space in each Storage Daemon is above the threshold (defined by min_free_space_percent parameter).

    • Code: GC__SD_FREE

    • Action: Review the status of the affected Storage Daemon and start or restart it

  • dedup: Enable getting dedup status for Storage Daemons in order to make dedup checks around Global Endpoint Deduplication. (Requires to not exclude sd_status)

  • ded_errors: Check if GED is reporting some general error. (Requires to not exclude dedup)

    • Code: GC__DED_ER

    • Action: Review the message and the status of the affected Storage Daemon. Restart it, run vacuum procedures and re-check

  • ded_orphan: Check if GED is reporting some orphan reference. (Requires to not exclude dedup)

    • Code: GC__DED_OR

    • Action: Run vacuum as soon as possible. If it is not solved, run scrub process.

  • ded_vacuum: Check if GED vacuum procedure was executed recently enough. The number of days can be controlled with max_days_vacuum parameter. (Requires to not exclude dedup)

    • Code: GC__DED_VA

    • Action: Run vacuum as soon as possible.

  • ded_idx: Check if GED is marking some error with the indexes. (Requires to not exclude dedup)

    • Code: GC__DED_ID

    • Action: Review the status of the filesystem holding the indexes.

  • ded_miss: Check if GED is marking some missed reference. (Requires to not exclude dedup)

    • Code: GC__DED_MI

    • Action: Run vacuum as soon as possible. If it is not solved, run scrub process.

  • ded_free: Check the free space available in GED engine (it relies on parameter dedup_free_space_min_bytes). (Requires to not exclude dedup)

    • Code: GC__DED_FR

    • Action: Provide more space to your GED containers filesystem. You can also purge data from your dedup storages and then run vacuum process to try to recover some space.

  • ded_suspect: Check if GED is reporting some suspect reference. (Requires to not exclude dedup)

    • Code: GC__DED_SU

    • Action: Run vacuum as soon as possible. If it is not solved, run scrub process.

  • ded_derr: Check if GED is reporting errors in the dedup engine. (Requires to not exclude dedup)

    • Code: GC__DED_DE

    • Action: Run vacuum as soon as possible. If it is not solved, run scrub process.

  • ded_cerr: Check if GED is reporting errors in the containers. (Requires to not exclude dedup)

    • Code: GC__DED_CE

    • Action: Run vacuum as soon as possible. If it is not solved, run scrub process.

  • fd_status: Checks the status of the client File Daemon(s) to see if there is any reported error with the service or if there is no connectivity with some of them.

    • Code: GC__FD_STA

    • Action: Review the status of the affected File Daemon and the connectivity to it from the Director host. Start or restart the service if needed

  • fips: Checks if FIPS is enabled on daemons supporting it (for DIR requires to not exclude ‘dir_status’ ; for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).

    • Code: GC__FIPS

    • Action: Consider enabling FIPS to the affected Daemon if your security posture needs to be very high

  • trace: Checks if trace is enabled in any daemon with the risk of fulling a disk (for DIR requires to not exclude ‘dir_status’ ; for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).

    • Code: GC__TRACE

    • Action: Disable debug and trace from the affected Daemon as soon as possible if you are not doing debug activities anymore

  • versions: Checks if the FDs and/or SDs Bacula versions are aligned with the version of the Director (for FDs requires to not exclude ‘fd_status’ ; for SDs requires to not exclude ‘sd_status’).

    • Code: GC__VERSIO

    • Action: Install a supported Bacula version in the affected Daemon. Storage Daemon and Director must be on the same version, while File Daemons can run an older version than the Director.

  • security_plugin: Checks if the security plugin is deployed in each FD (requires to not exclude ‘fd_status’).

    • Code: GC__SECURI

    • Action: Install the security plugin in the affected File Daemons

  • permissions: Checks if permissions are strong enough for the given path and subpaths (permissions_base_paths parameter).

    • Code: GC__PERMIS

    • Action: Correct the permisisons on the affected paths. Usually you need to exclude the ‘others’ group from any bacula directory and to protect the bacula configuraton from undesiderable writes.

  • running_processes: Checks if running processes are running with root user, which is generally not recommended for secure environments.

    • Code: GC__RUNNIG

    • Action: Configure your daemos with the correct user. Director and Storage Daemon do not need to be run with root.

  • encryption: Check if encryption is used at all in the environment

    • Code: GC__ENCRYP

    • Action: Consider using encryption for any sensitive data or any untrusted storage.

  • volprotection: Check if volume protection is used at all in the environment

    • Code: GC__VOLPRO

    • Action: Consider enabling volume protection for you disk backup over linux, as well as any backup sent to NAS from NetApp, DataDomain or HPE StoreOnce.

  • pg_vacuum: Check if PostgreSQL vacuum process was executed recently enough on the key tables

    • Code: GC__PG_VAC

    • Action: Run Vacuum on the affected tables as soon as possible, during a low load window in your environment.

  • pg_analyze: Check if PostgreSQL analyze process was executed recently enough on the key tables

    • Code: GC__PG_ANA

    • Action: Run Analyze on the affected tables as soon as possible, during a low load window in your environment.

  • pg_config: Check if PostgreSQL configuration parameters are inside the recommended margins

    • Code: GC__PG_CON

    • Action: Correct the mentioned values to comply with the recommended configuration

Infected service

This service is activated if the service parameter contains the keyword: infected.

Alert code is: GIN

The purpose of this service is to reports jobs where some virus, ransomware or malware were detected, so a summary of them is easily available while a new alert for any new entry will also be generated.

Action: If you find any job containing some kind of malware or virus you should quickly isolate that system from your network and run healing activities on it. After, run a new backup and check that no more virus or malware are detected.

Security events service

This service is activated if the service parameter contains the keyword: securityevents.

Alert code is: GSE

This service will report any recent event registered in Bacula core with the security category.

Option

Required

Default

Values

Example

Description

securityevents_days_since

No

15

Integer

30

Defines the number of days to consider, from today, for the report of security events

Action: Review the nature of the event and act in consequence. If you find, for instance, many failed attempts to connect to the Director from BConsole, consider to change the location of your consoles or improve the security posture at networking level for them.

Deviation service

This service is activated if the service parameter contains the keyword: deviation.

Alert code is: GDV

The purpose of this service is to analyze job executions statistically and find deviation from the expected values. Calculations are done over the size, number of files and duration of the jobs.

Depending on what kind of jobs and the nature of data of your environment, you may need to adjust the following parameters to maximize the utility of the deviation results. It is possible to adjust the different thresholds, as well as to decide if results should only be listed following regression deviation and deviation from average (default behavior) or exclude deviation from average if it is generating too much information and it is not pointing to issues in the environment (dev_include_by_avg parameter).

Parameters for deviation service are explained below:

Option

Required

Default

Values

Example

Description

dev_factor

No

0.4

Float

0.25

Defines what jobs that will trigger the alert of deviation. It means what relation with the calculated standard deviation is considered significant enough. 1 means 200% of the standard deviation, 0.5 means 150% of the standard deviation. Example: If the standard deviation for size is 100Mb, with a value of 0.5: A job with a deviation from the average or regression of 160Mb will be selected, a job with a deviation from the average of 50Mb won’t be selected, a job with a deviation from the average of 120Mb won’t be selected

dev_severity_low_limit

No

0.5

Float

0.8

Defines the limit to consider a selected deviated job as severity Low

dev_severity_medium_limit

No

0.75

Float

0.9

Defines the limit to consider a selected deviated job as severity Medium

dev_min_regression_accuracy

No

0.5

Float

0.4

Minimum value of regression accuracy in order to use the regression analysis

dev_min_dev_from_avg

No

0.8

Float

2

Minimum deviation from the average to consider a job as deviated for selection based on average

dev_min_duration

No

1200

Integer

3600

Minimum duration of a job in order to consider deviation by duration as something significant to select the job

dev_min_executions

No

5

Integer

20

Minimum number of executions of a given job in order to consider it for deviation analysis

dev_include_by_avg

No

true

1, true, yes, Yes ; 0, false, no, No

false

Enable/disable selection of results based on the average deviaton

dev_min_files

No

50

Integer

100

Minimum nuber of files in a job in order to consider deviation by number of files as something signficatn to select the job

dev_min_size

No

10485760

Integer

52428800

Minimu size of a job in order to include it in deviation analysis

Action: When this service reports results, you should review every result and analyze what caused the given deviation. A deviation can be caused by many reasons, like sudden new information to backup, a sudden slowness problem, some controlled massive deletion… However, the same effect can be caused from ransomware activities or even a not controlled or desired user activity. If you find such event, you will need to solve it and consider adjusting or re-runnnig some backups. If there is an explanation, every issue can be marked to be ignored in further alerts through the ignoring mechanism.

Failed in a row service

This service is activated if the service parameter contains the keyword: failedinarow.

Alert code is: GFR

The purpose of this service is to report jobs that have failed N or more times in a row, according to the parameter failedrow_times.

Option

Required

Default

Values

Example

Description

failedrow_times

No

3

Integer

5

Defines thresshold of times a given job failed in order to select it to be included in the report

Action: Review as soon as possible the affected jobs and analyze the causes of the failure. You can run them manually to check the result and then adjust the configuration or the schedule according to the results of your analysis.

Restore frequency service

This service is activated if the service parameter contains the keyword: restorefrequency.

Alert code is: GRF

The purpose of this service is to report jobs whose restore frequency is below the factor established by the parameter: restore_factor.

Option

Required

Default

Values

Example

Description

restore_factor

No

0.15

Float

0.5

Defines thresshold of restoring frequency for a given job in terms of %

Action: Review your backup policies and include some periodic restores on it, in order to ensure your backups and restore strategies are correct and agile enough to be correctly prepared for the time when an urgent restore comes.

Success ratio service

This service is activated if the service parameter contains the keyword: successratio.

Alert code is: GSR

The purpose of this service is to report jobs whose successratio is below the factor established by the parameter: success_factor.

Option

Required

Default

Values

Example

Description

success_factor

No

0.8

Float

0.75

Defines thresshold of success ratio for the executions of a given job in terms of %

success_severity_medium_limit

No

0.4

Float

0.5

Defines the limit to consider a selected deviated job as severity Medium

success_severity_low_limit

No

0.6

Float

0.7

Defines the limit to consider a selected deviated job as severity Low

Action: Review the affected jobs and analyze the causes of the failures. If they are apparently random, consider to run a load analysis over your system in order to spread better the load over your network and hosts.

No Copy service

This service is activated if the service parameter contains the keyword: nocopy.

Alert code is: GNC

The purpose of this service is to reports jobs not included in any 2-tier policy, which means jobs that have never been copied or migrated.

Option

Required

Default

Values

Example

Description

nocopy_grace_period_days

No

10

Integer

10

Jobs that are more recent than the days defined by this parameter won’t be included in the report

Action: Review your backup policies and include a 2-Tier storage where sending the reported jobs through the configuration and execution of Copy jobs.

No Verify service

This service is activated if the service parameter contains the keyword: noverify.

Alert code is: GNV

The purpose of this service is to reports jobs not included in any verification policy, which means jobs that have never been verified.

Option

Required

Default

Values

Example

Description

noverify_grace_period_days

No

20

Integer

50

Jobs that are more recent than the days defined by this parameter won’t be included in the report

Action: Review your backup policies and include a verification phase where you run periodic verify jobs. Include the listed jobs in the report.

Empty service

This service is activated if the service parameter contains the keyword: empty.

Alert code is: GE

The purpose of this service is to reports Full jobs that were successful but have no contents (no files and no bytes stored).

Action: Review the affected jobs, including the joblog and the configuration. You may need to adjust the configuration or to run again the affected jobs.

Last good service

This service is activated if the service parameter contains the keyword: lastgood. Alert code is: GLG

The purpose of this service is to reports jobs where its last successful execution is older than the number of days specified by the parameter: lastgood_max_since_days.

Option

Required

Default

Values

Example

Description

lastgood_max_since_days

No

5

Integer

10

Jobs that have no successful execution more recent than the days defined by this parameter will be included into the report

Action: Review as soon as possible the affected jobs and their latest executions. Run them manually if they have been missed, cancelled or failed.

Low Dedup

This service is activated if the service parameter contains the keyword: lowdedup.

Alert code is: GLD

The purpose of this service is to provide a report of jobs that are not having good enough deduplication and that are potentially miss-using resources for information that are not a good candidate to be deduplicated. This information is also useful if running out of space and need to delete or migrate the information of some jobs that are using a good amount of storage.

Option

Required

Default

Values

Example

Description

dedup_ratio_min

No

40

Float

60

Defines thresshold of dedup ratio. Jobs with a lower ratio will be reported. The threshold represents a percentage and jobs store it as ‘compressratio’ in the catalog

dedup_size_min

No

52428800

Long

262144000

Defines the limit in size to consider a selected low dedup job. Jobs with smaller size won’t be considered

Action: Consider disabling deduplication for the affected jobs if the ratio is very poor for your needs. If running out of space, consider deleting (or moving to a different storage with a Migration) large jobs with a poor ratio that are old enough for your needs.

Orphan Chain

This service is activated if the service parameter contains the keyword: orphanchain.

Alert code is: GOC

Incremental and Differential jobs are part of a chain, and they are dependent on the information of a previous job. Sometimes, due to human mistakes or due to a bad retention policy, chains can be broken, and a dependent job is recycled before an after one. This service will detect this situation and report jobs that are ‘orphan’ in these terms.

It is important to note that depending on the backup nature, Incremental or Differential jobs alone can be still useful. For instance, for any job that contains files, the information is still fully recoverable, and they can also be based on a previous older Full or Incremental having the restore job still working. However, for some Virtual Machine or Database plugins, it is possible that one Incremental or Differential job without their predecessor will not contain recoverable information. In general, it is important to try to avoid having any orphan job and this service is intended to help in that direction.

Action: If you ever detect an orphan job, review your backup policies regarding retention times and adjust them, if necessary, to not be automatically producing any orphan job. Check also for the affected jobs that you have other valid copies of the information, if you don’t, run new jobs as soon as possible.

Different filesystem

This service is activated if the service parameter contains the keyword: differentfilesystem.

Alert code is: GDFS

This service will detect and report jobs where the log message ‘xxx is a different filesystem. Will not descend from yyyy’ is produced. Paths reported there will be compared with the list of excluded ‘knonw’ paths configured by ‘different_fs_exclude’ parameter. If they do not match, jobs will be reported.

This situation happens when jobs are using filesets with OneFS option enabled. Depending on the environment, this behavior is absolutely desirable, but can hide some non desired exclussion. This service helps to avoid those kind of situations.

Option

Required

Default

Values

Example

Description

different_fs_exclude

No

/proc, /sys, /tmp, /boot

List of paths separated by ‘,’

/proc, /sys, /tmp, /boot, /myNonDesiredFS2, /mnt/myNonDesiredFS1,

List of known paths that are on different filesystems and there is no problem if they are reported as paths that won’t be backed up

Action: Review the path that was excluded and consider modifying your backup configuration if it is necessary to include it, or add it to the list to be excluded on this service otherwise.

NoTOTP service

This service is activated if the service parameter contains the keyword: nototp.

Alert code is: GNT

The purpose of this service is to report users that have not enabled TOTP 2-Tier authentication mechanism.

Action: Consider to enable the TOTP 2-Tier authentication mechanism for the reported users in order to improve your security posture regarding BWeb access.

Go back to the main configuration page.