Alerts

On top of the reporting features described in the Reports section BGuardian also implements an Alert framework that will keep track of those issues through executions of the tool through the time. The purpose of this framework is to easy the management of the different issues and to provide the proper functions to see the status of an environment in a given point in time when BGuardian is run regularly.

Alerts will group the information of the issues by bigger entities. For example, the same job can present many records inside the deviation service, with different executions of it. In the issue report we will see individually each entry, while in the alerts framework we will only have a single alert for that jobname, grouping all the affected executions.

BGuardian allows to precisely select the services that are desired to be run. However, the backup administrator will find situations where the service is still interesting to be run, while there are some records that he/she acknowledged already and does not desire to see anymore in the reports or in the alert lists. Here is where the ignore feature comes in place. Using the code of a given issue (or part of it in some situations) it is possible to mark an issue or a group of them to be ignored in further executions.

In this section, the alerts structure, the different commands and soem examples of them are shown

Alerts structure

Alerts are store in the path defined by ‘alerts_path’ parameter. Each alert is represented by .json file, whose name is the code of the alert and the kind of it.

Alerts files
 $ ls -l
 total 56
 -rw-rw-r-- 1 bac bac  343 jun 16 12:31 GC__CATA.on.json
 -rw-rw-r-- 1 bac bac  342 jun 16 12:31 GC__CONF.on.json
 -rw-rw-r-- 1 bac bac  396 jun 16 12:31 GC__CONS.on.json
 -rw-rw-r-- 1 bac bac  324 jun 16 12:31 GC__COPY.on.json
 -rw-rw-r-- 1 bac bac  394 jun 16 12:31 GC__EVEN.on.json
 -rw-rw-r-- 1 bac bac  401 jun 16 12:31 GC__MALW.on.json
 -rw-rw-r-- 1 bac bac  482 jun 16 12:31 GC__PASS.on.json
 -rw-rw-r-- 1 bac bac  371 jun 16 12:31 GC__PERM.on.json
 -rw-rw-r-- 1 bac bac  312 jun 16 12:31 GC__REST.on.json
 -rw-rw-r-- 1 bac bac  406 jun 16 12:31 GC__SECU.on.json
 -rw-rw-r-- 1 bac bac  310 jun 16 12:31 GC__VERI.on.json
 -rw-rw-r-- 1 bac bac 1298 jun 16 12:31 GDV__guardianjob.on.json
 -rw-rw-r-- 1 bac bac  219 jun 16 12:31 GRF__guardianjob.on.json
 -rw-rw-r-- 1 bac bac  239 jun 16 12:31 GSR__guardianjob.on.json

The ‘on.json’ extension represents an active alert, while the ‘off.json’ extension represents an alert that will be ignored in future executions.

List alerts

The following command lists active alerts:

Active alerts
# Text mode
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian list_text

# Json mode
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian list

Example output in text format:

Active alerts
 GC__CONF | LOW | Service: configurationsecurity | Entity: config_backup | Details: {"CONFIG_BACKUP":{"passed":false,"description":"Catalog Backup Job executions","details":"Catalog Backup Job was not run last 10 days","subCheck":"CONFIG_BACKUP","severity":"HIGH"}}
 GC__CONS | LOW | Service: configurationsecurity | Entity: consoles | Details: {"CONSOLES":{"passed":false,"description":"Restricted consoles","details":"No restricted console was found. If external connections are allowed, it is recommended to use restricted consoles for them","subCheck":"CONSOLES","severity":"LOW"}}

The structure of an alert is similar to the structure of an issue. It is composed by:

Alert structure
Code | Severity | BGuardian service | Affected entity | Issue details

The following command lists ignore alerts:

List Ignore alerts
# Text mode
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian list_ignore_text

# Json mode
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian list_ignore

Ignore alerts

The following command adds ‘ignores’, which means that issues matching the ignore code will be ignored from active alerts and from further BGuardian executions.

Ignore an alert
# One single code
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian ignore code1

# Several codes at once
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian ignore code1, code2, code3

Using the codes shown in list alerts, we could ignore them with the following commands

Ignore some alerts
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian ignore GC__CONF, GC__CONS

There are services that have entities, but also more information. For example the deviation service works with jobnames and with job ids. Example:

Deviation issue
 ############### Service: Deviation ###############
 HIGH | GDV__GuardianJob__9 | Job: 9 GuardianJob F 2023-06-16 00:00:00
     | Size (read): 142,39 MiB  +4080,1%  | Size (write): 142,43 MiB
     | Files: 4,02 K | Duration: 3s
     | Executions: 5 | Average: 3,41 MiB r | 3,41 MiB w - 819  files - 0s
     | Details: Significant deviations found: Job size (bytes read) increased 4080,1% from the expected estimated value of: 3,41 MiB.

It is possible to ignore here any execution of the GuardianJob. To do so:

Ignore alert by entity
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian ignore GDV__GuardianJob

However, it is also possible to ignore only particular jobids:

Ignore alert by id
 sudo -u bacula /bin/bash /opt/bacula/bin/bguardian ignore GDV__GuardianJob__9

Doing so, new deviated results of the same job will be still considered in next executions.

Manually remove alerts

It is possible to remove active alerts using a very similar format:

Remove active alert
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian remove_alert GDV__GuardianJob__9

To remove something from the ignore list:

Remove ignore
sudo -u bacula /bin/bash /opt/bacula/bin/bguardian remove_ignore GC__CATA

Alerts recovery

When a situation marked by a given issue is solved, BGuardian will automatically remove the associated alert.

Events

In order to notify the backup administrator when an alert is created or recovered, BGuardian uses the Events feature. This means it will send an Message of type Event with the information of what happened.

Example of BGuardian event about permissions recovery (source ‘bguardian’):

Events
 *list events
 Automatically selected Catalog: MyCatalog
 Using Catalog "MyCatalog"
 +---------------------+---------------+---------------+-----------------------+------------------------------------------+
 | time                | daemon        | source        | type                  | events                                   |
 +---------------------+---------------+---------------+-----------------------+------------------------------------------+
 | 2023-06-16 13:20:57 | 127.0.0.1-dir | *Director*    | daemon                | Director configuration reloaded          |
 | 2023-06-16 13:21:34 | 127.0.0.1-dir | *Console*     | connection            | Connection from 127.0.0.1:8101           |
 | 2023-06-16 13:21:34 | 127.0.0.1-dir | **bguardian** | configurationsecurity | BGuardian alert [GC__PERM] was recovered |
 | 2023-06-16 13:21:34 | 127.0.0.1-dir | *Console*     | connection            | Disconnection from 127.0.0.1:8101        |
 | 2023-06-16 13:21:37 | 127.0.0.1-dir | *Console*     | connection            | Connection from 127.0.0.1:8101           |
 +---------------------+---------------+---------------+-----------------------+------------------------------------------+

Note that in order to have events feature working, it is needed to enable them in the Message resources as discussed in the Configuration section.

Go back to the main operations page.