Dedup 2 Index Rebuild

The Bacula Systems Support Team may ask you to rebuild your deduplication engine version 2 index. This article provides instructions on how to perform this process.

Prerequisites

Before proceeding with the rebuild, ensure that the storage space where the index is stored has a maximum occupancy of 40%.

If the current storage location doesn’t meet this requirement, choose an alternative destination folder that can adequately host the rebuilt index. The -o option of the `bdde-fsck`command permits to indicate the full path of the new index.

Following these guidelines will help ensure a smooth index rebuild process. The subsequent sections will detail the step-by-step procedure for rebuilding your dedup index.

Instructions

1. Stop the Storage Daemon: the proper use of the bdde-fsck and bdde-check tools require that the Storage Daemon is not running.

$ sudo systemctl stop bacula-sd
  1. Rebuild the index: Use bdde-fsck.

In the example below, we use 10 threads (-T 10) to scan the container files in parallel and create smaller meta data files of the results.

These meta data files are stored in a subdirectory (verify) where the index file is located. Their names match the container files (like verify-00000000.ctn for beed2-00000000.ctn). These meta datafiles are much smaller (44 bytes/index record) than the container files themselves and their total size should be smaller than the size of the current index file.

The container scanning phase of the process is designed to be restartable. In the event of a failure before completion, it can be restarted without the need to rescan containers that have already been successfully scanned.

The following information is required to set up the bdde-fsck command - examples are included:

  • the filename of index file: /mnt/index/beeindex.tch

  • the directory where the container files are stored: /mnt/containers

  • the output filename for the new index file: /mnt/index/repair.tch

  • the output filename for the repair log: /mnt/index/repairlog.txt

Before running bdde-fsck, the user must set up the LD_LIBARY_PATH environment variable:

$ export LD_LIBRARY_PATH=/opt/bacula/lib

Then, run the tool in the background with /usr/bin/nohup to the repair log:

$ /usr/bin/nohup /opt/bacula/bin/bdde-fsck -i /mnt/index/beeindex.tch -D /mnt/containers -s -R -o /mnt/index/repair.tch -g -T 10 -dt > /mnt/index/repairlog.txt 2>&1 &

3. Monitoring progress (optional): The rebuild can take from several hours to many days depending on the amount of data in the containers and the performance of the container storage. The verify meta data sizes can be used to estimate progress.

For example:

First, determine the size in KiB of the verify directory:

$ du -sk /mnt/index/verify
38524

Second, use this result to compute the number of index entries that have been scanned from the containers:

$ expr 38524 \* 1024 / 44
896558

In this simple example there have been almost 900 thousand records scanned. This number can be estimated against the number of records in the current index to determine the progress. The current size of the index can be found with information in the SD tracelog or with beed-check on the current index, the RNUM (or record number):

$ /opt/bacula/bin/bdde-check -i /mnt/index/beeindex.tch

4. Proper completion: When the bdde-fsck job completes, there should be messages in the log file (/mnt/index/repairlog.txt):

: bdde-fsck.cc:595-0 syncing index
: bdde-fsck.cc:597-0 index closed
: bdde-fsck.cc:1103-0 Rebuild of the index ok

The user should submit the repairlog.txt along with the output from bdde-check for the new index file for review in a support ticket before moving on:

$ /opt/bacula/bin/bdde-check -i /mnt/index/repair.tch

Note

Depending on the errors/warning found on the repairlog.txt file, additional steps (to be determined with Bacula Systems Support Team) are required.

5. Backup of meta data: Once the rebuild process has been confirmed correct, a backup of the meta data files should be made to avoid rescanning the containers if some other error presents itself:

$ mkdir /mnt/index/verify.bkp
$ cp /mnt/indext/verify/* /mnt/index/verify.bkp

6. Copy the new index (repair.tch) to be the current dedup2 index (beeindex.tch), and start the Storage Daemon. At this point we are ready to start the SD with the new index file.

Note

Do not run any backup jobs or vacuum processes yet!

$ cp /mnt/index/repair.tch /mnt/index/beeindex.tch
$ sudo systemctl start bacula-sd

Send a tail -n 1000 of the SD tracelog to Bacula Systems Support Team for review before moving on.

$ tail -n 1000 /opt/bacula/working/bacula-sd.trace
  1. Run a vacuum checkmiss to mark any unused space in the containers. Using bconsole:

    $ dedup vacuum checkmiss storage=<dedup2_storage_name>

Result

Dedup2 index has been rebuilt, and the Storage Daemon is ready to resume normal operations.

Go back to the Global Endpoint Deduplication 2 page.