S3/Amazon Account Management

Important

The S3 Driver is now deprecated. If you are still using it, move from the S3 driver to the Amazon driver with the same directives and parameters as they are most of them compatible.

Create an Account

Amazon offers high-level S3 commands with the AWS Command Line Interface. The tool (aws) is mandatory for the Bacula Amazon driver to work. It can also be used to verify manually your account and list everything you have stored in you Amazon S3 buckets. The advantage of testing your setup with the aws command is that the same Amazon credentials are used in accessing S3 via aws as in the Bacula Cloud resource.

Go to the following link and create an S3/Amazon account.

https://aws.amazon.com/s3

Then you might want to use the S3/Amazon tutorial. Note that some of the information in the tutorial is repeated below for your convenience.

https://aws.amazon.com/s3/getting-started/

Install the AWS Command Line Interface

A guide which explains how to install the AWS Command Line Interface (CLI) can be found here:

http://docs.aws.amazon.com/cli/latest/userguide/installing.html

We chose pip to install the AWS CLI. Amazon recommends to have at least Python version 2.7.

Make sure that Python is installed:

[root@localsd1 ~]# python --version
Python 3.10.6
[root@localsd1 ~]#

The pip tool was not installed in our case which can be checked with:

[root@localsd1 ~]# pip --help
-bash: pip: command not found
[root@localsd1 ~]#

Installation is easy (but produces warnings with Python versions < 2.7):

[root@localsd1 ~]# curl -O https://bootstrap.pypa.io/get-pip.py
[root@localsd1 ~]# python get-pip.py

Once you have pip, you can install the AWS CLI and check if the installation was successful:

[root@localsd1 ~]# pip install awscli
[root@localsd1 ~]# aws help

Define user and export credentials

This step is not mandatory if you plan to use aws only thru the Bacula Amazon cloud driver, since Bacula will automatically use the Bacula Cloud resource credentials, but it is recommended for using aws as a standalone command line interface.

To be able to access your S3/Amazon buckets, you will need to create an authorized user in the web interface: Services \(\rightarrow\) Security & Identity \(\rightarrow\) IAM. Attach the policy AmazonS3FullAccess to this user. Create access keys in the web portal (tab Security Credentials in IAM) and export them as a CSV file.

Use the configure command:

[root@localsd1 ~]# aws configure

to store the credentials locally (see also http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html)

Create a Bucket in S3/Amazon

Log into your AWS console (https://aws.amazon.com/console/), select the correct region, and choose Services \(\rightarrow\) Storage & Content Delivery \(\rightarrow\) S3. Create a bucket and give it a unique name. In our test scenario we created a bucket called bsyssdsync with one folder volumes inside it.

You can also create the bucket with the AWS Command Line Interface:

[root@localsd1 ~]# aws s3api --endpoint-url='https://s3.amazonaws.com' \
  create-bucket --bucket bsyssdsync

Copy and sync files

You can list your S3/Amazon buckets with:

[root@localsd1 ~]# aws s3 ls
2016-04-11 21:13:02 bsyssdsync
[root@localsd1 ~]#

To copy volumes from your local SD to the cloud, use:

[root@localsd1 ~]# cd /srv/bacula-storage
[root@localsd1 bacula-storage]# ls
Vol-0002  Vol-0005
[root@localsd1 bacula-storage]#
[root@localsd1 bacula-storage]# aws s3 cp Vol-0002 s3://bsyssdsync/volumes/
upload: ./Vol-0002 to s3://bsyssdsync/volumes/Vol-0002
[root@localsd1 bacula-storage]#

This of course only makes sense when you have only one job per volume and your volumes are marked Full by Bacula after the job. You could trigger the upload to the cloud in a RunScript after the job and then delete the local copy. You would have to make sure though that in the restore case all volumes are available again in the local file system.

The aws s3 cp works in both directions:

[root@localsd1 ~]# aws s3 cp <source> <destination>

and behaves like the UNIX cp command (more details: http://aws.amazon.com/cli/). However, when you have more than one job per volume, and volumes with fixed maximum size configured in Bacula, you will want to sync the directory with Bacula volumes to S3. The AWS CLI has a command for that:

[root@localsd1 ~]# aws s3 sync /srv/bacula-storage s3://bsyssdsync/volumes

In this case you would identify Full volumes with a Bacula Catalog query and delete them after all backups have been run and the volumes have been synced. Again, you will need to make sure that they are available when you want to restore data.

When it comes to Bacula reusing volumes (after the configured retention times have passed), you would probably use a different configuration approach in a cloud scenario: Configure retention times to be indefinitely long (i.e. years), the volumes will be synced away into the cloud, deleted from disk, and then you will use Amazon mechanisms to tier the volumes to less expensive storage, from

General Purpose (Amazon S3 Standard)

\(\downarrow\)

Infrequent Access (Amazon S3 Standard - Infrequent Access)

\(\downarrow\)

Archive (Amazon Glacier)

and finally delete them. Learn more about Amazon Storage Classes: https://aws.amazon.com/s3/storage-classes/. Policies can be set for each bucket independently in the AWS web portal.

S3/Amazon Glacier Instant/Flexible Retrieval

Another effective strategy is to use Bacula direct restoration from S3/Amazon Glacier Instant/Flexible Retrieval (available with Bacula Enterprise 12.2.0 and later) or S3/Amazon Glacier Deep Archive (available with Bacula Enterprise 14.0.0 and later). Configure the bucket lifecycle to automatically handle transition to S3 Glacier Instant/Flexible Retrieval or S3/Amazon Glacier Deep Archive after backup:

https://docs.aws.amazon.com/en_pv/AmazonS3/latest/user-guide/create-lifecycle.html

and use the TransferPriority and TransferRetention Cloud directives to configure rehydration before restoration. See S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive Directives for details. Bacula will monitor the rehydration process until its completion and proceed normally with parts download and restoration afterwards.

S3/Amazon Glacier Deep Archive

Bacula direct restoration can also be used to restore from S3/Amazon Glacier Deep Archive (available with Bacula Enterprise 12.2.0 and later. S3/Amazon Glacier Instant Retrieval and S3/Amazon Glacier Deep Archive support is provided as a separate option). Configure the bucket lifecycle to automatically handle transition to S3/Amazon Glacier Deep Archive after backup:

https://docs.aws.amazon.com/en_pv/AmazonS3/latest/user-guide/create-lifecycle.html

and use the same TransferPriority and TransferRetention Cloud directives as for Glacier to configure rehydration before restoration. see S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive Directives for details. Bacula will monitor the rehydration process until its completion and proceed normally with parts download and restoration afterwards.

Time coordination

It seems that S3/Amazon syncing is sensitive to clock differences. If you get a S3-RequestTimeTooSkewed error during aws s3 sync you should use Amazon NTP servers:

http://www.emind.co/how-to/how-to-fix-amazon-s3-requesttimetooskewed

S3 Object Lock

If Object Lock is configured on your target bucket, do not use the S3 driver to backup to it, but rather the more recent Amazon driver, by changing the Driver keyword of the Cloud resource in your SD configuration from “S3” to “Amazon”.

Once the destination storage has immutability capabilities enabled, Bacula will work transparently with it. The only requirement is to have greater Bacula retention for the implied volumes than the retention configured in the cloud.

Note

To read more about security and data immutability, click here.