Configuration

Enterprise

Bacula Enterprise Only

This solution is only available for Bacula Enterprise. For subscription inquiries, please reach out to sales@baculasystems.com.

The HDFS Plugin configuration is split into connection, snapshot selection, tuning, query, and restore parameters. The table below focuses on the parameters that are actually honored by the current implementation.

Generic Parameters

HDFS generic parameters

Parameter

Default

Notes

path

plugin path

Base path used for logs and local plugin artifacts.

debug

0

Increases logging verbosity when greater than zero (0-9).

abort_on_error

no

Any error will abort the job

Connection and Authentication Parameters

HDFS connection parameters

Parameter

Default

Notes

url

none

HDFS endpoint URL. Mandatory.

user

none

User used to access the HDFS namespace. Mandatory

config_file

none

Loads all or additional parameters from a configuration file.

core_site_path

none

Path to core-site.xml.

hdfs_site_path

none

Path to hdfs-site.xml.

ssl_client_path

none

Path to ssl-client.xml when HTTPS client settings are required.

user_principal

none

Kerberos principal used by the File Daemon when Kerberos is enabled.

keytab

none

Path to the Kerberos keytab file on the File Daemon host.

service_principal

none

Kerberos service principal used to reach the target HDFS services.

The XML files do not need to remain on the HDFS cluster. You can copy core-site.xml, hdfs-site.xml, and ssl-client.xml from the HDFS system to any directory on the File Daemon host, then point the plugin to those local paths with the parameters above.

When several configuration sources are provided, the most specific setting wins. Direct plugin parameters take precedence over values loaded from config_file. For restore sessions, the restore-side options shown in BConsole apply to the restore job, while the connection and authentication parameters continue to describe the target HDFS cluster.

Snapshot and Filesystem Parameters

HDFS snapshot parameters

Parameter

Default

Notes

base_snap_dir

/

Directory that will be snapshotted. Mandatory, but it defaults to “/” when not specified

keep_snapshot

yes

Keeps the created snapshot after the job completes when enabled.

include

none

Glob include filter. Can be repeated.

regexinclude

none

Regular expression include filter. Can be repeated.

exclude

none

Glob exclude filter. Can be repeated.

regexexclude

none

Regular expression exclude filter. Can be repeated.

If no include or exclude filters are provided, the plugin backs up the files reachable from the selected snapshot directory.

Advanced/Tuning Parameters

HDFS tuning parameters

Parameter

Default

Notes

concurrent_threads

500

Controls worker concurrency. Values above 1000 are capped.

backup_queue_size

1000

Controls the handoff queue between fetcher and opener stages. Values above 5000 are capped.

The code enforces safe bounds for these tuning values before they are used by the backup pipeline.

Restore Parameters

This list is accessible in restore sessions in BConsole through the Plugin Options menu.

HDFS restore parameters

Parameter

Default

Notes

destination_path

empty

Destination path used when restoring to a live HDFS namespace.

url

none

HDFS endpoint URL used during restore. Mandatory.

user

none

User used to access the HDFS namespace during restore. Mandatory.

core_site_path

none

Path to the local copy of core-site.xml on the File Daemon host.

hdfs_site_path

none

Path to the local copy of hdfs-site.xml on the File Daemon host.

ssl_client_path

none

Path to the local copy of ssl-client.xml when HTTPS client settings are required.

user_principal

none

Kerberos principal used by the File Daemon when Kerberos is enabled.

keytab

none

Path to the Kerberos keytab file on the File Daemon host.

service_principal

none

Kerberos service principal used to reach the target HDFS services.

debug

none

Controls debug level from 0 to 9.

The Bacula restore command uses where to choose the restore target. Leave where empty or set it to / to restore back into HDFS. Set where to a local path when the restore should be written to the File Daemon host.

Fileset Examples

Example: full HDFS backup with snapshot directory selection.

Fileset {
  Name = FS_Hdfs
  Include {
    Plugin = "hdfs: user=hadoop url=hdfs://localhost:9000 base_snap_dir=/data"
  }
}

Example: filter files with glob and regular expression excludes.

Fileset {
  Name = FS_Hdfs_filtered
  Include {
    Plugin = "hdfs: user=hadoop url=hdfs://localhost:9000 base_snap_dir=/data include=*/logs/* regexexclude=.*\.tmp\Z"
  }
}

Example: use local copies of core-site.xml and hdfs-site.xml on the File Daemon host.

Fileset {
  Name = FS_Hdfs_XML_Auth
  Include {
    Plugin = "hdfs: user=hadoop url=hdfs://mycluster core_site_path=/etc/bacula/hdfs/core-site.xml hdfs_site_path=/etc/bacula/hdfs/hdfs-site.xml base_snap_dir=/data/hadoop"
  }
}

Example: connect to a Kerberos-enabled cluster using XML files and service credentials.

Fileset {
  Name = FS_Hdfs_Kerberos
  Include {
    Plugin = "hdfs: user=hadoop url=hdfs://nn01.example.com:8020 core_site_path=/etc/bacula/hdfs/core-site.xml hdfs_site_path=/etc/bacula/hdfs/hdfs-site.xml user_principal=hadoop@EXAMPLE.COM keytab=/etc/bacula/hdfs/hadoop.keytab service_principal=hdfs/_HOST@EXAMPLE.COM base_snap_dir=/data/hadoop"
  }
}

Go back to: HDFS Plugin.