What to Do When Bacula Crashes
If you are running on a Linux system, and you have a set of working configuration files, it is very unlikely that Bacula will crash. As with all software, however, it is inevitable that someday, it may crash, particularly if you are running on another operating system or using a new or unusual feature.
This chapter explains what you should do if one of the three Bacula daemons (Director, File, Storage) crashes. When we speak of crashing, we mean that the daemon terminates abnormally because of an error. There are many cases where Bacula detects errors (such as PIPE errors) and will fail a job. These are not considered crashes. In addition, under certain conditions, Bacula will detect a fatal in the configuration, such as lack of permission to read/write the working directory. In that case, Bacula will force itself to crash with a SEGFAULT. However, before crashing, Bacula will normally display a message indicating why. For more details, please read on.
Traceback
Each of the three Bacula daemons has a built-in exception handler which, in case of an error, will attempt to produce a traceback. If successful the traceback can even be emailed to you.
For this to work, you need to ensure that a few things are setup correctly on your system:
You must have an installed copy of
gdb
(theGNU
debugger), and it must be on Bacula’s path. On some systems such as Solaris,gdb
may be replaced bydbx
.The Bacula installed script file
btraceback
must be in the same directory as the daemon which dies, and it must be marked as executable.The script file btraceback.gdb must have the correct path to it specified in the
btraceback
file.You must have a
mail
program which is on Bacula’s path. By default, thismail
program is set tobsmtp
, and it must be correctly configured in case you would prefer to receive the traceback report via email.If you run either the Director or Storage daemon under a non-root userid, you will most likely need to modify the
btraceback
file to do something likesudo
(raise to root priority) for the call togdb
so that it has the proper permissions to debug Bacula.
If all the above conditions are met, the daemon that crashes will produce a traceback report and place it in the working directory of the failed daemon, and, if set correctly, also sent it via email. If the above conditions are not true, you can either run the debugger by hand as described below, or you may be able to correct the problems by editing the btraceback
file.
Since each daemon has the same traceback code, a single btraceback file is sufficient if you are running more than one daemon on a machine.
Testing the Traceback
To “manually” test the traceback feature, you simply start Bacula then obtain the PID of the main daemon thread (there are multiple threads). The output produced here will look different depending on what OS and what version of the kernel you are running. Unfortunately, the output had to be split to fit on this page:
[kern@rufus kern]$ ps fax --columns 132 | grep bacula-dir
2103 ? S 0:00 /opt/bacula/bin/bacula-dir -c
/opt/bacula/etc/bacula-dir.conf
2104 ? S 0:00 \_ /opt/bacula/bin/bacula-dir -c
/opt/bacula/etc/bacula-dir.conf
2106 ? S 0:00 \_ /opt/bacula/bin/bacula-dir -c
/opt/bacula/etc/bacula-dir.conf
2105 ? S 0:00 \_ /opt/bacula/bin/bacula-dir -c
/opt/bacula/etc/bacula-dir.conf
The PID of the main daemon thread in this case is 2103. While Bacula is running, you call the program giving it the path to the Bacula executable, the PID and the working directory. In this case, it is:
./btraceback /opt/bacula/bin 2103 /opt/bacula/working
It should produce a file named (in this case) bacula.2103.traceback
in the working directory and an email showing you the current state of the daemon (in this case the Director), and then exit leaving Bacula running as if nothing happened. If this is not the case, you will need to correct the problem by modifying the btraceback
script.
Typical problems might be that gdb
or dbx
for Solaris is not on the default path. Fix this by specifying the full path to it in the btraceback
file. Another common problem is that you haven’t modified the script so that the bsmtp
program has an appropriate smtp server or the proper syntax for your SMTP
server. If you use the mail
program and it is not on the default path, it will also fail. On some systems, it is preferable to use Mail
rather than mail
.
Getting Traceback on Other Systems
It should be possible to produce a similar traceback on systems other than Linux, either using gdb
or some other debugger. Solaris with dbx
loaded works quite fine. On other systems, you will need to modify the btraceback
program to invoke the correct debugger, and possibly correct the btraceback.gdb script to have appropriate commands for your debugger.
See also
Go back to:
Go back to the Bacula Enterprise Troubleshooting chapter.