Locks and Threads

Bacula has a builtin lock manager called lmgr. This lock manager is a wrapper for common pthread or mutex operation. The lock manager will overwrite the different POSIX thread functions via src/lockmgr.h.

It can detect deadlock situation during the run time. Found a deadlock !!!!

The lock manager can prevent deadlock and help developers to design with the mutex list (lib/mutex_list.h). If all mutex are acquired/released with some predefined order, deadlocks are not possible.

ERROR: V out of order lock=%p %s:%i dumping lock

The lock manager will also dump all the mutex map during a backtrace. It can be analyzed easily to find the incorrect lock path. In the following example, we can see that one thread (0x7f67abe5f700) has requested to lock the same lock (0x6a8b40) two times from two different location.

threadid=0x7f67abe5f700 max=4 current=1
   lock=0x6a8b40 state=Granted priority=0 jobq.c:476
   lock=0x6a8b40 state=Wanted  priority=0 jobq.c:324

Using the thread id, we can locate in the backtrace what the thread was doing

Thread 982 (Thread 0x7f67abe5f700 (LWP 6211)):
#0  0x00007f6d1d8684ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f6d1d863dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f6d1d863c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f6d1dcebdcb in lmgr_p (m=m@entry=0x6a8b40 <job_queue>) at lockmgr.c:105
#4  0x00007f6d1dced2ed in bthread_mutex_lock_p ("jobq.c", 324) at lockmgr.c:1026
#5  0x0000000000430d82 in jobq_remove (jcr=jcr@entry=0x7f6cfcb542c8) at jobq.c:324
#6  0x000000000042ec9e in cancel_job (ua=ua@entry=0x7f6be807a7f8, jcr=jcr@entry=0x7f6cfcb542c8,
                            wait=wait@entry=60, cancel=cancel@entry=true) at job.c:746
#7  0x000000000042f10a in allow_duplicate_job (jcr=jcr@entry=0x7f6cfc361838) at job.c:1155
#8  0x00000000004317e5 in reschedule_job (je=0x7f6cfcac48d8, jq=0x6a8b40, jcr=0x7f6cfc361838) at jobq.c:675
#9  jobq_server (arg=arg@entry=0x6a8b40 <job_queue>) at jobq.c:494
#10 0x00007f6d1dcece25 in lmgr_thread_launcher (x=0x7f6cfc369578) at lockmgr.c:1184
#11 0x00007f6d1d861dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f6d1c289ead in clone () from /lib64/libc.so.6

We can also have some information about the job 0x7f67abe5f700 looking the jcr dump list.

threadid=0x7f67abe5f700 JobId=936628 JobStatus=t jcr=0x7f6cfc361838
        name=XXX-SQL.2019-08-17_21.30.00_07
    use_count=1 killable=0
    JobType=B JobLevel=F
    sched_time=18-Aug-2019 03:13 start_time=18-Aug-2019 00:18
    end_time=18-Aug-2019 02:43 wait_time=01-Jan-1970 01:00
    db=0x7f69b0921e28 db_batch=0x7f6bb0033d38 batch_started=0
    wstore=0x7f6b246262e8 rstore=(nil) wjcr=(nil) client=0x7f6b240c5438
        reschedule_count=1 SD_msg_chan_started=0

BDB=0x7f69b0921e28 db_name=bacula db_user=bacula connected=true
    cmd="SELECT MediaId,VolumeName,VolJobs,...
    RWLOCK=0x7f69b0921e40 w_active=0 w_wait=0

The lock manager can also handle threads, by default on Linux, it is not safe to call pthread_kill on a non existing thread. On linux, pthread_t is a pointer to a struct. As detached threads are released automatically, trying to kill an old thread will raise a segmentation fault. With the lockmanager, the replacement of the pthread_kill will check if the thread is registered in the lock manager before to kill it.

The lock manager also implements a ring of events. This list can be displayed in the backtrace file. It can be used to analyze the life of a mutex for example. (the backtrace is a fixed picture at a given time).

Possible Next Steps

Go to Tabbing.

Go back to Developer Notes.

Go back to Developer Guide.