Linux Kernel: Deadlocks and how to avoid them
Deadlock Problem:Scenario 1: Self deadlock - "re-acquire lock"
Say there's a thread A, it acquires lock X1 and then before relenquishing the lock, it re-acquires it. This will lead to a deadlock.
thread_A()
{
. . .
spinlock_lock(&X1)
do_some_stuff()
spinlock_lock(&X1) //deadlock!!
. . .
spinlock_unlock(&X1)
}
Scenario 2: ABBA deadlock
Say there're two threads thread A and thread B executing simultaneously and two locks X1 and X2 (protecting some critical region). A acquires X1 and B acquires X2 and then A tries to acquire X2 and B tries to acquire X1. This will deadlock the both the threads.
thread_A()
{
. . .
spinlock_lock(&X1)
do_some_stuff()
spinlock_lock(&X2) //deadlock if thread_B has already acquired X2 as well as X1
. . .
}
thread_B()
{
. . .
spinlock_lock(&X2)
do_some_stuff()
spinlock_lock(&X1) //deadlock if thread_A has already acquired X1 as well as X2
. . .
}
How to avoid deadlocks (Coding Guidelines):
These guidelines are basically spawned from the kind of deadlocks that exist:
1. Maintain lock order ie where ever multiple locks are being acquired, make sure they are all acquired in the same order otherwise it would lead to a ABBA deadlock situation above.
2. Never re-acquire the same lock that you are holding, be aware the code that will be executed in the current thread, the functions that will be called, so if a lock has been acquired in some upstream function, don't reacquire it!
How to detect / solve deadlocks:
1. Manually inspect the code to check against the coding guidelines above.
2. Run strace -p <pid> and check if the process is stuck in wait.
3. If you suspect your application is *stuck*, do a ps aux|grep <application name> . If the output is " D " (uninterruptible sleep), it *could* mean there's a deadlock in your code.
4. Linux kernel has an inbuilt feature called lockdep . It can help in pin-pointing the line in code causing the deadlock. See this post (post incomplete, check my answer on stackoverflow instead) on how to detect deadlocks using lockdep.
5. Helgrind
Deadlocks (if very few), don't always need taking care of since it'll require too much effort and won't affect the system performance much, one can just reboot and get going, as Tom West said, "Not everything worth doing is worth doing well"