利用GDB分析死锁问题

littleSnail.W

已于 2022-06-01 22:42:23 修改

阅读量1.8k

点赞数

分类专栏：调试Debug 文章标签： c语言

于 2022-05-31 23:56:05 首次发布

本文链接：https://blog.csdn.net/sinat_32152141/article/details/125003027

版权

调试Debug 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

文章目录

1. 死锁产生的四个必要条件
2. 死锁预防
3. 示例代码
4. gdb调试
5. 解决死锁
6. 补充说明

1. 死锁产生的四个必要条件

互斥条件：资源是独占的且排他使用，进程互斥使用资源，即任意时刻一个资源只能给一个进程使用，其他进程若申请一个资源，而该资源被另一进程占有时，则申请者等待直到资源被占有者释放。
不可剥夺条件：进程所获得的资源在未使用完毕之前，不被其他进程强行剥夺，而只能由获得该资源的进程资源释放。
请求和保持条件：进程每次申请它所需要的一部分资源，在申请新的资源的同时，继续占用已分配到的资源。在发生死锁时必然存在一个进程等待队列{P1,P2,…,Pn},其中P1等待P2占有的资源，P2等待P3占有的资源，…，Pn等待P1占有的资源，形成一个进程等待环路，环路中每一个进程所占有的资源同时被另一个申请，也就是前一个进程占有后一个进程所深情地资源。

以上给出了导致死锁的四个必要条件，只要系统发生死锁则以上四个条件至少有一个成立。

2. 死锁预防

我们可以通过破坏死锁产生的4个必要条件来预防死锁，由于资源互斥是资源使用的固有特性是无法改变的。

破坏“不可剥夺”条件：一个进程不能获得所需要的全部资源时便处于等待状态，等待期间他占有的资源将被隐式的释放重新加入到系统的资源列表中，可以被其他的进程使用，而等待的进程只有重新获得自己原有的资源以及新申请的资源才可以重新启动，执行。
破坏”请求与保持条件“：第一种方法静态分配即每个进程在开始执行时就申请他所需要的全部资源。第二种是动态分配即每个进程在申请所需要的资源时他本身不占用系统资源。
破坏“循环等待”条件：采用资源有序分配其基本思想是将系统中的所有资源顺序编号，将紧缺的，稀少的采用较大的编号，在申请资源时必须按照编号的顺序进行，一个进程只有获得较小编号的进程才能申请较大编号的进程。

3. 示例代码

//deadlock.c
#include <sys/types.h>
#include <sys/syscall.h>
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t lock_a;
pthread_mutex_t lock_b;

pid_t gettid()
{
    return syscall(SYS_gettid);
}

void* func1(void* arg)
{
	pthread_mutex_lock(&lock_a);
	sleep(1);
	pthread_mutex_lock(&lock_b);
	printf("current tid:%u\n", gettid());
	pthread_mutex_unlock(&lock_b);
	pthread_mutex_unlock(&lock_a);
}

void* func2(void* arg)
{
	pthread_mutex_lock(&lock_b);
	sleep(1);
	pthread_mutex_lock(&lock_a);
	printf("current tid:%u\n", gettid());
	pthread_mutex_unlock(&lock_a);
	pthread_mutex_unlock(&lock_b);
}

int main()
{
	pthread_mutex_init(&lock_a, NULL);
	pthread_mutex_init(&lock_b, NULL);

	pthread_t tid[2] = {0};

	pthread_create(&tid[0], NULL, &func1, NULL);
	pthread_create(&tid[1], NULL, &func2, NULL);

	pthread_join(tid[0], NULL);
	pthread_join(tid[1], NULL);

	return 0;
}

4. gdb调试

[root@localhost ~]# ps -aux|grep deadlock
root        3208  0.0  0.0  22936   980 pts/0    Sl+  23:46   0:00 ./deadlock
root        3229  0.0  0.0  12348  1124 pts/1    S+   23:47   0:00 grep --color=auto deadlock
[root@localhost ~]# gdb attach 3208
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-15.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: 没有那个文件或目录.
Attaching to process 3208
[New LWP 3209]
[New LWP 3210]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f97f574963d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-151.el8.x86_64
(gdb) info threads 
  Id   Target Id                                   Frame 
* 1    Thread 0x7f97f5b7b740 (LWP 3208) "deadlock" 0x00007f97f574963d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
  2    Thread 0x7f97f537a700 (LWP 3209) "deadlock" 0x00007f97f575165d in __lll_lock_wait () from /lib64/libpthread.so.0
  3    Thread 0x7f97f4b79700 (LWP 3210) "deadlock" 0x00007f97f575165d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) t 2
[Switching to thread 2 (Thread 0x7f97f537a700 (LWP 3209))]
#0  0x00007f97f575165d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f97f575165d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f97f574a979 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x0000000000400835 in func1 (arg=0x0) at deadlock.c:20
#3  0x00007f97f574814a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f97f5477dc3 in clone () from /lib64/libc.so.6
(gdb) f 2
#2  0x0000000000400835 in func1 (arg=0x0) at deadlock.c:20
20	        pthread_mutex_lock(&lock_b);
(gdb) p lock_b
$1 = {__data = {__lock = 2, __count = 0, __owner = 3210, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000\212\f\000\000\001", '\000' <repeats 26 times>, __align = 2}
(gdb) t 3
[Switching to thread 3 (Thread 0x7f97f4b79700 (LWP 3210))]
#0  0x00007f97f575165d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f97f575165d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f97f574a979 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x0000000000400891 in func2 (arg=0x0) at deadlock.c:30
#3  0x00007f97f574814a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f97f5477dc3 in clone () from /lib64/libc.so.6
(gdb) f 2
#2  0x0000000000400891 in func2 (arg=0x0) at deadlock.c:30
30	        pthread_mutex_lock(&lock_a);
(gdb) p lock_a
$2 = {__data = {__lock = 2, __count = 0, __owner = 3209, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000\211\f\000\000\001", '\000' <repeats 26 times>, __align = 2}

通过gdb，我们可以看到deadlock进程里面有三个线程，分别是一个主线程（3208）和两个子线程（3209和3210）。主线程阻塞在pthread_join中，两个子线程都阻塞在pthread_mutex_lock。通过打印锁的结构体，可以看到：
3209线程当前持有lock_a，等待lock_b，而lock_b当前被3210线程持有。
3210线程当前持有lock_b，等待lock_a，而lock_a当前被3209线程持有。
形成死锁。

5. 解决死锁

通过破坏“循环等待”条件的方式，即func1和func2的加锁顺序保持一致。
修改代码后实际执行如下：

[root@localhost deadlock]# ./deadlock
current tid:3281
current tid:3282

6. 补充说明

在打印线程号时，遇到了以下问题：pthread_self()返回的值和top -H -p pid(deadlock进程ID)的线程号不一致。查了相关资料，解释如下：
pthread_self 是posix描述的线程ID（并非内核真正的线程id），相对于进程中各个线程之间的标识号，对于这个进程内是唯一的，而不同进程中，每个线程的 pthread_self() 可能返回是一样的。
而 gettid 获取的才是内核中线程ID。

参考资料：