Linux多线程死锁的调试方法

 

首先此处给出死锁多线程代码示例,便于理解:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t g_smutex ; 

void * func(void *arg)
{
	int i=0;

	//lock

	pthread_mutex_lock( &g_smutex);
	
	for(i = 0 ;i < 0x7fffffff; i++)
	{

	}

	//forget unlock
	
	return NULL;
}

int main()
{
	pthread_t  thread_id_01;
	pthread_t  thread_id_02;
	pthread_t  thread_id_03;
	pthread_t  thread_id_04;
	pthread_t  thread_id_05;
	
	pthread_mutex_init( &g_smutex, NULL );

	pthread_create(&thread_id_01, NULL, func, NULL);
	pthread_create(&thread_id_02, NULL, func, NULL);
	pthread_create(&thread_id_03, NULL, func, NULL);
	pthread_create(&thread_id_04, NULL, func, NULL);
	pthread_create(&thread_id_05, NULL, func, NULL);

	while(1)
	{
		sleep(0xfff);
	}
	return 0;
}

说明:第一个线程启动func函数后,忘记unlock解锁了,导致其他线程怎么也获得不到锁。

编译:gcc a.c -g -lpthread -o a.out

 

第一种方法:

1.使用gdb a.out(可执行文件),并输入r命令运行程序

gdb a.out

GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10

Copyright (C) 2015 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

and "show warranty" for details.

This GDB was configured as "i686-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word"...

Reading symbols from a.out...done.

(gdb) r

Starting program: /share/a.out

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

[New Thread 0xb7de6b40 (LWP 15436)]

[New Thread 0xb75e5b40 (LWP 15437)]

[New Thread 0xb6de4b40 (LWP 15438)]

[New Thread 0xb65e3b40 (LWP 15439)]

[New Thread 0xb5de2b40 (LWP 15440)]

[Thread 0xb7de6b40 (LWP 15436) exited]

2.在运行的过程中按下ctrl + c

^C

Program received signal SIGINT, Interrupt.

0xb7fdbbe8 in __kernel_vsyscall ()

3.查看线程栈信息,info stack,这个命令只能查看当前正在运行的某个线程的栈信息

(gdb) info stack

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7e9c3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81

#2  0xb7e9c1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138

#3  0x08048679 in main () at New0001.c:46

4.info threads查看所有线程id,前面有*的,代表正在运行的线程,其他没有*的极有可能是在阻塞或者死锁的。

(gdb) info threads

  Id   Target Id         Frame

  6    Thread 0xb5de2b40 (LWP 15440) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

  5    Thread 0xb65e3b40 (LWP 15439) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

  4    Thread 0xb6de4b40 (LWP 15438) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

  3    Thread 0xb75e5b40 (LWP 15437) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

* 1    Thread 0xb7de7700 (LWP 15432) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

5. thread apply all bt (thread apply all  命令,gdb会让所有线程都执行这个命令,比如命令为bt,查看所有线程的具体的栈信息)

需要注意的是:如果系统运行着很多线程的时候,不可能使用thread  id(这个id比如上面的1 ,2 ,3, ,4, 5, 6),这样要查到什么时候呢 ,100个线程你还输入100次吗

因此最好还是直接使用thread apply all bt

(gdb)thread apply all bt

Thread 6 (Thread 0xb5de2b40 (LWP 15440)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb5de2b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 5 (Thread 0xb65e3b40 (LWP 15439)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb65e3b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 4 (Thread 0xb6de4b40 (LWP 15438)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb6de4b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 3 (Thread 0xb75e5b40 (LWP 15437)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb75e5b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 1 (Thread 0xb7de7700 (LWP 15432)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

---Type <return> to continue, or q <return> to quit---

#1  0xb7e9c3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81

#2  0xb7e9c1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138

#3  0x08048679 in main () at New0001.c:46

6.看到的lock_wait就是被死锁的线程

多按照上述步骤运行几次,看到那些线程老是出现lock_wait的,就很明显可能是死锁的线程了。

比如线程3吧

Thread 3 (Thread 0xb75e5b40 (LWP 15437)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb75e5b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

就是死锁的位置,可以从这里开始定位代码,看看哪个地方可能没有释放锁。

第二种方法

先让程序跑起来,打开另外一个会话,通过ps -axu| grep 可执行文件 ,

找到程序的进程号

ps -axu | grep a.out

root     15463  0.4  0.1  43320   732 pts/4    Sl+  19:29   0:03 ./a.out

root     15476  0.0  0.3   4540  1864 pts/6    S+   19:44   0:00 grep --color=auto a.out

由上可知进程号是 15463

1.使用gdb  attach  进程号

  或者是进入gdb后, attach 进程号

  或者是 gdb 可执行文件  进程号,此时也会自动attach

root@ubuntu:/share# gdb

GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10

Copyright (C) 2015 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

and "show warranty" for details.

This GDB was configured as "i686-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word".

(gdb)

attach 进程号

(gdb) attach 15463

Attaching to process 15463

Reading symbols from /share/a.out...done.

Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/libpthread-2.21.so...done.

done.

[New LWP 15467]

[New LWP 15466]

[New LWP 15465]

[New LWP 15464]

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Reading symbols from /lib/i386-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/libc-2.21.so...done.

done.

Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/ld-2.21.so...done.

done.

0xb773abe8 in __kernel_vsyscall ()

2.查看线程信息

(gdb) info threads

  Id   Target Id         Frame

  5    Thread 0xb7545b40 (LWP 15464) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  4    Thread 0xb6d44b40 (LWP 15465) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  3    Thread 0xb6543b40 (LWP 15466) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  2    Thread 0xb5d42b40 (LWP 15467) "a.out" 0xb773abe8 in __kernel_vsyscall ()

* 1    Thread 0xb7546700 (LWP 15463) "a.out" 0xb773abe8 in __kernel_vsyscall ()

3.查看所有线程信息并执行bt

(gdb) thread apply all bt

Thread 5 (Thread 0xb7545b40 (LWP 15464)):

#0  0xb773abe8 in __kernel_vsyscall ()

#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb77091aa in start_thread (arg=0xb7545b40) at pthread_create.c:333

#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 4 (Thread 0xb6d44b40 (LWP 15465)):

#0  0xb773abe8 in __kernel_vsyscall ()

#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb77091aa in start_thread (arg=0xb6d44b40) at pthread_create.c:333

#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 3 (Thread 0xb6543b40 (LWP 15466)):

#0  0xb773abe8 in __kernel_vsyscall ()

#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb77091aa in start_thread (arg=0xb6543b40) at pthread_create.c:333

#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 2 (Thread 0xb5d42b40 (LWP 15467)):

#0  0xb773abe8 in __kernel_vsyscall ()

#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb77091aa in start_thread (arg=0xb5d42b40) at pthread_create.c:333

#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 1 (Thread 0xb7546700 (LWP 15463)):

#0  0xb773abe8 in __kernel_vsyscall ()

---Type <return> to continue, or q <return> to quit---

#1  0xb75fb3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81

#2  0xb75fb1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138

#3  0x08048679 in main () at New0001.c:46

4.选有lock_wait的来查看一下 比如线程 gdb 的id 为4的线程

 

(gdb) thread 4

[Switching to thread 4 (Thread 0xb6d44b40 (LWP 15465))]

#0  0xb773abe8 in __kernel_vsyscall ()

(gdb) bt

#0  0xb773abe8 in __kernel_vsyscall ()

#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb77091aa in start_thread (arg=0xb6d44b40) at pthread_create.c:333

#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

查看栈上的第三帧

(gdb) frame 3

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

16 pthread_mutex_lock( &g_smutex);

调用锁阻塞了

(gdb) p  g_smutex

$1 = {__data = {__lock = 2, __count = 0, __owner = 15468, __kind = 0, __nusers = 1, {__elision_data = {__espins = 0,

        __elision = 0}, __list = {__next = 0x0}}},

  __size = "\002\000\000\000\000\000\000\000l<\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2}

锁的拥有者线程id为15468,但该线程id已经结束,说明是线程结束了,忘记解锁了。

附上第二步看到的仅剩下的线程

(gdb) info threads

  Id   Target Id         Frame

  5    Thread 0xb7545b40 (LWP 15464) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  4    Thread 0xb6d44b40 (LWP 15465) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  3    Thread 0xb6543b40 (LWP 15466) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  2    Thread 0xb5d42b40 (LWP 15467) "a.out" 0xb773abe8 in __kernel_vsyscall ()

* 1    Thread 0xb7546700 (LWP 15463) "a.out" 0xb773abe8 in __kernel_vsyscall ()

第三种方法不是gdb,是pstack工具

使用方法:pstack   进程号

注意pstack不支持64位

并且我的ubuntu系统莫名使用不了pstack来查看,pstack已经安装了。

root@ubuntu:/share# pstack 15463
15463: ./a.out
(No symbols found in )
(No symbols found in /lib/i386-linux-gnu/libc.so.6)
(No symbols found in /lib/ld-linux.so.2)
0xb773abe8: _fini + 0x25f14 (0, 0, 0, 0, 0, 0) + 400d04fc
crawl: Input/output error
Error tracing through process 15463

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值