首先此处给出死锁多线程代码示例,便于理解:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <unistd.h>
pthread_mutex_t g_smutex ;
void * func(void *arg)
{
int i=0;
//lock
pthread_mutex_lock( &g_smutex);
for(i = 0 ;i < 0x7fffffff; i++)
{
}
//forget unlock
return NULL;
}
int main()
{
pthread_t thread_id_01;
pthread_t thread_id_02;
pthread_t thread_id_03;
pthread_t thread_id_04;
pthread_t thread_id_05;
pthread_mutex_init( &g_smutex, NULL );
pthread_create(&thread_id_01, NULL, func, NULL);
pthread_create(&thread_id_02, NULL, func, NULL);
pthread_create(&thread_id_03, NULL, func, NULL);
pthread_create(&thread_id_04, NULL, func, NULL);
pthread_create(&thread_id_05, NULL, func, NULL);
while(1)
{
sleep(0xfff);
}
return 0;
}
说明:第一个线程启动func函数后,忘记unlock解锁了,导致其他线程怎么也获得不到锁。
编译:gcc a.c -g -lpthread -o a.out
第一种方法:
1.使用gdb a.out(可执行文件),并输入r命令运行程序
gdb a.out
GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...done.
(gdb) r
Starting program: /share/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb7de6b40 (LWP 15436)]
[New Thread 0xb75e5b40 (LWP 15437)]
[New Thread 0xb6de4b40 (LWP 15438)]
[New Thread 0xb65e3b40 (LWP 15439)]
[New Thread 0xb5de2b40 (LWP 15440)]
[Thread 0xb7de6b40 (LWP 15436) exited]
2.在运行的过程中按下ctrl + c
^C
Program received signal SIGINT, Interrupt.
0xb7fdbbe8 in __kernel_vsyscall ()
3.查看线程栈信息,info stack,这个命令只能查看当前正在运行的某个线程的栈信息
(gdb) info stack
#0 0xb7fdbbe8 in __kernel_vsyscall ()
#1 0xb7e9c3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#2 0xb7e9c1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3 0x08048679 in main () at New0001.c:46
4.info threads查看所有线程id,前面有*的,代表正在运行的线程,其他没有*的极有可能是在阻塞或者死锁的。
(gdb) info threads
Id Target Id Frame
6 Thread 0xb5de2b40 (LWP 15440) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()
5 Thread 0xb65e3b40 (LWP 15439) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()
4 Thread 0xb6de4b40 (LWP 15438) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()
3 Thread 0xb75e5b40 (LWP 15437) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()
* 1 Thread 0xb7de7700 (LWP 15432) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()
5. thread apply all bt (thread apply all 命令,gdb会让所有线程都执行这个命令,比如命令为bt,查看所有线程的具体的栈信息)
需要注意的是:如果系统运行着很多线程的时候,不可能使用thread id(这个id比如上面的1 ,2 ,3, ,4, 5, 6),这样要查到什么时候呢 ,100个线程你还输入100次吗
因此最好还是直接使用thread apply all bt
(gdb)thread apply all bt
Thread 6 (Thread 0xb5de2b40 (LWP 15440)):
#0 0xb7fdbbe8 in __kernel_vsyscall ()
#1 0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb7faa1aa in start_thread (arg=0xb5de2b40) at pthread_create.c:333
#5 0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
Thread 5 (Thread 0xb65e3b40 (LWP 15439)):
#0 0xb7fdbbe8 in __kernel_vsyscall ()
#1 0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb7faa1aa in start_thread (arg=0xb65e3b40) at pthread_create.c:333
#5 0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
Thread 4 (Thread 0xb6de4b40 (LWP 15438)):
#0 0xb7fdbbe8 in __kernel_vsyscall ()
#1 0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb7faa1aa in start_thread (arg=0xb6de4b40) at pthread_create.c:333
#5 0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
Thread 3 (Thread 0xb75e5b40 (LWP 15437)):
#0 0xb7fdbbe8 in __kernel_vsyscall ()
#1 0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb7faa1aa in start_thread (arg=0xb75e5b40) at pthread_create.c:333
#5 0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
Thread 1 (Thread 0xb7de7700 (LWP 15432)):
#0 0xb7fdbbe8 in __kernel_vsyscall ()
---Type <return> to continue, or q <return> to quit---
#1 0xb7e9c3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#2 0xb7e9c1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3 0x08048679 in main () at New0001.c:46
6.看到的lock_wait就是被死锁的线程
多按照上述步骤运行几次,看到那些线程老是出现lock_wait的,就很明显可能是死锁的线程了。
比如线程3吧
Thread 3 (Thread 0xb75e5b40 (LWP 15437)):
#0 0xb7fdbbe8 in __kernel_vsyscall ()
#1 0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb7faa1aa in start_thread (arg=0xb75e5b40) at pthread_create.c:333
#5 0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
就是死锁的位置,可以从这里开始定位代码,看看哪个地方可能没有释放锁。
第二种方法
先让程序跑起来,打开另外一个会话,通过ps -axu| grep 可执行文件 ,
找到程序的进程号
ps -axu | grep a.out
root 15463 0.4 0.1 43320 732 pts/4 Sl+ 19:29 0:03 ./a.out
root 15476 0.0 0.3 4540 1864 pts/6 S+ 19:44 0:00 grep --color=auto a.out
由上可知进程号是 15463
1.使用gdb attach 进程号
或者是进入gdb后, attach 进程号
或者是 gdb 可执行文件 进程号,此时也会自动attach
root@ubuntu:/share# gdb
GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb)
attach 进程号
(gdb) attach 15463
Attaching to process 15463
Reading symbols from /share/a.out...done.
Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/libpthread-2.21.so...done.
done.
[New LWP 15467]
[New LWP 15466]
[New LWP 15465]
[New LWP 15464]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/i386-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/libc-2.21.so...done.
done.
Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/ld-2.21.so...done.
done.
0xb773abe8 in __kernel_vsyscall ()
2.查看线程信息
(gdb) info threads
Id Target Id Frame
5 Thread 0xb7545b40 (LWP 15464) "a.out" 0xb773abe8 in __kernel_vsyscall ()
4 Thread 0xb6d44b40 (LWP 15465) "a.out" 0xb773abe8 in __kernel_vsyscall ()
3 Thread 0xb6543b40 (LWP 15466) "a.out" 0xb773abe8 in __kernel_vsyscall ()
2 Thread 0xb5d42b40 (LWP 15467) "a.out" 0xb773abe8 in __kernel_vsyscall ()
* 1 Thread 0xb7546700 (LWP 15463) "a.out" 0xb773abe8 in __kernel_vsyscall ()
3.查看所有线程信息并执行bt
(gdb) thread apply all bt
Thread 5 (Thread 0xb7545b40 (LWP 15464)):
#0 0xb773abe8 in __kernel_vsyscall ()
#1 0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb77091aa in start_thread (arg=0xb7545b40) at pthread_create.c:333
#5 0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
Thread 4 (Thread 0xb6d44b40 (LWP 15465)):
#0 0xb773abe8 in __kernel_vsyscall ()
#1 0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb77091aa in start_thread (arg=0xb6d44b40) at pthread_create.c:333
#5 0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
Thread 3 (Thread 0xb6543b40 (LWP 15466)):
#0 0xb773abe8 in __kernel_vsyscall ()
#1 0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb77091aa in start_thread (arg=0xb6543b40) at pthread_create.c:333
#5 0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
Thread 2 (Thread 0xb5d42b40 (LWP 15467)):
#0 0xb773abe8 in __kernel_vsyscall ()
#1 0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb77091aa in start_thread (arg=0xb5d42b40) at pthread_create.c:333
#5 0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
Thread 1 (Thread 0xb7546700 (LWP 15463)):
#0 0xb773abe8 in __kernel_vsyscall ()
---Type <return> to continue, or q <return> to quit---
#1 0xb75fb3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#2 0xb75fb1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3 0x08048679 in main () at New0001.c:46
4.选有lock_wait的来查看一下 比如线程 gdb 的id 为4的线程
(gdb) thread 4
[Switching to thread 4 (Thread 0xb6d44b40 (LWP 15465))]
#0 0xb773abe8 in __kernel_vsyscall ()
(gdb) bt
#0 0xb773abe8 in __kernel_vsyscall ()
#1 0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
#4 0xb77091aa in start_thread (arg=0xb6d44b40) at pthread_create.c:333
#5 0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
查看栈上的第三帧
(gdb) frame 3
#3 0x080485b5 in func (arg=0x0) at New0001.c:16
16 pthread_mutex_lock( &g_smutex);
调用锁阻塞了
(gdb) p g_smutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 15468, __kind = 0, __nusers = 1, {__elision_data = {__espins = 0,
__elision = 0}, __list = {__next = 0x0}}},
__size = "\002\000\000\000\000\000\000\000l<\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2}
锁的拥有者线程id为15468,但该线程id已经结束,说明是线程结束了,忘记解锁了。
附上第二步看到的仅剩下的线程
(gdb) info threads
Id Target Id Frame
5 Thread 0xb7545b40 (LWP 15464) "a.out" 0xb773abe8 in __kernel_vsyscall ()
4 Thread 0xb6d44b40 (LWP 15465) "a.out" 0xb773abe8 in __kernel_vsyscall ()
3 Thread 0xb6543b40 (LWP 15466) "a.out" 0xb773abe8 in __kernel_vsyscall ()
2 Thread 0xb5d42b40 (LWP 15467) "a.out" 0xb773abe8 in __kernel_vsyscall ()
* 1 Thread 0xb7546700 (LWP 15463) "a.out" 0xb773abe8 in __kernel_vsyscall ()
第三种方法不是gdb,是pstack工具
使用方法:pstack 进程号
注意pstack不支持64位
并且我的ubuntu系统莫名使用不了pstack来查看,pstack已经安装了。
root@ubuntu:/share# pstack 15463
15463: ./a.out
(No symbols found in )
(No symbols found in /lib/i386-linux-gnu/libc.so.6)
(No symbols found in /lib/ld-linux.so.2)
0xb773abe8: _fini + 0x25f14 (0, 0, 0, 0, 0, 0) + 400d04fc
crawl: Input/output error
Error tracing through process 15463