程序跑死卡住了,怀疑是死锁引起的
使用
gdb attach pid
命令附加到那个进程上,然后再gdb里输入命令
info thread
得到如下信息
(gdb) info thread
Id Target Id Frame
4 Thread 0x7f466b8e1700 (LWP 10945) "jtnvragentserve" 0x00007f466cbd5b13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
3 Thread 0x7f466c0e2700 (LWP 11009) "jtnvragentserve" 0x00007f466cbccda3 in select () at ../sysdeps/unix/syscall-template.S:81
2 Thread 0x7f4653fff700 (LWP 9171) "jtnvragentserve" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
* 1 Thread 0x7f466e052780 (LWP 10942) "jtnvragentserve" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
很明显,线程9171和10942确实是死锁了,
再使用命令查看线程堆栈信息
thread apply all bt
看到如下信息
(gdb) thread apply all bt
Thread 4 (Thread 0x7f466b8e1700 (LWP 10945)):
#0 0x00007f466cbd5b13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f466c38ed88 in ?? ()
#2 0x00007f465c000f80 in ?? ()
#3 0xffffffff64001c50 in ?? ()
#4 0x000000007fffffff in ?? ()
#5 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f466c0e2700 (LWP 11009)):
#0 0x00007f466cbccda3 in select () at ../sysdeps/unix/syscall-template.S:81
#1 0x0000000000455373 in _eXosip_read_message (excontext=0x144f770, max_message_nb=1, sec_max=1, usec_max=0) at udp.c:1580
#2 0x00000000004415a4 in eXosip_execute (excontext=0x144f770) at eXconf.c:791
#3 0x000000000044254e in _eXosip_thread (arg=0x144f770) at eXconf.c:1090
#4 0x00007f466d9b8182 in start_thread (arg=0x7f466c0e2700) at pthread_create.c:312
#5 0x00007f466cbd547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7f4653fff700 (LWP 9171)):
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f466d9ba657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f466d9ba480 in __GI___pthread_mutex_lock (mutex=0x1e34140) at ../nptl/pthread_mutex_lock.c:79
#3 0x00000000004d95be in jthread::JMutex::Lock() ()
#4 0x00000000004e9013 in jrtplib::RTPUDPv4Transmitter::WaitForIncomingData(jrtplib::RTPTime const&, bool*) ()
#5 0x000000000050789c in jrtplib::RTPPollThread::Thread() ()
#6 0x00000000004d9b67 in jthread::JThread::TheThread(void*) ()
#7 0x00007f466d9b8182 in start_thread (arg=0x7f4653fff700) at pthread_create.c:312
#8 0x00007f466cbd547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7f466e052780 (LWP 10942)):
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f466d9ba657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f466d9ba480 in __GI___pthread_mutex_lock (mutex=0x1e34140) at ../nptl/pthread_mutex_lock.c:79
#3 0x00000000004d95be in jthread::JMutex::Lock() ()
#4 0x00000000004e9263 in jrtplib::RTPUDPv4Transmitter::AbortWait() ()
#5 0x0000000000507645 in jrtplib::RTPPollThread::Stop() ()
#6 0x000000000050748b in jrtplib::RTPPollThread::~RTPPollThread() ()
#7 0x0000000000507514 in jrtplib::RTPPollThread::~RTPPollThread() ()
#8 0x00000000004e051b in void jrtplib::RTPDelete<jrtplib::RTPPollThread>(jrtplib::RTPPollThread*, jrtplib::RTPMemoryManager*) ()
#9 0x00000000004db950 in jrtplib::RTPSession::BYEDestroy(jrtplib::RTPTime const&, void const*, unsigned long) ()
#10 0x00000000004213f1 in JtGb28181NvrAgent::HandleSdpReq_CloseVideo (this=0x1442060, CallID=...) at ../JtGb28181NvrAgent.cpp:4600
#11 0x0000000000416497 in JtGb28181NvrAgent::Thread (this=0x1442060) at ../JtGb28181NvrAgent.cpp:2669
#12 0x0000000000424675 in JtGb28181NvrAgent::StartWork (this=0x1442060, Config=0x0, RunMode=0) at ../JtGb28181NvrAgent.cpp:5333
#13 0x000000000040d22c in main (argc=1, argv=0x7fff33f8b738) at ../main.cpp:174
由上可知线程9171和10942线程卡住的地方,是再jrtplib里卡住了,函数如下
int RTPUDPv4Transmitter::AbortWait()
{
if (!init)
return ERR_RTP_UDPV4TRANS_NOTINIT;
MAINMUTEX_LOCK
if (!created)
{
MAINMUTEX_UNLOCK
return ERR_RTP_UDPV4TRANS_NOTCREATED;
}
if (!waitingfordata)
{
MAINMUTEX_UNLOCK
return ERR_RTP_UDPV4TRANS_NOTWAITING;
}
AbortWaitInternal();
MAINMUTEX_UNLOCK
return 0;
}
void RTPUDPv4Transmitter::AbortWaitInternal()
{
#if (defined(WIN32) || defined(_WIN32_WCE))
send(abortdesc[1],"*",1,0);
#else
if (write(abortdesc[1],"*",1))
{
// To get rid of __wur related compiler warnings
}
#endif // WIN32
}
MAINMUTEX_LOCK是加锁,MAINMUTEX_UNLOCK是去锁,为什么会出现锁未去掉的情况呢,最后发现是在结束一个线程时使用了pthread_cancel函数,pthread_cancel可能在线程取消点就退出线程了,这里就是AbortWaitInternal函数的write调用处结束线程
,就导致MAINMUTEX_UNLOCK无法被调用,从而出现了死锁。
针对该问题做出代码上的修改,解决问题
通过这次bug,得出一个结论,除非你知晓代码及调用的每一个细节,否则不要轻易使用pthread_cancel来结束线程!!