linux下记一次使用gdb对死锁问题的定位以及pthread_cancel使用的建议

程序跑死卡住了,怀疑是死锁引起的

使用

gdb attach pid

命令附加到那个进程上,然后再gdb里输入命令

info thread

得到如下信息

(gdb) info thread
  Id   Target Id         Frame 
  4    Thread 0x7f466b8e1700 (LWP 10945) "jtnvragentserve" 0x00007f466cbd5b13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
  3    Thread 0x7f466c0e2700 (LWP 11009) "jtnvragentserve" 0x00007f466cbccda3 in select () at ../sysdeps/unix/syscall-template.S:81
  2    Thread 0x7f4653fff700 (LWP 9171) "jtnvragentserve" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
* 1    Thread 0x7f466e052780 (LWP 10942) "jtnvragentserve" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

很明显,线程9171和10942确实是死锁了,

再使用命令查看线程堆栈信息

thread apply all bt

看到如下信息

(gdb) thread apply all bt

Thread 4 (Thread 0x7f466b8e1700 (LWP 10945)):
#0  0x00007f466cbd5b13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f466c38ed88 in ?? ()
#2  0x00007f465c000f80 in ?? ()
#3  0xffffffff64001c50 in ?? ()
#4  0x000000007fffffff in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f466c0e2700 (LWP 11009)):
#0  0x00007f466cbccda3 in select () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000000455373 in _eXosip_read_message (excontext=0x144f770, max_message_nb=1, sec_max=1, usec_max=0) at udp.c:1580
#2  0x00000000004415a4 in eXosip_execute (excontext=0x144f770) at eXconf.c:791
#3  0x000000000044254e in _eXosip_thread (arg=0x144f770) at eXconf.c:1090
#4  0x00007f466d9b8182 in start_thread (arg=0x7f466c0e2700) at pthread_create.c:312
#5  0x00007f466cbd547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7f4653fff700 (LWP 9171)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f466d9ba657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f466d9ba480 in __GI___pthread_mutex_lock (mutex=0x1e34140) at ../nptl/pthread_mutex_lock.c:79
#3  0x00000000004d95be in jthread::JMutex::Lock() ()
#4  0x00000000004e9013 in jrtplib::RTPUDPv4Transmitter::WaitForIncomingData(jrtplib::RTPTime const&, bool*) ()
#5  0x000000000050789c in jrtplib::RTPPollThread::Thread() ()
#6  0x00000000004d9b67 in jthread::JThread::TheThread(void*) ()
#7  0x00007f466d9b8182 in start_thread (arg=0x7f4653fff700) at pthread_create.c:312
#8  0x00007f466cbd547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7f466e052780 (LWP 10942)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f466d9ba657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f466d9ba480 in __GI___pthread_mutex_lock (mutex=0x1e34140) at ../nptl/pthread_mutex_lock.c:79
#3  0x00000000004d95be in jthread::JMutex::Lock() ()
#4  0x00000000004e9263 in jrtplib::RTPUDPv4Transmitter::AbortWait() ()
#5  0x0000000000507645 in jrtplib::RTPPollThread::Stop() ()
#6  0x000000000050748b in jrtplib::RTPPollThread::~RTPPollThread() ()
#7  0x0000000000507514 in jrtplib::RTPPollThread::~RTPPollThread() ()
#8  0x00000000004e051b in void jrtplib::RTPDelete<jrtplib::RTPPollThread>(jrtplib::RTPPollThread*, jrtplib::RTPMemoryManager*) ()
#9  0x00000000004db950 in jrtplib::RTPSession::BYEDestroy(jrtplib::RTPTime const&, void const*, unsigned long) ()
#10 0x00000000004213f1 in JtGb28181NvrAgent::HandleSdpReq_CloseVideo (this=0x1442060, CallID=...) at ../JtGb28181NvrAgent.cpp:4600
#11 0x0000000000416497 in JtGb28181NvrAgent::Thread (this=0x1442060) at ../JtGb28181NvrAgent.cpp:2669
#12 0x0000000000424675 in JtGb28181NvrAgent::StartWork (this=0x1442060, Config=0x0, RunMode=0) at ../JtGb28181NvrAgent.cpp:5333
#13 0x000000000040d22c in main (argc=1, argv=0x7fff33f8b738) at ../main.cpp:174

由上可知线程9171和10942线程卡住的地方,是再jrtplib里卡住了,函数如下

int RTPUDPv4Transmitter::AbortWait()
{
	if (!init)
		return ERR_RTP_UDPV4TRANS_NOTINIT;
	
	MAINMUTEX_LOCK
	if (!created)
	{
		MAINMUTEX_UNLOCK
		return ERR_RTP_UDPV4TRANS_NOTCREATED;
	}
	if (!waitingfordata)
	{
		MAINMUTEX_UNLOCK
		return ERR_RTP_UDPV4TRANS_NOTWAITING;
	}

	AbortWaitInternal();
	
	MAINMUTEX_UNLOCK
	return 0;
}
void RTPUDPv4Transmitter::AbortWaitInternal()
{
#if (defined(WIN32) || defined(_WIN32_WCE))
	send(abortdesc[1],"*",1,0);
#else
	if (write(abortdesc[1],"*",1))
	{
		// To get rid of __wur related compiler warnings
	}
#endif // WIN32
}
MAINMUTEX_LOCK是加锁,MAINMUTEX_UNLOCK是去锁,为什么会出现锁未去掉的情况呢,最后发现是在结束一个线程时使用了pthread_cancel函数,pthread_cancel可能在线程取消点就退出线程了,这里就是AbortWaitInternal函数的write调用处结束线程 ,就导致MAINMUTEX_UNLOCK无法被调用,从而出现了死锁。

针对该问题做出代码上的修改,解决问题


通过这次bug,得出一个结论,除非你知晓代码及调用的每一个细节,否则不要轻易使用pthread_cancel来结束线程!!


  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值