gdb死锁定位记录

一、问题

        在一次测试程序的时候,突然发现程序没反应了,于是使用pstack查看进程,发现调用栈都是基本不变化,而且好几个线程都停留在pthread_mutex_lock中,怀疑是发生死锁了。

二、定位

        首先使用gdb attach pid方式进入gdb,并且查看线程。

gdb attach 25659 

(gdb) i threads

 Id   Target Id         Frame 
  18   Thread 0x7fd55f070700 (LWP 25538) "spc" 0x00007fd55fecd6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17   Thread 0x7fd55e86f700 (LWP 25539) "spc" 0x00007fd55fecda82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  16   Thread 0x7fd55e06e700 (LWP 25540) "spc" 0x00007fd55fecda82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15   Thread 0x7fd55caca700 (LWP 25547) "spc" 0x00007fd55fecda82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  14   Thread 0x7fd556b0f700 (LWP 25548) "spc" 0x00007fd55fecd6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13   Thread 0x7fd55630e700 (LWP 25549) "spc" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
  12   Thread 0x7fd555b0d700 (LWP 25550) "spc" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
  11   Thread 0x7fd55530c700 (LWP 25551) "spc" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
  10   Thread 0x7fd554b0b700 (LWP 25556) "msgSchedule" 0x00007fd55fecda82 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  9    Thread 0x7fd54ffff700 (LWP 25557) "spc" 0x00007fd55fbf8d13 in epoll_wait () from /lib64/libc.so.6
  8    Thread 0x7fd54f7fe700 (LWP 25558) "spc" 0x00007fd55fbf8977 in epoll_pwait () from /lib64/libc.so.6
  7    Thread 0x7fd54effd700 (LWP 25572) "msgSchedule" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
  6    Thread 0x7fd54e7fc700 (LWP 25573) "msgSchedule" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
  5    Thread 0x7fd54dffb700 (LWP 25574) "msgSchedule" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
  4    Thread 0x7fd54d7fa700 (LWP 25575) "msgSchedule" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
  3    Thread 0x7fd54cff9700 (LWP 25576) "msgSchedule" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
  2    Thread 0x7fd51ffff700 (LWP 25577) "msgSchedule" 0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
* 1    Thread 0x7fd562f04740 (LWP 25537) "spc" 0x00007fd55fed1101 in sigwait () from /lib64/libpthread.so

可用看到线程2-7、11-13都是锁等待__lll_lock_wait ()。

(gdb) thread 2
[Switching to thread 2 (Thread 0x7fd51ffff700 (LWP 25577))]
#0  0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fd55fecbd02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fd55fecbc08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000481741 in __gthread_mutex_lock (__mutex=0x76a740 <sgw::TcpServerHandler::m_ssHandlerMutex>)
    at /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/gthr-default.h:748
#4  0x0000000000488c34 in std::mutex::lock (this=0x76a740 <sgw::TcpServerHandler::m_ssHandlerMutex>) at /usr/include/c++/4.8.2/mutex:134
#5  0x000000000048978c in std::lock_guard<std::mutex>::lock_guard (this=0x7fd51fffd590, __m=...) at /usr/include/c++/4.8.2/mutex:414
#6  0x00000000004825c9 in sgw::TcpServerHandler::setMessage (ss=..., buffer=0x7fd51fffdb10, length=216) at RNMServerAdapter.cpp:121
#7  0x000000000049abe4 in sgw::CMIController::messageReply (this=0x7fd538002658, type=sgw::RNMServerAdapter::REP_QUERY_VOIP_INFO, 
    replyInfo="{\"errCode\":0, \"voipInfo\":{\"priImsi\":\"460000960345390\",\"priMsisdn\":\"8613530950520\",\"voipMsisdn\":\"852580502155159\",\"activeFlag\":\"0\",\"vlrid\":\"\"}}") at CMIController.cpp:127
#8  0x00000000004ad684 in sgw::CMIController::queryVoipInfo (this=0x7fd538002658) at CMIController.cpp:930
#9  0x00000000004b2e2f in sgw::CMIController::buinessProcess (this=0x7fd538002658) at CMIController.cpp:1216
#10 0x00000000004b1992 in sgw::CMIController::run (this=0x7fd538002658) at CMIController.cpp:1126
#11 0x00007fd5623fc2ab in Poco::PooledThread::run (this=0x7fd538003680) at src/ThreadPool.cpp:199
#12 0x00007fd5623f97fb in Poco::(anonymous namespace)::RunnableHolder::run (this=0x7fd538003410) at src/Thread.cpp:56
#13 0x00007fd5623f94cb in Poco::ThreadImpl::runnableEntry (pThread=0x7fd5380036a8) at src/Thread_POSIX.cpp:345
#14 0x00007fd55fec9dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007fd55fbf873d in clone () from /lib64/libc.so.6
(gdb) f 4
#4  0x0000000000488c34 in std::mutex::lock (this=0x76a740 <sgw::TcpServerHandler::m_ssHandlerMutex>) at /usr/include/c++/4.8.2/mutex:134
134           int __e = __gthread_mutex_lock(&_M_mutex);
(gdb) p _M_mutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 25551, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
      __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\317c\000\000\001", '\000' <repeats 26 times>, __align = 2}

我们随便看一个线程的,这里选择的是线程2,查看线程2的堆栈信息查看锁的状态信息,从__owner = 25551中可以知道当前锁被线程号为25551的线程所占用了。从上面知道25551是线程11,我们接着查看线程11.

(gdb) thread 11
[Switching to thread 11 (Thread 0x7fd55530c700 (LWP 25551))]
#0  0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007fd55fed01bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fd55fecbd02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fd55fecbc08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000481741 in __gthread_mutex_lock (__mutex=0x76a740 <sgw::TcpServerHandler::m_ssHandlerMutex>)
    at /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/gthr-default.h:748
#4  0x0000000000488c34 in std::mutex::lock (this=0x76a740 <sgw::TcpServerHandler::m_ssHandlerMutex>) at /usr/include/c++/4.8.2/mutex:134
#5  0x000000000048978c in std::lock_guard<std::mutex>::lock_guard (this=0x7fd55530ba90, __m=...) at /usr/include/c++/4.8.2/mutex:414
#6  0x00000000004823c2 in sgw::TcpServerHandler::~TcpServerHandler (this=0x7fd5440009a0, __in_chrg=<optimized out>)
    at RNMServerAdapter.cpp:102
#7  0x0000000000482905 in sgw::TcpServerHandler::onSocketWritable (this=0x7fd5440009a0, pNf=...) at RNMServerAdapter.cpp:147
#8  0x0000000000498362 in Poco::NObserver<sgw::TcpServerHandler, Poco::Net::WritableNotification>::notify (this=0x7fd5280d9a30, pNf=
    0x1130270) at /usr/local/include/Poco/NObserver.h:86
#9  0x00007fd5623a7691 in Poco::NotificationCenter::postNotification (this=0x7fd544003520, pNotification=...)
    at src/NotificationCenter.cpp:76
#10 0x00007fd561c30c3d in Poco::Net::SocketNotifier::dispatch (this=0x7fd5440034e0, pNotification=0x1130270) at src/SocketNotifier.cpp:80
#11 0x00007fd561c2ce46 in Poco::Net::SocketReactor::dispatch (this=0x1130c50, pNotifier=..., pNotification=0x1130270)
    at src/SocketReactor.cpp:267
#12 0x00007fd561c2cc44 in Poco::Net::SocketReactor::dispatch (this=0x1130c50, socket=..., pNotification=0x1130270)
    at src/SocketReactor.cpp:243
#13 0x00007fd561c2c2ba in Poco::Net::SocketReactor::run (this=0x1130c50) at src/SocketReactor.cpp:92
#14 0x00007fd5623f97fb in Poco::(anonymous namespace)::RunnableHolder::run (this=0x11263d0) at src/Thread.cpp:56
#15 0x00007fd5623f94cb in Poco::ThreadImpl::runnableEntry (pThread=0x10f8af8) at src/Thread_POSIX.cpp:345
#16 0x00007fd55fec9dc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007fd55fbf873d in clone () from /lib64/libc.so.6

(gdb) f 4
#4  0x0000000000488c34 in std::mutex::lock (this=0x76a740 <sgw::TcpServerHandler::m_ssHandlerMutex>) at /usr/include/c++/4.8.2/mutex:134
134           int __e = __gthread_mutex_lock(&_M_mutex);
(gdb) p _M_mutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 25551, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
      __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\317c\000\000\001", '\000' <repeats 26 times>, __align = 2}
   

可以看出线程2也是在锁等待,但是自己又拥有了锁,怀疑是程序中同一个线程中连续使用了两次lock,于是找到如下代码死锁位置。

std::lock_guard<std::mutex> autoLock(m_ssHandlerMutex);
try {
    _socket.sendBytes(m_outputBuffer);
} catch (Poco::Exception& e) {
    poco_error(*m_pLogger, e.displayText());
    if (e.code() == EAGAIN || e.code() == EWOULDBLOCK) return;
    // 就是这里引起死锁的,调用delete this等价于调用函数operator delete(虚构函数),然后在释放内存。 
    // 所以这里在函数里又进行了锁等待,从而因此死锁。
    delete this;
    return;
}
        
        
        
TcpServerHandler::~TcpServerHandler()
{

    int connNum;
    {
        std::lock_guard<std::mutex> autoLock(m_ssHandlerMutex);
        auto it = m_ssHandler.find(_socket);
        if (it != m_ssHandler.end()) m_ssHandler.erase(it);

        connNum = m_ssHandler.size();
    }

}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值