Linux 多线程程序调用malloc,backtrace引发死锁问题的调试

       最近,参与公司开发一项目,为提高Server端的执行效率,将Server程序设计为多线程结构。在一次测试中发现了Server无任何响应的问题,我的第一判断是Server程序出现了死锁。于是,使用pstack命令查看各线程的堆栈状态。

# pstack  进程号

Thread 9 (Thread 0x7fe82b43a700 (LWP 29656)):
#0  0x00007fe82f3681bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fe82f363d1d in _L_lock_840 () from /lib64/libpthread.so.0
#2  0x00007fe82f363c3a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fe8301f671d in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#4  0x00007fe82d6e1a49 in __run_exit_handlers () from /lib64/libc.so.6
#5  0x00007fe82d6e1a95 in exit () from /lib64/libc.so.6
#6  0x00000000004048c3 in printRecvSignalNum (sign=<optimized out>) at AC.c:257
#7  <signal handler called>
#8  0x00007fe82f3681bb in __lll_lock_wait () from /lib64/libpthread.so.0
#9  0x00007fe82f363d02 in _L_lock_791 () from /lib64/libpthread.so.0
#10 0x00007fe82f363c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#11 0x000000000041f802 in cronometer (arg=<optimized out>) at timerlib.c:266
#12 0x00007fe82f361dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fe82d7a073d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7fe82ac39700 (LWP 29671)):
#0  0x00007fe82d7ae0fc in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fe82d72bf93 in _L_lock_14932 () from /lib64/libc.so.6
#2  0x00007fe82d729013 in malloc () from /lib64/libc.so.6
#3  0x00007fe8301f4078 in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2
#4  0x00007fe8301fa6db in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#5  0x00007fe8301f5ff4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#6  0x00007fe8301f9feb in _dl_open () from /lib64/ld-linux-x86-64.so.2
#7  0x00007fe82d7dafc2 in do_dlopen () from /lib64/libc.so.6
#8  0x00007fe8301f5ff4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9  0x00007fe82d7db082 in __libc_dlopen_mode () from /lib64/libc.so.6
#10 0x00007fe82d7b4565 in init () from /lib64/libc.so.6
#11 0x00007fe82f366bb0 in pthread_once () from /lib64/libpthread.so.0
#12 0x00007fe82d7b467c in backtrace () from /lib64/libc.so.6
#13 0x00007fe82eb4ead9 in procAssertStackInfo () at cc_common.c:545
#14 0x00007fe82eb4f240 in procAssertEntry (file=0x0, func=0x0, line=0, exp_str=0x0, sign=11) at cc_common.c:597
#15 <signal handler called>
#16 0x00007fe82d72477d in malloc_consolidate () from /lib64/libc.so.6
#17 0x00007fe82d726385 in _int_malloc () from /lib64/libc.so.6
#18 0x00007fe82d729a14 in calloc () from /lib64/libc.so.6
#19 0x0000000000432287 in UpdateStasInfoIntoMySQL (listStas=listStas@entry=0x7fe82ac38b68, oldListStas=oldListStas@entry=0x7fe82ac37f90, pWtpHashNode=pWtpHashNode@entry=0x7fe81c01af10) at ACDisplay.c:3256
#20 0x0000000000432bc4 in UpdateStationListMySQL (listStas=0x7fe82ac38b68, listStas@entry=0x0, pWtpHashNode=0x7fe81c01af10, pWtpHashNode@entry=0x7fe82ac38ab8) at ACDisplay.c:3585
#21 0x0000000000434607 in UpdateStationList (listStas=0x0, listStas@entry=0x7fe82ac38b68, pWtpHashNode=0x7fe82ac38ab8, pWtpHashNode@entry=0x7fe81c01af10) at ACDisplay.c:4716
#22 0x000000000040c4c3 in ACEnterRun (pWtpHashNode=pWtpHashNode@entry=0x7fe81c01af10, msgPtr=msgPtr@entry=0x7fe82ac38d70, dataFlag=CW_FALSE) at ACRunState.c:499
#23 0x00000000004061c9 in CWManageWTP (arg=arg@entry=0x7fe82ac38da8) at ACMainLoop.c:428
#24 0x00000000004069c1 in CWHandleIncomingCapwapPkg (parg=0xe7ab60) at ACMainLoop.c:497
#25 0x000000000040e27f in CWConsumerThread (arg=<optimized out>) at Scheduler.c:234
#26 0x00007fe82f361dc5 in start_thread () from /lib64/libpthread.so.0
#27 0x00007fe82d7a073d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7fe82a438700 (LWP 29672)):
#0  0x00007fe82f3681bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fe82f363d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fe82f363c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000041a5d5 in CWThreadMutexLock (theMutex=theMutex@entry=0xc5f228 <g_wtp_data_hash+5594600>) at CWThread.c:157
#4  0x000000000040697e in CWHandleIncomingCapwapPkg (parg=0xe7d500) at ACMainLoop.c:483
#5  0x000000000040e27f in CWConsumerThread (arg=<optimized out>) at Scheduler.c:234
#6  0x00007fe82f361dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fe82d7a073d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fe829c37700 (LWP 29673)):
#0  0x00007fe82d7ae0fc in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fe82d72b991 in _L_lock_4780 () from /lib64/libc.so.6
#2  0x00007fe82d7251f8 in _int_free () from /lib64/libc.so.6
#3  0x000000000041fc65 in timer_rem (id=8522, free_arg=0x41a367 <CWTimerFreeArgSingleThread>) at timerlib.c:524
#4  0x000000000041adf8 in CWTimerCancelSingleThread (idPtr=<optimized out>) at CWThread.c:909
#5  0x000000000040be4f in CWStopNeighborDeadTimer (pWtpManData=<optimized out>) at ACRunState.c:1920
#6  0x000000000040be91 in CWRestartNeighborDeadTimer (pWtpManData=0x7fe814065720) at ACRunState.c:1935
#7  0x000000000040c08e in ACEnterRun (pWtpHashNode=pWtpHashNode@entry=0x7fe814064be0, msgPtr=msgPtr@entry=0x7fe829c36d70, dataFlag=CW_FALSE) at ACRunState.c:259
#8  0x00000000004061c9 in CWManageWTP (arg=arg@entry=0x7fe829c36da8) at ACMainLoop.c:428
#9  0x00000000004069c1 in CWHandleIncomingCapwapPkg (parg=0xe7d460) at ACMainLoop.c:497
#10 0x000000000040e27f in CWConsumerThread (arg=<optimized out>) at Scheduler.c:234
#11 0x00007fe82f361dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fe82d7a073d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fe829436700 (LWP 29674)):
#0  0x00007fe82f3681bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fe82f363d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fe82f363c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000041a5d5 in CWThreadMutexLock (theMutex=theMutex@entry=0xc5f228 <g_wtp_data_hash+5594600>) at CWThread.c:157
#4  0x000000000040697e in CWHandleIncomingCapwapPkg (parg=0xe7d2a0) at ACMainLoop.c:483
#5  0x000000000040e27f in CWConsumerThread (arg=<optimized out>) at Scheduler.c:234
#6  0x00007fe82f361dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fe82d7a073d in clone () from /lib64/libc.so.6

从pstack结果可以看出,线程9到线程5已经死锁。接下来,详细分析各线程的状态。经分析发现真正引起死锁的源头是线程8。

线程8:

执行过程有一步调用了calloc 函数,向linux系统申请堆栈空间。

#18 0x00007fe82d729a14 in calloc () from /lib64/libc.so.6

在malloc尚未完成的时候,该线程接收到了sign 11

#14 0x00007fe82eb4f240 in procAssertEntry (file=0x0, func=0x0, line=0, exp_str=0x0, sign=11) at cc_common.c:597

在我们项目中,已将信号11重载,并在信号处理函数中调用了backtrace函数,而backtrace在执行中会调用malloc函数

#2  0x00007fe82d729013 in malloc () from /lib64/libc.so.6

       因此,我们知道了死锁原因,当malloc正在执行时,被信号11打断而去执行了backtrace函数,backtrace函数中又调用了malloc函数,此种情况,堆栈锁被连续lock了两次,因而线程8被阻塞,并且,在此之后任何线程都无法获取堆栈锁,会导致其它线程阻塞在诸如malloc或free的操作上。现在,我们查看下是否有线程阻塞在诸如malloc或free的操作上?

Thread 6 (Thread 0x7fe829c37700 (LWP 29673)):
#0  0x00007fe82d7ae0fc in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fe82d72b991 in _L_lock_4780 () from /lib64/libc.so.6
#2  0x00007fe82d7251f8 in _int_free () from /lib64/libc.so.6

        可见,线程6被阻塞在free操作上;再次分析我们的代码,线程6已经占用了一个锁(公司项目中定义的),且再无机会释放。再看其它线程,线程5,7,9都在等待锁,且永远等待不到。

        那么,如何解决这个问题的呢?

        由以上分析,信号11是引发死锁的导火线,信号11一般是由内存越界引起,排查最近开发的代码解决掉这个错误,但Sever死锁的风险仍然存在。因此,若想从根本上解决死锁的风险,则backtrace不可以作为信号处理函数使用。

        总结:信号处理函数必须是可重入函数。以下是可重入函数和不可重入函数的定义。

        可重入函数:重入意味着这个函数可以重复进入,可以被并行调用,可以被中断,它只使用自身栈上的数据变量,它不依赖于任务环境,在多任务调度过程中,它是安全的,不必担心数据出错。
        不可重入函数:不可重入,意味着不可被并行调度,否则会产生不可预料的结果,这些函数内一般使用了静态(static)的数据结构,使用了malloc()或者free()函数,使用了标准I/O函数等等。

  • 5
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值