FS线上一次Crashes分析定位过程-ldns库问题
– by yine 2018-04-10 15:33:05
一、故障发生时间点
2018-04-10 09:54:07
二、堆栈查看结果
warning: .dynamic section for "/usr/lib/x86_64-linux-gnu/librtmp.so.1" is not at the expected address (wrong library or version mismatch?) warning: .dynamic section for "/usr/lib/libldns.so.1" is not at the expected address (wrong library or version mismatch?) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/bin/freeswitch -nc -nonat -nosql -u popo -g netease'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f2388bc9067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt full #0 0x00007f2388bc9067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 resultvar = 0 pid = 28593 selftid = 20809 #1 0x00007f2388bca448 in __GI_abort () at abort.c:89 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x3030303030207078, sa_sigaction = 0x3030303030207078}, sa_mask = {__val = {3475143045726351408, 2314885530819502128, 2314885530818453536, 8319937555149627424, 746872325959545721, 3775530756625032759, 3631650816742404144, 3472329422401517619, 3467895374536122416, 2319406791620833328, 3761104034442405222, 2314885530819704883, 2314885530818453536, 2314885530818453536, 4069054363051241248, 139789281265312}}, sa_flags = 65, sa_restorer = 0x7f233a740700} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00007f2388c071b4 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f2388cf9cb3 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175 ap = {{gp_offset = 32, fp_offset = 32547, overflow_arg_area = 0x7f233a740710, reg_save_area = 0x7f233a7406a0}} fd = 2 on_2 = <optimized out> list = <optimized out> nlist = <optimized out> cp = <optimized out> written = <optimized out> #3 0x00007f2388c8caa7 in __GI___fortify_fail (msg=msg@entry=0x7f2388cf9c4a "buffer overflow detected") at fortify_fail.c:31 No locals. #4 0x00007f2388c8acc0 in __GI___chk_fail () at chk_fail.c:28 No locals. #5 0x00007f2388c8ca17 in __fdelt_chk (d=<optimized out>) at fdelt_chk.c:25 No locals. #6 0x00007f23822184c5 in ?? () from /usr/lib/libldns.so.1 No symbol table info available. #7 0x0000000000000000 in ?? () No symbol table info available. (gdb)
三、FS日志查看结果
popo@hzadg-ysf-01:~/DATA/logs/freeswitch$ grep "8e660ca2-d28a-4f09-a6f7-260bd25b75f4" freeswitch.log 8e660ca2-d28a-4f09-a6f7-260bd25b75f4 2018-04-10 09:54:10.237472 [NOTICE] switch_channel.c:1104 New Channel sofia/internal/test@59.111.165.135:53 [8e660ca2-d28a-4f09-a6f7-260bd25b75f4] 8e660ca2-d28a-4f09-a6f7-260bd25b75f4 2018-04-10 09:54:10.357473 [INFO] mod_dialplan_xml.c:637 Processing test <test>->test in context default 8e660ca2-d28a-4f09-a6f7-260bd25b75f4 2018-04-10 09:54:10.377454 [NOTICE] switch_ivr.c:2172 Transfer sofia/internal/test@59.111.165.135:53 to enum[test@default] popo@hzadg-ysf-01:~/DATA/logs/freeswitch$
四、问题定位
通过堆栈可以看出libldns库,通过fs中的日志可以看到最后执行的一行是:mod_enum这个模块下的enum指令后才crash,开始进行漫天的search,终于发现一些端倪;
首先发现有人在FS中报了这样一个jira单子:https://freeswitch.org/jira/browse/FS-7624?attachmentViewMode=list
FS作者向ldns库作者提了这样一个问题:https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=678
ldns作者做了这样一个patch: https://www.nlnetlabs.nl/bugs-script/attachment.cgi?id=285&action=diff
五、问题解决
接作者所说增加宏定义,FD_SETSIZE 自己想要扩展的值
From your back trace I see that the crash happens in ldns_sock_wait which uses select to wait for a socket to become readable or writable. The maximum number of sockets fed to select is FD_SETSIZE which is 1024 by default. In the issue report I read that this crash only occurs when the number of file descriptors in use is more than 1024.直接升级debian8上的ldns库至1.7.0版本解决问题
https://git.nlnetlabs.nl/ldns/tree/Changelog?h=release-1.7.0&id=54822adfc9fffbe47107c1201df5bca917793fa4 中的bugfix #678: Use poll i.s.o. select to support > 1024 fds 这一条即是对本BUG的修复内容但是1.7.0在debian8的发行版本里没有,最新的也只有1.6.18,所以只能自己编译依赖
先进入/usr/lib/freeswitch/mod目录下查看mod_enum.so对ldns的依赖,如下:
/usr/lib/freeswitch/mod# ldd mod_enum.so linux-vdso.so.1 (0x00007ffde5fc5000) libldns.so.1 => /usr/lib/libldns.so.1 (0x00007f1c4e9f6000) libfreeswitch.so.1 => /usr/lib/libfreeswitch.so.1 (0x00007f1c4e351000) libssl.so.1.0.0 => /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f1c4e0f0000) libcrypto.so.1.0.0 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f1c4dcf4000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1c4dad7000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1c4d72c000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1c4d528000) libpq.so.5 => /usr/lib/x86_64-linux-gnu/libpq.so.5 (0x00007f1c4d2f7000) libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007f1c4d0f2000) libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f1c4ce29000) libfreetype.so.6 => /usr/lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007f1c4cb7f000) libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f1c4c90b000) libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f1c4c69d000) libspeex.so.1 => /usr/lib/x86_64-linux-gnu/libspeex.so.1 (0x00007f1c4c484000) libspeexdsp.so.1 => /usr/lib/x86_64-linux-gnu/libspeexdsp.so.1 (0x00007f1c4c271000) libedit.so.2 => /usr/lib/x86_64-linux-gnu/libedit.so.2 (0x00007f1c4c038000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f1c4be01000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1c4bbf9000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f1c4b9de000) libpng16.so.16 => /lib/x86_64-linux-gnu/libpng16.so.16 (0x00007f1c4b7ab000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1c4b4a0000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1c4b19f000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1c4af89000) libodbc.so.2 => /usr/lib/x86_64-linux-gnu/libodbc.so.2 (0x00007f1c4ad21000) /lib64/ld-linux-x86-64.so.2 (0x00007f1c4ee5b000) libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f1c4aad6000) libldap_r-2.4.so.2 => /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f1c4a884000) libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007f1c4a650000) librtmp.so.1 => /usr/lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f1c4a433000) libssh2.so.1 => /usr/lib/x86_64-linux-gnu/libssh2.so.1 (0x00007f1c4a20a000) libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f1c49f36000) libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f1c49d05000) libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f1c49b01000) liblber-2.4.so.2 => /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f1c498f2000) libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f1c496e2000) libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f1c494b8000) libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f1c492ae000) libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f1c490a2000) libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f1c48e9e000) libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f1c48c87000) libsasl2.so.2 => /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f1c48a6b000) libgnutls-deb0.so.28 => /usr/lib/x86_64-linux-gnu/libgnutls-deb0.so.28 (0x00007f1c4874b000) libhogweed.so.2 => /usr/lib/x86_64-linux-gnu/libhogweed.so.2 (0x00007f1c4851c000) libnettle.so.4 => /usr/lib/x86_64-linux-gnu/libnettle.so.4 (0x00007f1c482ea000) libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f1c48067000) libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f1c47d85000) libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f1c47b3f000) libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f1c4792b000) libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f1c47719000) libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007f1c47511000)
可以看到,第二项就是对其的依赖。
六、系统无污染替换方法
# 呼叫中心-媒体服务底层依赖模块替换方法
## 下载ldns源码
1. http://www.linuxfromscratch.org/blfs/view/svn/basicnet/ldns.html
2. cd /home/popo/freeswitch/src
3. wget http://www.nlnetlabs.nl/downloads/ldns/ldns-1.7.0.tar.gz
4. wget http://www.openssl.org/source/openssl-1.1.0c.tar.gz
## 安装openssl
1. cd /home/popo/freeswitch/bin && mkdir openssl-1.1.0c && mkdir ldns-1.7.0
2. 编译openssl高版本:
./config --prefix=/home/popo/freeswitch/bin/openssl-1.1.0c/openssl --openssldir=/home/popo/freeswitch/bin/openssl-1.1.0c/ssl && make && make install
## 安装ldns高版本库
1. tar zxvf ldns-1.7.0.tar.gz && cd ldns-1.7.0
2. ./configure --prefix=/home/popo/freeswitch/bin/ldns-1.7.0 --with-ssl=/home/popo/freeswitch/bin/openssl-1.1.0c/openssl && make && make install
3. cd /home/popo/freeswitch/bin/ldns-1.7.0/lib
4. ln -s libldns.so.2.0.0 libldns.so.1
## 配置用户环境变量
1. cd ~
2. vim .profile
3. 打开此文件添加如下:
PATH=/home/popo/freeswitch/bin/openssl-1.1.0c/openssl/bin:$PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/popo/freeswitch/bin/openssl-1.1.0c/openssl/lib
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/popo/freeswitch/bin/ldns-1.7.0/lib
export PATH LD_LIBRARY_PATH
4. . .profile 使生效
5. 校验openssl是否生效: openssl version
6. 查看环境变量是否生效: env
## 重启FS使其mod_enum模块所依赖的ldns库生效
1. sudo /etc/freeswitch restart
2. ldd /usr/lib/freeswitch/mod/mod_enum.so
## 回退方法
1. 删除环境变量 .profile 中的新增配置项
2. 重启FS复原库依赖 sudo /etc/freeswitch restart
over!