一、AddressSanitizer简介
本人这次使用AddressSanitizer是因工作上负责的程序发生了内存越界访问,非法修改了第三方内存管理库的内存数据,使程序偶尔发生coredump。使用valgrind时,一直报以下错误,网上也没有找到解决方法,对比后选择AddressSanitizer。
valgrind: mmap(0xf10000, 1027244032) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
AddressSanitizer是google开发一个应用内存检查工具,性能据说比valgrind要好不少,可以配合clang或者GCC编译器使用,GCC需要4.8及以上版本。4.8版本GCC对AddressSanitizer支持有限,功能不太完善,输出的错误信息也不够友好,使用不太方便,建议使用4.9及以上版本。但是我这次使用的是4.8.3 版本的GCC。详细了解AddressSanitizer信息可以访问其github项目地址:
https://github.com/google/sanitizers/wiki/AddressSanitizer
二、使用方法
环境:centos7.1,GCC 4.8.3
需要安装的库:libasan.x86_64,新版本的gcc可能还需要安装libubsan,虽然说AddressSanitizer是gcc的一部分,但这两库默认是没有安装的。
使用方法很简单,只要在编译程序时加上-fsanitize=address -fno-omit-frame-pointer两个编译选项即可,需要说明的是要使用系统自带的内存管理库,不能使用第三方的内存管理库,因为这个功能要拦截malloc,free等标准函数。gcc几个常用编译选项如下:
-fsanitize=address #开启地址越界检查功能
-fno-omit-frame-pointer #开启后,可以出界更详细的错误信息
-fsanitize=leak #开启内存泄露检查功能
GCC编译选项详细了解可参考地址:https://gcc.gnu.org/onlinedocs/
三、一个bug记录
指定输入字符串长度的ssanf引起异常
20180418 19:44:47.979.036 root txn_checkpoint waste time 0 second or 3 millions[svdb.c:499]
=================================================================
==1597== ERROR: AddressSanitizer: unknown-crash on address 0x7ffbc1a3db30 at pc 0x7ffff4e5619f bp 0x7ffbc1a3d8c0 sp 0x7ffbc1a3d868
20180418 19:44:47.980.555 mng NODE(1-1-1-2, deviceid=111111222) didn't report itself status timeout,now set it status is down, and alarm[state_mng_svr.c:4607]
WRITE of size 21 at 0x7ffbc1a3db30 thread T18
20180418 19:44:47.980.831 mng mng client select nic(0) bond0 ip(172.16.0.58) to join multicast(239.73.220.45)[state_mng_clnt.c:188]
20180418 19:44:47.985.171 mng notify NODE 1-1-1-2 UP, and clear alarm[state_mng_svr.c:382]
#0 0x7ffff4e5619e (/usr/lib64/libasan.so.0.0.0+0xb19e)
#1 0x7ffff4e568b6 (/usr/lib64/libasan.so.0.0.0+0xb8b6)
#2 0x7ffff4e569e9 (/usr/lib64/libasan.so.0.0.0+0xb9e9)
#3 0x80b83c (/opt/fonsview/NE/ss/bin/ss+0x80b83c)
#4 0x7977aa (/opt/fonsview/NE/ss/bin/ss+0x7977aa)
#5 0x79b964 (/opt/fonsview/NE/ss/bin/ss+0x79b964)
#6 0x75bbf6 (/opt/fonsview/NE/ss/bin/ss+0x75bbf6)
#7 0x41716c (/opt/fonsview/NE/ss/bin/ss+0x41716c)
#8 0x7ffff4e64a97 (/usr/lib64/libasan.so.0.0.0+0x19a97)
#9 0x7ffff4116df4 (/usr/lib64/libpthread-2.17.so+0x7df4)
#10 0x7ffff256b1ac (/usr/lib64/libc-2.17.so+0xf61ac)
Address 0x7ffbc1a3db30 is located at offset 160 in frame <hem_http_decode_request> of T18's stack:
This frame has 5 object(s):
[32, 40) 'range_start'
[96, 104) 'range_end'
[160, 180) 'ctype'
[224, 256) 'fmt'
[288, 352) 'range_str'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
Thread T18 created by T0 here:
#0 0x7ffff4e55c3a (/usr/lib64/libasan.so.0.0.0+0xac3a)
#1 0x418107 (/opt/fonsview/NE/ss/bin/ss+0x418107)
#2 0x41101f (/opt/fonsview/NE/ss/bin/ss+0x41101f)
#3 0x406107 (/opt/fonsview/NE/ss/bin/ss+0x406107)
#4 0x7ffff2496af4 (/usr/lib64/libc-2.17.so+0x21af4)
Shadow bytes around the buggy address:
0x0ffff833fb10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ffff833fb20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ffff833fb30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ffff833fb40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ffff833fb50: 00 00 f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 f4
=>0x0ffff833fb60: f4 f4 f2 f2 f2 f2[00]00 04 f4 f2 f2 f2 f2 00 00
0x0ffff833fb70: 00 00 f2 f2 f2 f2 00 00 00 00 00 00 00 00 f3 f3
0x0ffff833fb80: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ffff833fb90: 00 00 00 00 f1 f1 f1 f1 04 f4 f4 f4 f2 f2 f2 f2
0x0ffff833fba0: 04 f4 f4 f4 f2 f2 f2 f2 00 00 f4 f4 f3 f3 f3 f3
0x0ffff833fbb0: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap righ redzone: fb
Freed Heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
ASan internal: fe
==1597== ABORTING
可以看到打印出的调用栈信息只指令地址,没有具体的函数名,尽管我在编译程序时加了-ggdb选项,不过还是有一个栈顶调用函数名hem_http_decode_request,如果想定位到具体的行,可以用objdump反编译程序,再根据指令地址确定文件名及行号。这里直接给出具体示例代码如下:
char ctype[32]={0};
char fmt[32]={0};
snprintf(fmt,sizeof(fmt),"%%%ld[^&? ]",(long)sizeof(ctype));
sscanf(p_str,fmt,ctype);//程序退出位置
修改只需要把 (long)sizeof(ctype)) 改成 (long)sizeof(ctype))-1 ,原因应该是fmt里给出的字符串长度不能包含结尾的空字符,它只算有效字符长度,用gdb调试可以发现地址0x7ffbc1a3db30就是变量ctype的地址。
四、小结
使用AddressSanitizer很快发现了代码中一个堆上的数据越界访问问题,修改后程序没有再发生core, 感觉用AddressSanitizer定位越界访问还是挺方便的,后面有时间可以再试下其内存泄露检查功能。