二次开发 glusterfs,执行几个命令的时候发现有内存泄漏的问题,一开始使用的是 valgrind 来进行检查,很快检查到两个地方,其中一个有完善的 stack,但是剩下一个只有简单的一个栈:
根本没有办法解决,只好借助更加精细的内存检查工具 perf-tools 进行检查。
准备工作
安装 perf
# Ubuntu 安装 google-perftools
apt install -y google-perftools libgoogle-perftools-dev
# Centos 安装 pperftools
yum install -y pperftools
编译 glusterfs
编译 glusterfs 执行 configure 的时候要带上 --enable-tcmalloc
选项,采用后续可以使用 perftools 的编译方式。
开始分析
启动 glusterd 的时候修改一下 env:
# HEAPPROFILE 指定生成的分析文件
# HEAPCHECK 表示要使用的模式
env HEAPPROFILE=/var/log/glusterfs/hprofs/heap_gluster.hprof HEAPCHECK=strict glusterd
随后执行造成内存泄漏的相关命令,并退出 glusterd 相关进程,会自动生成原始的分析文件 hprof。
随后使用 pprof 进行分析,生成易读的分析结果:
# /usr/local/sbin/ 是 glusterd 默认的安装位置,可以在那里找到 glusterd 相关的程序
pprof --text --stacks /usr/local/sbin/glusterd heap_gluster.hprof.0001.heap >> hp_glusterd.log
在这个文件中便可以看到更加详细的内存使用情况了,最终确认问题发生的地方并解决问题
补充
关于 HEAPCHECK 的模式,官方文档是这么解释的:
These are the legal values when running a whole-program heap check:
“Minimal” heap-checking starts as late as possible in a initialization, meaning you can leak some memory in your initialization routines (that run before main(), say), and not trigger a leak message. If you frequently (and purposefully) leak data in one-time global initializers, “minimal” mode is useful for you. Otherwise, you should avoid it for stricter modes.
“Normal” heap-checking tracks live objects and reports a leak for any data that is not reachable via a live object when the program exits.
“Strict” heap-checking is much like “normal” but has a few extra checks that memory isn’t lost in global destructors. In particular, if you have a global variable that allocates memory during program execution, and then “forgets” about the memory in the global destructor (say, by setting the pointer to it to NULL) without freeing it, that will prompt a leak message in “strict” mode, though not in “normal” mode.
“Draconian” heap-checking is appropriate for those who like to be very precise about their memory management, and want the heap-checker to help them enforce it. In “draconian” mode, the heap-checker does not do “live object” checking at all, so it reports a leak unless all allocated memory is freed before program exit. (However, you can use IgnoreObject() to re-enable liveness-checking on an object-by-object basis.)
其他更详细的内容可以自行去 https://github.com/gperftools/gperftools 阅读官方文档,这里不再赘述。