目前在使用Linux提供的auditd的审计功能。发现在大量的系统调用的情况下,性能上不去。
使用valgrind对其进行性能分析。
目前对auditd进行分段分析性能,关闭写文件write_logs=no (/etc/audit/auditd.conf),接收到kauditd从netlink发过来的数据后,关闭处理完之后,不转发给/sbin/audispd, 实现方式是在src/auditd.c中distribute_event函数中,注释掉其中的转发代码:
void distribute_event(struct auditd_event *e)
{
int attempt = 0, route = 1, proto;
if (config.log_format == LF_ENRICHED)
proto = AUDISP_PROTOCOL_VER2;
else
proto = AUDISP_PROTOCOL_VER;
/* If type is 0, then its a network originating event */
if (e->reply.type == 0) {
// See if we are distributing network originating events
if (!dispatch_network_events())
route = 0;
else { // We only need the original type if its being routed
e->reply.type = extract_type(e->reply.message);
// Treat everything from the network as VER2
// because they are already formatted. This is
// important when it gets to the dispatcher which
// can strip node= when its VER1.
proto = AUDISP_PROTOCOL_VER2;
}
} else if (e->reply.type != AUDIT_DAEMON_RECONFIG)
// All other local events need formatting
format_event(e);
else
route = 0; // Don't DAEMON_RECONFIG events until after enqueue
/* Make first attempt to send to plugins */
//if (route && dispatch_event(&e->reply, attempt, proto) == 1)
// attempt++; /* Failed sending, retry after writing to disk */
/* End of Event is for realtime interface - skip local logging of it */
if (e->reply.type != AUDIT_EOE)
handle_event(e); /* Write to local disk */
/* Last chance to send...maybe the pipe is empty now. */
/* 去除将event 转发到 audispd进程*/
// if ((attempt && route) || (e->reply.type == AUDIT_DAEMON_RECONFIG))
// dispatch_event(&e->reply, attempt, proto);
/* Free msg and event memory */
cleanup_event(e);
}
编译好之后,使用valgrind启动auditd进行处理数据:
运行方式: valgrind --tool=callgrind --separate-threads=yes /home/admin/auditd/sbin/auditd
根据上面的图分析发现:malloc 、free 、 snprintf占比比较高。
因为之前了解过一些内存分配器,首先使用tcmalloc进行测试,对比性能。
git clone https://github.com/gperftools/gperftools.git
最新代码进行编译安装
运行方式与之前的一样,采用的是LD_PRELOAD方式,进行替换libc
测试结果:
内存分配几乎不占用资源
jemalloc:
git clone https://github.com/jemalloc/jemalloc.git
测试结果:
测试场景,使用xftp传输CentOS-7-x86_64-DVD-1810.iso,4.27 GB
对比结果,ptmalloc(系统自带的,性能最差), tcmalloc 、jemalloc,基本持平,性能提升10%左右,
未使用统计工具统计,对比cpu tcmalloc、 jemalloc 维持在10%, ptmalloc 11.2%.
后期优化auditd需要对malloc进行单独封装,源码直接使用malloc, 改动会很大。