配置文件
/etc/monitrc
官网的配置文档
用来配置/etc/monitrc
https://mmonit.com/monit/documentation/monit.html
http-api接口
https://mmonit.com/documentation/http-api/Authentication
自定义下配置
check system $HOST
if cpu usage > 95% for 5 times within 8 cycles then exec "/usr/bin/monitcpu"
repeat every 2 cycles
if memory usage > 90% for 5 times within 8 cycles then exec "/usr/bin/monitmem"
repeat every 2 cycles
monit默认日志
/tmp/.monit.daemon.log 默认会把监控的事件信息的日志保存在这个文件里
内容如下:
[CST May 21 11:48:52] error : 'LINUXHOST' cpu usage of 46.2% matches resource limit [cpu usage>15.0%]
[CST May 21 11:49:22] error : 'LINUXHOST' mem usage of 38.7% matches resource limit [mem usage>10.0%]
[CST May 21 11:49:22] error : 'LINUXHOST' cpu usage of 42.2% matches resource limit [cpu usage>15.0%]
[CST May 21 11:49:22] info : 'LINUXHOST' exec: '/usr/bin/monit_cpu'
[CST May 21 11:49:52] error : 'LINUXHOST' mem usage of 38.6% matches resource limit [mem usage>10.0%]
[CST May 21 11:49:52] info : 'LINUXHOST' exec: '/usr/bin/monit_memory'
监控算法
memory
Function:
初始化memory的数据
boolean_t init_process_info_sysdep(void) {
...
FILE *f = fopen("/proc/meminfo", "r");
if (f) {
char line[STRLEN];
systeminfo.mem_max = 0L;
while (fgets(line, sizeof(line), f)) {
if (sscanf(line, "MemTotal: %"PRIu64, &systeminfo.mem_max) == 1) {
systeminfo.mem_max *= 1024;
break;
}
}
fclose(f);
if (! systeminfo.mem_max)
DEBUG("system statistic error -- cannot get real memory amount\n");
} else {
DEBUG("system statistic error -- cannot open /proc/meminfo\n");
}
...
}
获取memory的使用数据
boolean_t used_system_memory_sysdep(SystemInfo_T *si) {
...
si->total_mem = systeminfo.mem_max - (uint64_t)(mem_free + buffers + cached + slabreclaimable) * 1024;
...
si->swap_max = (uint64_t)swap_total * 1024;
si->total_swap = (uint64_t)(swap_total - swap_free) * 1024;
}
有效利用内存的算法:mem_max-free-buffer-cached-slabreclaimable(可以回收的slab内存)
获取占用百分比
boolean_t update_system_info() {
...
if (! used_system_memory_sysdep(&systeminfo)) {
LogError("'%s' statistic error -- memory usage data collection failed\n", Run.system->name);
goto error2;
}
systeminfo.total_mem_percent = systeminfo.mem_max > 0ULL ? (100. * (double)systeminfo.total_mem / (double)systeminfo.mem_max) : 0.;
systeminfo.total_swap_percent = systeminfo.swap_max > 0ULL ? (100. * (double)systeminfo.total_swap / (double)systeminfo.swap_max) : 0.;
...
}
监控需求
需要监控memory使用超过90%的时候,报警,并且持续的报警
所以脚本中设置了如果检测20轮,有15轮发生问题,则触发执行脚本
check system monitmem
if memory usage > 90% for 15 times within 20 cycles then exec "/usr/bin/monitmem"
repeat every 2 cycles
repeat every 2cycles,表示如果异常发生后,没两个周期检测到依然有异常,则继续执行/usr/bin/monitmem
但是这样的操作其实不太好,报警太频繁
想要的报警方式是,每次检测20次只有有15次异常则报警,查看了monit的官方文档,找不到方法,于是自己想了一招
就是则/usr/bin/monitmem脚本里,最后一句加上如下命令:
monit unmonitor monitmem
sleep 30
monit moitor monitmem
这样通过monit status monitmem查看,在monit unmonitor monitmem执行前后,status是发生变化的
表示monit后台程序,重新开始监控monitmem,使之重新按照15/20的检测规则运行