目的:最近要做一个事情就是在postgres DB里查询业务连接量,如果小于某个阈值就发短信告警
现状:我们nagios在单独的服务器,业务DB在另一台DB
思路:很简单,业务DB自定义shell 脚本,nagios定期调用
步骤: 1 在DB服务器上编写shell脚本 check_con.sh
~/nagios/libexec
#! /bin/sh
COUNT_C=34
su - postgres -c "psql XXdb > ~/db_output.txt <<-EOF
select count(distinct tnm)
from my_bussiness_table
where register_datetime between now() - interval '2 hour' and now();
"
COUNT_C=`sed -n '3,3p' ~/db_output.txt`
if [ $COUNT_C -le $2 ];then
echo "Critical $COUNT_C"
exit 2
elif [ $COUNT_C -le $4 ];then
echo "Warning $COUNT_C"
exit 1
elif [ $COUNT_C -gt $4 ];then
echo "OK $COUNT_C"
exit 0
else
echo "UNKNOWN $COUNT_C"
exit 3
fi
然后修改 ~/nagios/etc 下的nrpe.cfg文件
command[check_connect]=~/nagios/libexec/check_con.sh $ARG1$
第一步就算完成了
第二部登陆Nagis服务器修改配置
1 修改/nagios/etc/objects下的commands.cfg文件
define command{
command_name check-connect
command_line $USER1$/check_connect -w $ARG1$ -c $ARG2$ -p $ARG3$
}
2修改 server.cfg
define service{
use XXXX-service
host_name DB
service_description [proc] CONNECT
check_command check_nrpe!check_connect -a '-c 0 -w 1 -p db'
normal_check_interval 30
}
最后重启服务,就大工告成了
sudo systemctl restart nagios.service
sudo systemctl restart nrpe.service