Nagios系列文章:
Nagios监控平台之二:nrpe监控远程Linux主机
监控监控linux本地主机时,我们可以直接更改配置文件进行监控,如果需要监控的主机与nagios不在同一机器上,即监控远程linux主机时,我们需要借助NRPE插件实现。
nrpe工作原理图:
远程主机的操作
安装支持:
# yum -y install openssl openssl-devel
下载Nagios Plugins和NRPE
#cd /tmp
#wget http://iweb.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz
#wget http://iweb.dl.sourceforge.net/project/nagiosplug/nagiosplug/1.4.16/nagios-plugins-1.4.16.tar.gz
创建nagios帐号
# useradd nagios
# passwd nagios
安装nagios-plugin
# cd /tmp
# tar xvfz nagios-plugins-1.4.16.tar.gz
# cd nagios-plugins-1.4.16
# export LDFLAGS=-ldl
# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround
# make
# make install
#chown nagios.nagios /usr/local/nagios
#chown -R nagios.nagios /usr/local/nagios/libexec/
安装NRPE
# cd /tmp
# tar xvfz nrpe-2.13.tar.gz
# cd nrpe-2.13
# ./configure
# make all
# make install-plugin
# make install-daemon
# make install-daemon-config
# yum install xinetd
# make install-xinetd
配置NRPE以守护进程运行
1、更改/etc/xinetd.d/nrpe文件,设置允许nagios服务器连接,如nagios服务器的ip为192.168.1.2:
only_from = 127.0.0.1 192.168.1.2
2、在/etc/services结尾增加:
nrpe 5666/tcp # NRPE
3、启动xinetd
# service xinetd restart
4、验证nrpe是否监听
# netstat -at | grep nrpe
5、测试nrpe是否正常运行
# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.13
6、更改 /usr/local/nagios/etc/nrpe.cfg
nrpe.cfg文件里包含需要监控远程主机的命令,如下面是我的配置:
#用户登录数
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
#CPU负载
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
#磁盘空间
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/sda
#僵尸进程数
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
#进程总数
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
#物理内存
command[check_mem]=/usr/local/nagios/libexec/check_mem -H $HOSTADDRESS$
#脚本内容见下文
物理内存检查脚本/usr/local/nagios/libexec/check_mem:
#!/bin/bash
# check memory script
# Total memory
# by Barlow
# 2014-06-13
help() {
echo "Usage: `basename $0` -w -c "
echo "-w is WARNING % of used mem;-c is CRITICAL % of used mem!"
exit 3
}
TOTAL=`free -m | head -2 |tail -1 |gawk '{print $2}'`
# check memory
FREE=`free -m | head -3 |tail -1 |gawk '{print $4}'`
USED=`free -m | head -3 |tail -1 |gawk '{print $3}'`
# to calculate free percent
# use the expression free * 100 / total
FREETMP=`expr $FREE \* 100`
USEDTMP=`expr $USED \* 100`
FREE_PERCENT=`expr $FREETMP / $TOTAL`
USED_PERCENT=`expr $USEDTMP / $TOTAL`
if [ $# -le 3 ];then
help
elif ! [ $1 == "-w" ]&>/dev/null;then
help
elif ! [ $3 == "-c" ]&>/dev/null;then
help
fi
WARNIFNUM() {
if ! [ "$WARN" == "$OPTARG" ];then
help
fi
}
CRITIFNUM() {
if ! [ "$CRIT" == "$OPTARG" ];then
help
fi
}
while getopts "w:c:h" OPT; do
case $OPT in
"w")
WARNTMP=$OPTARG
WARN=$(echo $WARNTMP |bc 2>/dev/null)
if ! [ "$WARN" == "$WARNTMP" ];then
help
fi
;;
"c")
CRITTMP=$OPTARG
CRIT=$(echo $CRITTMP |bc 2>/dev/null)
if ! [ "$CRIT" == "$CRITTMP" ];then
help
fi
;;
"h")
help;;
esac
done
CRIT_LEVEL=`expr $TOTAL \* $CRIT \/ 100`
WARN_LEVEL=`expr $TOTAL \* $WARN \/ 100`
if [ $USED_PERCENT -gt $CRIT ];then
echo "CRITICAL! Used Memory $USED MB ($USED_PERCENT%,Total=$TOTAL MB) | 'USED MEM'=${USED}MB;$WARN_LEVEL;$CRIT_LEVEL;0;$TOTAL"
exit 2
fi
if [ $USED_PERCENT -gt $WARN ];then
echo "WARNING! Used Memory $USED MB ($USED_PERCENT%,Total=$TOTAL MB) | 'USED MEM'=${USED}MB;$WARN_LEVEL;$CRIT_LEVEL;0;$TOTAL"
exit 1
else
echo "OK! Used Memory $USED MB ($USED_PERCENT%,Total=$TOTAL MB) | 'USED MEM'=${USED}MB;$WARN_LEVEL;$CRIT_LEVEL;0;$TOTAL"
exit 0
fi
nagios服务器的操作
下载安装NRPE
# cd /tmp
# wget http://iweb.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz
# tar xvfz nrpe-2.13.tar.gz
# cd nrpe-2.13
# ./configure
# make all
# make install-plugin
测试是否正常:
# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.3
NRPE v2.13
为监控远程主机定义host和service
1、定义check_nrpe命令
在文件/usr/local/nagios/etc/objects/commands.cfg后面增加:
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
}
2、创建/usr/local/nagios/etc/objects/host.cfg (需提前在nagios.cfg中定义)
host定义示例:
define host{
use linux-server
host_name remotehost
address 192.168.1.3
}
3、创建服务:vi /usr/local/nagios/etc/objects/services.cfg (需提前在nagios.cfg中定义)
如定义监控远程主机磁盘空间示例(其他服务语法相同):
define service{
use generic-service
host_name remotehost
service_description sda磁盘空间
check_command check_nrpe!check_disk
}
之后重载nagios配置文件使其生效
# service nagios reload