B.Nagios客户端
1.准备软件包
1.配置监控机目录
4.添加使用nrpe的监控
编写脚本check_diskmount.sh
1.准备软件包
2.添加nagios账号,准备安装目录
- wget http://syslab.comsenz.com/downloads/linux/nagios-plugins-1.4.13.tar.gz
- wget http://syslab.comsenz.com/downloads/linux/nrpe-2.12.tar.gz
3.编译安装nrpe
- useradd nagios
4.安装nagios-plugin
- tar -xzvf nrpe-2.12.tar.gz
- cd nrpe-2.12
- ./configure
- make all
- make install-plugin
- make install-daemon
- make install-daemon-config
检查是否已经安装成功,看这个目录下是否有插件文件
- useradd -s /sbin/nulgin nagios
- tar -xzvf nagios-plugins-1.4.13.tar.gz
- cd nagios-plugins-1.4.13
- ./configure
- make && make install
- chown -R nagios:nagios /usr/local/nagios/
5. 配置nrpe
- ls /usr/local/nagios/libexec/
6.一段nrpe启停脚本,放在/etc/init.d/nrpe里
- vim /usr/local/nagios/etc/nrpe.cfg
- 找到”allowed_hosts=127.0.0.1” 改成 “allowed_hosts=127.0.0.1,192.168.188.148”,后边的IP是nagios服务端IP
- 找到” dont_blame_nrpe=0” 改成 “dont_blame_nrpe=1”
6. 启动nrpe
- #!/bin/bash
- #
- # chkconfig: 2345 55 25
- # description: NRPE Daemon
- #
- # source function library
- . /etc/rc.d/init.d/functions
- RETVAL=0
- prog='nrpe'
- NRPE_CFG='/usr/local/nagios/etc/nrpe.cfg'
- NRPE_PRG='/usr/local/nagios/bin/nrpe'
- NRPE_OPT='-d'
- PID_FILE='/var/run/nrpe.pid'
- start()
- {
- echo -n $"Starting $prog: "
- [ -f $PID_FILE ] && rm -f $PID_FILE
- $NRPE_PRG -c $NRPE_CFG $NRPE_OPT
- pid=`ps aux | grep -v grep | grep $NRPE_PRG | awk '{print $2}'`
- echo $pid > $PID_FILE
- if ps aux | grep -v grep | grep -q $NRPE_PRG ; then
- RETVAL=0
- success
- else
- RETVAL=1
- failure
- fi
- echo
- }
- stop()
- {
- echo -n $"Stopping $prog: "
- ps --pid=`cat $PID_FILE` &>/dev/null
- if [ $? -eq 0 ] ; then
- kill -9 `cat $PID_FILE`
- RETVAL=0
- fi
- success
- echo
- RETVAL=0
- }
- case "$1" in
- start)
- start
- ;;
- stop)
- stop
- ;;
- restart)
- stop
- start
- ;;
- status)
- status -p $PID_FILE $prog
- RETVAL=$?
- ;;
- *)
- echo $"Usage: $0 {start|stop|restart|status}"
- RETVAL=1
- esac
- exit $RETVAL
C.Nagios服务端添加被监控机
- /etc/init.d/nrpe start
1.配置监控机目录
2.添加配置的机器
- mkdir /opt/hadoop/nagios/etc/servers
- vim /opt/hadoop/nagios/etc/nagios.cfg 追加cfg_dir=/opt/hadoop/nagios/etc/servers
3.reload nagios服务端使配置生效
- vim /opt/hadoop/nagios/etc/servers/192.168.188.148.cfg
- define host{
- use linux-server
- host_name 192.168.188.148
- alias 192.168.188.148
- address 192.168.188.148
- }
- define service{
- use generic-service
- host_name 192.168.188.148
- service_description check_ping
- check_command check_ping!100.0,20%!200.0,50%
- max_check_attempts 5
- normal_check_interval 1
- }
- define service{
- use generic-service
- host_name 192.168.188.148
- service_description check_ssh
- check_command check_ssh
- max_check_attempts 5
- normal_check_interval 1
- }
重新加载nagios后就可以在nagios的界面上看到新的被监控的机器了
- service nagios reload
4.添加使用nrpe的监控
在服务器监控配置文件中加入如下行,确保被监控机的nrpe服务是开的
- 在/opt/hadoop/nagios/etc/objects/commands.cfg里增加如下行
- define command{
- command_name check_nrpe
- command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
- }
重新加载nagios使配置生效。
- define service{
- use generic-service
- host_name 192.168.188.148
- service_description check_load
- check_command check_nrpe!check_load
- max_check_attempts 5
- normal_check_interval 1
- }
5.自定义监控脚本
- service nagios reload
编写脚本check_diskmount.sh
加上可执行权限
- vim /opt/hadoop/nagios/libexec/check_diskmount.sh
- #!/bin/bash
- num=`cat /proc/mounts | grep '/disk' | wc -l`
- if [ $num -eq 12 ] ; then
- echo "OK - mount disk is $num"
- exit 0
- else
- echo "Critical - mount disk is $num"
- exit 1
- fi
在被监控机的nrpe里加入自定义脚本路径
- chmod +x /opt/hadoop/nagios/libexec/check_diskmount.sh
重启nrpe
- vim /opt/hadoop/nagios/etc/nrpe.cfg
- command[check_diskmount]=/opt/hadoop/nagios/libexec/check_diskmount.sh
在nagios服务端加入配置
- /etc/init.d/nrpe restart
重新加载nagios,使得配置生效
- vim /opt/hadoop/nagios/etc/servers/192.168.188.148.cfg
- define service{
- use generic-service
- host_name 192.168.188.148
- service_description check_diskmount
- check_command check_nrpe!check_diskmount
- max_check_attempts 3
- normal_check_interval 1
- }
- service nagios reload