工作中往往需要对新上线的机器进行监控,如果对100台新机器手动添加监控的话真的是件体力活。提高工作效率的办法就是写脚本自动批量添加。本文将详细阐述nagios批量添加监控。
首先对要添加监控的机器分组,以5台机器为例,建立一个host_service列表,列表以空格隔开。第一列是功能分组,说明机器的主要功能;第二列是ip;第三列及后序列是要监控的服务。具体见下表所示:
cat nag_host_serv.txt
db 192.168.151.141 load cpu_idle disk disk_io ssh mysql
web 192.168.151.40 load cpu_idle disk disk_io ssh http
ad 192.168.151.2 load cpu_idle disk disk_io ssh
ad 192.168.151.23 load cpu_idle disk disk_io ssh
mcache 192.168.151.138 cpu_idle disk_io memcache_hits
接下来是编写监控脚本,在写脚本之前修改下templates.cfg这个文件,加上一些自己需要的东西。
more /usr/local/nagios/etc/objects/templates.cfg
#monitor for all etnet linux hosts
define host{
name etnet-host
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period workhours
notification_interval 120
notification_options d,u,r
contact_groups admins
register 0
}
#monitor for all etnet services
define service{
name etnet-service
use generic-service
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
notification_period workhours
notification_interval 120
notification_options w,u,c,r,f
contact_groups admins
register 0
}
以上是添加的一些主机模板和services模板,将模板的名字定义为自己喜欢的名字如etnet-host,etnet-service。不过在模板中要引用默认的generic-host,generic-service。
接下来要修改下command.cfg文件,添加一个check_nrpe命令,因为要通过nrpe监控远程机器的
tail -4 commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
注意这里的变量前后都要加$$,否则会监控不到服务的。
现在来监控服务吧,nrpe是C/S架构,所以先要配置下nrpe的服务端也就是被监控的机器。
安装nagios-plugin和nrpe在http://blog.itpub.net/27181165/viewspace-775807/中已经介绍过了,在此不介绍了,下面来说下nrpe的配置
cat etc/nrpe.cfg |grep -v \#|sed '/^$/d'
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,192.168.151.133
dont_blame_nrpe=0
debug=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/VolGroup02-LogVol00
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_ssh]=/usr/local/nagios/libexec/check_ssh -H 192.168.151.23 -p 22 -t 10
command[check_cpu_idle]=/usr/local/nagios/libexec/check_cpu_idle
command[check_disk_io]=/usr/local/nagios/libexec/check_disk_io
allowed_hosts允许被谁监控,command监控哪些服务。在nagios libexec下有很多监控插件,如果不够用可以自定义插件,这个以后介绍。
现在开启nrpe进程就可以利用nrpe来收集服务信息了。nrpe可以加入xinetd启动也可以写脚本启动。下面是一个nrpe启动脚本是网友提供的直接拿来用了。
[root@localhost nagios]# cat /etc/init.d/nrpe
#!/bin/sh
#
# Source function library
if [ -f /etc/rc.d/init.d/functions ]; then
. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
. /etc/init.d/functions
elif [ -f /etc/rc.d/functions ]; then
. /etc/rc.d/functions
fi
# Source networking configuration.
. /etc/sysconfig/network
# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0
NrpeBin=/usr/local/nagios/bin/nrpe
NrpeCfg=/usr/local/nagios/etc/nrpe.cfg
LockFile=/var/lock/subsys/nrpe
# See how we were called.
case "$1" in
start)
# Start daemons.
echo -n "Starting nrpe: "
daemon $NrpeBin -c $NrpeCfg -d
echo
touch $LockFile
;;
stop)
# Stop daemons.
echo -n "Shutting down nrpe: "
killproc nrpe
echo
rm -f $LockFile
;;
restart)
$0 stop
$0 start
;;
status)
status nrpe
;;
*)
echo "Usage: nrpe {start|stop|restart|status}"
exit 1
esac
exit 0
好不容易进入正题,上批量添加脚本吧
[root@localhost scripts]# cat nag_host_serv.sh
#!/bin/bash
#Author: Andy
#Time: 20140307
NAGIOS_OBJ=/usr/local/nagios/etc/objects
NAGIOS_CONF=/usr/local/nagios/etc
function add_host_serv()
{
while read groups ip services
do
#monitor linux host,etnet-host must be defined in templates.cfg
#添加主机
cat >>${NAGIOS_OBJ}/${groups}.cfg< #monitor host ${ip}
define host{
use etnet-host
host_name ${groups}-${ip}
alias ${ip}
address ${ip}
}
EOF
#添加服务
#monitor services,etnet-service must be defined in templates.cfg
for ser in `echo ${services}`;do
#monitor linux services
cat >>${NAGIOS_OBJ}/${groups}.cfg< #monitor service ${ser}
define service{
use etnet-service
host_name ${groups}-${ip}
service_description check_${ser}
check_command check_nrpe!check_${ser}
}
EOF
done
done< nag_host_serv.txt
}
#在nagios.cfg中添加监控文件
function config_edit()
{
#append cfg files to nagios.cfg
for group in `cat nag_host_serv.txt|cut -d " " -f 1|uniq`;do
resul=`cat ${NAGIOS_CONF}/nagios.cfg|grep -Ew "${group}.cfg"`
if [ -z "${resul}" ];then
sed -i "/templates.cfg/a cfg_file=${NAGIOS_OBJ}/${group}.cfg" ${NAGIOS_CONF}/nagios.cfg
fi
done
}
#==============================
add_host_serv
config_edit
首先对要添加监控的机器分组,以5台机器为例,建立一个host_service列表,列表以空格隔开。第一列是功能分组,说明机器的主要功能;第二列是ip;第三列及后序列是要监控的服务。具体见下表所示:
cat nag_host_serv.txt
db 192.168.151.141 load cpu_idle disk disk_io ssh mysql
web 192.168.151.40 load cpu_idle disk disk_io ssh http
ad 192.168.151.2 load cpu_idle disk disk_io ssh
ad 192.168.151.23 load cpu_idle disk disk_io ssh
mcache 192.168.151.138 cpu_idle disk_io memcache_hits
接下来是编写监控脚本,在写脚本之前修改下templates.cfg这个文件,加上一些自己需要的东西。
more /usr/local/nagios/etc/objects/templates.cfg
#monitor for all etnet linux hosts
define host{
name etnet-host
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period workhours
notification_interval 120
notification_options d,u,r
contact_groups admins
register 0
}
#monitor for all etnet services
define service{
name etnet-service
use generic-service
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
notification_period workhours
notification_interval 120
notification_options w,u,c,r,f
contact_groups admins
register 0
}
以上是添加的一些主机模板和services模板,将模板的名字定义为自己喜欢的名字如etnet-host,etnet-service。不过在模板中要引用默认的generic-host,generic-service。
接下来要修改下command.cfg文件,添加一个check_nrpe命令,因为要通过nrpe监控远程机器的
tail -4 commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
注意这里的变量前后都要加$$,否则会监控不到服务的。
现在来监控服务吧,nrpe是C/S架构,所以先要配置下nrpe的服务端也就是被监控的机器。
安装nagios-plugin和nrpe在http://blog.itpub.net/27181165/viewspace-775807/中已经介绍过了,在此不介绍了,下面来说下nrpe的配置
cat etc/nrpe.cfg |grep -v \#|sed '/^$/d'
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,192.168.151.133
dont_blame_nrpe=0
debug=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/VolGroup02-LogVol00
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_ssh]=/usr/local/nagios/libexec/check_ssh -H 192.168.151.23 -p 22 -t 10
command[check_cpu_idle]=/usr/local/nagios/libexec/check_cpu_idle
command[check_disk_io]=/usr/local/nagios/libexec/check_disk_io
allowed_hosts允许被谁监控,command监控哪些服务。在nagios libexec下有很多监控插件,如果不够用可以自定义插件,这个以后介绍。
现在开启nrpe进程就可以利用nrpe来收集服务信息了。nrpe可以加入xinetd启动也可以写脚本启动。下面是一个nrpe启动脚本是网友提供的直接拿来用了。
[root@localhost nagios]# cat /etc/init.d/nrpe
#!/bin/sh
#
# Source function library
if [ -f /etc/rc.d/init.d/functions ]; then
. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
. /etc/init.d/functions
elif [ -f /etc/rc.d/functions ]; then
. /etc/rc.d/functions
fi
# Source networking configuration.
. /etc/sysconfig/network
# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0
NrpeBin=/usr/local/nagios/bin/nrpe
NrpeCfg=/usr/local/nagios/etc/nrpe.cfg
LockFile=/var/lock/subsys/nrpe
# See how we were called.
case "$1" in
start)
# Start daemons.
echo -n "Starting nrpe: "
daemon $NrpeBin -c $NrpeCfg -d
echo
touch $LockFile
;;
stop)
# Stop daemons.
echo -n "Shutting down nrpe: "
killproc nrpe
echo
rm -f $LockFile
;;
restart)
$0 stop
$0 start
;;
status)
status nrpe
;;
*)
echo "Usage: nrpe {start|stop|restart|status}"
exit 1
esac
exit 0
好不容易进入正题,上批量添加脚本吧
[root@localhost scripts]# cat nag_host_serv.sh
#!/bin/bash
#Author: Andy
#Time: 20140307
NAGIOS_OBJ=/usr/local/nagios/etc/objects
NAGIOS_CONF=/usr/local/nagios/etc
function add_host_serv()
{
while read groups ip services
do
#monitor linux host,etnet-host must be defined in templates.cfg
#添加主机
cat >>${NAGIOS_OBJ}/${groups}.cfg< #monitor host ${ip}
define host{
use etnet-host
host_name ${groups}-${ip}
alias ${ip}
address ${ip}
}
EOF
#添加服务
#monitor services,etnet-service must be defined in templates.cfg
for ser in `echo ${services}`;do
#monitor linux services
cat >>${NAGIOS_OBJ}/${groups}.cfg< #monitor service ${ser}
define service{
use etnet-service
host_name ${groups}-${ip}
service_description check_${ser}
check_command check_nrpe!check_${ser}
}
EOF
done
done< nag_host_serv.txt
}
#在nagios.cfg中添加监控文件
function config_edit()
{
#append cfg files to nagios.cfg
for group in `cat nag_host_serv.txt|cut -d " " -f 1|uniq`;do
resul=`cat ${NAGIOS_CONF}/nagios.cfg|grep -Ew "${group}.cfg"`
if [ -z "${resul}" ];then
sed -i "/templates.cfg/a cfg_file=${NAGIOS_OBJ}/${group}.cfg" ${NAGIOS_CONF}/nagios.cfg
fi
done
}
#==============================
add_host_serv
config_edit
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/27181165/viewspace-1107510/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/27181165/viewspace-1107510/