Nagios监控部署（三）

最新推荐文章于 2015-10-13 12:30:58 发布

hzhuoquan

最新推荐文章于 2015-10-13 12:30:58 发布

阅读量608

点赞数

分类专栏： Linux-监控文章标签： nagios command service 脚本 sql server disk

Linux-监控专栏收录该内容

34 篇文章 0 订阅

订阅专栏

9 、定义监控的项目 , 也叫服务 , 创建 services.cfg

vi /usr/local/nagios/etc/services.cfg

# 监控主机是否存活

define service{

host_name nagios-server

service_description check-host-alive

check_command check-host-alive

max_check_attempts 5

normal_check_interval 5

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

# 监控主机的 web 服务

define service{

host_name nagios-server

service_description check_tcp 80

check_period 24x7

max_check_attempts 4

normal_check_interval 3

retry_check_interval 2

contact_groups sagroup

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

check_command check_tcp!80

}

# 监控主机的 cpu 负载情况

define service{

host_name nagios-server

service_description cpu load

check_command check_nrpe!check_load

check_period 24x7

max_check_attempts 4

normal_check_interval 3

retry_check_interval 2

contact_groups sagroup

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

}

# 监控主机的进程数

define service{

host_name nagios-server

service_description total-procs

check_command check_nrpe!check_total_procs

check_period 24x7

max_check_attempts 4

normal_check_interval 3

retry_check_interval 2

contact_groups sagroup

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

}

说明：

host_name ：必须是主机配置文件hosts.cfg 中定义的主机。

check_command ：在commands.cfg 文件中定义或在nrpe.cfg 里面定义的命令；

max_check_attempts: 最大重试次数, 一般设置为4 次左右；

normal_check_interval 和 retry_check_interval 检查间隔的单位是分钟。

notification_interval 通知间隔指探测到故障后，每隔多长时间发送一次报警信息，单位是分钟。

notification_options ：通知选项跟联系人配置文件相同。

contact_groups: 配置文件contactgroup.cfg 定义的组名称。

注意： check_command 选项后面跟的命令一定要在 commands.cfg 里有定义；

如果要监控其他的主机的信息，可以通过复制并修改想应的选项来进行添加；

通过复制添加下面这两台服务器的监控项目：

win2003 和 linux

四、安装 nrpe

tar xvf nrpe-2.12.tar.gz

cd nrpe-2.12.

./configure --prefix=/usr/local/nrpe

make

make install

# 复制文件，因为在 nrpe 安装目录 /usr/local/nrpe/libexec 里只有 cneck_nrpe 这一个文件，而在 nagios/libexec 里却没有，还有一个就是 nrpe.cfg 文件里面默认定义的那几个命令后面的路径是放在 /usr/local/nrpe/libexec 的目录里面，也要把那几个文件复制过来，如果不复制过来的话必须要修改 nrpe.cfg 里面定义的命令的路径，免得在 services.cfg 里面定义 check_command 时提示找不到命令；现在把下面的文件复制过来：

cp /usr/local/nrpe/libexec/check_nrpe /usr/local/nagios/libexec

cp /usr/local/nagios/libexec/check_disk /usr/local/nrpe/libexec

cp /usr/local/nagios/libexec/check_load /usr/local/nrpe/libexec

cp /usr/local/nagios/libexec/check_ping /usr/local/nrpe/libexec

cp /usr/local/nagios/libexec/check_procs /usr/local/nrpe/libexec

cp /usr/local/nagios/libexec/check_users /usr/local/nrpe/libexec

# 修改 nrpe 配置文件 , 只把改过的地方写出来

vi /usr/local/nrpe/etc/nrpe.cfg

server_address=192.168.0.10 // 以单独的守护进程运行

allowed_hosts=127.0.0.1,192.168.0.10 // 设置允许 nagios 监控服务器可以访问

command[check_users]=/usr/local/nrpe/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nrpe/libexec/check_load -w 15,10,5 -c 30,25,20

#command[check_hda1]=/usr/local/nrpe/libexec/check_disk -w 20 -c 10 -p /dev/hda1 // 注释掉

command[check_df]=/usr/local/nrpe/libexec/check_disk -w 20 -c 10 // 添加这一行，监控整个磁盘利用率

command[check_zombie_procs]=/usr/local/nrpe/libexec/check_procs -w 5 -c 10 -s z

command[check_total_procs]=/usr/local/nrpe/libexec/check_procs -w 150 -c 200

command[check_ips]=/usr/local/nrpe/libexec/ip_conn.sh 8000 10000 // 监控 ip 连接数

说明：

● command[check_users]=/usr/local/nrpe/libexec/check_users –w 5 –c 10 在默认情况下 check_users 的插件是放在 /usr/local/nrpe/libexec/ 目录里面，而目录里面在默认情况下是没有这一个文件的，所以说要从 /usr/local/nagios/libexec/ 目录下拷贝一个过来；或者说的它后面的它改成 : command[check_users]=/usr/local/nagios/libexec/check_users –w 5 –c 10 这样的话就可以了，要不然在引用 check_users 的时候会提示没有那命令；
ps ：我这里为了方便，就是从 /usr/local/nagios/libexec 下把那几个文件拷贝过来；

● 在上面的 nrpe.cfg 配置文件里面，在中括号 “ [ ] “ 里面部分是命令名，也就是 check_nrep –c 后面可以接的内容，等号 = 后面的就是实际执行的插件程序的路径；从上往下分别是检测登录用户数， cpu 使用率，磁盘的容量，僵尸进程，总进程，连接数；

● 要是还要添加其它监控项目，不要忘记了在这里定义相应的命令；例：如果要监控主机的 swap 分区使用情况，当空闲空间小于 20% 时为警告状态，当空闲空间小于 10% 时为严重状态。需要在 nrpe.cnf 里面添加下面的命令： /usr/local/nagios/libexec/check_swap -w 20% -c 10% 如还有其它的，添加相就应的就可以了；关于命令用法可以能过 /usr/local/nagios/libexec/check_swap –h 这样的命令来查询；

● command[check_ips]=/usr/local/nrpe/libexec/ip_conn.sh 8000 10000 ip 连接数， ip_conn.sh 脚本需要自己写，下面给出脚本的内容：

#!/bin/sh

#if [ $#-ne 2 ]

#then

# echo "usage:$0 -w num1 -c num2"

#exit 3

#fi

ip_conns=`netstat -an |grep tcp |grep est |wc -l`

if [ $ip_conns -lt $1 ]

then

echo "ok -connectcounts is $ip_conns"

exit 0

if [ $ip_conns -gt $1 -a $ip_conns -lt $2 ]

then

echo "warning -connectcounts is $ip_conns"

exit 1

if [ $ip_conns -gt $2 ]

then

echo "critical -connectcounts is $ip_conns"

exit 2

我在 nrpe 配置文件 nrpe.cfg 把脚本所需的两个参数写上了，因此这个脚本就不需判断两个参数输入值的情况。只要当前 ip 连接数大于 8000 ，系统就发 warning 报警，超过 10000 ，则发“ critical ”报警信息。把这个脚本放在目录 /usr/local/nrpe/libexec 下，并给于执行权限；

注：脚本来自田逸的《开源监控利器 nagios 》

修改 /usr/local/nagios/etc/objects/commands.cfg, 在最后添加以下内容：

########################################################################

# 'check_nrpe ' command definition

define command{

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

添加 check_nrpe 的命令支持，要不是加的话，在 ”check_cmmmands check_nrpe!check_nrpe” 这样的情况下的时候，会提示没有 check_nrpe 这一个命令。

本文出自 “ky.blog ” 博客，转载请与作者联系！

上一篇 Nagios监控部署（二）——配置文件　　下一篇 Nagios监控部署（四）--被监控主机配置

类别：&.Linux ┆ 技术圈( 0) ┆ 推送到技术圈 ┆ 返回首页

相关文章

hzhuoquan

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Nagios监控部署（三）

9、定义监控的项目,也叫服务,创建services.cfgvi /usr/local/nagios/etc/services.cfg#监控主机是否存活define service{ host_name nagios-server
复制链接

扫一扫