Nagios监控

最新推荐文章于 2021-11-19 22:27:23 发布

低调沉稳

最新推荐文章于 2021-11-19 22:27:23 发布

阅读量348

点赞数

分类专栏：项目

项目专栏收录该内容

5 篇文章 1 订阅

订阅专栏

  1--安装epel源 

  rpm -ivh 
  http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm 

 
 2--安装nagios需要的工具 

  yum install -y httpd nagios nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe 

  3--更改用户名和密码 

  htpasswd -c /etc/nagios/passwd nagiosadmin 密码：123456 

  4--更改nagios配置文件 

  vim /etc/nagios/nagios.cfg 暂时无需更改 

  nagios -v /etc/nagios/nagios.cf 检查文件是否有错误 

  5--启动服务 

  service httpd restart;service nagios start 

  启动httpd报错“httpd：(98)Address already in use: make_sock: could not bind to address [::]:80” 

  此报错是80端口被占用； 

  “httpd：httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.3.132 for ServerName” 此报错一是将vim /etc/httpd/conf/httpd.conf中 

  将里面的 #ServerName localhost:80 注释去掉即可。 

  6--网页访问 

 
 http://192.168.1.132/nagios/ 

 
 7--安装客户端epel源 

  rpm -ivh 
  http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm 

  8--安装客户端相应软件 

  yum install -y nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe 

  9--更改客户端配置文件 

  vim /etc/nagios/nrpe.cfg 找到“allowed_hosts=127.0.0.1”改为“allowed_hosts=127.0.0.1,ip”后面的ip为服务端ip; 找到“dont_blame_nrpe=0”改为“dont_blame_nrpe=1” 

  10--启动客户端 

  /etc/init.d/nrpe start 

  11--监控中心（192.168.1.132）添加被监控主机（192.168.1.104） 

  cd /etc/nagios/conf.d/ 服务端 

  vim 192.168.3.104.cfg 创建一个新的文件 

  参考http://ask.apelearn.com/question/7155 

  define host{ 

  use linux-server ; Name of host template to use 

  ; This host definition will inherit all variables that are defined 

  ; in (or inherited by) the linux-server host template definition. 

  host_name 192.168.3.104 

  alias 3.104 

  address 192.168.3.104 

}

  define service{ 

  use generic-service 

  host_name 192.168.3.104 

  service_description check_ping 

  check_command check_ping!100.0,20%!200.0,50% 

  max_check_attempts 5 

  normal_check_interval 1 

}

  define service{ 

  use generic-service 

  host_name 192.168.3.104 

  service_description check_ssh 

  check_command check_ssh 

  max_check_attempts 5 

  normal_check_interval 1 

}

  define service{ 

  use generic-service 

  host_name 192.168.3.104 

  service_description check_http 

  check_command check_http 

  max_check_attempts 5 

  normal_check_interval 1 

}

  12--配置文件的简单说明 

  我们定义的配置文件中一共监控了三个service: ssh,ping,http这三个项目是使用本地的nagios工具去连接远程服务器，也就是说即使客户端没有安装nagios-plugins以及nrpe也是可以监控到的，其他的一些service诸如负载，磁盘使用等是需要服务端通过nrpe去连接到远程主机获取信息，所以需要远程主机安装nrpe服务以及相应的脚本（nagios-plugins） 

  max_check_attempts 5 #当nagios检测到问题时，一共尝试检测5次都有问题才会告警，如果该数值为1，那么检测到问题就立即告警 

  normal_check_interval 1 #重新检测的时间间隔，单位为分钟，默认是3分钟 

  notification_interval 60 #在服务出现异常后。故障一直没有解决，nagios再次对使用者发出通知的时间。单位是分钟，如果你认为，所有的事情只需要一次通知就够了，可以把这里的选项设置为0. 

  13--重启之前检查一下 

  服务端： nagios -v /etc/nagios/nagios.cfg 

  service nagios restart 

  14--继续添加服务 

  服务端 vim /etc/nagios/objects/commands.cfg 

  增加：define command{ 

  command_name check_nrpe 

  command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ 

}

  继续编辑 vim /etc/nagios/conf.d/192.168.1.104.cfg 

  define service{ 

  use generic-service 

  host_name 192.168.3.104 

  service_description check_load 

  check_command check_nrpe!check_load 

  max_check_attempts 5 

  normal_check_interval 1 

}

  define service{ 

  use generic-service 

  host_name 192.168.3.104 

  service_description check_disk_sda1 

  check_command check_nrpe!check_hda1 

  max_check_attempts 5 

  normal_check_interval 1 

}

  define service{ 

  use generic-service 

  host_name 192.168.3.104 

  service_description check_disk_sda3 

  check_command check_nrpe!check_hda3 

  max_check_attempts 5 

  normal_check_interval 1 

}

  15--客户端 vim /etc/nagios/nrpe.cfg 

  编辑：/etc/nagios/nrpe.cfg 

  command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10 

  command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 

  command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1编辑 

  command[check_hda3]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda3编辑 

  command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z 

  command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 

  16--启动服务 

  客户端/etc/init.d/nrpe restart 启动nrpe 

  服务端 nagios -v /etc/nagios/nagios.cfg /etc/init.d/nagios restart 

  17--日志 

  服务端：cd /var/log/nagios/ 

  18--服务端配置邮件告警文件 

  vim /etc/nagios/objects/contacts.cfg 

  define contactgroup{ 

  contactgroup_name admins 

  alias Nagios Administrators 

  members nagiosadmin 

}

  define contact{ 

  contact_name 123 

  use generic-contact 

  alias aming 

  email 2947557317@qq.com 

}

  define contact{ 

  contact_name 456 

  use generic-contact 

  alias aaa 

  email wangerzheng@126.com 

}

  define contactgroup{ 

  contactgroup_name common 

  alias common 

  members 123,456 

}

  19--配置cd /etc/nagios/conf.d/192.68.1.104.cfg 

  define service{ 

  use generic-service 

  host_name 192.168.1.104 

  service_description check_load 

  check_command check_nrpe!check_load 

  max_check_attempts 5 

  normal_check_interval 1 

  contact_groups common #添加的邮箱配置关联 

  notification_period 24x7 

  notification_options w,r 

}

  20--几个重要参数说明 

  notifications_enabled : 是否开启提醒功能。1为开启，0为禁用。一般，这个选项会在主配置文件（nagios.cfg）中定义，效果相同。 

  notification_interval: 之前刚介绍过，表示重复发送提醒信息的最短间隔时间。默认间隔时间是60分钟。如果这个值设置为0，将不会发送重复提醒。 

  notification_period: 发送提醒的时间段。非常重要的主机（服务）我定义为7×24，一般的主机（服务）就定义为上班时间。如果不在定义的时间段内，无论什么问题发生，都不会发送提醒。 

  notification_options: 这个参数定义了发送提醒包括的情况：d = 状态为DOWN, u = 状态为UNREACHABLE , r = 状态恢复为OK , f = flapping。，n=不发送提醒。 

  21--nagios，zabbix，mysql备份监控脚本供参考网页 

 
 http://ask.apelearn.com/question/8128 

 
 22--naigos监控脚本的存放路径 

  ls /usr/lib/nagios/plugins/ 

  23--进入脚本存放路径编辑check_disk.sh 

  vim /usr/lib/nagios/plugins/check_disk.sh 

  #!/bin/bash 

  row=`df -h |wc -l` 

  for i in `seq 2 $row` 

do

  ava=`df -h |sed -n "$i"p|awk '{print $4}'` 

  u_per=`df -h |sed -n "$i"p|sed -n "s/\%//"p|awk '{print $5}'` 

  p_p=`df -h -P|sed -n "$i"p|awk '{print $6}'` 

  if [ "$u_per" -gt "97" ];then 

  echo -n "$p_p CRITICAL $u_per% $ava " 

  sta[$i]=2 

  elif [ "$u_per" -gt "95" ];then 

  echo -n "$p_p WARNING! $u_per% $ava " 

  sta[$i]=1 

  else 

  echo -n "$p_p OK $u_per% $ava " 

  sta[$i]=0 

fi

  done 

n=0

  for j in `seq 2 $row` 

do

  if [ "${sta[$j]}" -gt $n ];then 

  n=${sta[$j]} 

fi

  done 

  exit $n 

  执行sh check_disk.sh 后 echo $?一下 

  24--客户端保存后修改脚本的权限 

  chmod +x /usr/lib/nagios/plugins/check_disk.sh 

  25--客户端编辑vim /etc/nagios/nrpe.cfg 

  vim /etc/nagios/nrpe.cfg 添加command 

  command[check_disk]=/usr/lib/nagios/plugins/check_disk.sh 

  26--添加完后，重启客户端服务/etc/init.d/nrpe restart 

  27--在服务端检测是否正常 

  /usr/lib/nagios/plugins/check_nrpe -H 192.168.1.104 -c check_disk 

  输出结果为 sh check_disk.sh的结果就对了； 

  28--服务端在server上添加相应的service 

  cd /etc/nagios/conf.d/192.168.1.104.cfg 

  define service{ 

  use generic-service 

  host_name 192.168.1.104 

  service_description check_disk 

  check_command check_nrpe!check_disk 

  max_check_attempts 5 

  normal_check_interval 1 

}

  28--重启服务service nagios restart 

低调沉稳

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录