1. 安装前的准备

    3台装有rhel6.2x64系统的机器,其中一台作为服务端(192.168.5.203),另两台为被监控端(192.168.5.204装有http服务并打开服务和192.168.5.206装有mysql服务并打开服务)

    注:192.168.5.204监控http服务,192.168.5.206监控mysql服务

    服务端要用的安装包:nagios-3.2.3.tar.gz

                        nagios-plugins-1.4.14.tar.gz

                        httpd-2.2.23.tar.bz2

                        php-5.4.10.tar.gz 

                        nrpe-2.12.tar.gz

下载地址:http://pan.baidu.com/s/1c0lHEH6


    两个客户端要使用的安装包:nagios-plugins-1.4.14.tar.gz

                              nrpe-2.12.tar.gz

                        

在服务端:

1)创建nagios用户和用户组

[root@Nagios-Server ~]# pwd

/root

[root@Nagios-Server ~]# useradd -s /sbin/nologin nagios

[root@Nagios-Server ~]# mkdir /usr/local/nagios

[root@Nagios-Server ~]# chown -R nagios.nagios /usr/local/nagios/

2)开始系统的sendmail服务

[root@Nagios-Server ~]# /etc/init.d/sendmail start

只需开启sendmail服务,无需配置

2.编译安装

[root@Nagios-Server ~]# tar zxvf nagios-3.2.3.tar.gz 

[root@Nagios-Server ~]# cd nagios-3.2.3

[root@Nagios-Server nagios-3.2.3]# ./configure --prefix=/usr/local/nagios

[root@Nagios-Server nagios-3.2.3]# make all

[root@Nagios-Server nagios-3.2.3]# make install

[root@Nagios-Server nagios-3.2.3]# make install-init

[root@Nagios-Server nagios-3.2.3]# make install-commandmode

[root@Nagios-Server nagios-3.2.3]# make install-config

[root@Nagios-Server nagios-3.2.3]# chkconfig --add nagios 

[root@Nagios-Server nagios-3.2.3]# chkconfig --level 35 nagios on

#echo "/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg">>/etc/rc.local

3.安装nagios插件

[root@Nagios-Server ~]# tar nagios-plugins-1.4.14.tar.gz 

[root@Nagios-Server ~]# cd nagios-plugins-1.4.14

[root@Nagios-Server nagios-plugins-1.4.14]# ./configure --prefix=/usr/local/nagios

[root@Nagios-Server nagios-plugins-1.4.14]# make

[root@Nagios-Server nagios-plugins-1.4.14]# make install

4.安装Apache和php

[root@Nagios-Server ~]# tar jxvf httpd-2.2.23.tar.bz2 

[root@Nagios-Server ~]# cd httpd-2.2.23

[root@Nagios-Server httpd-2.2.23]# ./configure --prefix=/usr/local/apache2

[root@Nagios-Server httpd-2.2.23]# make &&make install

[root@Nagios-Server ~]# tar zxvf php-5.4.10.tar.gz 

[root@Nagios-Server ~]# cd php-5.4.10

[root@Nagios-Server php-5.4.10]# ./configure --prefix=/usr/local/php \

> --with-gd --with-zlib --with-apxs2=/usr/local/apache2/bin/apxs

[root@Nagios-Server php-5.4.10]# make && make install

配置Apache

1)首先在/usr/local/apache2/conf/httpd.conf 中修改apache进程的启动用户为nagios

修改为:(大概在第67行) 

User nagios

Group nagios

2)然后找到 DirectoryIndex(大概在168行 )

<IfModule dir_module>

      DirectoryIndex index.html index.php

</IfModule>

3)增加如下内容(大概在311行增加)

 AddType application/x-httpd-php .php

4)授权访问nagios的web监控界面,需要增加验证配置,在http.conf文件的最后添加如下信息:

ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"

<Directory "/usr/local/nagios/sbin">

     AuthType Basic

     Options ExecCGI

     AllowOverride None

     Order allow,deny 

     Allow from all 

     AuthName "Nagios Access"

     AuthUserFile /usr/local/nagios/etc/htpasswd

 Require valid-user

</Directory>

Alias /nagios "/usr/local/nagios/share"

<Directory "/usr/local/nagios/share">

     AuthType Basic

     Options None

     AllowOverride None

     Order allow,deny 

     Allow from all 

     AuthName "nagios Access"

     AuthUserFile /usr/local/nagios/etc/htpasswd

     Require valid-user

</Directory>

5)创建Apache目录验证文件htpasswd (用户名和密码任意,本次使用ixdba)

[root@Nagios-Server ~]# /usr/local/apache2/bin/htpasswd \

> -c /usr/local/nagios/etc/htpasswd ixdba

New password: 

Re-type new password: 

Adding password for user nagios

6)启动apache服务

[root@Nagios-Server ~]# /usr/local/apache2/bin/apachectl start

#echo "/usr/local/apache2/bin/apachectl start" >>/etc/rc.local

3.在服务端(192.168.5.203)安装NRPE外部构件监控远程主机

[root@Nagios-Server ~]# tar zxvf nrpe-2.12.tar.gz 

[root@Nagios-Server ~]# cd nrpe-2.12

[root@Nagios-Server nrpe-2.12]# make all

[root@Nagios-Server nrpe-2.12]# make install-plugin

#echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d">>/etc/rc.local

4.在两台被监控端安装nagios客户端和NRPE

1)在被监控机上(192.168.5.204)安装nagios-plugins

[root@localhost ~]# useradd -s /sbin/nologin nagios

[root@localhost ~]# tar zxvf nagios-plugins-1.4.14.tar.gz 

root@localhost ~]# cd nagios-plugins-1.4.14

[root@localhost nagios-plugins-1.4.14]# ./configure

[root@localhost nagios-plugins-1.4.14]# make

[root@localhost nagios-plugins-1.4.14]# make install

[root@localhost nagios-plugins-1.4.14]# chown nagios.nagios /usr/local/nagios/

[root@localhost nagios-plugins-1.4.14]# chown -R nagios.nagios /usr/local/nagios/libexec/

2)在被监控机上(192.168.5.204)安装nrpe

[root@localhost ~]# tar zxvf nrpe-2.12.tar.gz

[root@localhost ~]# cd nrpe-2.12

[root@localhost nrpe-2.12]# ./configure 

[root@localhost nrpe-2.12]# make all

[root@localhost nrpe-2.12]# make install-plugin

[root@localhost nrpe-2.12]# make install-daemon

[root@localhost nrpe-2.12]# make install-daemon-config

#echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d">>/etc/rc.local

注:在192.168.5.206 重复1)2)

3)在被监控机上(192.168.5.204)修改 /usr/local/nagios/etc/nrpe.cfg 中(79行)修改为

allowed_hosts=127.0.0.1,192.168.5.203    (中间有个逗号,不要有空格)

并启动nrpe进程,如下表示启动成功,默认端口号5666

[root@Nagios-Linux ]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

[root@localhost nrpe-2.12]# ps -ef | grep nrpe

nagios   21885     1  0 Sep09 ?        00:00:08 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

[root@Nagios-Server nrpe-2.12]# netstat -tunl | grep 5666

tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN  

4)在服务端(192.168.5.203)上测试与客户端能否正常通信,执行命令如下,出现版本号表明,服务端可以与客户端正常通信。

[root@Nagios-Server nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.204

NRPE v2.12

[root@Nagios-Server nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.206

NRPE v2.12

5)在服务端(192.168.5.203)定义一个check_nrpe监控命令

[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/commands.cfg

define  command{

        command_name    check_nrpe

        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

6)在被监控机(192.168.5.204)上定义新增加监控服务器内容

用/usr/local/nagios/libexec/check_tcp 这个命令脚本, -p 80 端口,10是端口超时时间秒(204行)

[root@localhost ~]# vim /usr/local/nagios/etc/nrpe.cfg

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 

command[check_tcp80]=/usr/local/nagios/libexec/check_tcp -p 80 10

注:每次修改nrpe.cfg后,都要重启nrpe进程才能生效:杀死进程,再启动进程

[root@Nagios-Linux nagios]# ps -ef | grep nrpe

nagios    6508     1  0 09:32 ?        00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

[root@Nagios-Linux nagios]# kill 6508

[root@Nagios-Linux nagios]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

在被监控机(192.168.5.206)上定义check_tcp3306是命令名称,使用/usr/local/nagios/libexec/check_tcp 这个命令脚本,-p 3306端口,10 是端口超时时间秒(204行)

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 

command[check_tcp3306]=/usr/local/nagios/libexec/check_tcp -p 3306 5


7)在服务端(192.168.5.203)进行命令测试是否能够检测到,出现TCP OK表明正确

[root@Nagios-Server etc]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.204 -c check_tcp80

TCP OK - 0.000 second response time on port 80|time=0.000421s;;;0.000000;10.000000

[root@Nagios-Server etc]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.206 -c check_tcp3306

TCP OK - 0.000 second response time on port 3306|time=0.000431s;;;0.000000;10.000000



4.在服务端(192.168.5.203)添加被监控主机和监控服务

1)templates.cfg (默认定义,无需编辑)

位置 /usr/local/nagios/etc/objects/templates.cfg

2)resource.cfg(只有一行,大概是第26行,默认是下面这一行)

#vim /usr/local/nagios/etc/resource.cfg

$USER1$=/usr/local/nagios/libexec     

3)commands.cfg(已在上面定义了check_nrpe的命令,无需再编辑)

4)host.cfg(默认没有,需要手动创建,此文件定义监控主机的名字和IP,注意不要忘记上下大括号)

[root@Nagios-Server objects]# pwd

/usr/local/nagios/etc/objects

[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/hosts.cfg

define host{

        use             linux-server    ;默认写linux-server, 在templates.cfg中默认定义

        host_name       web             ;这个主机名可以任意命名

        alias           ixdba-web       ;别名任意命名

        address         192.168.5.204   ;被监控机地址

}

define host{

        use             linux-server

        host_name       mysql

        alias           ixdba-mysql

        address         192.168.5.206

}

define  hostgroup{                       ;定义主机组

        hostgroup_name  sa-server       ;主机组名称任意命名

        alias           sa server       ;主机别名

        members         web,mysql       ;上面定义的两个主机

}


5)services.cfg(默认没有,需手动创建,此文件用来定义被监控主机的服务)

[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/services.cfg

define service{

        use             local-service    ;使用默认local-service,已在templates.cfg中默认定义

        host_name       web              ;web主机,即192.168.5.204,已在hosts.cfg中定义

        service_description     PING    ;监控内容描述,名称意思接近服务即可,任意

        check_command           check_ping!100.0,20%!500.0,60%

}      ;使用服务端的chek_ping 此命令组合从左到右一次为命令!告警时延,丢包率!严重告警时延,丢包率

        

define service{

        use             local-service   ;使用默认local-service,已在templates.cfg中默认定义

        host_name               web     ;web主机,即192.168.5.204,已在hosts.cfg中定义

        service_description     web80  ;监控内容描述,名称意思接近服务即可,任意

        check_command           check_nrpe!check_tcp80 ;命令已在被监控机nrpe.cfg中定义

}

define service{

        use             local-service

        host_name       mysql

        service_description     PING

        check_command           check_ping!100.0,20%!500.0,60%

}

define  service{

        use             local-service

        host_name       mysql

        service_description     mysql3306

        check_command   check_nrpe!check_tcp3306

}

define  servicegroup{                   ;定义服务组,不是重点

        servicegroup_name       servergroup

        alias                   server-group

        members         web,PING,web,web80,mysql,PING,mysql,mysql3306

}

~   

6)contacts.cfg(定义联系人和联系人组)

[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/contacts.cfg

define contact{

        contact_name                    nagiosadmin             ; 联系人名称,使用默认即可       use                             generic-contact         ; 使用generic-contact的属性信息,已在templates.cfg中定义

        alias                           Nagios Admin            ; Full name of user

        email                           15901392876@139.com     ; 邮箱(建议移动,设置短信提醒)       

}

define contactgroup{

        contactgroup_name       admins     ;联系人组名称 ;使用默认

        alias                   Nagios Administrators

        members                 nagiosadmin

        }

            

7)timeperiods.cfg(定义监控时间段,已默认定义,无需改动)

[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/timeperiods.cfg

define timeperiod{

        timeperiod_name 24x7

        alias           24 Hours A Day, 7 Days A Week

        sunday          00:00-24:00

        monday          00:00-24:00

        tuesday         00:00-24:00

        wednesday       00:00-24:00

        thursday        00:00-24:00

        friday          00:00-24:00

        saturday        00:00-24:00

        }

8)cgi.cfg(此文件用来控制相关CGI脚本,只需在此文件添加用户的执行权限)

[root@Nagios-Server ~]# vim /usr/local/nagios/etc/cgi.cfg

default_user_name=ixdba

authorized_for_system_information=ixdba

authorized_for_configuration_information=ixdba

authorized_for_system_commands=ixdba

authorized_for_all_services=ixdba

authorized_for_all_hosts=ixdba

authorized_for_all_service_commands=ixdba

authorized_for_all_host_commands=ixdba

9)nagios.cfg(nagios的核心配置文件)

[root@Nagios-Server ~]# vim /usr/local/nagios/etc/nagios.cfg

cfg_file=/usr/local/nagios/etc/objects/commands.cfg

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

cfg_file=/usr/local/nagios/etc/objects/templates.cfg

cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

cfg_file=/usr/local/nagios/etc/objects/hosts.cfg(添加)

cfg_file=/usr/local/nagios/etc/objects/services.cfg (添加)


use_authentication=1   #0改成1,大概78行


5.验证nagios配置文件的正确性

[root@Nagios-Server ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

根据提示在错误在哪个文件的第几行有错误,而适当修改,(配置正确提示 警告0,错误0)

Checking for circular paths between hosts...

Checking for circular host and service dependencies...

Checking global event handlers...

Checking obsessive compulsive processor commands...

Checking misc settings...


Total Warnings: 0

Total Errors:   0

[root@Nagios-Server ~]# service nagios start

6.登录监控界面 http://192.168.5.203/nagios  输入用户名ixdba和密码 

wKioL1QSar2jdDcDAAOeAg9VScY330.jpg

点击 Services会看到监控服务,其中 localhost是默认监控本地的服务,会看到mysql(192.168.5.206)和web(192.168.5.204)的监控服务。

7.模拟web的http程序异常,等待出现报警

#service httpd stop

wKiom1QScJfDffH_AAHqBHL6iHk594.jpg

并有报警邮件和短信提醒

wKioL1QScPOgNRuDAAFFCUUTsrc369.jpg

8.模拟恢复web

#service httpd start

wKioL1QSccGxGL39AAI4XRUx96U025.jpg

并有恢复邮件通知和短信提醒

wKiom1QScdrARrwxAAGfH5zSuEw902.jpg

注:虽然已经实现了服务的监控、报警、和报警邮件、短信。但是发现从web故障发(11:29)生到报警时间(12:01),30分钟时间。这时间是不能忍的

所以还要对nagios做一些检查的优化。

在templates.cfg文件中修改

[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/templates.cfg

wKioL1QSdPXC-1_ZAAEqcQ2JtYs724.jpg

72 check_interval 是对主机的检查时间间隔,改成1(单位分钟)

73 retry_interval 是重试检查时间间隔,改成1(单位分钟)

74 max_check_attempts 是对主机的最大检查次数,改成1次

76 notification_period 故障时发送通知的时间范围,改成24x7

wKiom1QSdf_hXAjlAAIrdKnn2rk120.jpg

169 max_check_attempts 对服务的最大检查次数,改成 2 (分钟) 

170 normal_check_interval 对服务检查时间间隔,改成 1    (分钟)

171 retry_check_interval  重试检查时间间隔  改成1 (分钟)

wKioL1QSdqmSsJ7YAADI1VCWXQY132.jpg

185 max_check_attempts 对服务的最大检查次数 改成 2(分钟)

186 normal_check_interval 对服务检查时间间隔改成1(分钟)

187 retry_check_interval  重试检查时间间隔  改成1 (分钟)

9.再模拟一次故障,报警时间就快很多。