一、Nagios服务端安装
1.1 环境介绍
OS:Centos6.4
nagios-server:172.16.27.57
nagios-client:172.16.27.43
1.2 安装基础套件
#yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel zlib* gd*
1.3创建nagios用户和用户组
# useradd -s /sbin/nologin nagios
# mkdir /usr/local/nagios
# chown -R nagios.nagios /usr/local/nagios
1.4安装nagios:
首先去我的共享云盘上下载所需要的安装包,地址如下:http://yun.baidu.com/share/link?shareid=2600302438&uk=1226090734
整个nagios目录所有的文件全部下载下来即可。
关于nagios的一些基础概念及知识介绍可以查看nagios.ppt。这里对概念性的东西
不多做介绍了。。。。。。
Nagios主程序安装:
[root@nagios-server opt]# tar -zxvf nagios-3.4.3.tar.gz
[root@nagios-server opt]# cd nagios
[root@nagios-server nagios]# ./configure --prefix=/usr/local/nagios
[root@nagios-server nagios]# make all
[root@nagios-server nagios]# make install
make install-init
- This installs the init script in /etc/rc.d/init.d
make install-commandmode
- This installs and configures permissions on the
directory for holding the external command file
make install-config
- This installs sample config files in /usr/local/nagios/etc
[root@nagios-server nagios]# make install-init
[root@nagios-server nagios]# make install-commandmode
[root@nagios-server nagios]# make install-config
#开机自启动
[root@nagios-server nagios]# chkconfig --add nagios
[root@nagios-server nagios]# chkconfig --level 35 nagios on
[root@nagios-server nagios]# chkconfig --list nagios
nagios 0:关闭 1:关闭 2:关闭 3:启用 4:启用 5:启用 6:关闭
Nagios插件安装:
[root@nagios-server nagios]# cd /opt/
[root@nagios-server opt]# tar -zxvf nagios-plugins-2.1.1.tar.gz
[root@nagios-server opt]# cd nagios-plugins-2.1.1
[root@nagios-server nagios-plugins-2.1.1]# ./configure --prefix=/usr/local/nagios
[root@nagios-server nagios-plugins-2.1.1]# make && make install
Apache安装:
[root@nagios-server nagios-plugins-2.1.1]# cd /opt/
[root@nagios-server opt]# tar -zxvf httpd-2.2.23.tar.gz
[root@nagios-server opt]# cd httpd-2.2.23
[root@nagios-server httpd-2.2.23]# ./configure --prefix=/usr/local/apache2 --enable-so --enable-rewrite
[root@nagios-server httpd-2.2.23]# make && make install
php安装:
[root@nagios-server nagios-plugins-2.1.1]# cd /opt/
[root@nagios-server opt]# yum install libxml2 libxml2-devel libpng-devel libtool libtool-ltdl-devel -y
[root@nagios-server opt]# tar -jxvf php-5.3.28.tar.bz2
[root@nagios-server opt]# cd php-5.3.28
[root@nagios-server php-5.3.28]# ./configure --prefix=/usr/local/php/ --with-apxs2=/usr/local/apache2/bin/apxs \
--with-gd --with-zlib --enable-sockets
[root@nagios-server php-5.3.28]# make
[root@nagios-server php-5.3.28]# make install
Apache配置:
[root@nagios-server php-5.3.28]# vim /usr/local/apache2/conf/httpd.conf
1.
<IfModule !mpm_netware_module>
<IfModule !mpm_winnt_module>修改为
LoadModule php5_module modules/libphp5.soLoadModule rewrite_module modules/mod_rewrite.so
<IfModule !mpm_netware_module>
<IfModule !mpm_winnt_modul
2.
User daemon
Group daemon
修改为User nagios Group nagios
3.
#ServerName www.example.com:80
修改为ServerName 127.0.0.1
4.
<IfModule dir_module>
DirectoryIndex index.html
</IfModule>修改为
<IfModule dir_module>
DirectoryIndex index.html index.php index.php3 default.php
</IfModule>
5.
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz修改为
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
AddType application/x-httpd-php .php .php3 .htm .phtml .php4
AddType application/x-httpd-php-source .phps
6.
#setting for nagios ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin" <Directory "/usr/local/nagios/sbin"> #nagios存放cgi程序目录 AuthType Basic Options ExecCGI AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" AuthUserFile /usr/local/nagios/etc/htpasswd #用于此目录访问身份验证的文件 Require valid-user </Directory> Alias /nagios "/usr/local/nagios/share" <Directory "/usr/local/nagios/share"> #nagios存放html目录 AuthType Basic Options None AllowOverride None Order allow,deny Allow from all AuthName "nagios Access" AuthUserFile /usr/local/nagios/etc/htpasswd Require valid-user </Directory> #注意:不要把注释内容也复制到配置文件里哦,否则会出错的!
创建apache目录验证文件:
[root@nagios-server php-5.3.28]# /usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
#这样就在/usr/local/nagios/etc 目录下创建了一个htpasswd 验证文件,
#当通过http://IP/nagios访问时就需要输入用户名和密码了。
[root@nagios-server share]# cat /usr/local/nagios/etc/htpasswd
nagiosadmin:$apr1$GHjt0nqB$VNnfnBpdxXoiHYh.PXuMS0
启动apache服务:
[root@nagios-server share]# /usr/local/apache2/bin/apachectl start
httpd: Syntax error on line 57 of /usr/local/apache2/conf/httpd.conf: module rewrite_module is built-in and can't be loaded
若出现:
httpd: Syntax error on line 55 of /usr/local/apache2/conf/httpd.conf:
module rewrite_module is built-in and can't be loaded
表示模块是内建的,不用再调入,注释掉
#LoadModule rewrite_module modules/mod_rewrite.so
##############################################
访问下nagios界面:
[root@nagios-server share]# service iptables stop #先要关闭防火墙,或者设置允许外界可以访问80端口。
首先会要求你输入账号密码:
然后就是nagios主页面了:
关于nagios服务端的安装部分到这里就差不多了,按照步骤来,应该不会有问题,接下来最难的部分还是如何配置。
我们下面继续研究。
二、Nagios服务端配置
[root@nagios-server etc]# cd /usr/local/nagios/etc/ [root@nagios-server etc]# ls cgi.cfg htpasswd nagios.cfg objects resource.cfg #接下来我们每个配置文件逐个学下 a. cgi.cfg:#此文件用来控制相关cgi脚本,如果想在nagios的web监控界面执行cgi脚本,
#例如重启nagios进程、关闭nagios通知、停止nagios主机检测等,这时就需要配置cgi.cfg文件了。
#由于nagios的web监控界面验证用户为nagiosadmin,
#所以只需在cgi.cfg文件中添加此用户的执行权限就可以了,需要修改的配置信息如下:#如果有其他用户也想执行,就直接添加如hiyun9。 default_user_name=nagiosadmin
authorized_for_system_information=nagiosadmin,hiyun9
authorized_for_configuration_information=nagiosadmin,hiyun9
authorized_for_system_commands=nagiosadmin,hiyun9
authorized_for_all_services=nagiosadmin,hiyun9
authorized_for_all_hosts=nagiosadmin,hiyun9
authorized_for_all_service_commands=nagiosadmin,hiyun9
authorized_for_all_host_commands=nagiosadmin,hiyun9b. resource.cfg
$USER1$=/usr/local/nagios/libexec
#这里面就一行内容,主要是设定$USER1$变量,它指定了nagios插件的路径,如果把插件安装到了其他
#地方,在这里修改即可。需要注意的是,变量必须先定义,然后才能在其他配置文件中进行引用。c. nagios.cfg
#nagios主配置文件,这个我基本都没怎么改,就是为了方便管理,
#定义了三个目录,用来区分linux机器、windows机器、还有网络设备。
log_file=/usr/local/nagios/var/nagios.logcfg_file=/usr/local/nagios/etc/objects/commands.cfg
....
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_dir=/usr/local/nagios/etc/objects/linux
cfg_dir=/usr/local/nagios/etc/objects/windows
cfg_dir=/usr/local/nagios/etc/objects/switch
#之前定义的3个目录需要在这里调用
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
....
....
###service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata
host_perfdata_process_empty_results=1
service_perfdata_process_empty_results=1
(nagios.cfg标红部分为pnp4nagios配置)
#有需要的话可以根据此配置文件中的注释在研究下。
#接下来再看下objects里面的配置文件信息如下:
[root@nagios-server objects]# ls
commands.cfg localhost.cfg switch.cfg timeperiods.cfg
contacts.cfg printer.cfg templates.cfg windows.cfg
d.
command.cfg此文件默认是存在的,无需修改即可使用,当然如果有新的命令需要加入时,在此文件进行添加即可。
# 'process-host-perfdata' command definition
define command{
command_name process-host-perfdata
#command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios//var/host-perfdata.out
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA
}
#红色部分是pnp4nagios的配置
# 'process-service-perfdata' command definition
define command{
command_name process-service-perfdata
#command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios//var/service-perfdata.out
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl
}
#此为nrpe插件程序部分
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
define command{
command_name check_nrpe2
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 60
}
e.contacts.cfg
define contact{
contact_name nagiosadmin ; Short name of user
use eneric-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
#host_notification_period 24x7
#service_notification_period 24x7
email jinyuchen@hengtiansoft.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
f.
timeperiods.cfg
#在中国留着第一个24x7就好了,其他自行参阅把。。。。。define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
# 'workhours' timeperiod definition
define timeperiod{
timeperiod_name workhours
alias Normal Work Hours
monday 09:00-17:00
tuesday 09:00-17:00
wednesday 09:00-17:00
thursday 09:00-17:00
friday 09:00-17:00
}
# 'none' timeperiod definition
define timeperiod{
timeperiod_name none
alias No Time Is A Good Time
}
define timeperiod{
name us-holidays
timeperiod_name us-holidays
alias U.S. Holidays
january 1 00:00-00:00 ; New Years
monday -1 may 00:00-00:00 ; Memorial Day (last Monday in May)
july 4 00:00-00:00 ; Independence Day
monday 1 september 00:00-00:00 ; Labor Day (first Monday in September)
thursday 4 november 00:00-00:00 ; Thanksgiving (4th Thursday in November)
december 25 00:00-00:00 ; Christmas
}
# This defines a modified "24x7" timeperiod that covers every day of the
# year, except for U.S. holidays (defined in the timeperiod above).
define timeperiod{
timeperiod_name 24x7_sans_holidays
alias 24x7 Sans Holidays
use us-holidays ; Get holiday exceptions from other timeperiod
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}g. templates.cfg顾名思义,nagios主要用于监控主机资源以及服务,在nagios配置中称为对象,为了不必重复定义一些监控对象,
nagios引入了一个模板配置文件,将一些共性的属性定义成模板,以便于多次引用。这就是templates.cfg的作用。define contact{
name generic-contact #联系人名称
service_notification_period 24x7 #当服务出现异常时,发送通知的时间段,这个时间段"24x7"在timeperiods.cfg文件中定义
host_notification_period 24x7 #当主机出现异常时,发送通知的时间段,这个时间段"24x7"在timeperiods.cfg文件中定义
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email #服务故障时,发送通知的方式,可以是邮件和短信,这里发送的方式是邮件;
host_notification_commands notify-host-by-email #主机故障时,发送通知的方式,可以是邮件和短信,这里发送的方式是邮件;
register 0w即warn,表示警告状态,
u即unknown,表示不明状态,
c即criticle,表示紧急状态,
d即down,表示宕机状态,
r即recovery,表示重新恢复状态
f即flapping,状态波动很大
n即none,不发送告警邮件
define host{
name generic-host #主机名称,这里的主机名,并不是直接对应到真正机器的主机名;
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1 #其值可以为0或1,其作用为是否启用Nagios的数据输出功能; #如果将此项赋值为1,那么Nagios就会将收集的数据写入某个文件中,以备提取。
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7 #指定“发送通知”的时间段,也就是可以在什么时候发送通知给使用者。
register 0define host{
name linux-server
use generic-host #引用generic-host所有的配置
check_period 24x7
check_interval 5 #nagios对主机的检查时间间隔,这里是5分钟。
retry_interval 1 #重试检查时间间隔,单位是分钟。
max_check_attempts 10 #nagios对主机的最大检查次数,也就是nagios在检查发现某主机异常时,并不马上判断为异常状况,#而是多试几次,因为有可能只是一时网络太拥挤,或是一些其他原因,让主机受到了一点影响;
check_command check-host-alive #指定检查主机状态的命令,其中“check-host-alive”在commands.cfg文件中定义。
notification_period workhours
notification_interval 30 #在主机出现异常后,故障一直没有解决,nagios再次对使用者发出通知的时间。单位是分钟;
notification_options d,u,r
contact_groups admins #指定联系人组,这个“admins”在contacts.cfg文件中定义。
register 0
}define host{
name windows-server
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,r
contact_groups admins
hostgroups windows-servers
register 0
}define host{
name generic-printer
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period workhours
notification_interval 30
notification_options d,r
contact_groups admins
register 0
}
define host{
name generic-switch
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,r
contact_groups admins
register 0
}
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 10
retry_check_interval 2
contact_groups admins
notification_options w,u,c,r
notification_interval 60
notification_period 24x7
register 0
}
define service{
name local-service
use generic-service
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
register 0
}##以下是pnp4nagios的配置
define host {
name host-pnp
register 0
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=_HOST_
#action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$
#process_perf_data 1
define service {
name srv-pnp
register 0
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=$SERVICEDESC$
#action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
#process_perf_data 1
}h. localhost.cfg# Define a host for the local machine
#以后新的机器需要监控,也类似,不赘述了。
define host{
use linux-server,host-pnp
host_name localhost
alias localhost
address 127.0.0.1
statusmap_image linux40.png
}
# Define an optional hostgroup for Linux machines
define hostgroup{
hostgroup_name linux-servers
alias Linux Servers
members localhost,linux_server #(有其他linux机器在这里加入组)
}
# Define a service to "ping" the local machine
define service{
use local-service
host_name localhost
service_description PING
check_command check_ping!100.0,20%!500.0,60% action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$ #调用pnp4nagios出图
}
# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use local-service
host_name localhost
service_description Root Partition
check_command check_local_disk!20%!10%!/ action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use local-service
host_name localhost
service_description Current Users
check_command check_local_users!20!50 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use local-service
host_name localhost
service_description Total Processes
check_command check_local_procs!250!400!RSZDT action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
# Define a service to check the load on the local machine.
define service{
use local-service
host_name localhost
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use local-service
host_name localhost
service_description Swap Usage
check_command check_local_swap!20!10 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.
define service{
use local-service
host_name localhost
service_description SSH
check_command check_ssh
notifications_enabled 0 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
define service{
use local-service
host_name localhost
service_description HTTP
check_command check_http
notifications_enabled 0 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
验证Nagios配置文件的正确性:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios提供的这个验证功能非常有用,在错误信息中通常会打印出错误的配置文件以及文件中的哪一行,这使得nagios的配置变得非常容易,报警信息通常是可以忽略的,因为一般那些只是建议性的。
# service nagios start
若出现以下错误:
[root@localhost etc]# service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios:This account is currently not available.
done.
将/etc/passwd下
nagios:x:501:501::/home/nagios:/sbin/nologin
修改为
nagios:x:501:501::/home/nagios:/bin/bash
汉化(喜欢英文的同学忽略):
[root@nagios-server opt]# tar -zxvf nagios-cn-3.2.3.tar.gz [root@nagios-server opt]# cd nagios-cn-3.2.3[root@nagios-server nagios-cn-3.2.3]# ./configure --prefix=/usr/local/nagios/
[root@nagios-server nagios-cn-3.2.3]# make all
[root@nagios-server nagios-cn-3.2.3]# make install安装后监控页面感觉被该的有点丑:
/usr/local/nagios/share/stylesheets
stylesheets是定义界面的CSS结构。可以将原来的替换成汉化后的。
[root@localhost share]# cd /usr/local/nagios/share/ssi
[root@train ssi]# mv common-footer.ssi common-footer.ssi.bak
可以将下面的“感谢使用nagios-cn工程,工程代码主要源自Nagios工程和Nagiosgraph项目。”去除
安装pnp4nagios查看图形:
安装rrdtool
[root@nagios-server nagios-cn-3.2.3]# cd /opt/[root@nagios-server opt]# yum install rrdtool [root@nagios-server opt]# tar -zxvf pnp4nagios-0.6.14.tar.gz [root@nagios-server opt]# cd pnp4nagios-0.6.14 [root@nagios-server pnp4nagios-0.6.14]# yum install -y perl-Time-HiRes [root@nagios-server pnp4nagios-0.6.14]# ./configure --prefix=/usr/local/pnp4nagios --with-nagios-user=nagios --with-nagios-group=nagios [root@nagios-server pnp4nagios-0.6.14]# make all [root@nagios-server pnp4nagios-0.6.14]# make fullinstall [root@nagios-server pnp4nagios-0.6.14]# cd /usr/local/pnp4nagios/etc/ [root@nagios-server etc]# mv misccommands.cfg-sample misccommands.cfg [root@nagios-server etc]# mv nagios.cfg-sample nagios.cfg [root@nagios-server etc]# mv rra.cfg-sample rra.cfg [root@nagios-server etc]# mv pages/web_traffic.cfg-sample pages/web_traffic.cfg [root@nagios-server etc]# cd check_commands/ [root@nagios-server check_commands]# mv check_all_local_disks.cfg-sample check_all_local_disks.cfg [root@nagios-server check_commands]# mv check_nrpe.cfg-sample check_nrpe.cfg [root@nagios-server check_commands]# mv check_nwstat.cfg-sample check_nwstat.cf
在/usr/local/apache2/conf/httpd.conf下最后面加入:
Alias /pnp4nagios "/usr/local/pnp4nagios/share"
<Directory "/usr/local/pnp4nagios/share">
AllowOverride None
Order allow,deny
Allow from all
#
# Use the same value as defined in nagios.conf
#
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
<IfModule mod_rewrite.c>
# Turn on URL rewriting
RewriteEngine On
Options FollowSymLinks
# Installation directory
RewriteBase /pnp4nagios/
# Protect application and system files from being viewed
RewriteRule ^(application|modules|system) - [F,L]
# Allow any files or directories that exist to be displayed directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Rewrite all other URLs to index.php/URL
RewriteRule .* index.php/$0 [PT,L]
</IfModule>
</Directory>复制鼠标悬停图标显示文件:
[root@localhost share]# cp /opt/pnp4nagios-0.6.14/contrib/ssi/* /usr/local/nagios/share/ssi/此时点击页面有1个问题(红色小太阳):
PHP magic_quotes_gpc | PHP magic_quotes_gpc is deprecated |
---|
[root@localhost php]# cp /opt/php-5.3.28/php.ini-production /usr/local/php/lib/php.ini
[root@nagios-server opt]# cd /usr/local/pnp4nagios/share/[root@localhost share]# mv install.php install.php.bak
#配置部分已在上文给出,##
##所有的配置都在百度云的nagios包里,名为etc.tar.gz若有需要可自行查看
访问下nagios监控界面(汉化后):
naigos监控主机的界面:
naigos监控服务的界面:
点红色太阳出图的界面:
##下面就不一 一 展示了。
安装check_nrpe插件(服务端):
先来看下nrpe的工作原理:
当Nagios需要监控某个远程Linux主机的服务或者资源情况时:
注意:NRPE daemon需要Nagios插件安装在远程的Linux主机上,否则,daemon不能做任何的监控。
[root@nagios-server opt]# tar -zxvf nrpe-2.14.tar.gz
[root@nagios-server opt]# cd nrpe-2.14
[root@nagios-server nrpe-2.14]# ./configure
[root@nagios-server nrpe-2.14]# make all
[root@nagios-server nrpe-2.14]# make install-plugin
#若要监控交换机需要添加snmp服务,需要重新编译下plugins
[root@nagios-server nrpe-2.14]# yum install net-snmp-devel net-snmp-utils
[root@nagios-server opt]# cd /opt/nagios-plugins-2.1.1
[root@nagios-server nagios-plugins-2.1.1]# ./configure --prefix=/usr/local/nagios --with-snmpget-command=/usr/bin/snmpwalk --with-snmpgetnext-command=/usr/bin/snmpwalk
[root@nagios-server nagios-plugins-2.1.1]# make
[root@nagios-server nagios-plugins-2.1.1]# find / -name check_snmp
/opt/nagios-plugins-2.1.1/plugins/check_snmp
[root@nagios-server nagios-plugins-2.1.1]# cp /opt/nagios-plugins-2.1.1/plugins/check_snmp /usr/local/nagios/libexec/
安装check_nrpe插件(客户端):
[root@elk ~]# useradd nagios [root@elk ~]# passwd nagios [root@nagios-server opt]# cd /opt/ [root@elk opt]# scp root@172.16.27.57:/opt/nagios-plugins-2.1.1.tar.gz ./ [root@elk opt]# tar -zxvf nagios-plugins-2.1.1.tar.gz [root@elk opt]# cd nagios-plugins-2.1.1 [root@elk nagios-plugins-2.1.1]# ./configure --prefix=/usr/local/nagios [root@elk nagios-plugins-2.1.1]# make && make install #这一步完成后会在/usr/local/nagios/下生成三个目录include、libexec和share。 [root@elk nagios-plugins-2.1.1]# chown nagios.nagios /usr/local/nagios/ [root@elk opt]# scp root@172.16.27.57:/opt/nrpe-2.14.tar.gz /opt/ [root@elk opt]# tar -zxvf nrpe-2.14.tar.gz [root@elk opt]# cd nrpe-2.14 [root@elk nrpe-2.14]# ./configure [root@elk nrpe-2.14]# make all [root@elk nrpe-2.14]# make install-plugin [root@elk nrpe-2.14]# make install-daemon [root@elk nrpe-2.14]# make install-daemon-config#现在再查看nagios 目录就会发现有5个目录了
[root@elk nrpe-2.14]# ls /usr/local/nagios/
bin etc include libexec share
[root@elk nrpe-2.14]# vim /usr/local/nagios/etc/nrpe.cfg
allowed_hosts=127.0.0.1,172.16.27.57#修改为服务端的IP
#启动nrpe:
[root@elk nrpe-2.14]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
[root@elk nrpe-2.14]# ps -ef |grep nrpe
nagios 29271 1 0 11:16 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
root 29273 7990 0 11:17 pts/2 00:00:00 grep nrpe
[root@elk nrpe-2.14]# netstat -lanp|grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 29271/nrpe
[root@elk nrpe-2.14]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.14#返回本说明成功
#然后在服务端也进行测试,记得关闭防火墙:
[root@nagios-server opt]# /usr/local/nagios/libexec/check_nrpe -H 172.16.27.43 NRPE v2.14#返回本说明成功
##然后我们需要去服务端上添加个配置文件,告诉nrpe去监控那个客户机。
[root@nagios-server objects]# cd /usr/local/nagios/etc/objects/linux/
[root@nagios-server linux]# cp ../localhost.cfg ./elk.cfg #然后我们复制一个文件进行修改,文件名自行考虑
[root@nagios-server linux]# vim elk.cfg
define host{
use linux-server,host-pnp ; This host definition will inherit all variables that are defined
host_name elk
alias elk
hostgroups linux-servers
address 172.16.27.43
}
#define hostgroup{
# hostgroup_name linux-servers
# alias Linux Servers
# members localhost,elk
# }
define service{
use local-service
host_name elk
service_description PING
check_command check_ping!100.0,20%!500.0,60%
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
define service{
use local-service
host_name elk
service_description Root Partition
check_command check_nrpe!check_disk
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
define service{
use local-service
host_name elk
service_description Current Users
check_command check_nrpe!check_users
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
define service{
use local-service
host_name elk
service_description Total Processes
check_command check_nrpe!check_total_procs
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
define service{
use local-service
host_name elk
service_description Current Load
check_command check_nrpe!check_load
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
define service{
use local-service ; Name of service template to use
host_name elk
service_description Swap Usage
check_command check_nrpe!check_swap
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
define service{
use local-service ; Name of service template to use
host_name elk
service_description SSH
check_command check_ssh
notifications_enabled 0
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
define service{
use local-service ; Name of service template to use
host_name elk
service_description HTTP
check_command check_http
notifications_enabled 0
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
#然后去客户端上修改nrpe.cfg文件
[root@elk etc]# vim /usr/local/nagios/etc/nrpe.cfg
#将最下面的类似command的内容注释了,然后换成下面的。
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
#localhost.cfg里面需要加上elk,或者可以按照自己的方式分组、、
#define hostgroup{
# hostgroup_name linux-servers
# alias Linux Servers
# members localhost,elk
# }
[root@nagios-server linux]# sed -i 's/localhost/elk/g' elk.cfg #将localhost都替换成elk
[root@nagios-server ~]# service nagios restart
然后我们看下web效果:
因为elk上没安装web服务,所有会显示警告,
到这里关于nagios的配置,也结束了。之后大家只要自己多尝试下,肯定也可以的。。。。
三、Nagios插件的开发
我这里抛砖引玉一下,举一个非常简单的例子,大家应该一看就可以明白,nagios插件支持多种语言。shell、python、、、这里么我使用的是shell。。
#上面我们知道了,客户端的监控数据是通过nrpe采集之后再上传给nagios服务器,那么我们写插件的时候 也同样在客户端编写,然后将数据传给nrpe。 #我们这里尝试写一个监控memcached服务的脚本。 [root@elk libexec]# cd /usr/local/nagios/libexec/ [root@elk libexec]# vim check_memcached #!/bin/bash STATE_OK=0 STATE_CRITICAL=2 W=`netstat -lanp | grep 11211 | wc -l` if [ $W -ge 1 ];then echo "OK,Memcached is working!" exit $STATE_OK; else echo "WARING,Memcached is not working!" exit $STATE_CRITICAL fi #脚本十分简单,就是查看memcached进程是否存活。若存活就返回0,否则返回2,这里不做过多介绍了 [root@elk libexec]# chmod 777 check_memcached [root@elk libexec]# ./check_memcached WARING,Memcached is not working! [root@elk libexec]# service memcached start 正在启动 memcached: [确定] [root@elk libexec]# ./check_memcached OK,Memcached is working! [root@elk libexec]# service memcached stop 停止 memcached: [确定] [root@elk libexec]# ./check_memcached WARING,Memcached is not working!#经过测试,脚本可以监控memcached的运行情况。
[root@elk libexec]# vim /usr/local/nagios/etc/nrpe.cfgcommand[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10% command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p / command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 command[check_memcached]=/usr/local/nagios/libexec/check_memcached -c 2 #标红色的这个是新添加的监控命令,其他不变。[root@elk libexec]# ps -ef |grep nrpe nagios 30360 1 0 14:03 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d root 32258 31396 0 15:51 pts/3 00:00:00 grep nrpe [root@elk libexec]# kill -9 30360 [root@elk libexec]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d #客户端的修改完成了,然后去服务端添加一下命令: [root@nagios-server linux]# vim /usr/local/nagios/etc/objects/linux/elk.cfg #在最后面添加这段即可 define service{ use local-service ; Name of service template to use host_name elk service_description Check Memcached check_command check_nrpe!check_memcached action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$ }
下面看下结果:
当memcached服务关闭的时候,
当memcached服务开启的时候,
关于自定义脚本的监控大家可以自己再研究下,不过思路就是这样了,其他的就要看你研发的功力了。。。
终于完工了,,,感觉自己搞下用不了多少时间,但是写文档竟然用了2天时间。
感谢网上那么多达人的文档,我只是参考了众多人的劳动成果,再进行了总结,若有相似之处,纯属正常。。。。。