Linux监控利器nagios,是一款开源的系统与网络监视工具,有效的监控设备windows,linux,router,switch等的运行情况,在被监控设备异常时及时通过邮件报警第一时间通知运维人员.并在状态恢复后发出正常的邮件或短信通知.本案例将重点放在安装完成后的配置过程,结合nrpe,rrdtool,pnp4nagios等实现对linux被监控端的资源监控和图形化检查.
一、需求:在nagios主机上监控本机资源和一台linux客户端,主要监控客户端的cpu,memory,分区使用等,并可以通过图形化界面观察资源使用情况。
环境:
一台Linux Linux_Server(主机名),IP192.168.189.130
一台Linux Linux_Client(主机名),IP192.168.189.128
二、实现过程
1.安装配置nagios过程:
1.1 关闭iptables和selinux
[root@Linux_Server~]# service iptables status
iptables:未运行防火墙[root@Linux_Server~]# setenforce 0
1.2 安装准备软件包
[root@Linux_Server ~]# yum -y install httpd gcc glibc glibc-common gd gd-devel xinetd php php-mysql mysql-devel mysql-server gnutls
1.3 创建nagios用户和用户组
[root@Linux_Server~]# useradd -s /sbin/nologin nagios
[root@Linux_Server~]# mkdir /usr/local/nagios
[root@Linux_Server~]# chown -R nagios:nagios /usr/local/nagios/
1.4 上传安装所需软件
[root@Linux_Server~]# mkdir /opt/soft
[root@Linux_Server~]# cd /opt/soft/
[root@Linux_Serversoft]# ls
cgilib-0.5.tar.gz nagios-3.5.0.tar.gz nagios-plugins-1.4.16.tar.gznrpe-2.14.tar.gz pnp4nagios-0.6.14.tar.gz rrdtool-1.4.4.tar.gz
1.5 安装nagios
[root@Linux_Serversoft]# tar zxvf nagios-3.5.0.tar.gz
[root@Linux_Serversoft]# cd nagios
[root@Linux_Servernagios]# ./configure --prefix=/usr/local/nagios
[root@Linux_Servernagios]# make all
[root@Linux_Servernagios]# make install
[root@Linux_Servernagios]# make install-init
[root@Linux_Servernagios]# make install-commandmode
[root@Linux_Servernagios]# make install-config
[root@Linux_Servernagios]# make install-webconf
[root@Linux_Servernagios]# chkconfig --add nagios
[root@Linux_Servernagios]# chkconfig nagios on
[root@Linux_Servernagios]# chkconfig httpd on
[root@Linux_Servernagios]# chkconfig --list nagios
nagios 0:关闭 1:关闭 2:启用 3:启用 4:启用 5:启用 6:关闭
[root@Linux_Servernagios]# ls /usr/local/nagios/
bin etc libexec sbin share var
1.6 安装nagios插件
[root@Linux_Servernagios]# cd /opt/soft/
[root@Linux_Serversoft]# tar zxvf nagios-plugins-1.4.16.tar.gz
[root@Linux_Serversoft]# cd nagios-plugins-1.4.16
[root@Linux_Servernagios-plugins-1.4.16]# ./configure --prefix=/usr/local/nagios/
[root@Linux_Servernagios-plugins-1.4.16]# make
[root@Linux_Servernagios-plugins-1.4.16]# make install
[root@Linux_Servernagios-plugins-1.4.16]# ls /usr/local/nagios/libexec/
check_apt check_disk check_http check_jabber
1.7 配置apache和php[root@Linux_Serverhtml]# vi /var/www/html/index.html
输入This is Apache
[root@Linux_Servernagios-plugins-1.4.16]# vi /etc/httpd/conf/httpd.conf
将
User apache
Group apache
修改为
User nagios
Group nagios
将
#ServerName www.example.com:80
修改为
ServerName 192.168.189.130:80
将
DirectoryIndex index.html index.html.var
修改为
DirectoryIndex index.html index.html.var index.php
AddType application/x-httpd-php .php
在文件末尾加入如下内容
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
# SSLRequireSSL
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "NagiosAccess"
AuthType Basic
AuthUserFile/usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
Alias /nagios "/usr/local/nagios/share"
<Directory "/usr/local/nagios/share">
# SSLRequireSSL
Options None
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "NagiosAccess"
AuthType Basic
AuthUserFile/usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
[root@Linux_Serverconf.d]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
[root@Linux_Serverconf.d]# mv /etc/httpd/conf.d/nagios.conf /etc/httpd/conf.d/nagios.con_
[root@Linux_Serverconf.d]# service httpd start
[root@Linux_Serveretc]# service nagios restart
1.9 访问http://192.168.189.130/nagios测试
Service一栏
点击service一栏,可以看到nagios已经监控本机的部分资源,但是资源有限,现在修改配置文件,让nagios监控所有分区的使用情况
1.10 修改配置文件,监控本机所有分区的使用情况[root@Linux_Serverlibexec]# df -h
文件系统 容量 已用 可用 已用%% 挂载点
/dev/sda2 13G 3.1G 8.8G 27% /
/dev/sda1 485M 31M 429M 7% /boot
/dev/sda3 4.9G 138M 4.5G 3% /home
[root@Linux_Serverlibexec]# cd /usr/local/nagios/etc/objects/
[root@Linux_Serverobjects]# vi localhost.cfg
在
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
后面添加如下内容
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Boot Partition
check_command check_local_disk!20%!10%!/boot
}
defineservice{
use local-service ; Name of service template to use
host_name localhost
service_description Home Partition
check_command check_local_disk!20%!10%!/home
}
[root@Linux_Server objects]# servicenagios restart
刷新nagios页面,稍等片刻可以看到,证明nagios已经监控了所有分区的的使用情况
默认情况下nagios以及nrpe均不会提供对内存的检测机制,如果需要检测内存的话需要自行自行提供shell脚本/php/perl;另外使用nagios-plugins插件编译后的check_procs脚本无法通过pnp4nagios出图,需要将编译前的check_procs.c文件进行代码修改重新编译后将新生成的check_procs替换老的文件.
内存检测脚本下载: http://down.51cto.com/data/264874
Check_procs脚本无法出图问题: http://www.suiyiwen.com/question/4173
将下载下来的check_memory.pl上传至/usr/local/nagios/libexec/
[root@Linux_Server libexec]# chmod +xcheck_memory.pl
[root@Linux_Serverobjects]# vi /usr/local/nagios/etc/objects/commands.cfg
在
# 'check_nt' command definition
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}
后面添加如下内容
# 'check_mem' command definition
define command{
command_name check_mem
command_line $USER1$/check_memory.pl -f -w 20 -c 10
}
[root@Linux_Serverobjects]# vi localhost.cfg
在最后加入如下内容
define service{
use local-service ; Name of service template to use
host_name localhost
service_description check_mem
check_command check_mem
notifications_enabled 0
}
[root@Linux_Serverobjects]# service nagios restart
1.12 刷新nagios页面,查看服务列表
2、安装配置nrpe过程
安装nagios后只能对目标主机做对外性的监控,如对外的tcp,http,ftp等,如果需要检测内存,磁盘,cpu等,就需要借助npre软件.
NRPE 总共由两部分组成:
· check_nrpe 插件,位于监控主机上
· NRPEdaemon,运行在远程的Linux主机上(通常就是被监控机)
按照上图,整个的监控过程如下:
当Nagios 需要监控某个远程Linux 主机的服务或者资源情况时:
1. Nagios 会运行check_nrpe 这个插件,告诉它要检查什么;
2. check_nrpe 插件会连接到远程的NRPE daemon,所用的方式是SSL;
3. NRPE daemon 会运行相应的Nagios 插件来执行检查;
4. NRPE daemon 将检查的结果返回给check_nrpe 插件,插件将其递交给nagios做处理。
注意:NRPE daemon 需要Nagios 插件安装在远程的Linux主机上,否则,daemon不能做任何的监控
2.1 客户端配置nrpe
2.1.1 安装nrpe
[root@Linux_Client~]# service xinetd status
xinetd: 未被识别的服务
[root@Linux_Client~]# yum -y install xinetd
[root@Linux_Client~]# service xinetd start
[root@Linux_Client/]# mkdir /opt/soft/
[root@Linux_Client/]# cd /opt/soft/
[root@Linux_Clientsoft]# useradd -s /sbin/nologin nagios
[root@Linux_Clientsoft]# ls
nagios-plugins-1.4.16.tar.gz nrpe-2.14.tar.gz
[root@Linux_Clientsoft]# tar zxvf nagios-plugins-1.4.16.tar.gz
[root@Linux_Clientsoft]# cd nagios-plugins-1.4.16
[root@Linux_Clientnagios-plugins-1.4.16]# yum -y install openssl-devel
[root@Linux_Clientnagios-plugins-1.4.16]# ./configure --prefix=/usr/local/nagios
[root@Linux_Clientnagios-plugins-1.4.16]# make
[root@Linux_Clientnagios-plugins-1.4.16]# make install
[root@Linux_Clientnagios-plugins-1.4.16]# ls /usr/local/nagios/
include libexec share
[root@Linux_Clientnagios-plugins-1.4.16]# chown -R nagios:nagios /usr/local/nagios/
[root@Linux_Clientnagios-plugins-1.4.16]# cd /opt/soft/
[root@Linux_Clientsoft]# tar zxvf nrpe-2.14.tar.gz
[root@Linux_Clientsoft]# cd nrpe-2.14
[root@Linux_Clientnrpe-2.14]# ./configure
[root@Linux_Clientnrpe-2.14]# make all
[root@Linux_Clientnrpe-2.14]# make install-plugin
[root@Linux_Clientnrpe-2.14]# make install-daemon
[root@Linux_Clientnrpe-2.14]# make install-daemon-config
[root@Linux_Clientnrpe-2.14]# ls /usr/local/nagios/
bin etc include libexec share
[root@Linux_Clientnrpe-2.14]# make install-xinetd
[root@Linux_Clientnrpe-2.14]# vi /etc/xinetd.d/nrpe
将 only_from = 127.0.0.1
修改为 only_from = 127.0.0.1192.168.189.130
[root@Linux_Clientnrpe-2.14]# vi /etc/services
在最后加入如下内容
nrpe 5666/tcp # nrp
[root@Linux_Clientnrpe-2.14]# service xinetd restart
[root@Linux_Clientnrpe-2.14]# netstat -an | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
[root@Linux_Clientnrpe-2.14]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.14
2.1.2 客户端配置nrpe
[root@Linux_Client~]# df -h
文件系统 容量 已用 可用 已用% 挂载点
/dev/sda3 35G 3.2G 30G 10% /
/dev/sda1 2.9G 75M 2.7G 3% /boot
[root@Linux_Client~]# vi /usr/local/nagios/etc/nrpe.cfg
将
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5-c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w150 -c 200
修改为以下内容
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5-c 30,25,20
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c10% -p /dev/sda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w150 -c 200
command[check_sda3]=/usr/local/nagios/libexec/check_disk -w 20% -c10% -p /dev/sda3
command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -f -w20 -c 10
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c10%
上传check_memory.pl至/usr/local/nagios/libexec
[root@Linux_Clientlibexec]# chmod +x check_memory.pl
2.2 服务器端安装配置nrpe插件
2.2.1 安装nrpe
[root@Linux_Serversoft]# tar zxvf nrpe-2.14.tar.gz
[root@Linux_Serversoft]# cd nrpe-2.14
[root@Linux_Servernrpe-2.14]# ./configure
[root@Linux_Servernrpe-2.14]# make all
[root@Linux_Servernrpe-2.14]# make install-plugin
[root@Linux_Servernrpe-2.14]# /usr/local/nagios/libexec/check_nrpe -H 192.168.189.128
NRPE v2.142.2.2 在commands.cfg文件增加对check_nrpe的定义
[root@Linux_Servernrpe-2.14]# vi /usr/local/nagios/etc/objects/commands.cfg
# 'check_mem' command definition
define command{
command_name check_mem
command_line $USER1$/check_memory.pl -f -w 20 -c 10
}
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c$ARG1$
}
[root@Linux_Servernrpe-2.14]# vi /usr/local/nagios/etc/nagios.cfg
将 #cfg_dir=/usr/local/nagios//etc/servers
修改为cfg_dir=/usr/local/nagios//etc/servers
[root@Linux_Servernrpe-2.14]# cd /usr/local/nagios/etc/
[root@Linux_Serveretc]# mkdir servers
[root@Linux_Serveretc]# cd servers/
[root@Linux_Serverservers]# touch 192.168.189.128.cfg
[root@Linux_Serverservers]# vi 192.168.189.128.cfg
# Define a hostfor the local machine
define host{
use linux-server
host_name Linux_Client
alias Linux_Client
address 192.168.189.128
}
define service{
use generic-service
host_name Linux_Client
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use generic-service
host_name Linux_Client
service_description Current Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name Linux_Client
service_description Boot Partition
check_command check_nrpe!check_sda1
}
define service{
use generic-service
host_name Linux_Client
service_description Root Partition
check_command check_nrpe!check_sda3
}
define service{
use generic-service
host_name Linux_Client
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name Linux_Client
service_description Check Zombie Procs
check_command check_nrpe!check_zombie_procs
}
define service{
use generic-service
host_name Linux_Client
service_description Current Users
check_command check_nrpe!check_users
}
defineservice{
use generic-service
host_name Linux_Client
service_description Check Swap
check_command check_nrpe!check_swap
}
define service{
use generic-service
host_name Linux_Client
service_description check mem
check_command check_nrpe!check_mem
}
[root@Linux_Serverservers]# service nagios restart
3、让nagios结合rrdtool和pnp实现图形绘制功能
3.1 安装准备软件
[root@Linux_Serversoft]# yum -y install libart_lgpl-devel pango-devel* cairo-devel* libxml2-develperl-Time-HiRes
3.2 安装cgilib,rrdtool,pnp4nagios[root@Linux_Serversoft]# cd /opt/soft/
[root@Linux_Servercgilib-0.5]# tar zxvf cgilib-0.5.tar.gz
[root@Linux_Serversoft]# cd cgilib-0.5
[root@Linux_Servercgilib-0.5]# make
[root@Linux_Servercgilib-0.5]# cp libcgi.a /usr/local/lib
[root@Linux_Servercgilib-0.5]# cp cgi.h /usr/include/
[root@Linux_Servercgilib-0.5]# cd ..
[root@Linux_Serversoft]# tar zxvf rrdtool-1.4.4.tar.gz
[root@Linux_Serversoft]# cd rrdtool-1.4.4
[root@Linux_Serverrrdtool-1.4.4]# ./configure --prefix=/usr/local/rrdtool
[root@Linux_Serverrrdtool-1.4.4]# make
[root@Linux_Serverrrdtool-1.4.4]# make install
[root@Linux_Serverrrdtool-1.4.4]# cd ..
[root@Linux_Serversoft]# tar zxvf pnp4nagios-0.6.14.tar.gz
[root@Linux_Serversoft]# cd pnp4nagios-0.6.14
[root@Linux_Server pnp4nagios-0.6.14]# ./configure--prefix=/usr/local/pnp4nagios --with-rrdtool=/usr/local/rrdtool/bin/rrdtool
[root@Linux_Serverpnp4nagios-0.6.14]# make all
[root@Linux_Serverpnp4nagios-0.6.14]# make install
[root@Linux_Serverpnp4nagios-0.6.14]# make install-webconf
[root@Linux_Serverpnp4nagios-0.6.14]# make install-config
[root@Linux_Serverpnp4nagios-0.6.14]# make install-init
[root@Linux_Serverpnp4nagios-0.6.14]# cd /usr/local/pnp4nagios/etc/
[root@Linux_Serveretc]# cp nagios.cfg-sample nagios.cfg
[root@Linux_Serveretc]# cp misccommands.cfg-sample misccommands.cfg
[root@Linux_Serveretc]# cp rra.cfg-sample rra.cfg
[root@Linux_Serveretc]# cd pages/
[root@Linux_Serverpages]# cp web_traffic.cfg-sample web_traffic.cfg
[root@Linux_Serverpages]# cd ../check_commands/
[root@Linux_Server check_commands]# cpcheck_all_local_disks.cfg-sample check_all_local_disks.cfg
[root@Linux_Servercheck_commands]# cp check_nrpe.cfg-sample check_nrpe.cfg
3.3 配置nagios支持pnp4nagios3.3.1 修改nagios主配置文件nagios.cfg,打开performance_data
[root@Linux_Servercheck_commands]# vi /usr/local/nagios/etc/nagios.cfg
将
process_performance_data=0
#host_perfdata_command=process-host-perfdata
#service_perfdata_command=process-service-perfdata
修改为
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
3.3.2 修改command文件[root@Linux_Servercheck_commands]# vi /usr/local/nagios/etc/objects/commands.cfg
将
# 'process-host-perfdata' command definition
define command{
command_name process-host-perfdata
command_line /usr/bin/printf"%b""$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n">> /usr/local/nagios//var/host-perfdata.out
}
# 'process-service-perfdata' commanddefinition
define command{
command_name process-service-perfdata
command_line /usr/bin/printf"%b""$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n">> /usr/local/nagios//var/service-perfdata.out
}
替换为
# 'process-host-perfdata' commanddefinition
define command{
command_name process-host-perfdata
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl
}
# 'process-service-perfdata' commanddefinition
define command{
command_name process-service-perfdata
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl
}
3.3.3 编辑templates.cfg文件,添加小太阳模板,嵌套在nagios页面上
[root@Linux_Servercheck_commands]# vi /usr/local/nagios/etc/objects/templates.cfg
在host模板最后添加以下内容
define host{
name host-pnp
register 0
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$
}
在service模板最后添加以下内容
define service{
name srv-pnp
register 0
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}
在主机的配置文件里添加host-pnp和srv-pnp小太阳模板
[root@Linux_Server check_commands]# vi/usr/local/nagios/etc/objects/localhost.cfg
在
define host{
use linux-server
添加
define host{
use linux-server,host-pnp
在
define service{
use local-service
添加
define service{
use local-service,srv-pnp
[root@Linux_Server check_commands]# vi/etc/httpd/conf/httpd.conf
在末尾加上
include conf.d/pnp4nagios.conf
[root@Linux_Servercheck_commands]# chown -R nagios:nagios /usr/local/nagios/
[root@Linux_Servercheck_commands]# chown -R nagios:nagios /usr/local/pnp4nagios/
[root@Linux_Servercheck_commands]# mv /usr/local/pnp4nagios/share/install.php/usr/local/pnp4nagios/share/install.php_
[root@Linux_Serverphp]# chmod 777 -R /var/lib/php/session/
[root@Linux_Servercheck_commands]# service nagios restart
[root@Linux_Servercheck_commands]# service httpd restart
Pnp无法绘制进程的图形,若是需要绘制进程的图,修改nagios-plugin插件的源代码重新编译,开始已经说过,在此不便详细描述.
本案例图形化监控只针对于本机,若是监控客户端的话,在client端的配置文件里添加host-pnp,srv-pnp重启nagios即可.
邮件报警功能在此不做详细描述。