Nagios简介
Nagios是一款开源的电脑系统和网络监视工具,能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设置,打印机等。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。
Nagios主要功能
- 网络服务监控(SMTP、POP3、HTTP、NNTP、ICMP、SNMP、FTP、SSH)
- 主机资源监控(CPU load、disk usage、system logs),也包括Windows主机(使用NSClient++ plugin)
- 可以指定自己编写的Plugin通过网络收集数据来监控任何情况(温度、警告……)
- 可以通过配置Nagios远程执行插件远程执行脚本
- 远程监控支持SSH或SSL加通道方式进行监控
简单的plugin设计允许用户很容易的开发自己需要的检查服务,支持很多开发语言(shell scripts、C++、Perl、ruby、Python、PHP、C#等) - 包含很多图形化数据Plugins(Nagiosgraph、Nagiosgrapher、PNP4Nagios等)
- 可并行服务检查
- 能够定义网络主机的层次,允许逐级检查,就是从父主机开始向下检查
- 当服务或主机出现问题时发出通告,可通过email, pager, sms 或任意用户自定义的plugin进行通知
- 能够自定义事件处理机制重新激活出问题的服务或主机
- 自动日志循环
- 支持冗余监控
- 包括Web界面可以查看当前网络状态,通知,问题历史,日志文件等
Nagios工作原理
Nagios的功能是监控服务和主机,但是他自身并不包括这部分功能,所有的监控、检测功能都是通过各种插件来完成的。
启动Nagios后,它会周期性的自动调用插件去检测服务器状态,同时Nagios会维持一个队列,所有插件返回来的状态信息都进入队列,Nagios每次都从队首开始读取信息,并进行处理后,把状态结果通过web显示出来。
Nagios提供了许多插件,利用这些插件可以方便的监控很多服务状态。安装完成后,在nagios主目录下的/libexec里放有nagios自带的可以使用的所有插件,如check_disk是检查磁盘空间的插件,check_load是检查CPU负载的等。每一个插件可以通过运行./check_xxx –h 来查看其使用方法和功能。
Nagios可以识别4种状态返回信息,即 0(OK)表示状态正常/绿色、1(WARNING)表示出现警告/黄色、2(CRITICAL)表示出现非常严重的错误/红色、3(UNKNOWN)表示未知错误/深黄色。Nagios根据插件返回来的值,来判断监控对象的状态,并通过web显示出来,以供管理员及时发现故障。
Nagios服务端安装
安装依赖包
shell > yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel
创建nagios用户和用户组
shell > useradd -s /sbin/nologin nagios
shell > mkdir /usr/local/nagios
shell > chown -R nagios.nagios /usr/local/nagios
编译安装Nagios
shell > tar zxvf nagios-4.0.8.tar.gz
shell > cd nagios-4.0.8
shell > ./configure --prefix=/usr/local/nagios
shell > make all
shell > make install
shell > make install-init
shell > make install-commandmode
shell > make install-config
shell > make install-webconf
安装Nagios 插件
shell > tar zxvf nagios-plugins-2.0.3.tar.gz
shell > cd nagios-plugins-2.0.3
shell > ./configure --prefix=/usr/local/nagios
shell > make && make install
配置web页面
创建web用户并指定密码
shell > /usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd svoid
shell > cat /usr/local/nagios/etc/htpasswd
svoid:$apr1$UzFKNZdW$77G9wSbrv9W3d7w2qgFyW0
修改apache配置文件,并且需要启用LoadModule cgid_module modules/mod_cgid.so,否则访问时提示nagios core not running
shell > vim /usr/local/apache2/conf/httpd.conf
====================================================================================
LoadModule cgid_module modules/mod_cgid.so
User nagios
Group nagios
为了安全起见,一般情况下要让nagios 的web 监控页面必须经过授权才能访问,即在httpd.conf 文件最后添加如下信息:
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
AuthType Basic
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
Alias /nagios "/usr/local/nagios/share"
<Directory "/usr/local/nagios/share">
AuthType Basic
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
====================================================================================
shell > /usr/local/apache2/bin/apachectl start
访问 http://<ip>/nagios/
配置Nagios
在nagios的配置主要涉及以下几点:
- 定义监控哪些主机、主机组、服务和服务组;
- 定义这个监控要用什么命令实现;
- 定义监控的时间段;
- 定义主机或服务出现问题时要通知的联系人和联系人组。
nagios各个定义对象与配置文件关系:
- 创建hosts.cfg文件来定义主机和主机组
- 创建services.cfg文件来定义服务
- 用默认的contacts.cfg文件来定义联系人和联系人组
- 用默认的commands.cfg文件来定义命令
- 用默认的timeperiods.cfg来定义监控时间段
- 用默认的templates.cfg文件作为资源引用文件
创建hosts.cfg
shell > cat hosts.cfg
====================================================================================
define host{
use linux-server
host_name Nagios-Linux
alias Nagios-Linux
address 192.168.56.211
}
define hostgroup{
hostgroup_name mysql-servers
alias mysql servers
members Nagios-Linux
}
====================================================================================
shell > cat services.cfg
====================================================================================
define service{
use local-service
host_name Nagios-Linux
service_description check-host-alive
check_command check-host-alive
}
====================================================================================
shell > vim cgi.cfg
====================================================================================
default_user_name=svoid
authorized_for_system_information=nagiosadmin,svoid
authorized_for_configuration_information=nagiosadmin,svoid
authorized_for_system_commands=nagiosadmin,svoid
authorized_for_all_services=nagiosadmin,svoid
authorized_for_all_hosts=nagiosadmin,svoid
authorized_for_all_service_commands=nagiosadmin,svoid
authorized_for_all_host_commands=nagiosadmin,svoid
====================================================================================
shell > vim nagios.cfg
====================================================================================
log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
cfg_file=/usr/local/nagios/etc/objects/windows.cfg
object_cache_file=/usr/local/nagios/var/objects.cache
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1 # 是否允许nagios在web界面下执行重启nagios、停止主机/服务检查等操作;
command_check_interval=10s
interval_length=60
====================================================================================
验证Nagios 配置文件的正确性
shell > /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors: 0
此时通过web控制台即可查看nagios初步配置情况
利用NRPE监控远程Linux上的“本地信息”
NRPE 工作原理
NRPE 总共由两部分组成:
- check_nrpe 插件,位于监控主机上
- NRPE daemon,运行在远程的Linux主机上(通常就是被监控机)
当Nagios 需要监控某个远程Linux 主机的服务或者资源情况时:
- Nagios 会运行check_nrpe 插件,告诉它要检查什么;
- check_nrpe 插件会连接到远程的NRPE daemon,所用的方式是SSL;
- NRPE daemon 会运行相应的Nagios 插件来执行检查;
- NRPE daemon 将检查的结果返回给check_nrpe 插件,插件将其递交给nagios做处理。
被监控机(Nagios-Linux)安装
shell > useradd -s /sbin/nologin nagios
shell > mkdir /usr/local/nagios
shell > chown -R nagios.nagios /usr/local/nagios
安装nagios-plugin
shell > tar zxvf nagios-plugins-2.0.3.tar.gz
shell > cd nagios-plugins-2.0.3
shell > ./configure --prefix=/usr/local/nagios
shell > make && make install
ls /usr/local/nagios/
include libexec share
安装nrpe
shell > tar zxvf nrpe-2.15.tar.gz
shell > cd nrpe-2.15
shell > ./configure
shell > make all
shell > make install-plugin
shell > make install-daemon
shell > make install-daemon-config
shell > ls /usr/local/nagios/
bin etc include libexec share
shell > make install-xinetd
shell > cat /etc/xinetd.d/nrpe
====================================================================================
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 192.168.56.211
}
====================================================================================
shell > cat /etc/services 增加NRPE服务
====================================================================================
nrpe 5666/tcp # nrpe
====================================================================================
shell > service xinetd restart
shell > netstat -an | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
使用check_nrpe 插件测试NRPE 是否工作正常。
shell > /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.15
监控机(Nagios-server)安装
shell > tar zxvf nrpe-2.15.tar.gz
shell > cd nrpe-2.15
shell > ./configure
shell > make all
shell > make install-plugin
shell > /usr/local/nagios/libexec/check_nrpe -H 192.168.56.210
NRPE v2.15
在commands.cfg中增加对check_nrpe的定义
shell > vim /usr/local/nagios/etc/objects/commands.cfg
====================================================================================
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
====================================================================================
shell > vim /usr/local/nagios/etc/objects/svoid-a.cfg
====================================================================================
define host{
use linux-server
host_name svoid-a
alias svoid-a
address 192.168.56.211
}
define service{
use local-service ; Name of service template to use
host_name svoid-b
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service ; Name of service template to use
host_name svoid-b
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
====================================================================================
参考:
http://www.cnblogs.com/mchina/archive/2013/02/20/2883404.html
http://my.oschina.net/forlinux/blog/370502?p=1
整理自网络
Svoid
2015-07-07
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29733787/viewspace-1735712/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29733787/viewspace-1735712/