Nagios 学习笔记（整理ing）

最新推荐文章于 2024-07-18 19:12:46 发布

weixin_34308389

最新推荐文章于 2024-07-18 19:12:46 发布

阅读量167

点赞数

文章标签： php 数据库运维

原文链接：http://blog.51cto.com/zener/385974

版权

Part 1：安装平台

下载所需软件：
http://www.nagios.org/download/

创建 nagios 用户和组
groupadd nagios
useradd -g nagios -d /usr/local/nagios -s /bin/bash nagios

创建一个 nagcmd 组用于从 Web 接口执行外部命令，并将 nagios 用户和 Apache 用户都加到这个组中
groupadd nagcmd
usermod -G nagcmd nagios
usermod -G nagcmd nobody

tar zxvf nagios-3.0.3.tar.gz
cd nagios-3.0.3
./configure \
--prefix=/usr/local/nagios \
--with-command-group=nagcmd

make all

make install
make install-init
make install-config
make install-commandmode

注释下：
1、解压nagios. tar zxvf nagios-2.6.tar.gz
2、配置nagios. cd nagios ; ./configure –prefix=/usr/local/nagios（设置安装目录）
3、编译nagios. make all
4、安装nagios。与别的软件安装稍有不同，nagios的安装要好几步才能完成。

第一步执行make install安装主要的程序、CGI及HTML文件，
第二步执行 make install-commandmode 给外部命令访问nagios配置文件的权限，
第三步执行 make install-config 把配置文件的例子复制到nagios的安装目录。
这里还有一个 make install-init的步骤，它的作用是把nagios做成一个运行脚本，使nagios随系统开机启动，这是一个很方便的措施。

安装 Nagios 的 WEB 配置文件到 Apache 的 /etc/httpd/conf.d 目录下
make install-webconf

如果是编译安装的 Apache ，可以手动添加一下配置
cat sample-config/httpd.conf >> /usr/local/apache/conf/httpd.conf

创建 Nagios Web 接口登录的用户
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

确认加载了 cgi_module 和 alias_module ，然后重启 Apache
service httpd restart

安装 Nagios 插件
tar zxvf nagios-plugins-1.4.12.tar.gz
cd nagios-plugins-1.4.12
./configure \
--prefix=/usr/local/nagios \
--with-nagios-group=nagcmd

make
make install

添加 nagios 服务
chkconfig --add nagios
chkconfig nagios on

检查配置文件是否有错误
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

启动
service nagios start

通过 http://IP/nagios 输入用户名密码就可以打开 nagios 的页面了。
此时还只能简单的查看 localhost 的状态。

Part 2 : 配置

nagios的主要配置文件包括
nagios.cfg        //主配置文件
contacts.cfg      //联系人配置文件
contactgroups.cfg //联系人组配置文件
commands.cfg      //命令配置文件
host.cfg          //主机配置文件
hostgroups.cfg    //服务器组文件
templates.cfg     //模板文件
timeperiods.cfg   //监视时段文件
services.cfg      //服务配置文件

主配置文件 nagios.cfg 需要更改的地方：
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
interval_length=1 ; 间隔时间基准由 60s 改为 1s
command_check_interval=10s ; 命令检查时间间隔，-1 表示尽可能频繁的进行检查
date_format=iso8601 ; 日期格式

objects/contacts.cfg 用来定义联系人：

define contact {
contact_name sa
alias System Administrator
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
email admin@test.com
}

再定义联系人组

define contactgroup {
contactgroup_name admins
alias Administrator Group
members sa ; 添加其它联系人用 "," 分隔
}

主机监控的配置

define host {
host_name host_name ; 简短的主机名称。
alias alias ; 别名，可以更详细的说明主机。
address address ; IP 地址，也可以写主机名。如果不定义这个值， nagio 将会用 host_name 去寻找主机。
parents host_names ; 上一节点的名称，也就是指从 nagios 服务器到被监控主机之间经过的节点，可以是路由器、交换机、主机等等。
hostgroups hostgroup_names ; 简短的主机组名称。
check_command command_name ; 检查命令的简短名称，如果此项留空， nagios 将不会去判断主机是否 alive 。
max_check_attempts 整数 ; 当检查命令的返回值不是 "OK" 时，重试的次数。
check_interval 数字 ; 循环检查的间隔时间。
active_checks_enabled [0/1] ; 是否启用 "active_checks"
passive_checks_enabled [0/1] ; 是否启用 "passive_checks" ，及“被动检查”
check_period timeperiod_name ; 检测时间段简短名称，这只是个名称，具体的时间段要写在其他的配置文件中。
obsess_over_host [0/1] ; 是否启用主机操作系统探测。
check_freshness [0/1] ; 是否启用 freshness 检查。freshness 检查是对于启用被动检查模式的主机而言的，其作用是定期检查主机报告的状态信息，如果该状态信息已经过期，freshness 将会强制做主机检查。
freshness_threshold 数字 ; fressness 的临界值，单位为秒。如果定义为 "0" ，则为自动定义。
event_handler command_name ; 当主机发生状态改变时，采用的处理命令的简短的名字（可以在 commands.cfg 中对其定义）
event_handler_enabled [0/1] ; 是否启用 event_handler
low_flap_threshold 数字 ; 抖动的下限值。抖动，即在一段时间内，主机（或服务）的状态值频繁的发生变化。
high_flap_threshold 数字 ; 抖动的上限值。
flap_detection_enabled [0/1] ; 是否启用抖动检查。
process_perf_data [0/1] ; 是否启用 processing of performance data
retain_status_information [0/1] ; 程序重启时，是否保持主机状态相关的信息。
retain_nonstatus_information [0/1] ; 程序重启时，是否保持主机状态无关的信息。
contact_groups contact_groups ; 联系人组，在此组中的联系人都会收到主机的提醒信息。
notification_interval 整数 ; 重复发送提醒信息的最短间隔时间。默认间隔时间是 "60" 分钟。如果这个值设置为 "0" ，将不会发送重复提醒。
notification_period timeperiod_name ; 发送提醒的时间段。非常重要的主机（服务）定义为 24x7 ，一般的主机（服务）就定义为上班时间。如果不在定义的时间段内，无论发生什么问题，都不会发送提醒。
notification_options [d,u,r,f] ; 发送提醒包括的情况： d = 状态为 DOWN , u = 状态为 UNREACHABLE , r = 状态恢复为 OK , f = flapping
notifications_enabled [0/1] ; 是否开启提醒功能。"1" 为开启，"0" 为禁用。一般，这个选项会在主配置文件 (nagios.cfg) 中定义，效果相同。
stalking_options [o,d,u] ; 持续状态检测参数，o = 持续的 UP 状态 , d = 持续的 DOWN 状态 , u = 持续的 UNREACHABLE 状态
}

服务监控的配置

define contact {
contact_name sa
alias System Administrator
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
email admin@test.com
}

服务监控的配置和主机监控的配置较为相似，就不一一说明了。

间隔时间的计算方法为：
normal_check_interval x interval_length 秒
retry_check_interval x interval_length 秒
notification_interval x interval_length 秒

主机监控配置的例子

define host {
host_name web1
alias web1
address 192.168.0.101
contact_groups admins
check_command check-host-alive
max_check_attempts 5
notification_interval 0
notification_period 24x7
notification_options d,u,r
}

对主机 web1 进行 24x7 的监控，默认会每 10 秒检查一次状态，累计五次失败就发送提醒，并且不再重复发送提醒。

服务监控配置的例子

define service {
host_name web1
service_description check_http
check_period 24x7
max_check_attempts 3
normal_check_interval 30
contact_groups admins
retry_check_interval 15
notification_interval 3600
notification_period 24x7
notification_options w,u,c,r
check_command check_http
}

配置解释： 24x7 监控 web1 主机上的 HTTP 服务，检查间隔为 30 秒，检查失败后每 15 秒再进行一次检查，累计三次失败就认定是故障并发送提醒。
联系人组是 admins 。提醒后恢复到 30 秒一次的 normal_check_interval 检查。如果服务仍然没有被恢复，每个小时发送一次提醒。

如果要检测其他服务，例如，要检查 ssh 服务是否开启，更改如下两行：
service_description check_ssh
check_command check_ssh

为方便管理，对配置文件的分布做了如下修改：
nagios.cfg 中增加了：
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

在 hosts 目录中，为不同类型的主机创建了配置文件，如： app.cfg cache.cfg mysql.cfg web.cfg
并创建了 hostgroup.cfg 文件对主机进行分组，如：

define hostgroup {
hostgroup_name app-hosts
alias APP Hosts
members app1,app2
}

在 services 目录中创建了各种服务的配置文件，如： disk.cfg http.cfg load.cfg mysql.cfg
并创建了 servicegroup.cfg 文件对服务进行分组，如：

define servicegroup {
servicegroup_name disk
alias DISK
members cache1,check_disk,cache2,check_disk
}

Part 3 ：安装配置 NRPE

监控平台上的安装：
先安装 openssl-devel
yum install openssl-devel
Installing for dependencies: e2fsprogs-devel krb5-devel

再安装 NRPE:
tar zxvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin

在 objects/commands.cfg 中定义 check_nrpe 使用的命令：

# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

被监控主机的安装：
同样需要安装 openssl-devel
创建 nagios 用户和组
groupadd nagios
useradd -g nagios -d /usr/local/nagios -s /sbin/nologin nagios

先安装 nagios-plugin:
tar zxvf nagios-plugins-1.4.12.tar.gz
cd nagios-plugins-1.4.12
./configure --prefix=/usr/local/nagios
make
make install

再安装 NRPE:
tar zxvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config

chown -R nagios:nagios /usr/local/nagios

配置 NRPE:
vi /usr/local/nagios/etc/nrpe.cfg
allowed_hosts=127.0.0.1,$Nagios监控平台的地址或域名

启动 NRPE 守护进程：
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
可以将此命令加入 /etc/rc.local ，以便开机自动启动。

检查 NRPE 是否正常：
在被监控主机上
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
在监控平台上
/usr/local/nagios/libexec/check_nrpe -H $目标主机地址
都应该可以输出 NRPE 的版本： NRPE v2.12

在被监控端的 nrpe.cfg 文件中，可以看到这样的配置：
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
这是用来检查 CPU 负载的。

这样，就可以在监控平台上定义如下服务来监控被监控端的 CPU 负载了：

define service{
host_name remotehost
service_description check_load
...
check_command check_nrpe!check_load }

Part 4 ：扩展提示信息发送方式(smtp, msn, fetion)

1. 使用 SMTP 发送邮件
objects/commands.cfg 中有邮件提醒命令的设置
如要通过 SMTP 发送邮件，可以使用 sendEmail:
http://caspian.dotconf.net/menu/Software/SendEmail/

安装
wget http://caspian.dotconf.net/menu/Software/SendEmail/sendEmail-v1.55.tar.gz
tar zxvf sendEmail-v1.55.tar.gz
mv sendEmail-v1.55/sendEmail /usr/local/bin/

发送邮件的示例：
sendEmail -f nagios@test.com -t admin@test.com -s smtp.test.com -u "test" -xu nagios@test.com

-xp password -m "test."

vi objects/commands.cfg
把 notify-host-by-email 和 notify-service-by-email 的邮件发送部分改为：

/usr/local/bin/sendEmail -f nagios@test.com -t $CONTACTEMAIL$ -s smtp.test.com -u "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" -xu nagios@test.com -xp password

2. 发送 MSN 提醒
这里有个 perl 写的程序：
http://blog.chinaunix.net/u/24312/showart_1076329.html
yum install perl-XML-Simple
yum install perl-Hash-Merge
perl -MCPAN -e 'install Net::MSN'
cd msn
修改一下 hello.pl 的 $handle 和 $password
chmod +x hello.pl
把发送者和接收者互相加为好友，然后发个测试信息试试：
./hello.pl admin@test.com hello
第一次运行时发现这样的错误提示：
could not find ParserDetails.ini in /usr/lib/perl5/vendor_perl/5.8.5/XML/SAX
在这里有相关说明：
http://perl-xml.sourceforge.net/faq/#parserdetails.ini
执行
perl -MXML::SAX -e "XML::SAX->add_parser(q(XML::SAX::PurePerl))->save_parsers()"
mkdir /usr/local/nagios/lib
mv lib /usr/local/nagios/lib/msn
chown -R nagios:nagios /usr/local/nagios/lib
vi msn_send.pl
#!/usr/bin/perl
use lib "/usr/local/nagios/lib/msn";
my $handle = 'nagios@live.cn';
my $password = 'password';
chown nagios:nagios msn_send.pl
chmod +x msn_send.pl
mv msn_send.pl /usr/local/nagios/libexec/
添加 MSN 提醒执行的命令：
vi /usr/local/nagios/etc/objects/commands.cfg

define command{
command_name notify-host-by-msn
command_line /usr/local/nagios/libexec/msn_send.pl $CONTACTEMAIL$ "`/usr/bin/printf "%b" "***** Monitor *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n"`"
}
define command{
command_name notify-service-by-msn
command_line /usr/local/nagios/libexec/msn_send.pl $CONTACTEMAIL$ "`/usr/bin/printf "%b" "***** Monitor *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$"`"
}

使用中发现，需要回复了才能终止进程，不然 check 就无法继续，而且是所有其它 host 和 service 的 check 都无法继续了。
作者说需要把接收报警的人加为好友，可是的确是加为好友了的。但据说是可以设置超时时间的。
如果有可以运行 PHP 的 Web 服务，用 PHP 发送 MSN 信息是个简单的办法：
http://www.fanatic.net.nz/2005/02/15/send-a-message-using-php/
安装
wget http://downloads.fanatic.net.nz/dev/php/sendMsg.zip
unzip sendMsg.zip
mv sendMsg /path/to/web/dir/msn
配置
默认的验证提交方式需要 ssl 支持，用 curl 的（需要在编译 PHP 时加上 --with-curl ）：
vi sendMsg.php
require_once('msnpauth-1.1.3.php');
发送中文需要先用 iconv 将字符集转为 UTF-8:
vi index.php
$sendMsg->sendMessage(iconv("GBK", "UTF-8", $_POST['message']), 'Times New Roman', '008000');
同样需要把发送者和接收者互相加为好友。
可以先打开 http://server/msn/index.php 测试一下能否发送。如果没有问题，可以写一个脚本来执行 MSN 信息发送命令：
vi /usr/local/nagios/libexec/msn_send.sh

#!/bin/sh

wget -O - -q --post-data="sender=nagios@live.cn&password=password&recipient=$1&message=$2" http://server/msn/index.php > /dev/null

chmod +x /usr/local/nagios/libexec/msn_send.sh
再把前面设置的 MSN 提醒命令中的 msn_send.pl 改为 msn_send.sh 就可以使用了。

3. 发送短信提醒
简单的，在 www.139.com 注册移动的邮箱，设置邮件短信提醒。这样就可以发送邮件的标题到注册的手机号码上。
还有利用飞信实现的：
http://www.it-adv.net/
依赖 glibc-2.4 。 CentOS4/RHEL4, Debian Etch 不能使用。
安装所依赖的库：
tar zxvf lib.tar.gz
mv lib /usr/local/lib/fetion
echo "/usr/local/lib/fetion" > /etc/ld.so.conf.d/fetion-i386.conf
ldconfig
再安装飞信的命令行客户端：
tar zxvf fetion_linux_20080402.tar.gz -C /usr/local/bin/
vi /usr/local/bin/sms.sh

#!/bin/sh

/usr/local/bin/sms -f 159xxxxxxxx -p password -t $1 -m "$2"

chmod +x /usr/local/bin/sms.sh
和添加 MSN 提醒命令一样添加短信提醒的命令就可以了。
Debian Etch 可以升级到 testing ， glibc 也就更新到 2.4 了。
sed -e 's|etch|testing|g' /etc/apt/sources.list > /etc/apt/sources.list~
mv /etc/apt/sources.list~ /etc/apt/sources.list
apt-get update
apt-get dist-upgrade

4.Nagios监控相关内容

1).nagios目录功能的简要说明

bin	Nagios执行程序所在目录， nagios文件即为主程序
etc	Nagios配置文件位置
sbin	Nagios Cgi文件所在目录，也就是执行外部命令所需文件所在的目录
Share	Nagios网页文件所在的目录
Var	Nagios日志文件、spid 等文件所在的目录
var/archives	日志归档目录
var/rw	用来存放外部命令文件
libexec	存放 nagios插件

2)如何使用nagios插件

上面监控windows使用了check_nt插件(插件都放在/usr/local/nagios/libexec)

[root@cxy ~]# ls /usr/local/nagios/libexec/

check_apt check_ftp check_mailq check_overcr check_tcp

check_breeze check_http check_mrtg check_ping check_time

check_by_ssh check_icmp check_mrtgtraf check_pop check_udp

check_clamd check_ide_smart check_nagios check_procs check_ups

check_cluster check_ifoperstatus check_nntp check_real check_users

check_dhcp check_ifstatus check_nntps check_rpc check_wave

check_dig check_imap check_nrpe check_sensors negate

check_disk check_ircd check_nt check_simap urlize

可以看到有很多插件我们也可以使用帮助来自己写监控代码

例如查看check_nt帮助

[root@cxy libexec]# pwd

/usr/local/nagios/libexec

[root@cxy libexec]# ./check_nt -h

Usage:check_nt -H host -v variable [-p port] [-w warning] [-c critical][-l params] [-d SHOWALL] [-t timeout]

# 监控 CPU 写法

CPULOAD =

Average CPU load on last x minutes.

Request a -l parameter with the following syntax:

-l <minutes range>,<warning threshold>,<critical threshold>.

<minute range> should be less than 24*60.

Thresholds are percentage and up to 10 requests can be done in one shot.

ie: -l 60,90,95,120,90,95

# 完整写法为

check_nt!CPULOAD!-l 5,80,90

check_nt调用cpuload,5分钟内负载平均达到80%为warning,负载达到90%为critical

监控磁盘使用

USEDDISKSPACE =

Size and percentage of disk use.

Request a -l parameter containing the drive letter only.

Warning and critical thresholds can be specified with -w and -c.

# 如果要监控 C 盘 , 达到 80% 报警 , 达到 90% 为 严重危险

check_nt!USEDDISKSPACE!-l c -w 80 -c 90

5.nagios文件关系图

参考资料

【1】 http://www.nagios.org Nagios官方网站

【2】 http://www.nagiosexchange.org/ Nagios插件

【3】 http://www.nagvis.org/ Nagios逻辑拓扑监控项目

【4】 http://nagiosbp.projects.nagiosforge.org/ Nagios商业过程监控项目

【5】 http://www.itnms.net/docs/nagios/cn/build/html/ Nagios3.X中文文档

【6】 http://blog.chinaunix.net/u/28387/article_63200.html Nagios安装配置实例

【7】 http://nagiosplug.sourceforge.net/developer-guidelines.html#TH RE SHOLDFORMAT Nagios标准报警参数模版

转载于:https://blog.51cto.com/zener/385974

weixin_34308389

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Nagios 学习笔记（整理ing）

Part 1：安装平台下载所需软件：http://www.nagios.org/download/ 创建 nagios 用户和组groupadd nagiosuseradd -g nagios -d /usr/local/nagios -s /bin/bash nagios创建一个 nagcmd 组用于从 Web 接口执行外部命令，并...
复制链接

扫一扫