关闭

linux下nagios的搭建以及故障问题

标签: nagioslinux故障yum监控
279人阅读 评论(0) 收藏 举报

nagios搭建过程:

一、搭建监控中心:

一、安装前的准备
1. 支持php
# yum -y install httpd php gd gd-devel libpng libpng-devel libjpeg libjpeg-devel zlib zlib-devel openssl-devel
# service httpd restart
# chkconfig httpd on
# vim /var/www/html/index.php
<?php
phpinfo();
?>

2. 确定当前主机邮件系统工作正常
# yum -y install postfix
# chkconfig postfix on
# service postfix restart
# echo "mail test..." |mail -s "nagios mail test" root@localhost //发往本机用户
# echo "mail test..." |mail -s "nagios mail test" tianyun@126.com //测试往外网用户发送邮件

二、监控中心Nagios
1. 主程序安装
[root@master ~]# groupadd nagcmd
[root@master ~]# useradd nagios -G nagcmd //组nagcmd用于从Web接口执行外部命令
[root@master ~]# gpasswd -a apache nagcmd
[root@master ~]# grep nagcmd /etc/group
nagcmd:x:500:nagios,apache

[root@master ~]# tar xvf nagios-3.2.0.tar.gz
[root@master ~]# cd nagios-3.2.0
[root@master nagios-3.2.0]# ./configure --prefix=/usr/local/nagios \
--with-command-group=nagcmd
[root@master nagios-3.2.0]# make all
[root@master nagios-3.2.0]# make install //安装nagios的主程序,CGI和HTML文件
[root@master nagios-3.2.0]# make install-init //生成/etc/rc.d/init.d/nagios启动脚本
[root@master nagios-3.2.0]# make install-config //安装示例配置文件,路径/usr/local/nagios/etc
[root@master nagios-3.2.0]# make install-commandmode //设定相应nagios工作目录的权限
[root@master nagios-3.2.0]# make install-webconf //安装Nagios的WEB配置文件到
Apache的conf.d目录下
[root@master ~]# ls /usr/local/nagios/ //查看nagios安装文件
bin etc libexec sbin share var
[root@master ~]# ls /usr/local/nagios/libexec/ //目前没有任何插件

2. 插件安装
[root@master ~]# tar xvf nagios-plugins-1.4.14.tar.gz
[root@master ~]# cd nagios-plugins-1.4.14
[root@master nagios-plugins-1.4.14]# ./configure \
--with-nagios-user=nagios \
--with-nagios-group=nagcmd \
--prefix=/usr/local/nagios
[root@master nagios-plugins-1.4.14]# make
[root@master nagios-plugins-1.4.14]# make install
[root@master ~]# ls /usr/local/nagios/libexec/
check_apt check_file_age check_log check_oracle check_tcp
check_breeze check_flexlm check_mailq check_overcr check_time
check_by_ssh check_ftp check_mrtg check_ping check_udp
check_clamd check_http check_mrtgtraf check_pop check_ups
check_cluster check_icmp check_nagios check_procs check_users
check_dhcp check_ide_smart check_nntp check_real check_wave
check_dig check_ifoperstatus check_nt check_rpc negate
check_disk check_ifstatus check_ntp check_sensors urlize
check_disk_smb check_imap check_ntp_peer check_smtp utils.pm
check_dns check_ircd check_ntp_time check_ssh utils.sh
check_dummy check_load check_nwstat check_swap

3. Apache访问控制
[root@master ~]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin

4. 启动nagios,httpd
[root@master ~]# chkconfig nagios on
[root@master ~]# service nagios start
Starting nagios: done.
[root@master ~]# chkconfig httpd on
[root@master ~]# service httpd restart

5. 测试访问
http://ip/nagios
输入之前设置的用户名和密码
查看导航栏中:主机、服务,如果能看localhost的监控,说明阶段成功!
========================================================
nagios文件的具体含义:(/usr/local/nagios/etc/objects)
command.cfg 定义nagios能调用的命令;
contacts.cfg 定义联系人;
localhost.cfg 定义监控本机的对象;
printer.cfg 定义对打印机的监控;
switch.cfg 定义对交换机的监控;
templates.cfg 定义模板;
timeperiods.cfg 定义时间对象;
windows.cfg 定义监控的windows主机;
========================================================
check_command check-host-alive #检查的命令
check_interval 5 #检测的时间间隔
retry_interval 1 #检测失败后重试的时间间隔
max_check_attempts 5 #最大重试次数
check_period 24x7 #检测的时段
process_perf_data 0
retain_nonstatus_information 0
contact_groups sagroup #联系组
notification_interval 30 #通知的时间间隔
notification_period 24x7 #通知的时间段
notification_options d,u,r #通知的选项
#w—报警(warning),u—未知(unkown)
#d = 状态为DOWN,f = flapping,n=不发送提醒
#c—严重(critical),r—从异常情况恢复正常

二、监控本地主机:

定义主机和服务
[root@master ~]# vim /usr/local/nagios/etc/objects/localhost.cfg
... ... ...
# Define a service to "ftp" the local machine
define service{
use local-service
host_name localhost
service_description FTP
check_command check_ftp
}

# Define a service to "nfs" the local machine
define service{
use local-service
host_name localhost
service_description NFS
check_command check_tcp!2049
}

检查配置并重启nagios
[root@master objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@master objects]# service nagios restart
==============================

三、0. 定义命令
1. 定义主机
2. 定义服务

[root@master objects]# pwd
/usr/local/nagios/etc/objects
[root@master objects]# mkdir web mysql oracle
[root@master objects]# vim /usr/local/nagios/etc/nagios.cfg
cfg_dir=/usr/local/nagios/etc/objects/web
cfg_dir=/usr/local/nagios/etc/objects/mysql
cfg_dir=/usr/local/nagios/etc/objects/oracle

配置nagios监控远程主机
一、监控远程主机公共资源
[root@master objects]# pwd
/usr/local/nagios/etc/objects
[root@master objects]# cp localhost.cfg web/192.168.5.2.cfg
[root@master ~]# vim /usr/local/nagios/etc/objects/web/192.168.5.2.cfg

# Define a host for the local machine

define host{
use linux-server
host_name web-192.168.5.2
alias web-192.168.5.2
address 192.168.5.2
}

# Define a service for the local machine

define service{
use local-service
host_name web-192.168.5.2
service_description SSH
check_command check_ssh
}


define service{
use local-service
host_name web-192.168.5.2
service_description HTTP
check_command check_http
}

define service{
use local-service
host_name web-192.168.5.2
service_description FTP
check_command check_ftp
}

[root@master ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@master ~]# service nagios restart

二、监控主机私有资源

file://C:\Users\zjyh\AppData\Local\Temp\ct_tmp/1.png


配置被监控端Nagios Client
1. 安装nrpe和nagios插件
[root@client ~]# yum install openssl openssl-devel xinetd gcc make

[root@client ~]# useradd nagios
[root@client ~]# tar xvf nagios-plugins-1.4.14.tar.gz
[root@client ~]# cd nagios-plugins-1.4.14
[root@client nagios-plugins-1.4.14]# ./configure && make && make install

[root@client ~]# tar xvf nrpe-2.12.tar.gz
[root@client ~]# cd nrpe-2.12
[root@client nrpe-2.12]# ./configure && make && make install
[root@client nrpe-2.12]# make install-daemon-config
[root@client nrpe-2.12]# make install-xinetd
[root@client nrpe-2.12]# vim /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 192.168.5.240 //192.168.5.240为nagios监控中心地址
}

[root@client nrpe-2.12]# vim /etc/services
nrpe 5666/tcp # NRPE //添加该行
[root@client nrpe-2.12]# service xinetd restart
[root@client nrpe-2.12]# chkconfig xinetd on
[root@client nrpe-2.12]# netstat -tunpl | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 24380/xinetd


2. 配置监控本地私有资源
[root@client nrpe-2.12]# vim /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/vg01-lv_root
command[check_home]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/vg01-lv_home
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 50% -c 40%
[root@client ~]# service xinetd restart
========================================================
配置监控中心Nagios Server
1. 安装nrpe插件并测试
[root@master ~# tar xvf nrpe-2.12.tar.gz //仅需要check_nrpe插件
[root@master nrpe-2.12]# ./configure && make all && make install
[root@master ~]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.7
NRPE v2.12

2. 监控远程主机的私有资源
==定义命令
[root@master ~]# vim /usr/local/nagios/etc/objects/commands.cfg
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

==定义主机和定义服务
[root@master ~]# cd /usr/local/nagios/etc/objects/web/
[root@master web]# ls
192.168.5.11.cfg 192.168.5.7.cfg

[root@master web]# vim 192.168.5.7.cfg
# Define a host for the local machine

define host{
use linux-server
host_name web-192.168.5.7
alias web-192.168.5.7
address 192.168.5.7
}

# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use local-service
host_name web-192.168.5.7
service_description Root Partition
check_command check_nrpe!check_root
}

# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use local-service
host_name web-192.168.5.7
service_description Swap Usage
check_command check_nrpe!check_swap
}

# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use local-service
host_name web-192.168.5.7
service_description Total Processes
check_command check_nrpe!check_total_procs
}

# Define a service to check the load on the local machine.
define service{
use local-service
host_name web-192.168.5.7
service_description Current Load
check_command check_nrpe!check_load
}

[root@master libexec]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@master libexec]# service nagios restart
四、nagios的配置文件

nagios文件的具体含义:(/usr/local/nagios/etc/objects)
command.cfg 定义nagios能调用的命令;
contacts.cfg 定义联系人;
localhost.cfg 定义监控本机的对象;
printer.cfg 定义对打印机的监控;
switch.cfg 定义对交换机的监控;
templates.cfg 定义模板;
timeperiods.cfg 定义时间对象;
windows.cfg 定义监控的windows主机;
========================================================
达到目标:
1. 定义联系人及联系人组
2. 定义时间周期
3. 根据不同的项目选择不同的联系人和时间周期
4. 定义不同项目的检查间隔和通知间隔等

/usr/local/nagios/etc/objects/contacts.cfg
define contact{
contact_name dba_zhangsan
use generic-contact
alias mysql dba zhangsan
email zhangsan@126.com
}

define contact{
contact_name dba_lisi
use generic-contact
alias mysql dba lisi
email lisi@126.com
}

define contact{
contact_name dba_tianyun
use generic-contact
alias mysql dba tianyun
email tianyun@126.com
}

define contact{
contact_name web_jack
use generic-contact
alias web jack
email jack@126.com
}

define contactgroup{
contactgroup_name dba-contact
alias mysql dba
members dba_zhangsan,dba_lisi,dba_tianyun
}

define contactgroup{
contactgroup_name web-contact
alias web admins
members web_jack
}

/usr/local/nagios/etc/objects/templates.cfg
define host{
name mysql
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 10
notification_options d,u,r
contact_groups dba-contact
register 0
}

define host{
name web
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 10
notification_options d,u,r
contact_groups web-contact
register 0
}
五、nagios插件:

1. check_ping
commands.cfg:
# 'check_ping' command definition
define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 3
}

172.16.1.251.cfg:
define service{
use local-service,service-pnp
host_name Remote-Server
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}

宏:
$USER1$ 插件的安装目录/usr/local/nagios/libexec/
$HOSTADDRESS$ 被监控主机的主机名

Usage:check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>%
[-p packets] [-t timeout] [-4|-6]
./check_ping -H 192.168.10.6 -w 200,50% -c 300,60% -p 3
延时不到200ms,丢包50%以下 OK
-w 200,50%: 延时200ms,丢包50% WARNING
-c 300,60%: 延时300ms,丢包60% CRITICAL

2. check_dns
# ./check_dns --help
Usage:check_dns -H host [-s server] [-a expected-address] [-A] [-t timeout] [-w warn] [-c crit]
-H, --hostname=HOST
The name or address you want to query
-s, --server=HOST
Optional DNS server you want to use for the lookup
-w, --warning=seconds
Return warning if elapsed time exceeds value. Default off
-c, --critical=seconds
Return critical if elapsed time exceeds value. Default off

# ./check_dns -H www.uplooking.com -s 192.168.10.18
DNS OK: 0.009 seconds response time. www.uplooking.com returns 192.168.10.180|time=0.009433s;;;0.000000
# ./check_dns -H www.uplooking.com -s 192.168.10.18 -w 0.0000001 -c 0.0000000000000000002
DNS CRITICAL: 0.009 seconds response time. www.uplooking.com returns 192.168.10.180|time=0.009414s;;;0.000000

commands.cfg:
# 'check_dns' command definition
define command{
command_name check_dns
command_line $USER1$/check_dns -H $ARG1$ -s $HOSTADDRESS$ -w 0.1 -c 0.2
}
172.16.1.251.cfg:
define service{
use local-service,service-pnp
host_name dns-server
service_description check_dns
check_command check_dns!www.uplooking.com
}

3. check_mysql:
#./check_mysql -H 192.168.10.22 -u alice -p tianyun -d test
Uptime: 125 Threads: 2 Questions: 13 Slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 8 Queries per second avg: 0.104

commands.cfg:
# 'check_mysql' command definition
define command{
command_name check_mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -d
$ARG3$
}

172.16.1.22.cfg:
define service{
use local-service
host_name mysql1
service_description check_mysql
check_command check_mysql!alice!tianyun!test
}
六、性能分析表

PNP产生nagios的性能分析图表
http://www.pnp4nagios.org
========================================================
一、nagios性能分析图表的作用:
Nagios对服务或主机监控的是一个瞬时状态,有时候系统管理员如果需要了解主机在一段时间内的
性能以及服务的响应状态,并且形成图表时,就需要查看日志数据来分析,但是这种方式不但繁琐,而
且抽象,幸运的是,PNP可以帮助我们完成这个工作。

二、PNP的概念与安装环境:
PNP是一个小巧的开源软件包,它基于PHP和PERL,PNP可以利用rrdtool绘图工具将Nagios采集的
数据绘制成相关的图表,然后显示主机或者服务在一段时间内的运行状况。
# wget https://sourceforge.net/projects/pnp4nagios/files/latest --no-check-certificate

三、部署PNP
==安装软件
1. 安装PNP前,首先需要安装如下环境
# yum -y install libxml2-devel pango-devel php-gd perl

2. rrdtool安装
方法一:
# yum -y install rrdtool.x86_64

方法二:
# ./configure --prefix=/usr/local/rrdtool
# make
# make install

3. pnp安装
# tar xvf pnp4nagios-0.6.19.tar.gz
# cd pnp4nagios-0.6.19
# ./configure \
--with-rrdtool=/usr/local/rrdtool/bin/rrdtool \
--with-nagios-user=nagios \
--with-nagios-group=nagcmd \
--with-perl_lib_path=/usr/local/rrdtool/lib/perl/5.8.8/x86_64-linux-thread-multi/
# make all
# make install
# make fullinstall

4. 创建配置文件
# cd /usr/local/pnp4nagios/etc
# mv misccommands.cfg-sample misccommands.cfg
# mv nagios.cfg-sample nagios.cfg
# mv rra.cfg-sample rra.cfg

# cd pages
# mv web_traffic.cfg-sample web_traffic.cfg

# cd ../check_commands
# mv check_all_local_disks.cfg-sample check_all_local_disks.cfg
# mv check_nrpe.cfg-sample check_nrpe.cfg
# mv check_nwstat.cfg-sample check_nwstat.cfg

5. 重启服务
# /etc/init.d/npcd restart
# chkconfig npcd on

==配置nagios使用pnp
1. 修改 nagios 的主配置文件,打开performance_data
# vim /usr/local/nagios/etc/nagios.cfg
# 打开注释项:
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata

2.修改 commands.cfg,定义命令
# vim /usr/local/nagios/etc/objects/commands.cfg
# 'process-host-perfdata' command definition
define command{
command_name process-host-perfdata
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA
}

# 'process-service-perfdata' command definition
define command{
command_name process-service-perfdata
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl
}

3. 添加小太阳模版,镶嵌在nagios页面上
# vim /usr/local/nagios/etc/objects/templates.cfg
## 在最后添加
define host {
name host-pnp
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_
register 0
}

define service {
name srv-pnp
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
register 0
}

4. 修改定义host及service文件,此处为localhost.cfg
# vim /usr/local/nagios/etc/objects/localhost.cfg
define host{
use linux-server,host-pnp //此处增加host-pnp
host_name web-server1
alias server1-192.168.0.104
address 192.168.0.104
}

define service{
use local-service,srv-pnp //此处增加service-pnp
host_name web-server1
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}

5. 重启 nagios 服务
# service httpd restart
# service nagios restart

6. 访问测试

7. 删除pnp测试文件
# rm -rf /usr/local/pnp4nagios/share/install.php

========================================================
Please check the documentation for information about the following error.
perfdata directory "/usr/local/pnp4nagios/var/perfdata/" is empty. Please check your Nagios config. Read FAQ online

file [line]:
application/models/data.php [109]:
back

七、nagios报警机制;

Nagios报警机制
========================================================
一、Nagios支持的报警方式
1. web
2. mail
3. 短信网关
4. 手机sms
a. 139邮箱推荐
b. fetion

二、fetion
http://www.it-adv.net

1. 安装fetion的库和程序
# mkdir /usr/local/fetion
# cd /usr/local/fetion/
[root@station11 fetion]# ll
-rwxr-xr-x 1 root root 503425 10-13 16:34 fetion
drwxr-xr-x 2 root root 4096 10-13 16:33 lib
# vim /etc/ld.so.conf
/usr/local/fetion/lib
# ldconfig
# chmod a+x /usr/local/fetion/fetion
# /usr/local/fetion/fetion
# /usr/local/fetion/fetion --mobile=13521234567 --pwd=xxxxTIANYUN --to=13912345678 --msg-utf8="test fetion" -EN

2. 整合飞信到nagios
定义指令:
[root@master ~]# vim /usr/local/nagios/etc/objects/commands.cfg
#定义了一个主机故障时发送报警短信的指令
define command{
command_name notify-host-by-sms
command_line /usr/local/fetion/fetion --mobile=13521234567 --pwd=xxxx --to=$CONTACTPAGER$ --msg-utf8="Host $HOSTSTATE$ alert for $HOSTNAME$! on '$DATETIME$'"
}
#定义了一个服务故障时发送报警短信的指令
define command{
command_name notify-service-by-sms
command_line /usr/local/fetion/fetion --mobile=13521234567 --pwd=xxxx --to=$CONTACTPAGER$ --msg-utf8="$HOSTADDRESS$' $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$"
}

三、定义联系人
[root@master ~]# vim /usr/local/nagios/etc/objects/contacts.cfg
define contact{
contact_name sasystem
use generic-contact
alias sa-system
email alice@qq.com
pager 13112345678
}
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin,sasystem
}

四、定义检查的频率,通知频率,通知方法...
[root@master ~]# vim /usr/local/nagios/etc/objects/templates.cfg
define contact{
name generic-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email,notify-service-by-sms
host_notification_commands notify-host-by-email,notify-host-by-sms
register 0
}

[root@master ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@master ~]# service nagios restart

五、计划任务定期发送sms
======================================================




以上就是nagios监控的主要搭建过程,希望对大家有帮助!

1
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:1287次
    • 积分:63
    • 等级:
    • 排名:千里之外
    • 原创:5篇
    • 转载:0篇
    • 译文:0篇
    • 评论:0条
    文章存档