nagios监控Linux服务器的过程需要进行客户端和服务端的配置两大模块,每个部分又有更为细致的配置步骤,所以读者注意不要把两个部分弄混淆。
nagios监控Linux服务器的过程如下所述:
客户端配置:
建用户:
useradd nagios (可以在 /etc/password 里修改此帐户的shell为 /sbin/nologin)
passwd nagios
安装插件:
- tar zxvf nagios-plugins-1.4.11.tar.gz
- cd nagios-plugins-1.4.11
- ./configure --prefix=/usr/local/nagios --enable-redhat-pthread-workaround
- make all
- make install
修改权限:
chown nagios:nagios /usr/local/nagios –R
安装Nrpe服务:
- tar zxvf nrpe-2.8.1.tar.gz
- cd nrpe-2.8.1
- ./configure --prefix=/usr/local/nagios --enable-ssl --enable-command-args
- make all (编译)
- make install-plugin (安装插件)
- make install-daemon (安装 nrpe daemon)
- make install-daemon-config (安装配置文件)
- make install-xinetd (安装xinetd 脚本文件)
添加服务器IP:
vi /etc/xinetd.d/nrpe
only_from = 127.0.0.1 192.168.0.108
添加 nrpe 服务:
vi /etc/services
nrpe 5666/tcp # nrpe
重新启动Xinetd服务:
service xinetd restart
4)开启nrpe daemon
查看Nrpe服务是否开启:
netstat -ant|grep 5666
测试Nrpe是否能正常工作:
/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.8.1 (这表示在本地用check_nrpe连接nrpe daemon是正常的)
以下中间部分可以省略了,直接跳到最后的“实践成功案例”
服务端配置:
A 拷贝 localhost.cfg 模板文件,对192.168.112.101进行监控:
cp /usr/local/nagios/etc/localhost.cfg /usr/local/nagios/etc/object/192.168.112.101.cfg
vi 192.168.112.101.cfg
修改配置(主机名、IP、别名)
B 将192.168.112.101.cfg 加入nagios 主配置文件:
vi /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/192.168.112.101.cfg
C 验证配置文件是否正确,并重启:
/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/object/nagios.cfg
Service nagios restart (重启nagios 使配置生效)
D 查看naigios监控页面,发现主机已经被正常添加。
服务端增加监控服务(非必须):
在Command.cfg文件中增加check_nrpe的功能:
- vi /usr/local/nagios/etc/commands.cfg
- define command{
- command_name check_nrpe
- command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
- }
注意:-c后面带的$ARG1$参数是传给nrpe daemon执行的检测命令,在services.cfg中使用check_nrpe的时候要用!带上这个参数。
额外知识点:
vi /usr/local/nagios/etc/nrpe.cfg
找到以下这段内容:
- # The following examples use hardcoded command arguments...
- command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
- command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
- command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
- command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
- command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
[] 中是命令名,也就是check_nrpe 的-c参数可以接的内容,=后面是实际执行的插件程序commands.cfg中定义命令的形式十分相似,只不过是写在了一行.也就是说check_users就是等号后面/usr/local/nagios/libexec/check_users -w 5 -c 10的简称.
我们可以很容易知道上面这5行定义的命令分别是检测登陆用户数,cpu负载,hda1的容量,僵尸进程,总进程数.用法: ./check_load --help,由于-c后面只能接nrpe.cfg中定义的命令,也就是说现在我们只能用上面定义的这五条命令。
另外实践成功案例:
2.配置监控端
1.安装nagios
rpm -qa |grep gd
rpm -ql gd-devel-2.0.28-5.4E.el4_6.1
cd nagios-3.0.5
./configure --prefix=/usr/local/nagios --with-command-group=nagcmd --with-gd-lib=/usr/lib --with-gd-inc=/usr/include
make all
make install
make install-init
make install-config
make install-commandmode
make install-webconf #可以自动配置httpd.conf
|
2.安装nagios-plugins
cd nagios-plugins-1.4.11
./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround
make
make install
|
3.安装NRPE
cd nagios-nrpe_2.8.1
./configure #默认自动添加了openssl
#因为传送过程要加密,如果后面make报错,加如下参数
rpm -qa| grep ssl
openssl-devel-0.9.7a-43.17.el4_6.1
rpm -ql openssl-devel-0.9.7a-43.17.el4_6.1 | more
./configure --enable-ssl --with-ssl-lib=/lib/(当然前提要有openssl)
make all
make install-plugin
|
4.commands.cfg定义外部构件nrpe
vi /usr/local/nagios/etc/objects/commands.cfg
#添加
#check nrpe
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
|
5.配置要监控的linux主机
vi /usr/local/nagios/etc/nagios.cfg
#中间添加
cfg_file=/usr/local/nagios/etc/objects/mylinux.cfg
|
6.新建mylinux.cfg 设置监控内容
注意这个文本很重要,否则会报配置文件错误
vi /usr/local/nagios/etc/objects/mylinux.cfg
define host{
use linux-server
host_name mylinux
alias mylinux
address 192.168.0.21 #(客户端IP既被监控的IP,这里只是描述一下,真的环境里最好不要中文)
}
define service{
use generic-service
host_name mylinux
service_description check-swap
check_command check_nrpe!check_swap
}
define service{
use generic-service
host_name mylinux
service_description check-load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name mylinux
service_description check-disk
check_command check_nrpe!check_had1
}
define service{
use generic-service
host_name mylinux
service_description check-users
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name mylinux
service_description otal_procs
check_command check_nrpe!check_total_procs
}
|
6.其它设置
chkconfig --add nagios #配置机器启动时自动启动Nagios
chkconfig nagios on
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #检查Nagios配置文件
vi /etc/selinux/config #关闭SELinux
SELINUX=disabled
service iptables stop #关闭SELinux,或打开80,5666端口
service nagios start
|
3.配置被监控端
1.安装nagios-plugin
useradd nagios
passwd nagios
tar -zxvf nagios-plugins-1.4.12.tar.gz
cd nagios-plugins-1.4.12
./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround
Make
make install
|
2.改变主目录权限
chown –R nagios.nagios /usr/local/nagios
[root@client nagios]# ll
drwxr-xr-x 2 nagios nagios 4096 Jun 1 00:07 libexec
drwxr-xr-x 3 nagios nagios 4096 Jun 1 00:07 share
|
3.安装客户端的nrpe
tar -zxvf nagios-nrpe_2.8.1.orig.tar.gz
cd nagios-nrpe_2.8.1
./configure (会自动加载SSL)
#如果后面make报错,加如下参数
./configure --enable-ssl --with-ssl-lib=/usr/lib/(当然前提要有openssl)
make all
make install-plugin
make install-daemon
make install-daemon-config
|
4.配置nrpe信息
vi /usr/local/nagios/etc/nrpe.cfg
allowed_hosts=192.168.0.20,127.0.0.1,192.168.0.99
|
5.启动nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
#或
vi /etc/rc.d/rc.local
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
|
6.验证nrpe
netstat -an | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.8.1
#服务端测试
/usr/local/nagios/libexec/check_nrpe -H l92.168.0.21
NRPE v2.8.1
#常见错误
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
CHECK_NRPE: Error - Could not complete SSL handshake.
配置
allowed_hosts=192.168.0.20,127.0.0.1,192.168.0.99,然后kill进程再重启就OK了
2./usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
Connection refused by host
Nrpe进程没有启动
|
7.配置监控对像(关键)
vi /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
|
![](https://i-blog.csdnimg.cn/blog_migrate/ab910cf2a15648c3ee9c0cb45b7ee070.jpeg)
以下贴几张自己的配置和监控图:
安装在上文(略)
当然还有mylinux.cfg的配置文件 (在上文,当然也有其他一些小配置 )
成功监控linux服务器:
正确nrpe配置截图(注意命令程序路径):
vim /usr/local/nagios/etc/nrpe.cfg
监控服务如下: