dell poweredge系列服务器的监控_如何在luggage查看dell poweredge server psnt-CSDN博客

参考官方文档地址：
安装openmanager
http://linux.dell.com/repo/hardware/latest/
安装check_openmanage官档
http://folk.uio.no/trondham/soft ... .html#main-features
以下就是openmanage安装和配置的过程:
#---------------------------------------------------------------
# DELL srvadmin install and server firmware update.
#---------------------------------------------------------------
# Reference Website:
http://linux.dell.com/repo/hardware/latest/
1. Complete this step before any of the steps below.
[root@ritto ~]# wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi | bash
2. Installing OpenManage Server Administrator. //安装OpenManager务器端程序
[root@ritto ~]# yum -y install srvadmin-all
Exit terminal and then re-enter.
[root@ritto ~]# /opt/dell/srvadmin/sbin/srvadmin-services.sh start
[root@ritto ~]# netstat -nat | grep 1311
3. Installing firmware-tools to manage BIOS and firmware updates.//安装固件工具，c
[root@ritto ~]# yum -y install dell_ft_install
[root@ritto ~]# yum -y install $(bootstrap_firmware)
4. Managing BIOS and firmware updates //管理BIOS和升级固件
# Inventory firmware version levels
   [root@ritto ~]# inventory_firmware
# Install any applicable updates
   [root@ritto ~]# update_firmware --yes
5. Allow iptables access(add one line).//修改防火墙允许远程访问
[root@ritto ~]# vi /etc/sysconfig/iptables
:RH-Firewall-1-INPUT - [0:0] //加载规则使下面两条生效
         -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
         -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 1311 -j ACCEPT
[root@ritto ~]# /etc/init.d/iptables restart
6. # Addad for Centos //设置服务开机启动
[root@ritto ~]# yum -y remove OpenIPMI
[root@ritto ~]# yum -y install  OpenIPMI
[root@ritto ~]# chkconfig --level 3 dataeng on
[root@ritto ~]# service dataeng start
# Dell server admin access URL: https://ip:1311 //通过WEB访问，监控主机
7. Reboot the system.
[root@ritto ~]# reboot 重启系统加载固件更新
完成这写openmanager安装好了，同过https://ip:1311我们就可以访问了

配合nagios，还能将openmanager server administrator的信息发送给nagios处理，当服务器出现警告时，能以邮件和短信的方式发送通知，非常的方便.
下面是使用nagios的dell openmanager插件来监控dell服务器硬件的配置过程.
首先在被监控的linux服务器上安装 openmanage管理程序.
且安装配置好 nagios客户端程序 (nagios-plugins 和 nrpe).
服务器端：192.168.20.230（已安装好nagios服务器程序）
客户端：172.16.10.69（已安装好openmanage管理程序）
客户端安装和配置 nagios客户端程序 (nagios-plugins 和 nrpe)
Nagios-plugins下载地址： http://www.nagios.org/download/plugins/
Nrpe下载地址： http://www.nagios.org/download/addons/

1增加用户
[root@dbpi root]# useradd nagios
设置密码
[root@dbpi root]# passwd nagios
2安装nagios-plugins插件
解压缩
tar -zxvf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16
编译安装
./configure
make
make install
这一步完成后会在/usr/local/nagios/下生成两个目录libexec和share
ls /usr/local/nagios/
libexec share
修改目录权限
chown nagios.nagios /usr/local/nagios
chown -R nagios.nagios /usr/local/nagios/libexec
3安装nrpe
解压缩
tar -zxvf nrpe-2.14.tar.gz
cd nrpe-2.14
编译
./configure
输出如下
*** Configuration summary for nrpe 2.8.1 05-10-2007 ***:
General Options:
-------------------------
NRPE port: 5666
NRPE user: nagios
NRPE group: nagios
Nagios user: nagios
Nagios group: nagios
type 'make all' to compile the NRPE daemon and client.
可以看到NRPE的端口是5666,下一步是make all
make all
输出如下
*** Compile finished ***
If the NRPE daemon and client compiled without any errors, you
can continue with the installation or upgrade process.
Read the PDF documentation (NRPE.pdf) for information on the next
steps you should take to complete the installation or upgrade.
接下来安装NPRE插件,daemon和示例配置文件
安装check_nrpe这个插件
make install-plugin
之前说过监控机需要安装check_nrpe这个插件,被监控机并不需要,我们在这里安装它是为了测试的目的
安装deamon
make install-daemon
安装配置文件
make install-daemon-config
现在再查看nagios目录就会发现有4个目录了
[root@dbpi nrpe-2.8.1]# ls /usr/local/nagios/
bin etc libexec share
按照安装文档的说明,是将NRPE deamon作为xinetd下的一个服务运行的.在这样的情况下xinetd就必须要先安装好,不过一般系统已经默认装了 ..
4.安装xinetd脚本，如果机器没有安装xinetd，要用yum安装
[root@dbpi nrpe-2.8.1]# make install-xinetd
输出如下
/usr/bin/install -c -m 644 sample-config/nrpe.xinetd /etc/xinetd.d/nrpe
可以看到创建了这个文件/etc/xinetd.d/nrpe
编辑/etc/xinetd.d/nrpe这个脚本
vi /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1在后面增加监控主机的地址0.111,以空格间隔
}
改后
only_from = 127.0.0.1 192.168.20.230
编辑/etc/services文件,增加NRPE服务
vi /etc/services
增加如下
# Local services
nrpe 5666/tcp # nrpe
重启xinetd服务
[root@dbpi nrpe-2.8.1]# service xinetd restart .
Stopping xinetd: [ OK ]
Starting xinetd: [ OK ]
查看NRPE是否已经启动
[root@dbpi nrpe-2.8.1]# netstat -at|grep nrpe
tcp 0 0 *:nrpe *:* LISTEN
[root@dbpi nrpe-2.8.1]# netstat -an|grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
可以看到5666端口已经在监听了
5.在本机上测试NRPE是否则正常工作
之前我们在安装了check_nrpe这个插件用于测试,现在就是用的时候.执行
/usr/local/nagios/libexec/check_nrpe -H localhost
会返回当前NRPE的版本
也就是在本地用check_nrpe连接nrpe daemon是正常的
在服务器上执行
[root@dbpi nrpe-2.8.1]# /var/www/html/nagios/libexec/check_nrpe -H 172.16.10.69
NRPE v2.8.1
也就是在服务器上check_nrpe连接nrpe daemon是正常的，nagios客户端搭建完成
注:为了后面工作的顺利进行,注意本地防火墙要打开5666能让外部的监控机访问
vi /etc/sysconfig/iptables
  -A RH-Firewall-1-INPUT -p tcp -m tcp --dport 5666 -j ACCEPT
nagios
在被监控机上下载openmanage for nagios的插件.
安装check_openmanage官档
http://folk.uio.no/trondham/soft ... .html#main-features
wget http://folk.uio.no/trondham/soft ... anage-3.7.11.tar.gz
tar zxvf check_openmanage-3.7.11.tar.gz
cd check_openmanage-3.7.11
[root@monitor ~/check_openmanage-3.5.7]# ls -lhtr
total 3504
-rw-r--r--  1 45150  55150 25K Mar 19  2010 check_openmanage.8
-rwxr-xr-x  1 45150  55150 1.1K Mar 19  2010 install.sh
-rwxr-xr-x  1 45150  55150 406B Mar 19  2010 install.bat
-rw-r--r--  1 45150  55150 17K Mar 19  2010 check_openmanage.pod
-rw-r--r--  1 45150  55150 6.5K Mar 19  2010 check_openmanage.php
-rwxr-xr-x  1 45150  55150 147K Mar 19  2010 check_openmanage
-rw-r--r--  1 45150  55150 2.7K Mar 19  2010 README
-rw-r--r--  1 45150  55150 533B Mar 19  2010 INSTALL
-rw-r--r--  1 45150  55150 34K Mar 19  2010 COPYING
-rw-r--r--  1 45150  55150 19K Mar 19  2010 CHANGES
-rwxr-xr-x  1 45150  55150 3.1M Mar 19  2010 check_openmanage.exe
-rw-r--r--  1 45150  55150 4.4K Mar 19  2010 check_openmanage.spec
解压后就可以使用.
cp check_openmanage-3.5.7/check_openmanage /usr/local/nagios/libexec/
cd /usr/local/nagios/libexec/
chown nagios.nagios check_openmanage
chmod 775 check_openmanage
check_openmanage有两种方式获得Dell服务器的硬件信息.
1. 通过本地运行获得.
2. 通过snmp的方式获得.
我这里用的是nrpe的插件调用的方式进行监控的.
  Vi /usr/local/nagios/etc/nrpe.cfg
# 添加一行.
command[check_openmanage]=/usr/local/nagios/libexec/check_openmanage
/etc/init.d/xinetd restart
这样整个客户端的openmanage for nagios的插件就安装好了。

在nagios监控机上.我们增加配置
Nagios程序安装在/var/www/html/nagios
修改配置文件
修改nagios的主配置文件nagios.cfg
vi /var/www/html/nagios/etc/nagios.cfg
添加新主机的配置文件地址
cfg_file= /var/www/html/nagios/etc/objects/web1.cfg
vi /var/www/html/nagios/etc/objects/comments.cfg
vi /usr/local/nagios/etc/commands.cfg
在最后面增加如下内容,添加check_nrpe命令
# 'check_nrpe ' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
意义如下
command_name check_nrpe
定义命令名称为check_nrpe,在services.cfg中要使用这个名称.
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
这是定义实际运行的插件程序.这个命令行的书写要完全按照check_nrpe这个命令的用法.不知道用法的就用check_nrpe –h查看
-c后面带的$ARG1$参数是传给nrpe daemon执行的检测命令,之前说过了它必须是nrpe.cfg中所定义的那5条命令中的其中一条.在services.cfg中使用check_nrpe的时候要用!带上这个参数
新建一个主机配置文件
Vi /var/www/html/nagios/etc/objects/web1.cfg
# 添加以下内容.
# Check DEll Server.
define host{
      use                   linux-server
      host_name             dell.web1
      alias                   dell_manage
      address                172.16.10.69
      }
define service{
      use             generic-service
      host_name             dell.web1
      service_description    DELL Server status
      check_command       check_nrpe!check_openmanage
      }
配置完成后重启nagios服务
这样就可以通过nagios监控到dell服务器的硬件状态了，有故障就会发送警报，很方便.

附录：
可以用/usr/local/nagios/libexec/check_nrpe –h查看这个命令的用法
可以看到用法是check_nrpe –H 被监控的主机 -c要执行的监控命令
注意:-c后面接的监控命令必须是nrpe.cfg文件中定义的.也就是NRPE daemon只运行nrpe.cfg中所定义的命令
查看NRPE的监控命令
cd /usr/local/nagios/etc
vi nrpe.cfg
找到下面这段话
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
红色部分是命令名,也就是check_nrpe 的-c参数可以接的内容,等号=后面是实际执行的插件程序(这与commands.cfg中定义命令的形式十分相似,只不过是写在了一行).也就是说check_users就是等号后面/usr/local/nagios/libexec/check_users -w 5 -c 10的简称.
我们可以很容易知道上面这5行定义的命令分别是检测登陆用户数,cpu负载,hda1的容量,僵尸进程,总进程数.各条命令具体的含义见插件用法(执行”插件程序名 –h”)
由于-c后面只能接nrpe.cfg中定义的命令,也就是说现在我们只能用上面定义的这五条命令.我们可以在本机实验一下.执行
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_hda1
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_zombie_procs

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_total_procs

监控脚本文件：

#!/bin/bash

# Program : check_dell_omreport
# Version : 1.0
# Date : Jul 28 2012
# Author : huky - alonerhu@yahoo.com.cn
# Summary : a simple nagios/icinga plugin that checks the status of chassis &
# storage on Dell PowerEdge servers with omreport in Dell Openmanager
# Licence : GPL - summary below, full text at http://www.fsf.org/licenses/gpl.txt

#这里指定openmanager安装路径，默认是/opt/dell/srvadmin
DELL_SRV_DIR=/opt/dell/srvadmin
PATH=$PATH:$DELL_SRV_DIR/oma/bin:$DELL_SRV_DIR/bin:$DELL_SRV_DIR/sbin
#OMREPORT=`find $DELL_SRV_DIR -name omreport 2> /dev/null`
STOR_CTRL=/tmp/dell.storage.ctr
LOG_FILE=/tmp/dell_omreport.log

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKOWN=3

if [ ! -d $DELL_SRV_DIR ]; then
echo "Please install OpenManger and define the PATH after DELL_SRV_DIR" && exit $STATE_UNKOWN
fi

/etc/init.d/dataeng status >> /dev/null
if [ ! $? -eq 0 ]; then
echo "Please start the service dataeng" && exit $STATE_UNKOWN
fi

#check chassis
omreport chassis | grep ^[^Ok] | grep ":" | sed '/COMPONENT/d' > $LOG_FILE

#check storage
omreport storage controller | grep "^ID" | cut -d":" -f2 > $STOR_CTRL
if [ ! -s $STOR_CTRL ]; then
echo "Have you installed the package for storage?" >> $LOG_FILE
fi

for CONTR_ID in `cat $STOR_CTRL`
do
omreport storage controller controller=$CONTR_ID | grep -2 ^Status | sed '/--/d' | awk '{if (NR%5==0){print $0} else {printf"%s ",$0}}' | grep -v Ok | tr -s " *" " " >> $LOG_FILE
done

if [ -s $LOG_FILE ]; then
paste -s $LOG_FILE > $LOG_FILE.2
if [ `grep -c "Critical" $LOG_FILE` -eq `grep -c "\-Critical" $LOG_FILE` ]; then
echo `cat $LOG_FILE.2` && exit $STATE_WARNING
else
echo `cat $LOG_FILE.2` && exit $STATE_CRITICAL
fi
else
echo "Machine is Health" && exit $STATUS_OK
fi