Replacing Ping with Nmap for Nagios
作为一个管理员,有时候我们需要周围的网络情况。一个简单的例子,当网络设备规避了icmp的回应,我们用一般的ping就很难探测到设备是否活着,向nagios这样就很难做得到。所以下面我们来介绍一个例子,解决上面出现的情况,用namp可以结合nagios对网络扫描和监控是我们的目的。先来熟悉namp基本语法:
例子: nmap -sP 192.168.1.6
输出以下内容:
Starting Nmap 5.30BETA1 ( http://nmap.org ) at 2010-06-28 14:13 EDT Nmap scan report for argos (192.168.1.6) Host is up (0.000073s latency). Nmap done: 1 IP address (1 host up) scanned in 0.00 seconds
或者下面的内容:
nmap -sP 192.168.1.200 Starting Nmap 5.30BETA1 ( http://nmap.org ) at 2010-06-28 14:15 EDT Note: Host seems down. If it is really up, but blocking our ping / probes, try -Pn Nmap done: 1 IP address (0 hosts up) scanned in 3.01 seconds
上面是两种情况下的输出,第一种是主控ijup,第二种是主机down的时候。
现在我们来看下面简单的脚本:
#!/bin/sh /usr/local/bin/nmap -sP $1 | grep "Host seems down" if [ "$?" -eq 0 ]; then echo "NMAP PING: CRITICAL" exit 2 fi echo "NMAP PING: OK" exit 0
这个脚步看起来有点不好,因为它没有考虑unknown的情况,下面是考虑 3种情况的脚本:
#!/bin/sh NMAP="/usr/local/bin/nmap -sP" TMP=/var/tmp/nmap_ping.$$ CHECK="Nmap Ping" $NMAP $1 > $TMP grep "Host seems down" $TMP if [ "$?" -eq 0 ]; then rm -f $TMP echo "$CHECK: CRITICAL" exit 2 fi grep "Host is up" $TMP if [ "$?" -eq 0 ]; then rm -f $TMP echo "$CHECK: Ok" exit 0 fi rm -f $TMP echo "$CHECK: UNKNOWN" exit 3
现在脚本考虑了所有情况在内,但是还是有两个问题在内。第一:我们重复判断临时文件何时删除。第二:如果namp不运行呢?从理论上讲,会生成一个未知的,最好是在等到它运行完之前就判断。仅仅做一个例行的删除操作是没有意义的,更好的办法是返回一个错误等级和错误信息:
results_exit() { retval=$1 msg=$2 rm -f $TMP echo "$CHECK ${msg}" return $retval }
上面定义了一个退出函数,方便控制:
#!/bin/sh NMAP="/usr/local/bin/nmap -sP" TMP=/var/tmp/nmap_ping.$$ CHECK="Nmap Ping" results_exit() { retval=$1 msg=$2 rm -f $TMP echo "$CHECK ${msg}" return $retval } $NMAP $1 > $TMP || results_exit 255 "Could not execute $NMAP" grep "Host seems down" $TMP if [ "$?" -eq 0 ]; then results_exit 2 "CRITICAL" fi grep "Host is up" $TMP if [ "$?" -eq 0 ]; then results_exit 0 "Ok" fi results_exit 3 "Unknown"
现在,我们希望替换掉nagios的ping操作,下面是清晰的简单的脚本:
#!/bin/sh NMAP="/usr/local/bin/nmap -sP" TMP=/var/tmp/nmap_ping.$$ CHECK="Nmap Ping" results_exit() { rm -f $TMP echo "$CHECK: ${2}" return $1 } $NMAP $1 > $TMP || results_exit 255 "Could not execute $NMAP" grep "Host seems down" $TMP [ $? -eq 0 ] && results_exit 2 "CRITICAL" grep "Host is up" $TMP [ $? -eq 0 ] && results_exit 0 "Ok" results_exit 3 "Unknown"
用这个脚本可以替换掉nagios的ping操作,下面是一个例子,把脚本放到相应的路径,然后再command.cfg里定义命令模式:
# 'check-host-alive' 这个是nagios本来的ping定义
define command{ command_name check-host-alive command_line $USER1$/check_ping -H $HOSTADDRESS$/ -w 3000.0,80% -c 5000.0,100% -p 5 }
现在用nmap的替换它:
# 'check0host-alive' command definition define command{ command_name check-host-alive command_line /usr/local/nagios/local/nmap_ping / $HOSTADDRESS$ } # 'check-host-alive' command definition #define command{ # command_name check-host-alive # command_line $USER1$/check_ping -H / # $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5 # }
当然这个还可以进一步提高。假设我们的脚步放在/usr/local/nagios/local 这个目录下,我们可以用一个宏$USERxx$来替换它,就像宏$USER$能替换/usr/local/nagios/etc/resource.cfg一样。对于xx现在的nagios版本可以支持到数字32.比如,这里我们用数字6来替换xx,可以这样做,在resource.cfg文件里加上下面两行:
# Set $USER6$ to our local script directory $USER6$=/usr/local/nagios/local
下一步修改command.cfg文件:
# 'check-host-alive' command definition define command{ command_name check-host-alive command_line $USER6$/nmap_ping $HOSTADDRESS$ }
nagios本身就是一个杰出的监控工具,但是有时事实超过了它所能控制的范围,有些地方我们可能会有所规避,所以选择nmap作为它网络监控的助手,会是不错的选择。