Nagios自己编写监控磁盘脚本check_disk

不知不觉已经实习了一个月了,实习期间做的主要工作就是搭建Nagios+Centreon监控平台了,自己动手还是比较快的,搭这个东西虽然bug一堆,但还算顺利,后来就开始自行编写监控磁盘的脚本了。
先说一下为什么要自己编写监控磁盘的脚本,其实,我自己也不是太清楚,因为Nagios-plugins里面是有check_disk的脚本的,可能我的导师是想锻炼一下我,同时也为了有一个更符合自己实际情况的脚本。
面对的硬件有:三台服务器搭建测试云平台,两台服务器上有RAID卡,两台服务器上有SSD,还有HDD若干。对的,只有这么点,但对于我这个小菜鸟,也够我折腾了。


对于有RAID卡的主机,MegaCli就是个不错的选择了,自行下载安装MegaCli,然后就动手了:

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL  ---查raid
/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL    ---查raid卡信息
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL           ---查看硬盘信息

自己弄着弄着玩一下,观察一下显示的东西,显示出来的东西有很大一片的,随便看看。如果该主机本身没有RAID卡,那你在它上面使用MegaCli的话,显示的就只有 Exit Code: 0x00
主要用的是第三条命令/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
然后抓取我要的信息/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep -E 'Device Id|Error|Media Type'
Device Id — 监控SSD寿命的时候用到,就是一个Id而已
Error — Error Count 就是我们要观察的错误信息了,为0就是木有错误,不为0就要担心了
Media Type — 硬盘类型,主要是我要找主机面的SSD对应的是哪个Device Id,因为除了这样,我也不知道Device Id跟硬盘或者跟分区有什么对应关系,贴一下我显示的结果:

[root@cloud-13 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL  | grep -E 'Device Id|Error|Media Type'
Device Id: 0
Media Error Count: 0
Other Error Count: 0
Media Type: Hard Disk Device
Device Id: 1
Media Error Count: 0
Other Error Count: 0
Media Type: Hard Disk Device
Device Id: 2
Media Error Count: 0
Other Error Count: 0
Media Type: Hard Disk Device
Device Id: 3
Media Error Count: 0
Other Error Count: 0
Media Type: Hard Disk Device
Device Id: 4
Media Error Count: 0
Other Error Count: 0
Media Type: Solid State Device

这样,自行写代码观察Error Count后面的数值就行了,就达到监控的效果了。
刚刚有提到SSD寿命的问题,在这一并说了吧,使用smartctl可以检测SSD的寿命,当然还有很多其它结果,SSD寿命只是其中一部分,但是对于有RAID卡的主机,需要刚刚获取到的Device Id。

[root@cloud-13 ~]# smartctl -a -d megaraid,4 /dev/sdc1
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.123.2.openstack.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

/dev/sdc1 [megaraid_disk_04] [SAT]: Device open changed type from 'megaraid' to 'sat'
Smartctl open device: /dev/sdc1 [megaraid_disk_04] [SAT] failed: SATA device detected,
MegaRAID SAT layer is reportedly buggy, use '-d sat+megaraid,N' to try anyhow

我的主机上需要我加上sat,就听他话咯

[root@cloud-13 ~]# smartctl -a -d megaraid,4 /dev/sdc1
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.123.2.openstack.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

/dev/sdc1 [megaraid_disk_04] [SAT]: Device open changed type from 'megaraid' to 'sat'
Smartctl open device: /dev/sdc1 [megaraid_disk_04] [SAT] failed: SATA device detected,
MegaRAID SAT layer is reportedly buggy, use '-d sat+megaraid,N' to try anyhow
[root@cloud-13 ~]# smartctl -a -d sat+megaraid,4 /dev/sdc1
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.123.2.openstack.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     OCZ INTREPID 3600
Serial Number:    A21N8061423000004
LU WWN Device Id: 5 e83a97 100006dc5
Firmware Version: 1.4.6.0
User Capacity:    800,166,076,416 bytes [800 GB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ACS-2 (revision not indicated)
Local Time is:    Tue Aug 25 15:20:02 2015 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x1d) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   0) minutes.
Extended self-test routine
recommended polling time:        (   0) minutes.

SMART Attributes Data Structure revision number: 18
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       3964
 12 Power_Cycle_Count       0x0000   100   100
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值