在对NetApp的硬件环境做故障诊断的时候,经常用到的命令是environment status 或者在sp环境下的system sensors,本文将对这两个命令的输出结果怎么解读做一个详细的描述。
在Ontap下的environment status的输出其实也是调用SP的sensors输出内容。
SP sensor 是通过threshhold-based 的机制来判断是否有component有问题
System sensors 输出系统中sensor的信息
System sensors get sensor_name 可以得到更加详细的某个sensor的信息
一个sensor的reading值由下面的判断机制定义:
- LCR Lower Ciritcal
- LNC Lower Noncritical
- UNC Upper Noncritical
- UCR Upper Critical
有些sensors没有这四个值,对于这样的sensors,missing的数值显示为na
对于输出结果中的 System_Fw_status的解读,输出内容的形式是 0xAABB,根据这个AA和BB的组合来判断sensor的condition
AA 可以是下面的值:
01 System firmware error
02 System firmware hang
04 System firmware progress
BB 可以是下面的值:
00 System software has properly shut down
01 Memory initialization in progress
02 NVMEM initialization in progress (when NVMEM is present)
04 Restoring memory controller hub (MCH) values (when NVMEM is present)
05 User has entered Setup
13 Booting the operating system or LOADER
1F BIOS is starting up
20 LOADER is running
21 LOADER is programming the primary BIOS firmware. You must not power down the system.
22 LOADER is programming the alternate BIOS firmware. You must not power down the system.
2F Data ONTAP is running
60 SP has powered off the system
61 SP has powered on the system
62 SP has reset the system
63 SP watchdog power cycle
64 SP watchdog cold reset
举个例子:System_FW_Status sensor status 0x042F 含义是 "system firmware progress (04),
Data ONTAP is running (2F)."
System_watachdog 的解读
0x0080 The state of this sensor has not changed
0x0081 Timer interrupt
0x0180 Timer expired
0x0280 Hard reset
0x0480 Power down
0x0880 Power cycle
举例,System_Watchdog sensor status 0x0880 含义是 watchdog timeout occurs and
causes a system power cycle.
PSU1_Input_Type and PSU2_Input_Type 解读
DC电源的,这个数值没有意义,
0x01xx 220V PSU type
0x02xx 110V PSU type
NetApp的存储系统大部分是SP来监控的,但也有一些model型号是通过BMC来监控的,在BMC下也有相对应的命令,叫做sensor show,输出内容一样,如下图举例所示。
BMC 是 sensor show