HPUX系统syslog误报问题(HPUX SFM cache refresh bug)

4 篇文章 0 订阅

environment:

 OS: HP-UX B.11.31 U ia64

symptoms:

 元旦做机房UPS放电测试时,因为厂商的疏忽导致机房跳电,我的一台MES数据的 standby db 跳电重启(HP RX2660小机)

之后观察/var/adm/syslog/syslog.log 每天上午的十点半都会报错(power supply faild)

syslog

Jan  3 10:09:19 sfcstb1 telnetd[27751]:  Time out occurred in the initial option negotiation

Jan  3 10:33:21 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286145 -a 

Jan  4 10:33:23 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286146 -a 

Jan  5 10:33:27 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286147 -a 

Jan  5 14:10:28 sfcstb1 su: + tb johnz-oracle

Jan  5 16:05:50 sfcstb1 su: - ta xiaofan-oracle

Jan  5 16:05:59 sfcstb1 su: + ta xiaofan-oracle

Jan  5 16:07:09 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan  5 16:27:35 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan  6 08:26:41 sfcstb1 su: + ta xiaofan-oracle

Jan  6 10:33:30 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286148 -a 

Jan  6 15:30:54 sfcstb1 su: - ta xiaofan-root

Jan  6 15:31:02 sfcstb1 su: + ta xiaofan-root

Jan  6 15:47:01 sfcstb1 su: + tc xiaofan-root

Jan  6 15:47:44 sfcstb1 su: + tc xiaofan-root

Jan  7 10:33:33 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286149 -a 

Jan  8 10:33:36 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286150 -a 

Jan  9 08:32:25 sfcstb1 su: + ta xiaofan-oracle

Jan  9 08:33:41 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan  9 10:33:39 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286151 -a 

Jan 10 08:53:22 sfcstb1 su: + ta xiaofan-oracle

Jan 10 08:53:58 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 10 08:58:46 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 10 08:59:00 sfcstb1  above message repeats 2 times

Jan 10 10:33:42 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286152 -a 

 

sfcstb1:/tmp#  /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286152 -a 

 

CURRENT MONITOR DATA:

 

Event Time..........: Tue Jan 10 10:33:41 2012

Severity............: CRITICAL

Monitor.............: ia64_corehw

Event #.............: 103001

System..............: sfcstb1

 

Summary:

 

     Power Supply : Failure is detected.

 

Description of Error:

 

     The system has detected that one of the power supplies has failed. 

 

Probable Cause / Recommended Action:

 

     The power supply has failed. Contact your HP support representative to

     check the power supply.

    

          For information on the sensor that generated this event, refer to

          FRU ID in Event Details section.

 

Additional Event Data: 

     System IP Address...: 172.16.51.151

     Event Id............: 103001820120110103336

     Monitor Version.....: C.04.00.05

     Event Class.........: System

     Client Configuration File............:

     /var/stm/config/tools/monitor/default_ia64_corehw.clcfg

     Client Configuration File Version....: A.01.00

          Qualification criteria met.

               Number of events: 1

     Associated OS error log entry id(s)

          None

     Additional System Data:

          System Model Number.............: ia64 hp server rx2660

          EMS Version.....................: A.04.20

          STM Version.....................: NA

          System Serial Number............: SGH4843041

     Latest information on this event:

          http://docs.hp.com/hpux/content/hardware/ems/ia64_corehw.htm#E103001

 

 

v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S     v-v-v-v-v-v-v-v-v-v-v-v-v  

 

 

Event Details :

 

     Event Date  ...................: Mon Jan  2 17:02:49 2012

     Sensor Number .................: 0x41

     Sensor Type ...................: Power Supply

     Sensor Class ..................: Sensor specific

     Sensor Reading/Offset .........: 0x1 (Sensor Reading)

     Event  Type ...................: Assertion

     Entity ID .....................: 0xa

     Generic Message ...............:

       Power Supply Failure detected

     Entity FRU Id Info ............: (Sensor ID 0())

 

Error Details:

 

     Additional information on this event can be obtained from evweb

     logviewer (Refer SFM User Guide) with the following log id: 271804

 -------------------------------------------------------------------------------------------------------------------------------------------

实际上去机房实地查看,或通过com口连接到MP管理端口查看  power supply都是正常的 并无问题

-------------------------------------------------------------------------------------------------

Power supplies                State                        

-----------------------------------------------------------

Power Supply 1                Normal                         

Power Supply 2                Normal                         

 

Fans                State               Fans                State             

-------------------------------------------------------------------------------

Fan  1 (Mem)        Normal              Fan  7 (CPU)        Normal            

Fan  2 (Mem)        Normal              Fan  8 (CPU)        Normal            

Fan  3 (Mem)        Normal              Fan  9 (I/O)        Normal            

Fan  4 (Mem)        Normal              Fan 10 (I/O)        Normal            

Fan  5 (CPU)        Normal              Fan 11 (I/O)        Normal            

Fan  6 (CPU)        Normal              Fan 12 (I/O)        Normal     

 

所以很困惑,打800 联系HP技术支持 HP技术支持一开始给出的方案是让我查看电源线路是否有问题

UPS供电是否有异常? 和SA交流UPS供电没有问题,于是更换了电源的插座和电源线,但是周期性误报依然存在

不过时间变为我更换电源时的时间 14:33 。

 

------------------------------------------------------------更换电源插座时的报错--------------------------------------------------------------------------------

Jan 10 14:30:37 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286155 -a 

Jan 10 14:33:43 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286156 -a 

-------------------------------------------------------------------------------------------------------------------------------------------

Jan 11 08:23:52 sfcstb1 su: + ta xiaofan-oracle

Jan 11 14:33:46 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286157 -a 

Jan 12 08:45:56 sfcstb1 su: + ta xiaofan-oracle

Jan 12 08:47:00 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 12 08:54:31 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 12 08:55:07 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 12 08:59:00 sfcstb1  above message repeats 3 times

Jan 12 13:56:58 sfcstb1 su: + ta xiaofan-oracle

Jan 12 14:33:49 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286158 -a 

 

--------------------------------------------------------------------------------------------------------------------------------------------------

MP 查看 power supply normal

Power supplies                State                        

-----------------------------------------------------------

Power Supply 1                Normal                         

Power Supply 2                Normal                         

 

Fans                State               Fans                State             

-------------------------------------------------------------------------------

Fan  1 (Mem)        Normal              Fan  7 (CPU)        Normal            

Fan  2 (Mem)        Normal              Fan  8 (CPU)        Normal            

Fan  3 (Mem)        Normal              Fan  9 (I/O)        Normal            

Fan  4 (Mem)        Normal              Fan 10 (I/O)        Normal            

Fan  5 (CPU)        Normal              Fan 11 (I/O)        Normal            

Fan  6 (CPU)        Normal              Fan 12 (I/O)        Normal    

 

继续联系HP技术支持,这次技术支持给出的解释是HPUX SFM(system fault management)cache 记录下power supply fail 但是没有被刷新

之后每天都会在相同时候在syslog.log报出。给出的solution为手动刷新SFM cache  or 升级SFM版本

solution:

1.手动刷新cache

Disable SFM provider:


#cimprovider -d -m SFMProviderModule 

Remove the /var/opt/sfm/data/reminderEvent.dat,/var/opt/sfm/data/MemoryErrorCache.dat file.

Enable the SFM provider module:


#cimprovider -e -m SFMProviderModule 

2.升级SFM版本

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

潇湘秦

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值