HPUX系统syslog误报问题(HPUX SFM cache refresh bug)

4 篇文章 0 订阅


 OS: HP-UX B.11.31 U ia64


 元旦做机房UPS放电测试时,因为厂商的疏忽导致机房跳电,我的一台MES数据的 standby db 跳电重启(HP RX2660小机)

之后观察/var/adm/syslog/syslog.log 每天上午的十点半都会报错(power supply faild)


Jan  3 10:09:19 sfcstb1 telnetd[27751]:  Time out occurred in the initial option negotiation

Jan  3 10:33:21 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286145 -a 

Jan  4 10:33:23 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286146 -a 

Jan  5 10:33:27 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286147 -a 

Jan  5 14:10:28 sfcstb1 su: + tb johnz-oracle

Jan  5 16:05:50 sfcstb1 su: - ta xiaofan-oracle

Jan  5 16:05:59 sfcstb1 su: + ta xiaofan-oracle

Jan  5 16:07:09 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan  5 16:27:35 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan  6 08:26:41 sfcstb1 su: + ta xiaofan-oracle

Jan  6 10:33:30 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286148 -a 

Jan  6 15:30:54 sfcstb1 su: - ta xiaofan-root

Jan  6 15:31:02 sfcstb1 su: + ta xiaofan-root

Jan  6 15:47:01 sfcstb1 su: + tc xiaofan-root

Jan  6 15:47:44 sfcstb1 su: + tc xiaofan-root

Jan  7 10:33:33 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286149 -a 

Jan  8 10:33:36 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286150 -a 

Jan  9 08:32:25 sfcstb1 su: + ta xiaofan-oracle

Jan  9 08:33:41 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan  9 10:33:39 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286151 -a 

Jan 10 08:53:22 sfcstb1 su: + ta xiaofan-oracle

Jan 10 08:53:58 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 10 08:58:46 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 10 08:59:00 sfcstb1  above message repeats 2 times

Jan 10 10:33:42 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286152 -a 


sfcstb1:/tmp#  /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286152 -a 




Event Time..........: Tue Jan 10 10:33:41 2012

Severity............: CRITICAL

Monitor.............: ia64_corehw

Event #.............: 103001

System..............: sfcstb1




     Power Supply : Failure is detected.


Description of Error:


     The system has detected that one of the power supplies has failed. 


Probable Cause / Recommended Action:


     The power supply has failed. Contact your HP support representative to

     check the power supply.


          For information on the sensor that generated this event, refer to

          FRU ID in Event Details section.


Additional Event Data: 

     System IP Address...:

     Event Id............: 103001820120110103336

     Monitor Version.....: C.04.00.05

     Event Class.........: System

     Client Configuration File............:


     Client Configuration File Version....: A.01.00

          Qualification criteria met.

               Number of events: 1

     Associated OS error log entry id(s)


     Additional System Data:

          System Model Number.............: ia64 hp server rx2660

          EMS Version.....................: A.04.20

          STM Version.....................: NA

          System Serial Number............: SGH4843041

     Latest information on this event:




v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S     v-v-v-v-v-v-v-v-v-v-v-v-v  



Event Details :


     Event Date  ...................: Mon Jan  2 17:02:49 2012

     Sensor Number .................: 0x41

     Sensor Type ...................: Power Supply

     Sensor Class ..................: Sensor specific

     Sensor Reading/Offset .........: 0x1 (Sensor Reading)

     Event  Type ...................: Assertion

     Entity ID .....................: 0xa

     Generic Message ...............:

       Power Supply Failure detected

     Entity FRU Id Info ............: (Sensor ID 0())


Error Details:


     Additional information on this event can be obtained from evweb

     logviewer (Refer SFM User Guide) with the following log id: 271804


实际上去机房实地查看,或通过com口连接到MP管理端口查看  power supply都是正常的 并无问题


Power supplies                State                        


Power Supply 1                Normal                         

Power Supply 2                Normal                         


Fans                State               Fans                State             


Fan  1 (Mem)        Normal              Fan  7 (CPU)        Normal            

Fan  2 (Mem)        Normal              Fan  8 (CPU)        Normal            

Fan  3 (Mem)        Normal              Fan  9 (I/O)        Normal            

Fan  4 (Mem)        Normal              Fan 10 (I/O)        Normal            

Fan  5 (CPU)        Normal              Fan 11 (I/O)        Normal            

Fan  6 (CPU)        Normal              Fan 12 (I/O)        Normal     


所以很困惑,打800 联系HP技术支持 HP技术支持一开始给出的方案是让我查看电源线路是否有问题

UPS供电是否有异常? 和SA交流UPS供电没有问题,于是更换了电源的插座和电源线,但是周期性误报依然存在

不过时间变为我更换电源时的时间 14:33 。



Jan 10 14:30:37 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286155 -a 

Jan 10 14:33:43 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286156 -a 


Jan 11 08:23:52 sfcstb1 su: + ta xiaofan-oracle

Jan 11 14:33:46 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286157 -a 

Jan 12 08:45:56 sfcstb1 su: + ta xiaofan-oracle

Jan 12 08:47:00 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 12 08:54:31 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 12 08:55:07 sfcstb1 syslog: rm_log_init: fopen of file /etc/opt/resmon/log/client.log failed: Permission denied

Jan 12 08:59:00 sfcstb1  above message repeats 3 times

Jan 12 13:56:58 sfcstb1 su: + ta xiaofan-oracle

Jan 12 14:33:49 sfcstb1 EMS [5879]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw"     (Threshold:  >= " 3")    Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 385286155 -r /system/events/ia64_corehw/core_hw -n 385286158 -a 



MP 查看 power supply normal

Power supplies                State                        


Power Supply 1                Normal                         

Power Supply 2                Normal                         


Fans                State               Fans                State             


Fan  1 (Mem)        Normal              Fan  7 (CPU)        Normal            

Fan  2 (Mem)        Normal              Fan  8 (CPU)        Normal            

Fan  3 (Mem)        Normal              Fan  9 (I/O)        Normal            

Fan  4 (Mem)        Normal              Fan 10 (I/O)        Normal            

Fan  5 (CPU)        Normal              Fan 11 (I/O)        Normal            

Fan  6 (CPU)        Normal              Fan 12 (I/O)        Normal    


继续联系HP技术支持,这次技术支持给出的解释是HPUX SFM(system fault management)cache 记录下power supply fail 但是没有被刷新

之后每天都会在相同时候在syslog.log报出。给出的solution为手动刷新SFM cache  or 升级SFM版本



Disable SFM provider:

#cimprovider -d -m SFMProviderModule 

Remove the /var/opt/sfm/data/reminderEvent.dat,/var/opt/sfm/data/MemoryErrorCache.dat file.

Enable the SFM provider module:

#cimprovider -e -m SFMProviderModule 









当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则




¥1 ¥2 ¥4 ¥6 ¥10 ¥20



钱包余额 0


