环境:
操作系统:HP UNIX 11.31
数据库:oracle 11.2.0.3.6 rac
故障:
节点1总是发现应用自动断开,提示监听错误,数据库无法连接。
检查节点节点1日志:
/oracle/app/grid11.2.0/log/racdb1/alertracdb1.log
2013-09-30 10:27:56.609
[crsd(6294)]CRS-2765:Resource 'ora.net1.network' has failed on server 'racdb1'.
2013-09-30 10:33:17.086
[crsd(6294)]CRS-2765:Resource 'ora.net1.network' has failed on server 'racdb1'.
/oracle/app/grid11.2.0/log/racdb1/agent/crsd/orarootagent_root/orarootagent_root.log
2013-09-30 10:33:16.034: [ default][10854]ICMP Ping from 192.168.66.129 to 192.168.66.1
2013-09-30 10:33:17.069: [ora.net1.network][10854] {0:2:15033} [check] NetworkAgent::checkLink returned false
2013-09-30 10:33:17.070: [ AGFW][10] {0:2:15033} ora.net1.network racdb1 1 state changed from: ONLINE to: OFFLINE
2013-09-30 10:33:17.071: [ AGFW][10] {0:2:15033} Switching online monitor to offline one
2013-09-30 10:33:17.071: [ AGFW][10] {0:2:15033} Started implicit monitor for [ora.net1.network racdb1 1] interval=60000
delay=60000
2013-09-30 10:33:17.071: [ AGFW][10] {0:2:15037} Generating new Tint for unplanned state change. Original Tint: {0:2:15033}
2013-09-30 10:33:17.071: [ AGFW][10] {0:2:15037} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:1367803
2013-09-30 10:33:17.134: [ AGFW][10] {0:2:15037} Agent received the message: RESOURCE_START[ora.net1.network racdb1 1] ID
4098:116509
2013-09-30 10:33:17.134: [ AGFW][10] {0:2:15037} Preparing START command for: ora.net1.network racdb1 1
2013-09-30 10:33:17.134: [ AGFW][10] {0:2:15037} ora.net1.network racdb1 1 state changed from: OFFLINE to: STARTING
2013-09-30 10:33:17.140: [ora.net1.network][10855] {0:2:15037} [start] (:CLSN00107:) clsn_agent::start {
2013-09-30 10:33:17.140: [ora.net1.network][10855] {0:2:15037} [start] NetworkAgent::init enter {
2013-09-30 10:33:17.141: [ora.net1.network][10855] {0:2:15037} [start] Checking if lan900 Interface is fine
2013-09-30 10:33:17.211: [ AGFW][10] {0:2:15037} Agent received the message: RESOURCE_PROBE[ora.racdb1.vip 1 1] ID 4097:1
16510
2013-09-30 10:33:17.212: [ AGFW][10] {0:2:15037} Preparing CHECK command for: ora.racdb1.vip 1 1
2013-09-30 10:33:17.222: [ AGFW][10] {0:2:15037} Agent sending last reply for: RESOURCE_PROBE[ora.racdb1.vip 1 1] ID 4097
:116510
BUG:
HP-UX: GI ora.net1.network Goes Offline/Online Intermittently with "NetworkAgent::checkLink returned false" (文档 ID 1534994.1)
Cause
The issue was investigated in Bug 16039587, the cause is HP-UX bug, basically the contention of address memory range lock on kernel memory causes poll(2) timeout and affects orarootagent process.
Solution
Apply OS kernel patch PHKL_42850.
打完补丁后,系统恢复正常。应用不再出现中断显现,集群资源也不再offliine.
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/751371/viewspace-773639/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/751371/viewspace-773639/