CRS-1610 - 90% heartbeat fatal, eviction in 0.102 seconds

Linux AS 5.3 64 bit
Oracle 10.2.0.4   2 nodes  
GFS file system  


Node2  reboot  abnormally .   








node2 Linux Log  :   


Feb  4 16:14:46   --- reboot
Feb  4 16:18:57   --- ok  



Feb  4 16:14:14 hou249bbodb3112 snmpd[5979]: Received SNMP packet(s) from UDP: [127.0.0.1]:38732
Feb  4 16:14:29 hou249bbodb3112 snmpd[5979]: Connection from UDP: [127.0.0.1]:51532
Feb  4 16:14:29 hou249bbodb3112 snmpd[5979]: Received SNMP packet(s) from UDP: [127.0.0.1]:51532
Feb  4 16:14:30 hou249bbodb3112 snmpd[5979]: Connection from UDP: [127.0.0.1]:51532
Feb  4 16:14:46 hou249bbodb3112 snmpd[5979]: Connection from UDP: [127.0.0.1]:34969
Feb  4 16:14:46 hou249bbodb3112 snmpd[5979]: Received SNMP packet(s) from UDP: [127.0.0.1]:34969
Feb  4 16:14:46 hou249bbodb3112 snmpd[5979]: Connection from UDP: [10.13.8.110]:1048
Feb  4 16:14:46 hou249bbodb3112 snmpd[5979]: Received SNMP packet(s) from UDP: [10.13.8.110]:1048
Feb  4 16:18:57 hou249bbodb3112 syslogd 1.4.1: restart.
Feb  4 16:18:57 hou249bbodb3112 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Feb  4 16:18:57 hou249bbodb3112 kernel: Bootdata ok (command line is ro root=/dev/VolGroup00/LogVol00 rhgb quiet)
Feb  4 16:18:57 hou249bbodb3112 kernel: Linux version 2.6.18-128.1.16.el5xen ( mockbuild@hs20-bc1-2.build.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Fri Jun 26 11:10:46 EDT 2009
Feb  4 16:18:57 hou249bbodb3112 kernel: BIOS-provided physical RAM map:
Feb  4 16:18:57 hou249bbodb3112 kernel:  Xen: 0000000000000000 - 00000003d7724000 (usable)
Feb  4 16:18:57 hou249bbodb3112 kernel: DMI 2.5 present.
Feb  4 16:18:57 hou249bbodb3112 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Feb  4 16:18:57 hou249bbodb3112 kernel: ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled)
Feb  4 16:18:57 hou249bbodb3112 kernel: ACPI: LAPIC (acpi_id[0x10] lapic_id[0x10] enabled)
Feb  4 16:18:57 hou249bbodb3112 kernel: ACPI: LAPIC (acpi_id[0x18] lapic_id[0x18] enabled)
Feb  4 16:18:57 hou249bbodb3112 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Feb  4 16:18:57 hou249bbodb3112 kernel: ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled)
Feb  4 16:18:57 hou249bbodb3112 kernel: ACPI: LAPIC (acpi_id[0x11] lapic_id[0x11] enabled)











Node2 CRS Log :


2010-01-25 13:12:27.741
[crsd(10609)]CRS-1012:The OCR service started on node hou249bbodb3111.
2010-01-25 13:12:28.464
[evmd(10607)]CRS-1401:EVMD started on node hou249bbodb3111.
2010-01-25 13:12:29.902
[crsd(10609)]CRS-1201:CRSD started on node hou249bbodb3111.
2010-01-25 14:10:32.055
[cssd(11203)]CRS-1612:node hou249bbodb3112 (2) at 50% heartbeat fatal, eviction in 14.078 seconds
2010-01-25 14:10:33.051
[cssd(11203)]CRS-1612:node hou249bbodb3112 (2) at 50% heartbeat fatal, eviction in 13.088 seconds
2010-01-25 14:15:01.135
[cssd(10708)]CRS-1605:CSSD voting file is online: /dev/sdc. Details in /u01/app/oracle/product/crs/log/hou249bbodb3111/cssd/ocssd.log.
2010-01-25 14:15:01.137
[cssd(10708)]CRS-1605:CSSD voting file is online: /dev/sdd. Details in /u01/app/oracle/product/crs/log/hou249bbodb3111/cssd/ocssd.log.
2010-01-25 14:15:01.169
[cssd(10708)]CRS-1605:CSSD voting file is online: /dev/sdg. Details in /u01/app/oracle/product/crs/log/hou249bbodb3111/cssd/ocssd.log.
[cssd(10708)]CRS-1601:CSSD Reconfiguration complete. Active nodes are hou249bbodb3111 hou249bbodb3112 .
2010-01-25 14:15:07.842
[crsd(10106)]CRS-1005:The OCR upgrade was completed. Version has changed from 185599488 to 185599488. Details in /u01/app/oracle/product/crs/log/hou249bbodb3111/crsd/crsd.log.
2010-01-25 14:15:07.843
[crsd(10106)]CRS-1012:The OCR service started on node hou249bbodb3111.
2010-01-25 14:15:08.430
[evmd(10057)]CRS-1401:EVMD started on node hou249bbodb3111.
2010-01-25 14:15:12.687
[crsd(10106)]CRS-1201:CRSD started on node hou249bbodb3111.
2010-02-04 16:15:11.137
[cssd(10708)]CRS-1612:node hou249bbodb3112 (2) at 50% heartbeat fatal, eviction in 14.102 seconds
2010-02-04 16:15:12.234
[cssd(10708)]CRS-1612:node hou249bbodb3112 (2) at 50% heartbeat fatal, eviction in 13.102 seconds
2010-02-04 16:15:19.129
[cssd(10708)]CRS-1611:node hou249bbodb3112 (2) at 75% heartbeat fatal, eviction in 6.102 seconds
2010-02-04 16:15:23.129
[cssd(10708)]CRS-1610:node hou249bbodb3112 (2) at 90% heartbeat fatal, eviction in 2.102 seconds
2010-02-04 16:15:24.125
[cssd(10708)]CRS-1610:node hou249bbodb3112 (2) at 90% heartbeat fatal, eviction in 1.112 seconds
2010-02-04 16:15:25.129
[cssd(10708)]CRS-1610:node hou249bbodb3112 (2) at 90% heartbeat fatal, eviction in 0.102 seconds
2010-02-04 16:15:26.006
[cssd(10708)]CRS-1607:CSSD evicting node hou249bbodb3112. Details in /u01/app/oracle/product/crs/log/hou249bbodb3111/cssd/ocssd.log.
[cssd(10708)]CRS-1601:CSSD Reconfiguration complete. Active nodes are hou249bbodb3111 .
2010-02-04 16:15:30.531
[crsd(10106)]CRS-1204:Recovering CRS resources for node hou249bbodb3112.
[cssd(10708)]CRS-1601:CSSD Reconfiguration complete. Active nodes are hou249bbodb3111 hou249bbodb3112 .  

















node1 crsd log  :



hou249bbodb3111$vi crsd.log



2010-01-25 14:16:35.867: [  CRSRES][1504274752] startRunnable: setting CLI values
2010-01-25 14:16:36.108: [  CRSRES][1504274752] Attempting to start `ora.hou249bbodb3111.gsd` on member `hou249bbodb3111`
2010-01-25 14:16:36.473: [  CRSRES][1537845568] Attempting to start `ora.wmb2bprd.db` on member `hou249bbodb3112`
2010-01-25 14:16:37.146: [  CRSRES][1504274752] Start of `ora.hou249bbodb3111.gsd` on member `hou249bbodb3111` succeeded.
2010-01-25 14:16:37.420: [  CRSRES][1537845568] Start of `ora.wmb2bprd.db` on member `hou249bbodb3112` succeeded.
2010-02-04 16:15:26.098: [ CRSCOMM][1537845568] CLEANUP: Searching for connections to failed node hou249bbodb3112
2010-02-04 16:15:26.098: [  CRSEVT][1537845568] Processing member leave for hou249bbodb3112, incarnation: 145375564
2010-02-04 16:15:26.099: [    CRSD][1537845568] SM: recovery in process: 8
2010-02-04 16:15:26.099: [  CRSEVT][1537845568] Do failover for: hou249bbodb3112
2010-02-04 16:15:26.857: [  CRSRES][1537845568]  startup = 0
2010-02-04 16:15:26.881: [  CRSRES][1537845568]  startup = 0
2010-02-04 16:15:26.896: [  CRSRES][1537845568]  startup = 0
2010-02-04 16:15:26.914: [  CRSRES][1537845568]  startup = 0
2010-02-04 16:15:26.926: [  CRSRES][1537845568]  startup = 0
2010-02-04 16:15:26.946: [  CRSRES][1537845568]  startup = 0
2010-02-04 16:15:27.029: [  CRSRES][1087633728] startRunnable: setting CLI values
2010-02-04 16:15:27.045: [  CRSRES][1087633728] Attempting to start `ora.hou249bbodb3112.vip` on member `hou249bbodb3111`
2010-02-04 16:15:27.071: [  CRSRES][1504274752] startRunnable: setting CLI values
2010-02-04 16:15:27.123: [  CRSRES][1504274752] Attempting to start `ora.wmb2bprd.db` on member `hou249bbodb3111`
2010-02-04 16:15:27.276: [  CRSRES][1504274752] Start of `ora.wmb2bprd.db` on member `hou249bbodb3111` succeeded.
2010-02-04 16:15:30.518: [  CRSRES][1087633728] Start of `ora.hou249bbodb3112.vip` on member `hou249bbodb3111` succeeded.
2010-02-04 16:15:30.531: [  CRSEVT][1537845568] Post recovery done evmd event for: hou249bbodb3112
2010-02-04 16:15:30.532: [    CRSD][1537845568] SM: recoveryDone: 0
2010-02-04 16:15:30.537: [  CRSEVT][1537845568] Processing RecoveryDone
2010-02-04 16:19:52.049: [  OCRUTL][1283971392]u_freem: mem passed is null
2010-02-04 16:19:54.405: [    CRSD][1094658368] SM: rE2Ec: 4
2010-02-04 16:19:54.406: [  CRSRES][1537845568] StopResource: setting CLI values
2010-02-04 16:19:54.869: [    CRSD][1537845568] SM:dE2Ec: all E2E cmds done. 0
"crsd.log" 8463L, 600108C                                                                     














Node1 Linux Log :  






Feb  4 16:14:46 hou249bbodb3111 snmpd[5985]: Connection from UDP: [10.13.8.110]:1048
Feb  4 16:14:46 hou249bbodb3111 snmpd[5985]: Received SNMP packet(s) from UDP: [10.13.8.110]:1048
Feb  4 16:14:57 hou249bbodb3111 kernel: qla2xxx 0000:0d:00.1: LIP reset occured (f7f7).
Feb  4 16:14:57 hou249bbodb3111 kernel: qla2xxx 0000:0d:00.1: LIP occured (f7f7).
Feb  4 16:14:57 hou249bbodb3111 kernel: qla2xxx 0000:0d:00.0: LIP reset occured (f7f7).
Feb  4 16:14:57 hou249bbodb3111 kernel: qla2xxx 0000:0d:00.0: LIP occured (f7f7).
Feb  4 16:15:06 hou249bbodb3111 openais[5501]: [TOTEM] The token was lost in the OPERATIONAL state.
Feb  4 16:15:06 hou249bbodb3111 openais[5501]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Feb  4 16:15:06 hou249bbodb3111 openais[5501]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
Feb  4 16:15:06 hou249bbodb3111 openais[5501]: [TOTEM] entering GATHER state from 2.
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] entering GATHER state from 0.
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] Creating commit token because I am the rep.
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] Saving state aru 16aa35 high seq received 16aa35
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] Storing new sequence id for ring ac
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] entering COMMIT state.
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] entering RECOVERY state.
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] position [0] member 172.16.223.111:
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] previous ring seq 168 rep 172.16.223.111
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] aru 16aa35 high delivered 16aa35 received flag 1
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] Did not need to originate any messages in recovery.
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [TOTEM] Sending initial ORF token
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [CLM  ] CLM CONFIGURATION CHANGE
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [CLM  ] New Configuration:
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [CLM  ]  r(0) ip(172.16.223.111)
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [CLM  ] Members Left:
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [CLM  ]  r(0) ip(172.16.223.112)
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [CLM  ] Members Joined:
Feb  4 16:15:11 hou249bbodb3111 openais[5501]: [CLM  ] CLM CONFIGURATION CHANGE
Feb  4 16:15:11 hou249bbodb3111 kernel: dlm: closing connection to node 2
Feb  4 16:15:12 hou249bbodb3111 openais[5501]: [CLM  ] New Configuration:
Feb  4 16:15:12 hou249bbodb3111 openais[5501]: [CLM  ]  r(0) ip(172.16.223.111)
Feb  4 16:15:13 hou249bbodb3111 openais[5501]: [CLM  ] Members Left:
Feb  4 16:15:13 hou249bbodb3111 fenced[5520]: hou249bbodb3112priv not a cluster member after 1 sec post_fail_delay
Feb  4 16:15:14 hou249bbodb3111 fenced[5520]: fencing node "hou249bbodb3112priv"
Feb  4 16:15:14 hou249bbodb3111 openais[5501]: [CLM  ] Members Joined:
Feb  4 16:15:15 hou249bbodb3111 openais[5501]: [SYNC ] This node is within the primary component and will provide service.
Feb  4 16:15:15 hou249bbodb3111 openais[5501]: [TOTEM] entering OPERATIONAL state.
Feb  4 16:15:15 hou249bbodb3111 openais[5501]: [CLM  ] got nodejoin message 172.16.223.111
Feb  4 16:15:15 hou249bbodb3111 openais[5501]: [CPG  ] got joinlist message from node 1
Feb  4 16:15:28 hou249bbodb3111 fenced[5520]: fence "hou249bbodb3112priv" success
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata.1: jid=0: Trying to acquire journal lock...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata1.1: jid=0: Trying to acquire journal lock...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata2.1: jid=0: Trying to acquire journal lock...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata1.1: jid=0: Looking at journal...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata.1: jid=0: Looking at journal...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata2.1: jid=0: Looking at journal...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata1.1: jid=0: Acquiring the transaction lock...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata.1: jid=0: Acquiring the transaction lock...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata2.1: jid=0: Acquiring the transaction lock...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata1.1: jid=0: Replaying journal...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata1.1: jid=0: Replayed 0 of 2 blocks
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata1.1: jid=0: replays = 0, skips = 0, sames = 2
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata2.1: jid=0: Replaying journal...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata1.1: jid=0: Journal replayed in 1s
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata1.1: jid=0: Done
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata.1: jid=0: Replaying journal...
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata.1: jid=0: Replayed 0 of 1 blocks
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata.1: jid=0: replays = 0, skips = 1, sames = 0
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata2.1: jid=0: Replayed 0 of 38 blocks
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata2.1: jid=0: replays = 0, skips = 12, sames = 26
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata.1: jid=0: Journal replayed in 1s
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata.1: jid=0: Done
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata2.1: jid=0: Journal replayed in 1s
Feb  4 16:15:28 hou249bbodb3111 kernel: GFS: fsid=b2bgfs_cluster:gfs-b2bdata2.1: jid=0: Done
Feb  4 16:15:30 hou249bbodb3111 avahi-daemon[8110]: Registering new address record for 10.18.223.117 on bond1.
Feb  4 16:15:30 hou249bbodb3111 avahi-daemon[8110]: Withdrawing address record for 10.18.223.117 on bond1.
Feb  4 16:15:30 hou249bbodb3111 avahi-daemon[8110]: Registering new address record for 10.18.223.117 on bond1.

[ 本帖最后由 tolywang 于 2010-2-5 13:59 编辑 ]

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/35489/viewspace-626928/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/35489/viewspace-626928/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值