clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg



11gR2 rac 节点二起不来,节点一正常运行,私有内联网络是通的,检查网络配置没有问题



Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

[grid@rac02 rac02]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager




[grid@rac02 rac02]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        OFFLINE OFFLINE                                                   
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                                                   
ora.crf
      1        ONLINE  ONLINE       rac02                                       
ora.crsd
      1        OFFLINE OFFLINE                                                   
ora.cssd
      1        ONLINE  OFFLINE                               STARTING            
ora.cssdmonitor
      1        ONLINE  ONLINE       rac02                                       
ora.ctssd
      1        ONLINE  OFFLINE                                                   
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.drivers.acfs
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        OFFLINE OFFLINE                                                   
ora.gipcd
      1        ONLINE  ONLINE       rac02                                       
ora.gpnpd
      1        ONLINE  ONLINE       rac02                                       
ora.mdnsd
      1        ONLINE  ONLINE       rac02  




[grid@rac02 rac02]$ ps -ef |grep d.bin
root      1615     1  0 10:47 ?        00:00:00 /u01/app/11.2.0/grid/bin/cssdmonitor
root      1640     1  0 10:47 ?        00:00:00 /u01/app/11.2.0/grid/bin/cssdagent
grid      1654     1  0 10:47 ?        00:00:01 /u01/app/11.2.0/grid/bin/ocssd.bin
root     29984     1  0 10:16 ?        00:00:03 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid     30116     1  0 10:16 ?        00:00:01 /u01/app/11.2.0/grid/bin/oraagent.bin
grid     30130     1  0 10:16 ?        00:00:00 /u01/app/11.2.0/grid/bin/mdnsd.bin
grid     30142     1  0 10:16 ?        00:00:01 /u01/app/11.2.0/grid/bin/gpnpd.bin
root     30154     1  0 10:16 ?        00:00:00 /u01/app/11.2.0/grid/bin/orarootagent.bin
grid     30156     1  0 10:16 ?        00:00:03 /u01/app/11.2.0/grid/bin/gipcd.bin
root     30176     1  0 10:16 ?        00:00:13 /u01/app/11.2.0/grid/bin/osysmond.bin
root     30266     1  0 10:16 ?        00:00:01 /u01/app/11.2.0/grid/bin/ologgerd -m rac01 -r -d /u01/app/11.2.0/grid/crf/db/rac02



alert 日志:

[grid@rac02 rac02]$ tail -50 alertrac02.log
2014-09-01 10:27:01.705
[cssd(30378)]CRS-1605:CSSD 表决文件联机: ORCL:ASMDISK1; 详细资料见 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log。
2014-09-01 10:36:53.571
[/u01/app/11.2.0/grid/bin/cssdagent(30364)]CRS-5818:已中止命令 'start' (对于资源 'ora.cssd')。详细资料见 (:CRSAGF00113:) {0:0:2} (位于 /u01/app/11.2.0/grid/log/rac02/agent/ohasd/oracssdagent_root/oracssdagent_root.log)。
2014-09-01 10:36:53.572
[cssd(30378)]CRS-1656:CSS 守护程序由于致命错误而正在终止; 详细资料见 (:CSSSC00012:) (位于 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log)
2014-09-01 10:36:53.572
[cssd(30378)]CRS-1603:用户已关闭节点 rac02 上的 CSSD。
2014-09-01 10:36:59.095
[ohasd(29984)]CRS-2765:资源 'ora.cssdmonitor' 已失败 (在服务器 'rac02' 上)。
2014-09-01 10:37:02.378
[ohasd(29984)]CRS-2767:没有尝试对 'ora.diskmon' 进行资源状态恢复, 因为其目标状态为 OFFLINE
2014-09-01 10:37:10.738
[cssd(1489)]CRS-1713:CSSD 守护程序已在 clustered 模式下启动
2014-09-01 10:37:16.517
[cssd(1489)]CRS-1707:节点 rac02 (编号为 2) 的租约获取已完成
2014-09-01 10:37:17.813
[cssd(1489)]CRS-1605:CSSD 表决文件联机: ORCL:ASMDISK1; 详细资料见 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log。
2014-09-01 10:47:09.791
[/u01/app/11.2.0/grid/bin/cssdagent(1475)]CRS-5818:已中止命令 'start' (对于资源 'ora.cssd')。详细资料见 (:CRSAGF00113:) {0:0:2} (位于 /u01/app/11.2.0/grid/log/rac02/agent/ohasd/oracssdagent_root/oracssdagent_root.log)。
2014-09-01 10:47:09.792
[cssd(1489)]CRS-1656:CSS 守护程序由于致命错误而正在终止; 详细资料见 (:CSSSC00012:) (位于 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log)
2014-09-01 10:47:09.792
[cssd(1489)]CRS-1603:用户已关闭节点 rac02 上的 CSSD。
2014-09-01 10:47:15.216
[ohasd(29984)]CRS-2765:资源 'ora.cssdmonitor' 已失败 (在服务器 'rac02' 上)。
2014-09-01 10:47:18.491
[ohasd(29984)]CRS-2767:没有尝试对 'ora.diskmon' 进行资源状态恢复, 因为其目标状态为 OFFLINE
2014-09-01 10:47:26.857
[cssd(1654)]CRS-1713:CSSD 守护程序已在 clustered 模式下启动
2014-09-01 10:47:32.667
[cssd(1654)]CRS-1707:节点 rac02 (编号为 2) 的租约获取已完成
2014-09-01 10:47:33.976
[cssd(1654)]CRS-1605:CSSD 表决文件联机: ORCL:ASMDISK1; 详细资料见 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log。
2014-09-01 10:57:25.901
[/u01/app/11.2.0/grid/bin/cssdagent(1640)]CRS-5818:已中止命令 'start' (对于资源 'ora.cssd')。详细资料见 (:CRSAGF00113:) {0:0:2} (位于 /u01/app/11.2.0/grid/log/rac02/agent/ohasd/oracssdagent_root/oracssdagent_root.log)。
2014-09-01 10:57:25.902
[cssd(1654)]CRS-1656:CSS 守护程序由于致命错误而正在终止; 详细资料见 (:CSSSC00012:) (位于 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log)
2014-09-01 10:57:25.902
[cssd(1654)]CRS-1603:用户已关闭节点 rac02 上的 CSSD。
2014-09-01 10:57:31.327
[ohasd(29984)]CRS-2765:资源 'ora.cssdmonitor' 已失败 (在服务器 'rac02' 上)。
2014-09-01 10:57:34.605
[ohasd(29984)]CRS-2767:没有尝试对 'ora.diskmon' 进行资源状态恢复, 因为其目标状态为 OFFLINE
2014-09-01 10:57:42.969
[cssd(1776)]CRS-1713:CSSD 守护程序已在 clustered 模式下启动
2014-09-01 10:57:48.832
[cssd(1776)]CRS-1707:节点 rac02 (编号为 2) 的租约获取已完成
2014-09-01 10:57:50.141
[cssd(1776)]CRS-1605:CSSD 表决文件联机: ORCL:ASMDISK1; 详细资料见 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log。



ocss 日志

[grid@rac02 rac02]$ tail -50 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log
2014-09-01 11:02:35.924: [    CSSD][1087342912]clssscSelect: cookie accept request 0x1c2940c0
2014-09-01 11:02:35.924: [    CSSD][1087342912]clssscevtypSHRCON: getting client with cmproc 0x1c2940c0
2014-09-01 11:02:35.924: [    CSSD][1087342912]clssgmRegisterClient: proc(4/0x1c2940c0), client(1/0x1c20d0c0)
2014-09-01 11:02:35.924: [    CSSD][1087342912]clssgmJoinGrock: global grock CRF- new client 0x1c20d0c0 with con 0x3457, requested num -1, flags 0x4000e00
2014-09-01 11:02:35.924: [    CSSD][1087342912]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group information has been received from the master; group name CRF-, member number -1, flags 0x4000e00
2014-09-01 11:02:35.924: [    CSSD][1087342912]clssgmDiscEndpcl: gipcDestroy 0x3457
2014-09-01 11:02:35.925: [    CSSD][1087342912]clssgmDeadProc: proc 0x1c2940c0
2014-09-01 11:02:35.925: [    CSSD][1087342912]clssgmDestroyProc: cleaning up proc(0x1c2940c0) con(0x3428) skgpid  ospid 30176 with 0 clients, refcount 0
2014-09-01 11:02:35.925: [    CSSD][1087342912]clssgmDiscEndpcl: gipcDestroy 0x3428
2014-09-01 11:02:36.765: [    CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:36.888: [    CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431890, LATS 258023324, lastSeqNo 35431889, uniqueness 1408622107, timestamp 1409540520/917853354
2014-09-01 11:02:37.767: [    CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:37.891: [    CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431891, LATS 258024324, lastSeqNo 35431890, uniqueness 1408622107, timestamp 1409540521/917854364
2014-09-01 11:02:38.252: [    CSSD][1109973312]clssnmRcfgMgrThread: Local Join
2014-09-01 11:02:38.252: [    CSSD][1109973312]clssnmLocalJoinEvent: begin on node(2), waittime 193000
2014-09-01 11:02:38.252: [    CSSD][1109973312]clssnmLocalJoinEvent: set curtime (258024684) for my node
2014-09-01 11:02:38.252: [    CSSD][1109973312]clssnmLocalJoinEvent: scanning 32 nodes
2014-09-01 11:02:38.252: [    CSSD][1109973312]clssnmLocalJoinEvent: Node rac01, number 1, is in an existing cluster with disk state 3
2014-09-01 11:02:38.253: [    CSSD][1109973312]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2014-09-01 11:02:38.253: [    CSSD][1108396352]clssnmSendingThread: sending join msg to all nodes
2014-09-01 11:02:38.253: [    CSSD][1108396352]clssnmSendingThread: sent 4 join msgs to all nodes
2014-09-01 11:02:38.769: [    CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:38.894: [    CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431892, LATS 258025324, lastSeqNo 35431891, uniqueness 1408622107, timestamp 1409540522/917855364
2014-09-01 11:02:39.771: [    CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:39.897: [    CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431893, LATS 258026334, lastSeqNo 35431892, uniqueness 1408622107, timestamp 1409540523/917856364
2014-09-01 11:02:40.773: [    CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:40.899: [    CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431894, LATS 258027334, lastSeqNo 35431893, uniqueness 1408622107, timestamp 1409540524/917857364
2014-09-01 11:02:40.932: [    CSSD][1087342912]clssscSelect: cookie accept request 0x1bebc480
2014-09-01 11:02:40.932: [    CSSD][1087342912]clssgmAllocProc: (0x1c2940c0) allocated
2014-09-01 11:02:40.932: [    CSSD][1087342912]clssgmClientConnectMsg: properties of cmProc 0x1c2940c0 - 1,2,3,4,5
2014-09-01 11:02:40.932: [    CSSD][1087342912]clssgmClientConnectMsg: Connect from con(0x34fe) proc(0x1c2940c0) pid(30176) version 11:2:1:4, properties: 1,2,3,4,5
2014-09-01 11:02:40.932: [    CSSD][1087342912]clssgmClientConnectMsg: msg flags 0x0000
2014-09-01 11:02:40.934: [    CSSD][1087342912]clssscSelect: cookie accept request 0x1c2940c0
2014-09-01 11:02:40.934: [    CSSD][1087342912]clssscevtypSHRCON: getting client with cmproc 0x1c2940c0
2014-09-01 11:02:40.934: [    CSSD][1087342912]clssgmRegisterClient: proc(4/0x1c2940c0), client(1/0x1c20d0c0)
2014-09-01 11:02:40.934: [    CSSD][1087342912]clssgmJoinGrock: global grock CRF- new client 0x1c20d0c0 with con 0x352d, requested num -1, flags 0x4000e00
2014-09-01 11:02:40.934: [    CSSD][1087342912]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group information has been received from the master; group name CRF-, member number -1, flags 0x4000e00
2014-09-01 11:02:40.934: [    CSSD][1087342912]clssgmDiscEndpcl: gipcDestroy 0x352d
2014-09-01 11:02:40.935: [    CSSD][1087342912]clssgmDeadProc: proc 0x1c2940c0
2014-09-01 11:02:40.935: [    CSSD][1087342912]clssgmDestroyProc: cleaning up proc(0x1c2940c0) con(0x34fe) skgpid  ospid 30176 with 0 clients, refcount 0


参考:

http://www.itpub.net/thread-1766984-2-1.html

http://www.itpub.net/thread-1768708-1-1.html

http://www.killdb.com/2014/08/12/haip%e5%bc%82%e5%b8%b8%ef%bc%8c%e5%af%bc%e8%87%b4rac%e8%8a%82%e7%82%b9%e6%97%a0%e6%b3%95%e5%90%af%e5%8a%a8%e7%9a%84%e8%a7%a3%e5%86%b3%e6%96%b9%e6%a1%88.html


重新拔插心跳线,重启crs,解决,应该是oracle的bug,需要打补丁彻底解决,待后续研究




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值