clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg-CSDN博客

本文链接：https://blog.csdn.net/hellosunqi/article/details/39003697

11gR2 rac 节点二起不来，节点一正常运行，私有内联网络是通的，检查网络配置没有问题

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

[grid@rac02 rac02]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

[grid@rac02 rac02]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME          TARGET  STATE       SERVER                STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
   1       OFFLINE OFFLINE
ora.cluster_interconnect.haip
   1       ONLINE  OFFLINE
ora.crf
   1       ONLINE  ONLINE    rac02
ora.crsd
   1       OFFLINE OFFLINE
ora.cssd
   1       ONLINE  OFFLINE                            STARTING
ora.cssdmonitor
   1       ONLINE  ONLINE    rac02
ora.ctssd
   1       ONLINE  OFFLINE
ora.diskmon
   1       OFFLINE OFFLINE
ora.drivers.acfs
   1       OFFLINE OFFLINE
ora.evmd
   1       OFFLINE OFFLINE
ora.gipcd
   1       ONLINE  ONLINE    rac02
ora.gpnpd
   1       ONLINE  ONLINE    rac02
ora.mdnsd
   1       ONLINE  ONLINE    rac02

[grid@rac02 rac02]$ ps -ef |grep d.bin
root    1615    1  0 10:47 ?       00:00:00 /u01/app/11.2.0/grid/bin/cssdmonitor
root    1640    1  0 10:47 ?       00:00:00 /u01/app/11.2.0/grid/bin/cssdagent
grid    1654    1  0 10:47 ?       00:00:01 /u01/app/11.2.0/grid/bin/ocssd.bin
root    29984    1  0 10:16 ?       00:00:03 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid    30116    1  0 10:16 ?       00:00:01 /u01/app/11.2.0/grid/bin/oraagent.bin
grid    30130    1  0 10:16 ?       00:00:00 /u01/app/11.2.0/grid/bin/mdnsd.bin
grid    30142    1  0 10:16 ?       00:00:01 /u01/app/11.2.0/grid/bin/gpnpd.bin
root    30154    1  0 10:16 ?       00:00:00 /u01/app/11.2.0/grid/bin/orarootagent.bin
grid    30156    1  0 10:16 ?       00:00:03 /u01/app/11.2.0/grid/bin/gipcd.bin
root    30176    1  0 10:16 ?       00:00:13 /u01/app/11.2.0/grid/bin/osysmond.bin
root    30266    1  0 10:16 ?       00:00:01 /u01/app/11.2.0/grid/bin/ologgerd -m rac01 -r -d /u01/app/11.2.0/grid/crf/db/rac02

alert 日志：

[grid@rac02 rac02]$ tail -50 alertrac02.log
2014-09-01 10:27:01.705
[cssd(30378)]CRS-1605:CSSD 表决文件联机: ORCL:ASMDISK1; 详细资料见 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log。
2014-09-01 10:36:53.571
[/u01/app/11.2.0/grid/bin/cssdagent(30364)]CRS-5818:已中止命令 'start' (对于资源 'ora.cssd')。详细资料见 (:CRSAGF00113:) {0:0:2} (位于 /u01/app/11.2.0/grid/log/rac02/agent/ohasd/oracssdagent_root/oracssdagent_root.log)。
2014-09-01 10:36:53.572
[cssd(30378)]CRS-1656:CSS 守护程序由于致命错误而正在终止; 详细资料见 (:CSSSC00012:) (位于 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log)
2014-09-01 10:36:53.572
[cssd(30378)]CRS-1603:用户已关闭节点 rac02 上的 CSSD。
2014-09-01 10:36:59.095
[ohasd(29984)]CRS-2765:资源 'ora.cssdmonitor' 已失败 (在服务器 'rac02' 上)。
2014-09-01 10:37:02.378
[ohasd(29984)]CRS-2767:没有尝试对 'ora.diskmon' 进行资源状态恢复, 因为其目标状态为 OFFLINE
2014-09-01 10:37:10.738
[cssd(1489)]CRS-1713:CSSD 守护程序已在 clustered 模式下启动
2014-09-01 10:37:16.517
[cssd(1489)]CRS-1707:节点 rac02 (编号为 2) 的租约获取已完成
2014-09-01 10:37:17.813
[cssd(1489)]CRS-1605:CSSD 表决文件联机: ORCL:ASMDISK1; 详细资料见 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log。
2014-09-01 10:47:09.791
[/u01/app/11.2.0/grid/bin/cssdagent(1475)]CRS-5818:已中止命令 'start' (对于资源 'ora.cssd')。详细资料见 (:CRSAGF00113:) {0:0:2} (位于 /u01/app/11.2.0/grid/log/rac02/agent/ohasd/oracssdagent_root/oracssdagent_root.log)。
2014-09-01 10:47:09.792
[cssd(1489)]CRS-1656:CSS 守护程序由于致命错误而正在终止; 详细资料见 (:CSSSC00012:) (位于 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log)
2014-09-01 10:47:09.792
[cssd(1489)]CRS-1603:用户已关闭节点 rac02 上的 CSSD。
2014-09-01 10:47:15.216
[ohasd(29984)]CRS-2765:资源 'ora.cssdmonitor' 已失败 (在服务器 'rac02' 上)。
2014-09-01 10:47:18.491
[ohasd(29984)]CRS-2767:没有尝试对 'ora.diskmon' 进行资源状态恢复, 因为其目标状态为 OFFLINE
2014-09-01 10:47:26.857
[cssd(1654)]CRS-1713:CSSD 守护程序已在 clustered 模式下启动
2014-09-01 10:47:32.667
[cssd(1654)]CRS-1707:节点 rac02 (编号为 2) 的租约获取已完成
2014-09-01 10:47:33.976
[cssd(1654)]CRS-1605:CSSD 表决文件联机: ORCL:ASMDISK1; 详细资料见 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log。
2014-09-01 10:57:25.901
[/u01/app/11.2.0/grid/bin/cssdagent(1640)]CRS-5818:已中止命令 'start' (对于资源 'ora.cssd')。详细资料见 (:CRSAGF00113:) {0:0:2} (位于 /u01/app/11.2.0/grid/log/rac02/agent/ohasd/oracssdagent_root/oracssdagent_root.log)。
2014-09-01 10:57:25.902
[cssd(1654)]CRS-1656:CSS 守护程序由于致命错误而正在终止; 详细资料见 (:CSSSC00012:) (位于 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log)
2014-09-01 10:57:25.902
[cssd(1654)]CRS-1603:用户已关闭节点 rac02 上的 CSSD。
2014-09-01 10:57:31.327
[ohasd(29984)]CRS-2765:资源 'ora.cssdmonitor' 已失败 (在服务器 'rac02' 上)。
2014-09-01 10:57:34.605
[ohasd(29984)]CRS-2767:没有尝试对 'ora.diskmon' 进行资源状态恢复, 因为其目标状态为 OFFLINE
2014-09-01 10:57:42.969
[cssd(1776)]CRS-1713:CSSD 守护程序已在 clustered 模式下启动
2014-09-01 10:57:48.832
[cssd(1776)]CRS-1707:节点 rac02 (编号为 2) 的租约获取已完成
2014-09-01 10:57:50.141
[cssd(1776)]CRS-1605:CSSD 表决文件联机: ORCL:ASMDISK1; 详细资料见 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log。

ocss 日志

[grid@rac02 rac02]$ tail -50 /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log
2014-09-01 11:02:35.924: [ CSSD][1087342912]clssscSelect: cookie accept request 0x1c2940c0
2014-09-01 11:02:35.924: [ CSSD][1087342912]clssscevtypSHRCON: getting client with cmproc 0x1c2940c0
2014-09-01 11:02:35.924: [ CSSD][1087342912]clssgmRegisterClient: proc(4/0x1c2940c0), client(1/0x1c20d0c0)
2014-09-01 11:02:35.924: [ CSSD][1087342912]clssgmJoinGrock: global grock CRF- new client 0x1c20d0c0 with con 0x3457, requested num -1, flags 0x4000e00
2014-09-01 11:02:35.924: [ CSSD][1087342912]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group information has been received from the master; group name CRF-, member number -1, flags 0x4000e00
2014-09-01 11:02:35.924: [ CSSD][1087342912]clssgmDiscEndpcl: gipcDestroy 0x3457
2014-09-01 11:02:35.925: [ CSSD][1087342912]clssgmDeadProc: proc 0x1c2940c0
2014-09-01 11:02:35.925: [ CSSD][1087342912]clssgmDestroyProc: cleaning up proc(0x1c2940c0) con(0x3428) skgpid  ospid 30176 with 0 clients, refcount 0
2014-09-01 11:02:35.925: [ CSSD][1087342912]clssgmDiscEndpcl: gipcDestroy 0x3428
2014-09-01 11:02:36.765: [ CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:36.888: [ CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431890, LATS 258023324, lastSeqNo 35431889, uniqueness 1408622107, timestamp 1409540520/917853354
2014-09-01 11:02:37.767: [ CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:37.891: [ CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431891, LATS 258024324, lastSeqNo 35431890, uniqueness 1408622107, timestamp 1409540521/917854364
2014-09-01 11:02:38.252: [ CSSD][1109973312]clssnmRcfgMgrThread: Local Join
2014-09-01 11:02:38.252: [ CSSD][1109973312]clssnmLocalJoinEvent: begin on node(2), waittime 193000
2014-09-01 11:02:38.252: [ CSSD][1109973312]clssnmLocalJoinEvent: set curtime (258024684) for my node
2014-09-01 11:02:38.252: [ CSSD][1109973312]clssnmLocalJoinEvent: scanning 32 nodes
2014-09-01 11:02:38.252: [ CSSD][1109973312]clssnmLocalJoinEvent: Node rac01, number 1, is in an existing cluster with disk state 3
2014-09-01 11:02:38.253: [ CSSD][1109973312]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2014-09-01 11:02:38.253: [ CSSD][1108396352]clssnmSendingThread: sending join msg to all nodes
2014-09-01 11:02:38.253: [ CSSD][1108396352]clssnmSendingThread: sent 4 join msgs to all nodes
2014-09-01 11:02:38.769: [ CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:38.894: [ CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431892, LATS 258025324, lastSeqNo 35431891, uniqueness 1408622107, timestamp 1409540522/917855364
2014-09-01 11:02:39.771: [ CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:39.897: [ CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431893, LATS 258026334, lastSeqNo 35431892, uniqueness 1408622107, timestamp 1409540523/917856364
2014-09-01 11:02:40.773: [ CSSD][1094592832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-09-01 11:02:40.899: [ CSSD][1091438912]clssnmvDHBValidateNCopy: node 1, rac01, has a disk HB, but no network HB, DHB has rcfg 268948430, wrtcnt, 35431894, LATS 258027334, lastSeqNo 35431893, uniqueness 1408622107, timestamp 1409540524/917857364
2014-09-01 11:02:40.932: [ CSSD][1087342912]clssscSelect: cookie accept request 0x1bebc480
2014-09-01 11:02:40.932: [ CSSD][1087342912]clssgmAllocProc: (0x1c2940c0) allocated
2014-09-01 11:02:40.932: [ CSSD][1087342912]clssgmClientConnectMsg: properties of cmProc 0x1c2940c0 - 1,2,3,4,5
2014-09-01 11:02:40.932: [ CSSD][1087342912]clssgmClientConnectMsg: Connect from con(0x34fe) proc(0x1c2940c0) pid(30176) version 11:2:1:4, properties: 1,2,3,4,5
2014-09-01 11:02:40.932: [ CSSD][1087342912]clssgmClientConnectMsg: msg flags 0x0000
2014-09-01 11:02:40.934: [ CSSD][1087342912]clssscSelect: cookie accept request 0x1c2940c0
2014-09-01 11:02:40.934: [ CSSD][1087342912]clssscevtypSHRCON: getting client with cmproc 0x1c2940c0
2014-09-01 11:02:40.934: [ CSSD][1087342912]clssgmRegisterClient: proc(4/0x1c2940c0), client(1/0x1c20d0c0)
2014-09-01 11:02:40.934: [ CSSD][1087342912]clssgmJoinGrock: global grock CRF- new client 0x1c20d0c0 with con 0x352d, requested num -1, flags 0x4000e00
2014-09-01 11:02:40.934: [ CSSD][1087342912]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group information has been received from the master; group name CRF-, member number -1, flags 0x4000e00
2014-09-01 11:02:40.934: [ CSSD][1087342912]clssgmDiscEndpcl: gipcDestroy 0x352d
2014-09-01 11:02:40.935: [ CSSD][1087342912]clssgmDeadProc: proc 0x1c2940c0
2014-09-01 11:02:40.935: [ CSSD][1087342912]clssgmDestroyProc: cleaning up proc(0x1c2940c0) con(0x34fe) skgpid  ospid 30176 with 0 clients, refcount 0