Oracle 11.2.0.4.0 RAC下DRM导致单节点宕机

   DRM的bug太多,所以建议直接关闭。

alert日志:

Errors in file /oracle/app/oracle/diag/rdbms/gg/gg1/trace/gg1_lmon_60688126.trc:
ORA-29702: error occurred in Cluster Group Service operation
No connectivity to other instances in the cluster during startup. Hence, LMON is terminating the instance. Please check the LMON trace file for details. Also, please check the network logs of this instance along with clusterwide network health for problems and then re-start this instance.
LMON (ospid: 63112654): terminating the instance
Dumping diagnostic data in directory=[cdmp_20170814161033], requested by (instance=1, osid=63112654 (LMON)), summary=[abnormal instance termination].
Instance terminated by LMON, pid = 63112654
LMON: 各实例的LMON进程会定期通信,以检查集群中各节点的健康状态,当某个节点出现故障时,负责集群重构、GRD恢复等操作,它提供的服务叫CGS(cluster group services)。LMON可以和下层的clusterware合作也可以单独工作。当LMON检测到实例级别的脑裂时,LMON会通知下层的clusterware,期待clusterware解决脑裂问题,但是RAC并不假设clusterware肯定能够解决问题,因此,LMON不会无尽等待clusterware层的处理结果。如果发生等待超时,LMON会自动触发IMR(instance membership recovery)IMR功能可以看做是oracle在数据库层提供的脑裂、IO隔离机制。


LMON主要是借助两种心跳机制来完成健康检测:
1.节点间的网络心跳。
2.控制文件的磁盘心跳。每个节点的CKPT进程每隔3S更新一次控制文件一个数据块。可以通过x$kcccp看到这个动作。SQL>select inst_id,cphbt from x$kcccp

gg1_lmon_60688126.trc:

2017-08-14 16:07:40.381460 : kjfspseudorcfg: requested with reason 5(DRM Quiesce step stall)

* kjfcln: DRM aborted due to CGS rcfg.
*** 2017-08-14 16:07:44.621
=====================================================
kjxgmpoll: CGS state (20 1) start 0x59915a4b cur 0x59915a50 rcfgtm 5 sec
*** 2017-08-14 16:07:49.605
=====================================================
kjxgmpoll: CGS state (20 1) start 0x59915a4b cur 0x59915a55 rcfgtm 10 sec
*** 2017-08-14 16:07:54.581
=====================================================
kjxgmpoll: CGS state (20 1) start 0x59915a4b cur 0x59915a5a rcfgtm 15 sec
............................................................................
*** 2017-08-14 16:08:59.675
=====================================================
kjxgmpoll: CGS state (20 1) start 0x59915a4b cur 0x59915a9b rcfgtm 80 sec
*** 2017-08-14 16:09:04.694
=====================================================
kjxgmpoll: CGS state (20 1) start 0x59915a4b cur 0x59915aa0 rcfgtm 85 sec
kjxgmpoll: the CGS reconfiguration has spent 85 seconds.
kjxgmpoll: terminate the CGS reconfig.
Error: Cluster Group Service reconfiguration takes too long
LMON caught an error 29702 in the main loop
error 29702 detected in background process
ORA-29702: error occurred in Cluster Group Service operation
CGS reconfig的原因也正是由于DRM操作失败导致。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值