oracle rac节点自动重启,RAC节点频繁重启出现ORA-29702

博客内容涉及Oracle 10g RAC在Windows环境下遇到的节点频繁重启问题,日志显示错误代码ORA-29702,与Oracle已知Bug相关。该Bug影响特定版本,但提供的解决方案不适用于Windows系统。由于在Windows上部署RAC不常见,问题可能与平台不稳定性和Oracle理解不足有关,升级版本可能是主要解决途径。
摘要由CSDN通过智能技术生成

数据库的Oracle 10204 RAC for Windows出现频繁节点重启的问题。

从告警日志看,当前节点的重启一般发生在节点刚启动或关闭时:

Thu May 03 17:22:45 2012

cluster interconnect IPC tb version:Oracle 9i Winsock2 TCP/IP IPC

IPC Vendor 0 proto 0

Version 0.0

PMON started with pid=2, OS id=1616

DIAG started with pid=3, OS id=120

PSP0 started with pid=4, OS id=6104

LMON started with pid=5, OS id=3844

LMD0 started with pid=6, OS id=6120

LMS0 started with pid=7, OS id=3548

LMS1 started with pid=8, OS id=5688

LMS2 started with pid=9, OS id=3636

LMS3 started with pid=10, OS id=3588

MMAN started with pid=11, OS id=3168

DBW0 started with pid=12, OS id=3208

DBW1 started with pid=13, OS id=5784

LGWR started with pid=14, OS id=6208

CKPT started with pid=15, OS id=3100

SMON started with pid=16, OS id=5948

RECO started with pid=17, OS id=3748

CJQ0 started with pid=18, OS id=7152

MMON started with pid=19, OS id=4552

MMNL started with pid=20, OS id=6940

Thu May 03 17:22:46 2012

lmon registered with NM - instance id 1 (internal mem no 0)

Thu May 03 17:22:46 2012

Reconfiguration started (old inc 0, new inc 8)

List of nodes:

0 1

Global Resource Directory frozen

* allocate domain 0, invalid = TRUE

Communication channels reestablished

Error: KGXGN aborts the instance (6)

Thu May 03 17:22:51 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_lmon_3844.trc:

ORA-29702: ???????????

LMON: terminating instance due to error 29702

Thu May 03 17:22:51 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_pmon_1616.trc:

ORA-29702: ???????????

Thu May 03 17:22:51 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_psp0_6104.trc:

ORA-29702: ???????????

Thu May 03 17:22:51 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_dbw0_3208.trc:

ORA-29702: ???????????

Thu May 03 17:22:51 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_mman_3168.trc:

ORA-29702: ???????????

Thu May 03 17:22:51 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_dbw1_5784.trc:

ORA-29702: ???????????

Thu May 03 17:22:51 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_ckpt_3100.trc:

ORA-29702: ???????????

Thu May 03 17:22:51 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_lgwr_6208.trc:

ORA-29702: ???????????

Thu May 03 17:22:52 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_reco_3748.trc:

ORA-29702: ???????????

Thu May 03 17:22:52 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_smon_5948.trc:

ORA-29702: ???????????

Thu May 03 17:22:52 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_lms1_5688.trc:

ORA-29702: ???????????

Thu May 03 17:22:52 2012

Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_lms0_3548.trc:

ORA-29702: ???????????

Instance terminated by LMON, pid = 3844

而从CSSD日志文件中可以发现下面的信息:

[ CSSD]2012-04-29 16:26:07.953 [7112] >TRACE: clssgmReconfigThread: completed for reconfig(13), with status(1)

2012-04-30 09:07:04.718: [ OCROSD]utgdv:11:could not read reg value ocrmirrorconfig_loc os error=操作系统找不到已输入的环境选项。

2012-04-30 09:07:04.718: [ OCROSD]utgdv:11:could not read reg value ocrmirrorconfig_loc os error=操作系统找不到已输入的环境选项。

[ CSSD]2012-04-30 09:07:04.765 >USER: Copyright 2012, Oracle version 10.2.0.4.0

[ CSSD]2012-04-30 09:07:04.765 >USER: CSS daemon log for node crct-oadb, number 1, in cluster crs

[ CSSD]2012-04-30 09:07:04.765 [3780] >TRACE: clssscmain: local-only set to false

[ clsdmt]Listening to (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61180))

[ CSSD]2012-04-30 09:07:04.781 [3780] >TRACE: clssnmReadNodeInfo: added node 1 (crct-oadb) to cluster

[ CSSD]2012-04-30 09:07:04.781 [3780] >TRACE: clssnmReadNodeInfo: added node 2 (crct-oapt) to cluster

[ CSSD]2012-04-30 09:07:04.828 [3724] >TRACE: clssnm_skgxninit: Compatible vendor clusterware not in use

[ CSSD]2012-04-30 09:07:04.828 [3724] >TRACE: clssnm_skgxnmon: skgxn init failed

[ CSSD]2012-04-30 09:07:04.843 [3780] >TRACE: clssnmNMInitialize: misscount set to (60)

[ CSSD]2012-04-30 09:07:04.843 [3780] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 30000 ms, reconfig start (misscount) 60000 ms

[ CSSD]2012-04-30 09:07:04.843 [3780] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0/\\.\votedsk1)

[ CSSD]2012-04-30 09:07:04.843 [3112] >TRACE: clssnmvDPT: spawned for disk 0 (\\.\votedsk1)

[ CSSD]2012-04-30 09:07:06.843 [3112] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0/\\.\votedsk1)

[ CSSD]2012-04-30 09:07:06.843 [4492] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (\\.\votedsk1) initial sleep interval (1000)ms

根据这些信息查询,发现属于10.2.0.4上的bug:10gR2/11gR1: Instances Abort With ORA-29702 When The Server is rebooted or shut down [ID 752399.1]。这个bug影响10.2.0.1到10.2.0.4以及11.1.0.6和11.1.0.7版本。

Oracle给出的解决方案是修改操作系统启动时调用的K96 link替换为K19 link。不过当前版本是Windows环境,显然这种解决方法并不适用。恐怕除了升级版本外,没有什么太好的其他解决方法。

将产品环境部署在Windows环境下的系统确实少见,而在Windows上部署RAC的就更是凤毛麟角了,而大多数这样部署的不只是对于Oracle不了解,连Windows和Linux的稳定性的差别都不是很清楚,出现各种问题的几率自然要大得多了。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值