oracle10g ora 29702,rac节点无法启动ORA-29702的问题及分析

今天在虚拟机上启动rac,发现有一个节点怎么都起不了。另外一个节点没问题。

SQL> startup nomount

ORA-29702: error occurred in Cluster Group Service operation

尝试使用crs_stat查看crs的组件状态,也报错了。

-bash-4.1$ crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.

查看alert日志,发现在最后是因为29702的错误导致的。

SMON started with pid=20, OS id=12344

Sun May 11 04:10:28 2014

RECO started with pid=21, OS id=12346

Sun May 11 04:10:28 2014

MMON started with pid=22, OS id=12348

Sun May 11 04:10:28 2014

MMNL started with pid=23, OS id=12350

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

starting up 1 shared server(s) ...

USER (ospid: 12242): terminating the instance due to error 29702

Instance terminated by USER, pid = 12242

对于这个错误,oracle给出的解释如下。

-bash-4.1$ oerr ora 29702

29702, 00000, "error occurred in Cluster Group Service operation"

// *Cause: An unexpected error occurred while performing a CGS operation.

// *Action: Verify that the LMON process is still active.

//          Check the Oracle LMON trace files for errors.

//          Also, check the related CSS trace file for errors.

查看lmon的日志如下:

Trace file /u04/app/11.2.0/db/diag/rdbms/racdb/RACDB1/trace/RACDB1_lmon_12324.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,

Data Mining and Real Application Testing options

ORACLE_HOME = /u04/app/11.2.0/db/product/11.2.0/dbhome_1

System name:    Linux

Node name:      rac1

Release:        2.6.32-71.el6.x86_64

Version:        #1 SMP Wed Sep 1 01:33:01 EDT 2010

Machine:        x86_64

VM name:        VMWare Version: 6

Instance name: RACDB1

Redo thread mounted by this instance: 0 Oracle process number: 11

Unix process pid: 12324, image: oracle@rac1 (LMON)

*** 2014-05-11 04:10:27.777

*** SESSION ID:(130.1) 2014-05-11 04:10:27.777

*** CLIENT ID:() 2014-05-11 04:10:27.777

*** SERVICE NAME:() 2014-05-11 04:10:27.777

*** MODULE NAME:() 2014-05-11 04:10:27.777

*** ACTION NAME:() 2014-05-11 04:10:27.777

GES resources 5720 pool 3

GES enqueues 8361

GES IPC: Receivers 2  Senders 2

GES IPC: Buffers  Receive 1000  Send (i:1030 b:471) Reserve 301

GES IPC: Msg Size  Regular 1176  Batch 8376

Batching factor: enqueue replay 206, ack 229

Batching factor: cache replay 128 size per lock 64

*** 2014-05-11 04:10:28.644

kjxggin: CGS tickets = 1000

kgxgncin: CLSS init failed with status 3

kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSS

kjxgmin: kgxgncin fails - (2)

kjxggin: generic group layer init fails

*** 2014-05-11 04:10:28.655

Global Enqueue Service Shutdown

对于该节点,使用crs_stat,crsctl的操作都无济于事。

-bash-4.1$ crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

-bash-4.1$ crs_start -all

CRS-0184: Cannot communicate with the CRS daemon.

查看进程,确实都起来了。

-bash-4.1$ ps -ef|grep d.bin

root      2103     1  0 May10 ?        00:00:51 /u04/app/11.2.0/grid/bin/ohasd.bin reboot

grid      2297     1  0 May10 ?        00:00:32 /u04/app/11.2.0/grid/bin/oraagent.bin

grid      2309     1  0 May10 ?        00:00:01 /u04/app/11.2.0/grid/bin/mdnsd.bin

grid      2320     1  0 May10 ?        00:00:36 /u04/app/11.2.0/grid/bin/gpnpd.bin

root      2330     1  0 May10 ?        00:00:14 /u04/app/11.2.0/grid/bin/orarootagent.bin

grid      2333     1  0 May10 ?        00:02:39 /u04/app/11.2.0/grid/bin/gipcd.bin

root      2348     1  1 May10 ?        00:12:00 /u04/app/11.2.0/grid/bin/osysmond.bin

root      2569     1  0 May10 ?        00:03:55 /u04/app/11.2.0/grid/bin/ologgerd -M -d /u04/app/11.2.0/grid/crf/db/rac1

grid     12569  9580  0 04:25 pts/1    00:00:00 grep d.bin

使用root用户来停掉crs。但是报了错。

root

[root@rac1 bin]# ./crsctl disable crs

CRS-4621: Oracle High Availability Services autostart is disabled.

[root@rac1 bin]# ./crsctl stop crs

CRS-2796: The command may not proceed when Cluster Ready Services is not running

CRS-4687: Shutdown command has completed with errors.

CRS-4000: Command Stop failed, or completed with errors.

再次尝试启动,也是报错。

[root@rac1 bin]# ./crsctl enable crs

CRS-4622: Oracle High Availability Services autostart is enabled.

[root@rac1 bin]# ./crsctl start crs

CRS-4640: Oracle High Availability Services is already active

CRS-4000: Command Start failed, or completed with errors.

最后看到mos上有一个workaround,可以手动Kill掉那些crs的进程。当然了,在正式环境中还是得把psu打上。

[root@rac1 bin]# ps -fea | grep ohasd.bin | grep -v grep

root      2103     1  0 May10 ?        00:00:52 /u04/app/11.2.0/grid/bin/ohasd.bin reboot

[root@rac1 bin]# ps -fea | grep gipcd.bin | grep -v grep

grid      2333     1  0 May10 ?        00:02:41 /u04/app/11.2.0/grid/bin/gipcd.bin

[root@rac1 bin]# ps -fea | grep mdnsd.bin | grep -v grepgrid      2309     1  0 May10 ?        00:00:01 /u04/app/11.2.0/grid/bin/mdnsd.bin

[root@rac1 bin]# ps -fea | grep gpnpd.bin | grep -v grep

grid      2320     1  0 May10 ?        00:00:37 /u04/app/11.2.0/grid/bin/gpnpd.bin

[root@rac1 bin]# ps -fea | grep evmd.bin | grep -v grep

[root@rac1 bin]# ps -fea | grep crsd.bin | grep -v grep

[root@rac1 bin]# kill -9 2103 2333  2309 2320

再次尝试启动crs

[root@rac1 bin]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@rac1 bin]# ./crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.

启动的时候有些慢,稍等一下,直接自己来启库了。这次起库就没有问题了。

-bash-4.1$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sun May 11 04:41:03 2014

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount

ORACLE instance started.

Total System Global Area  638853120 bytes

Fixed Size                  2231072 bytes

Variable Size             482346208 bytes

Database Buffers          146800640 bytes

Redo Buffers                7475200 bytes

SQL> alter database mount;

Database altered.

SQL> alter database open;

Database altered.

SQL>

查看crs的状态,该起的都起了。两个节点创建了一个小表做测试,没有问题了。那个workaround的细节可以从MOS文档 ID 1233580.1里面查看。

-bash-4.1$ crs_stat -t

Name           Type           Target    State     Host

------------------------------------------------------------

ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1

ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac2

ora.asm        ora.asm.type   OFFLINE   OFFLINE

ora.cvu        ora.cvu.type   OFFLINE   OFFLINE

ora.gsd        ora.gsd.type   OFFLINE   OFFLINE

ora....network ora....rk.type ONLINE    ONLINE    rac1

ora.oc4j       ora.oc4j.type  OFFLINE   OFFLINE

ora.ons        ora.ons.type   ONLINE    ONLINE    rac1

ora....SM1.asm application    OFFLINE   OFFLINE

ora....C1.lsnr application    ONLINE    ONLINE    rac1

ora.rac1.gsd   application    OFFLINE   OFFLINE

ora.rac1.ons   application    ONLINE    ONLINE    rac1

ora.rac1.vip   ora....t1.type ONLINE    ONLINE    rac1

ora....SM2.asm application    OFFLINE   OFFLINE

ora....C2.lsnr application    ONLINE    ONLINE    rac2

ora.rac2.gsd   application    OFFLINE   OFFLINE

ora.rac2.ons   application    ONLINE    ONLINE    rac2

ora.rac2.vip   ora....t1.type ONLINE    ONLINE    rac2

ora.racdb.db   ora....se.type ONLINE    ONLINE    rac2

ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac2

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值