Instance Crash & Bug 13846531

一客户决策支持数据库在3月6日、9日节点1的实例Crash。

2.   故障分析

3月6日下午17点27分,发现决策支持数据库节点1 ASM实例后台报ORA-04031错误,该错误表示共享内存不足,检查ASM实例时发现已HANG,该节点已被集群踢出,分析是在跑批处理加载的时候ASM实例shared pool碎片太多导致ORA-04031错误,该故障现象与Bug 13846531 : ASM INSTANCE CRASHED WITH ORA-4031 ON EXADATA类似,需安装相应的patch或设置几个参数绕过。

3月9日下午14点28分,该数据库节点1再次出现Crash,日志如下:

Sat Mar 09 14:28:35 2013
Errors in file/u01/app/oracle/diag/rdbms/bjscj1dw/bjscj1dw1/trace/bjscj1dw1_lms1_8230.trc  (incident=384220):
ORA-00600: internal error code, arguments: [kjbldrmrpst:pkey],[32767], [1123700], [0], [0], [], [], [], [], [], [], []
Incident details in:/u01/app/oracle/diag/rdbms/bjscj1dw/bjscj1dw1/incident/incdir_384220/bjscj1dw1_lms1_8230_i384220.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packagingdetails.
Sat Mar 09 14:28:38 2013
Dumping diagnostic data in directory=[cdmp_20130309142838],requested by (instance=1, osid=8230 (LMS1)), summary=[incident=384220].
Sat Mar 09 14:28:38 2013
Sweep [inc][384220]: completed
Sweep [inc2][384220]: completed
Errors in file/u01/app/oracle/diag/rdbms/bjscj1dw/bjscj1dw1/trace/bjscj1dw1_lms1_8230.trc:
ORA-00600: internal error code, arguments: [kjbldrmrpst:pkey],[32767], [1123700], [0], [0], [], [], [], [], [], [], []
LMS1 (ospid: 8230): terminating the instance due to error 484
System state dump requested by (instance=1, osid=8230 (LMS1)),summary=[abnormal instance termination].
System State dumped to trace file/u01/app/oracle/diag/rdbms/bjscj1dw/bjscj1dw1/trace/bjscj1dw1_diag_8214.trc
Sat Mar 09 14:28:50 2013
Instance terminated by LMS1, pid = 8230
Sat Mar 09 14:28:56 2013
Starting ORACLE instance(normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0


从日志中可以看出,数据库在Crash前出现ORA-00600 [kjbldrmrpst:pkey] 错误,此时实例被LMS1进程终止,6秒后该实例被集群自动启动。从错误信息来看,数据库很可能遭遇Bug 13397104 : INSTANCE CRASH WITH ORA-600 [KJBLPKEYDRMQSCCHK:PKEY],该BUG与11g Dynamic Remastering新引入的Read-mostly locking功能有关,为了提高性能,该功能默认是开启的,若频繁引起实例重启,建议关闭该功能,绕开这个Bug。

关于ORA-04031错误,需在asm spfile增加两个隐含参数,如下所示:

Thefollowing work around has helped in reported incident. (The ideal solution isto fix the application code to use bind variables instead of literals in thecalls it make)

1.Set SGA_TARGET=2G  (optional)

2._enable_shared_pool_durations= false

(Thiswill allow unpinned SQLA, KGLH0, and KGLHD memory to be freed to make room fornew permanent allocations)

3._library_cache_advice = false

  (To eliminate the allocations from kglsim,this parameter setting will work:)

Pleaseset all the above three change in the ASM SPFILE restart clusterware( We needto restart ASM instance).

关于ORA-00600 [kjbldrmrpst:pkey],若频繁出现实例重启的现象,建议关闭Dynamic Remastering的Read-mostly locking功能,设置下列隐含参数可以关闭。详细请参考Note 13397104.1。

_gc_read_mostly_locking=FALSE

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值