LGWR (ospid: 14245): terminating the instance due to error 4021

背景:oracle ADG备库(数据库版本:11.2.0.4.0) 凌晨三点半出现宕机
alert日志如下:

Archived Log entry 120411 added for thread 1 sequence 113891 ID 0x26d463ff dest 1:
Thu Aug 29 03:33:31 2024
Errors in file /data/u01/app/oracle/diag/rdbms/mes2rydg/mes2rydg/trace/mes2rydg_lgwr_14245.trc:
ORA-04021: timeout occurred while waiting to lock object 
LGWR (ospid: 14245): terminating the instance due to error 4021
Thu Aug 29 03:33:31 2024
opiodr aborting process unknown ospid (14548) as a result of ORA-1092
Thu Aug 29 03:33:31 2024
ORA-1092 : opitsk aborting process
Thu Aug 29 03:33:31 2024
System state dump requested by (instance=1, osid=14245 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /data/u01/app/oracle/diag/rdbms/mes2rydg/mes2rydg/trace/mes2rydg_diag_14215_20240829033331.trc
Instance terminated by LGWR, pid = 14245      <<<---

查看mes2rydg_lgwr_14245.trc日志:

error 4021 detected in background process
ORA-04021: timeout occurred while waiting to lock object 
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5570<-ksbrdp()+35
07<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__
libc_start_main()+256 
----- End of Abridged Call Stack Trace -----

*** 2024-08-29 03:33:31.428
LGWR (ospid: 14245): terminating the instance due to error 4021
ksuitm: waiting up to [5] seconds before killing DIAG(14215)  <<<---

查看mes2rydg_diag_14215_20240829033331.trc:

 SO: 0x3b025e8b80, type: 4, owner: 0x3b02207740, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x3b02207740, name=session, file=ksu.h LINE:12729, pg=0
    (session) sid: 1198 ser: 1 trans: (nil), creator: 0x3b02207740
              flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x409) -/-/INC
              DID: , short-term DID: 
              txn branch: (nil)
              edition#: 0              oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
      Not in wait; last wait ended 0.000000 sec ago 
    There are 504 sessions blocked by this session.      <<<---
    Dumping one waiter:
      inst: 1, sid: 3539, ser: 16101
      wait event: 'library cache lock'
        p1: 'handle address'=0x3bbfba3470
        p2: 'lock address'=0x3c1ed0e6b8
        p3: '100*mode+namespace'=0x1004a0002
      row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
      min_blocked_time: 808 secs, waiter_cache_ver: 31858     <<<---
    Wait State:
      fixed_waits=0 flags=0x21 boundary=(nil)/-1
    Session Wait History:
        elapsed time of 0.000000 sec since last wait
     0: waited for 'library cache lock'
        handle address=0x3bbfba3470, lock address=0x3c3ff7b5f8, 100*mode+namespace=0x1004a0003
        wait_id=87251747 seq_num=24663 snap_id=1
        wait times: snap=15 min 0 sec, exc=15 min 0 sec, total=15 min 0 sec         <<<---
        wait times: max=15 min 0 sec
        wait counts: calls=301 os=301
        occurred after 0.000093 sec of elapsed time
     1: waited for 'library cache lock'
        handle address=0x3bbfba3470, lock address=0x3c3ff7b5f8, 100*mode+namespace=0x1004a0003
        wait_id=87251746 seq_num=24662 snap_id=1
        wait times: snap=0.308117 sec, exc=0.308117 sec, total=0.308117 sec
        wait times: max=15 min 0 sec
        wait counts: calls=2 os=2
        occurred after 0.000015 sec of elapsed time
     2: waited for 'rdbms ipc message'
        timeout=0x66, =0x0, =0x0
        wait_id=87251745 seq_num=24661 snap_id=1
        wait times: snap=0.039975 sec, exc=0.039975 sec, total=0.039975 sec

参考mos文档:ID 2183882.1

issue matches with bug 11712267 and bug 16717701

Since two bugs are matching with the case,

You can try with option (1) . As per Bug 11712267

change the cursor_sharing to force on Active dataguard (ADG).

Monitor your environment for sometime.

If it crashes again then follow with the option (2)
Option (2):

As per bug description

LGWR can request DBINSTANCE lock in X mode without any timeout which can lead to a hang / deadlock.

Both fixes are already included in 11.2.0.4 but the fix is DISABLED by default.
== > To ENABLE the fix one has to set == > "_adg_parselock_timeout" > to the number of centi-seconds == > LGWR should wait before backing off and retrying the request.

Value should be in centi seconds. == > I Don't think there is really any hard fast rule for a value - at default (0) it will not timeout.
A value representing a few seconds seems reasonable - if LGWR has been stuck for say 5 seconds waiting it seems reasonable guess it is not going to get the lock.

The param just causes it to abort the current attempt and retry If you want to play safe can start with a higher value then decrease later.
A higher value will just mean more sessions blocked for longer in case of the deadlock situation.
500 Seems reasonable , but I have no data to base it on.

There should be a statistic "ADG parselock X get attempts" If it gets set too small that value would likely increase a lot due to keep timing out and retrying.

This is a dynamic parameter

Follow option (1) .

change the cursor_sharing to force on ADG


If issue re-appears then follow option (2) as below

 

Please set "_adg_parselock_timeout" to 500 == >

SQL > alter system set "_adg_parselock_timeout"=500 scope=both sid='*';

Cursor_sharing参数有3个值可以设置:

 1)、EXACT:通常来说,exact值是Oracle推荐的,也是默认的,它要求SQL语句在完全相同时才会重用,否则会被重新执行硬解析操作。

 2)、SIMILAR:similar是在Oracle认为某条SQL语句的谓词条件可能会影响到它的执行计划时,才会被重新分析,否则将重用SQL。

 3)、FORCE:force是在任何情况下,无条件重用SQL。

查询cursor_sharing当前值
show parameter cursor_sharing

SQL> show parameter cursor_sharing
 
NAME             TYPE                   VALUE
----------     ------------       ------------
cursor_sharing   string            EXACT

查询_adg_parselock_timeout值(动态参数,可直接修改):
SELECT ksppinm, ksppstvl, ksppdesc
FROM x k s p p i x , x ksppi x, x ksppix,xksppcv y
WHERE x.indx = y.indx AND ksppinm = ‘_adg_parselock_timeout’;

SQL> SELECT   ksppinm, ksppstvl, ksppdesc  FROM   x$ksppi x, x$ksppcv y  
WHERE   x.indx = y.indx AND  ksppinm = '_adg_parselock_timeout';

KSPPINM                KSPPSTVL         KSPPDESC
---------------       -------------     --------------- 
_adg_parselock_timeout    0               timeout for parselock get on ADG in centiseconds

“ADG 上获取 parselock 的超时时间,单位为百分之一秒。”

在 Oracle 数据库中,parselock 是一种锁机制,用于在解析 SQL 语句时避免竞争条件。它确保在 SQL 语句的解析阶段,数据库的多个进程不会同时修改或访问相同的 SQL 语句,从而防止解析过程中出现冲突或数据不一致的情况。

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值