Oracle RAC内部错误:ORA-00600[kjbmprlst:shadow]一例

一套Linux x86-64上的11.2.0.1 4节点RAC系统中LMS GCS服务进程遭遇到内部错误ORA-00600[kjbmprlst:shadow],导致节点实例意外终止,具体日志如下:
Fri Jul 08 02:04:43 2011
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc  (incident=1011732):
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/PROD/PROD1/incident/incdir_1011732/PROD1_lms1_536_i1011732.trc
Fri Jul 08 02:04:44 2011
Trace dumping is performing id=[cdmp_20110708020444]
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
LMS1 (ospid: 536): terminating the instance due to error 484
Fri Jul 08 02:04:45 2011
opiodr aborting process unknown ospid (27387) as a result of ORA-1092
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_diag_513.trc
Fri Jul 08 02:04:54 2011
Termination issued to instance processes. Waiting for the processes to exit
Fri Jul 08 02:04:58 2011
ORA-1092 : opitsk aborting process
该ORA-00600[kjbmprlst:shadow]错误定位为11.2.0.1上的Bug 10121589或Bug 9458781:
Bug 10121589  ORA-600 [kjbmprlst:shadow] can occur in RAC
Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions BELOW 12.1
    Versions confirmed as being affected

        11.2.0.1

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in

        12.1 (Future Release)
        11.2.0.2 Bundle Patch 2 for Exadata Database
        11.2.0.1 Bundle Patch 7 for Exadata Database

Symptoms:

Related To:

    Internal Error May Occur (ORA-600)
    ORA-600 [kjbmprlst:shadow]

    RAC (Real Application Clusters) / OPS

Description

    An ORA-600 [kjbmprlst:shadow] can occur if the fix for bug 9979039
    is present.

    Note:
     One off patches for 10200390 should also include this fix.

Bug 9458781  Missing close message to master leaves closed lock dangling crashing the instance with assorted Internal error

Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions >= 11.2.0.1 but BELOW 11.2.0.2
    Versions confirmed as being affected

        11.2.0.1

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in

        11.2.0.2 (Server Patch Set)
        11.2.0.1 Bundle Patch 4 for Exadata Database

Symptoms:

Related To:

    Instance May Crash
    Internal Error May Occur (ORA-600)
    ORA-600 [KJBMPRLST:SHADOW]
    ORA-600 [KJBMOCVT:RID]
    ORA-600 [KJBRREF:PKEY]
    ORA-600 [KJBRASR:PKEY]

    RAC (Real Application Clusters) / OPS

Description

    A lock is closed without sending a message to the master.
    This causes closed lock dangling at the master crashing the instance with different internal errors.

    Reported internal errors so far are :
    - KJBMPRLST:SHADOW
    - KJBMOCVT:RID
    - KJBRREF:PKEY
    - KJBRASR:PKEY
该kjbmprlst:shadow内部函数用以管理kjbm shadow锁(/libserver10.a/kjbm.o )信息,存在某个已关闭的lock没有及时message给master node的代码漏洞,目前除了安装补丁外没有已验证的workaround办法(disable drm似乎是无效的):
oradebug lkdebug (track resources, take dumps)
KCL history
KJBL history
KJL history

PCM (GCS) and non-PCM (GES) resources are kept separate and use separate code paths.
GES:
Resource table: kjr and kjrt
Lock table: kjlt
Processes: kjpt
GCS:
Resource table: kjbr
Lock table: kjbl

DLM Structures (continued)
/* PCM resource structure */
typedef struct kjbr {                                /* 68 bytes on sun4u */
  kjsolk       hash_q_kjbr;                             /* hash list : hp */
  ub4          resname_kjbr[2];	                     /* the resource name */
  kjsolk       scan_q_kjbr; /* chain to lmd scan q of grantable resources */
  kjsolk       grant_q_kjbr;                 /* list of granted resources */
  kjsolk       convert_q_kjbr;       /* list of resources being converted */
  ub4          diskscn_bas_kjbr;         /* scn(base) known to be on disk */
  ub2          diskscn_wrap_kjbr;        /* scn(wrap) known to be on disk */
  ub2          writereqscn_wrap_kjbr;    /* scn(wrap) requested for write */
  ub4          writereqscn_bas_kjbr;     /* scn(base) requested for write */
  struct kjbl *sender_kjbr;                 /* lock elected to send block */
  ub2          senderver_kjbr;                  /* version# of above lock */
  ub2          writerver_kjbr;                  /* version# of lock below */
  struct kjbl *writer_kjbr;                /* lock elected to write block */
  ub1          mode_role_kjbr; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
  ub1          flags_kjbr;                        /* ignorewip, free etc. */
  ub1          rfpcount_kjbr;                      /* refuse ping counter */
  ub1          history_kjbr;                /* resource operation history */
  kxid         xid_kjbr;                          /* split transaction ID */
} kjbr ;

/* kjbl - PCM lock structure
** Clients and most of the DLM will use the KJUSER* or KJ_* modes and kscns  */

typedef struct kjbl {                                /* 52 bytes on sun4u */
  union {                     /* discriminate lock@master and lock@client */
    struct {                                           /* for lock@master */
      kgglk        state_q_kjbl;             /* link to chain to resource */
      kjbopqi     *rqinfo_kjbl;                             /* target bid */
      struct kjbr *resp_kjbl;                   /* pointer to my resource */
    } kjbllam;                                 /* KJB Lock Lock At Master */
    struct {                                           /* for lock@client */
      ub4         disk_base_kjbl;        /* disk version(base) for replay */
      ub2         disk_wrap_kjbl;        /* disk version(wrap) for replay */
      ub1         master_node_kjbl;                   /* master instance# */
      ub1         client_flag_kjbl;     /* flags specific to client locks */
      ub2         update_seq_kjbl;               /* last update to master */
    } kjbllac;                                 /* KJB Lock Lock At Client */
  } kjblmcd;                        /* KJB Lock Master Client Discrimnant */
  void  *remote_lockp_kjbl;           /* pointer to client lock or shadow */
  ub2    remote_ver_kjbl;                         /* remote lock version# */
  ub2        ver_kjbl;                                     /* my version# */
  ub2        msg_seq_kjbl;                         /* client->master seq# */
  ub2        reqid_kjbl;                         /* requestid for convert */
  ub2        creqid_kjbl; /* requestid for convert that has been cancelled */
  ub2        pi_wrap_kjbl;                     /* scn(wrap) of highest pi */
  ub4        pi_base_kjbl;                     /* scn(base) of highest pi */
  ub1        mode_role_kjbl; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
  ub1        state_kjbl;       /* _L|_R|_W|_S, notify, which q, lock type */
  ub1        node_kjbl;                       /* instance lock belongs to */
  ub1        flags_kjbl;                                /* lock flag bits */
  ub2        rreqid_kjbl;                               /* save the reqid */
  ub2         write_wrap_kjbl;        /* last write request version(wrap) */
  ub4         write_base_kjbl;        /* last write request version(base) */
  ub4         history_kjbl;                     /* lock operation history */
} kjbl;

PCM DLM locks that are owned by the local instance are allocated and embedded in an LE structure.
PCM DLM locks that are owned by remote instances and mastered by the local instance are allocated in SHARED_POOL.

PCM Locks and Resources
Fields of interest in the kclle structure: kcllerls or releasing; kcllelnm or name(id1,id2);
kcllemode or held-mode; kclleacq or acquiring; kcllelck or DLM lock.

Fields of interest in the kjbr structure: resname_kjbr[2] or resource name; grant_q_kjbr or grant queue;
convert_q_kjbr or convert queue; mode_role_kjbr, which is a bitwise merge of grant mode and
role-interpreted NULL(0x00), S(0x01), X(0x02), L0 Local (0x00), G0 Global without PI (0x08), G1 Global with PI (0x018).

The field mode_role_kjbl in kjbl is a bitwise merge of grant, request, and lock mode: 0x00 if grant NULL;
0x01 if grant S; 0x02 if grant X; 0x04 lock has been opened at master; 0x08 if global role (otherwise local);
0x10 has one or more PI; 0x20 if request CR; 0x40 if request S; 0x80 if request X.

Someone has to keep a list of all buffers and where they are mastered
This is called Global Resource Directory (GRD)
GRD is present on all the instances of the cluster
To find out the master:
select  b.dbablk, r.kjblmaster master_node
from x$le l, x$kjbl r, x$bh b
where b.obj =
and b.le_addr = l.le_addr
and l.le_kjbl = r.kjbllockp

Oracle Support宣称可以通过11.2.0.2 (Server Patch Set)11.2.0.1 Bundle Patch 4 for Exadata Database修复该bug,但是有迹象表明在11.2.0.2上仍可能发生该ORA-00600[kjbmprlst:shadow]内部错误,同时该bug更多地发生在超过2个节点的RAC系统中。  



本文转自maclean_007 51CTO博客,原文链接:http://blog.51cto.com/maclean/1277849

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在本地自己的操作系统上,完全模拟生产环境,让学员跟着视频一步一步搭建一套在RHEL7操作系统上面的oracle19c的rac环境。同时学员还会学会DNS服务器,DNS客户端,NTP服务器,NTP客户端等操作系统知识。让学员在短时间内,oracle和操作系统知识,有一定的提升。让学员可以独自轻松安装一套oracle rac环境。一课程主题 模拟生产环境,用多路径共享存储,用虚拟软件安装一套RHEL7.5+oracle19c+rac+打补丁二课程特色 完成模拟生产环境,用openfiler软件模拟生产惠普的3par存储,用2个虚拟网卡模拟数据库服务器的2个HBA卡。课程以实践为主,从头到尾一步一步教学员怎样操作,实践性强模拟生产需求,完全可以把这套环境拿到生产环境安装三课程内容 1.课程结束成果演示    1.1 用workstation,安装一套rhel7+oracle19c+rac+multipath+共享存储+DNS服务器+DNS客户端+NTP服务器+NTP客户端的生产环境    1.2 怎样打oracle19c的rac补丁(包括GI补丁,oracle补丁,数据库补丁,OJVM补丁,bug补丁)2.安装openfiler软件,模拟共享存储+配置多路径访问    2.1安装openfiler软件   2.2配置openfiler软件(配置2个虚拟网卡,模拟服务器的2个HBA卡)   2.3创建ocr磁盘   2.4创建mgmt磁盘   2.5创建数据文件磁盘   2.6创建归档日志磁盘3.安装2台数据库服务器    3.1安装2台数据库服务器RHEL7.5   3.2配置服务器双网卡绑定+配置服务器心跳线4.安装多路径软件识别共享存储中的磁盘     4.1安装服务器本地yum源    4.2安装iscsi软件,配置多路径配置文件,识别共享存储中的磁盘    4.3识别ocr磁盘    4.4识别mgmt磁盘    4.5识别数据文件磁盘    4.6识别归档日志磁盘5.oracle19c的rac环境系统参数官方说明     5.1如何配置oracle19c的rac的系统参数(我们参考官方说明)    5.2oracle19c+rac环境Best Practices 官方说明文档6.安装oracle19c+rac之前的准备工作     6.1修改/etc/hosts文件    6.2配置DNS服务器+DNS客户端+NTP服务器+NTP客户端    6.3创建用户和组    6.4创建目录    6.5修改用户环境变量    6.6安装相关软件包    6.7配置ssh互信    6.9禁用服务器透明大页7.安装oracle+19c+rac软件    7.1安装GI软件   7.2创建ASM磁盘,主要是数据文件磁盘和归档日志磁盘   7.3安装数据库软件   7.4创建数据库实   7.5日常常用维护集群命令(启停数据库,启停集群,查看监听,教同学们怎样不死记命令,而且命令还正确)8.打补丁   8.1打GI和ORACLE的操作系统补丁  8.2打OJVM补丁  8.3打ORA600的bug补丁9.课程总结和成果演示  9.1课程总结和成果演示 四学习必要工具 安装workstation软件  官网下载openfiler,rhel7.5软件下载oracle软件(包括19.3的rac安装包,19.4的补丁)以上软件我都已经在视频里面做了下载地址说明五课程纠错1)rac的私有ip应该是不同的网段,我在视频中设置错误。Ensure all private Ethernet interfaces are set to different subnets on each node. If different subnets are not used and connectivity is lost, this can cause a node reboot within the cluster2)配置好multipath,以及多路径的别名后,还要增加如下配置文件。[root@hellorac1 rules.d]# cat /etc/udev/rules.d/99-persistent.rulesENV==data, OWNER:=grid, GROUP:=asmadmin, MODE:=660上面的配置文件增加成功后,运行如下命令:#udevadm control --reload-rules#/sbin/udevadm trigger --type=devices --action=change执行完成之后,会发现/dev/dm*相应的磁盘权限变成grid.asmadmin. 视频中只运行了udevadm control --reload-rules。3)安装GI部分1和部分2,先看部分2,再看部分1.特此纠正上面的3个错误。谢谢大家的支持和厚爱。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值