Rac PMON crash 故障解决一例

最新推荐文章于 2023-02-10 23:59:34 发布

cjh2002519

最新推荐文章于 2023-02-10 23:59:34 发布

阅读量257

点赞数

文章标签：数据库网络

RAC 2节点CRASH 在ALERT LOG中发现如下日志

ORA-07445: exception encountered: core dump [kssdch()+2188] [SIGSEGV] [Address not mapped to object] [0x00008239D] [] []

相关错误有可能是PL/SQL developer 引起的数据字典bug 但是由于在 V5之后就不会存在这个问题了我们的 PL/SQL DEV是 V8的。

(k2g table)

error 602 detected in background process

ORA-00602: internal programming exception

ORA-07445: exception encountered: core dump [kssdch()+2188] [SIGSEGV] [Address not mapped to object] [0x00008239D] [] []

而且这个问题确实是可以引发宕机 BUG 在11G 中才修复好。

SIGSEGV
Typically, the signals seen are SIGBUS (signal 10, bus error) and SIGSEGV (signal 11, segmentation violation). There are other UNIX signals and exceptions that may happen, however, they are likely caused by OS problems rather than an Oracle problem. Examples of other signals are: SIGINT, SIGKILL, SIGSYS. A complete list is available in Note:1038055.6.

错误解释

SIGSEGV

Segmentation violation. This signal can also result from an illegal

pointer reference or an array bound error.

看起来还是软件的错误虽然他说是OS 的错误。但是论坛上有提到解决问题的办法是 flush shared_pool.。

下面是一个BUG REPORT 我选择其中的关键内容

When attempting to cleanup after a SQL*Net connection is terminated, the following error occurs:

ORA-07445: exception encountered: core dump [kssdct()+94] [SIGSEGV] [Address not mapped to object] [0x00000240E] [] []

and then the instance is terminated, due to PMON reporting the below errors:

ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kssdch()+2188] [SIGSEGV] [Address not mapped to object] [0x00000241E] [] []


  Oracle 10.2.0.5 on Linux x86-64.
  8 node RAC database
  Intermittent instance failures on one node. So far, two failures.

这个过程来看跟我们的宕机有些相像。

:
  ORA-602: internal programming exception
  ORA-7445: exception encountered: core dump [kssdch()+2188] [SIGSEGV]
  [Address not mapped to object] [0x0000708FB] [] []
  Thu Nov 18 01:02:04 GMT 2010
  PMON: terminating instance due to error 602

这个问题关系到一个 unpublished bug 9184754

我无法查到其中内容。

==================

无论如何这个问题已经FIX 掉了以下是ORACLE的建议。

Download and apply the one-off patch number Patch:9184754 on top of your version/platform. combination if available.

比较call stack 完全一致，call stack 请务必确保一致否则不要轻易尝试总结。

Call stack : kssdct()

当打过PATCH 之后问题解决。

具体可以参考 Doc ID 1281101.1

下面是自己查的一些其他资料。算是学习笔记了

==============================

Disable RAC

3. Change the working directory to $ORACLE_HOME/lib:

cd $ORACLE_HOME/lib

4. Run the following make command to relink the Oracle binaries without the RAC option:

make -f ins_rdbms.mk rac_off

make -f ins_rdbms.mk ioracle

==========================

RAC 有3中reason 会fail 第一个是节点自然离开第二个节点心跳死亡心跳是记录在controlfile中的第三个节点通信终端

RAC 默认通信使用 UPD 因为TCP IP 有7层 UPD没那么多也不许要3次握手内连很少丢包。

通信终端的原因

a message is not received for a timeout period, then a “communication failure” is assumed. This

is more relevant for UDP, as Reliable Shared Memory (RSM), Reliable DataGram protocol (RDG),

and Hyper Messaging Protocol (HMP) do not need it, since the acknowledgment mechanisms are

built into the cluster communication and protocol itself

大部分UPD 协议都是不可靠的如果发生丢包那么可以通屏蔽这个协议比如将_reliable_block_sends=TRUE 这样可能是走TCP了……目前不知道。

（user-mode IPC protocols

such as RDG (on HP Tru64 UNIX TruCluster) or HP HMP are used,）

_lgwr_async_broadcasts = true 这个参数可以设置是否允许异步广播

9I 的时候每一次COMMIT都需要所有的节点写REDO。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/21818314/viewspace-693195/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/21818314/viewspace-693195/

cjh2002519

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Rac PMON crash 故障解决一例

RAC 2节点CRASH 在ALERT LOG中发现如下日志 ORA-07445: exception encountered: core dump [kssdch()+2188] [SIGSEGV] [Address not...
复制链接

扫一扫