ASMB的BUG(ORA-04030 kfmditer)导致数据库宕机

原创 2013年12月18日 20:23:02
ASMB的BUG(ORA-04030 kfmditer)导致数据库宕机
现象:
客户的一个重要生产系统RAC的一个实例宕机,查看alert日志:

Fri Jun 21 17:05:52 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc (incident=31397):
ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc

Fri Jun 21 17:05:55 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_rbal_11389.trc (incident=31389):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31389/jyj1_rbal_11389_i31389.trc
Fri Jun 21 17:06:14 2013
Instance terminated by ASMB, pid = 11391

查看asmb trace文件:
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc (incident=31397):
ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc
Fri Jun 21 17:05:52 2013
Trace dumping is performing id=[cdmp_20130621170552]
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc:
ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)
ASMB (ospid: 11391): terminating the instance due to error 4030
System state dump is made for local instance
System State dumped to trace file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_diag_11345.trc
Fri Jun 21 17:05:53 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lms0_11363.trc (incident=31301):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31301/jyj1_lms0_11363_i31301.trc
Fri Jun 21 17:05:53 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lmon_11359.trc (incident=31277):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31277/jyj1_lmon_11359_i31277.trc
Fri Jun 21 17:05:53 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lms1_11367.trc (incident=31309):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31309/jyj1_lms1_11367_i31309.trc
Fri Jun 21 17:05:54 2013
ORA-1092 : opitsk aborting process
Fri Jun 21 17:05:54 2013
License high water mark = 327
Fri Jun 21 17:05:55 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_rbal_11389.trc (incident=31389):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31389/jyj1_rbal_11389_i31389.trc
Fri Jun 21 17:06:14 2013
Instance terminated by ASMB, pid

jyj1_asmb_11391_i31397.trc:

Dump file /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /opt/app/ora11gR1db
System name: Linux
Node name: KSJYJ_DB01
Release: 2.6.18-164.el5
Version: #1 SMP Thu Sep 3 04:15:13 EDT 2009
Machine: x86_64
Instance name: jyj1
Redo thread mounted by this instance: 1
Oracle process number: 24
Unix process pid: 11391, image: oracle@KSJYJ_DB01 (ASMB)


*** 2013-06-21 17:05:52.045
*** SESSION ID:(532.1) 2013-06-21 17:05:52.046
*** CLIENT ID:() 2013-06-21 17:05:52.046
*** SERVICE NAME:(SYS$BACKGROUND) 2013-06-21 17:05:52.046
*** MODULE NAME:() 2013-06-21 17:05:52.046
*** ACTION NAME:() 2013-06-21 17:05:52.046
 
Dump continued from file: /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc
ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)

========= Dump for incident 31397 (ORA 4030) ========

*** 2013-06-21 17:05:52.046
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
 <- ksfdmp <- dbgexPhaseII <- dbgexProcessError <- dbgeExecuteForError <- dbgePostErrorKGE
 <- 1774 <- dbkePostKGE_kgsf <- kgesev <- kgesec3 <- kghnospc
 <- kghalf <- kfmdIterInit <- kfkIterInit <- kfnbIostatiterOp <- 110
 <- kfnbRun <- ksbrdp <- opirip <- opidrv <- sou2o

Process state
-----------------------

SO: 0x940dd1b98, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x940dd1b98, name=process, file=ksu.h LINE:10286, pg=0
 (process) Oracle pid:24, ser:1, calls cur/top: 0x920f28eb8/0x920f28eb8
 flags: (0x6) SYSTEM
 int error: 0, call error: 0, sess error: 0, txn error 0
 (post info) last post received: 0 0 34
 last post received-location: ksr2.h LINE:594 ID:ksrpublish
 last process to post me: 950dfd540 47 2
 last post sent: 0 0 64
 last post sent-location: kso2.h LINE:316 ID:ksoreq_reply
 last process posted by me: 930e5c948 1 0
 (latch info) wait_event=0 bits=0
 Process Group: DEFAULT, pseudo proc: 0x950e4c060
 O/S info: user: oracle, term: UNKNOWN, ospid: 11391
 OSD pid info: Unix process pid: 11391, image: oracle@KSJYJ_DB01 (ASMB)
Dump of memory from 0x00000009D0DC0A10 to 0x00000009D0DC0C18


分析:
从报错信息(ORA-04030)看来,怀疑是Oracle的BUG导致的,因为以前碰到过类似的ASMB进程内存泄露的BUG,
于是搜索metalink关键词:asmb 04030
发现第一篇就跟客户的问题吻合。
ASMB process grows raising ora-4030 intermittently (Doc ID 735180.1)
ASMB process grows on memory, eventually leading to ora-4030 errors
which causes DB crash.

The reported error:
ORA-04030: out of process memory when trying to allocate 552 Bytes (callheap,kfmditer)
 
In the ASMB process heapdump we can see most of memory chunks are for 'kfmditer',
example:

 BreakDown
 ~~~~~~~~~
 Type     Count   Sum        Average
 ~~~~     ~~~~~   ~~~        ~~~~~~~
 Free     285684  142841492  500.00
 kfmditer 285685  157698132  552.00   <-- 在ASMB的HEAPDUMP中也看到了绝大多数都为kfmditer的内存片

 Total = 300539624 bytes 293495.73k 286.62MB
 
 这个BUG在11.1以后的大版本中都有出现,但是在以下的patchset中被修复:
 
 This issue is fixed in

11.2.0.1 (Base Release)
11.1.0.7.1 (Patch Set Update)
10.2.0.5 (Server Patch Set)
11.1.0.7 Patch 11 on Windows Platforms
11.1.0.7 RAC Recommended Patch Bundle #1
11.1.0.6 Patch 11 on Windows Platforms

如果不想做patchset升级的话,也可以直接打个小Patch 6851110可以解决这个问题。
You can check if Patch 6851110 is available for your patchset release and
O/S environment.:  Patch 6851110

解决方法:
在客户的数据库上打patch  6851110,经过持续观察一段时间,该问题未再现。

VKTM导致ASMB终止导致ORALCE实例宕

昨天在查看数据库备份情况时,查看日志发现: RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream ...
  • wzw_dba
  • wzw_dba
  • 2015年07月03日 16:19
  • 2385

Nginx:配置指南(2)

基于名字的虚拟主机 Nginx首先选定由哪一个虚拟主机来处理请求。让我们从一个简单的配置(其中全部3个虚拟主机都在端口*:80上监听)开始: server { listen ...
  • zhoudaxia
  • zhoudaxia
  • 2014年07月23日 17:22
  • 5598

Oracle BUG导致实例宕机:ORA-07445

现象: 客户的数据库(RAC环境:11.1.0.6)发生了实例异常宕机现象,伴随有ORA-07445错误: Sun Jun 23 01:00:06 2013 Exception [type: S...
  • zhou1862324
  • zhou1862324
  • 2013年12月23日 21:22
  • 1969

ORA-04031导致数据库宕机问题分析

背景介绍 2014/6/5接渠道反馈,用户数据库意外宕机,后经过重启服务器数据库恢复正常,用户希望能够排查原因,避免再次出现宕机事故,这种意外宕机原因排查是我们远程处理经常遇到的案例,虽然宕机的原因...
  • wenzhongyan
  • wenzhongyan
  • 2014年06月10日 17:45
  • 4874

ORA-00604和ORA-04031导致数据库实例宕机

问题描述 某用户数据库数据库突然宕机,查看日志发现宕机前大量出现如下错误: Errors in file /u01/oracle/admin/orcl/bdump/orcl2_smon_143...
  • wenzhongyan
  • wenzhongyan
  • 2015年11月17日 10:18
  • 1879

FLZYY数据库意外宕机恢复记录 ORA-00600 [kddummy_blkchk]

前言 本文介绍了CQ公司用户FLZYY一次意外断电导致数据库崩溃,技术人员经过多方尝试无法对数据库进行恢复,当决定采用备份恢复时,在检查用户备份环境时发现竟然已经失效很久,在万般无赖的情况下求助于总...
  • wenzhongyan
  • wenzhongyan
  • 2012年09月06日 16:32
  • 1357

10G RAC节点2宕机通过修改listener.ora实现客户端通过节点2VIP连接到数据库

根据周亮ORACLE DBA实战里的一个实验做的。 环境描述:两节点10G RAC环境,节点2宕机。此时客户端通过原节点2 VIP地址无法连接至数据库。客户端较多修改不便需要在服务器上进行修改。 ...
  • q947817003
  • q947817003
  • 2014年04月09日 23:24
  • 1563

由 BUG 引发 ORA-15064 进而导致数据库实例意外中止

今天早上刚到公司一会,就接到南京客户打来的电话,说他们的核心会员数据库宕机了,让我远程帮忙查看一下。 还没来得急看报纸的我赶紧打开电脑,远程连接到客户的服务器进行诊断。 客户的生产环境是AI...
  • aaron8219
  • aaron8219
  • 2015年01月05日 16:40
  • 1227

ORA-27300,ORA-27301,ORA-27302,ORA-04030导致crashed database

ORA-27300: OS system dependent operation:fork failed with status: 11/12;ORA-27301: OS failure m...
  • huang_tg
  • huang_tg
  • 2011年03月15日 11:28
  • 5102

flashback_area 区域溢出导致数据库宕机

问题: ORA-00257       ORA-16014 log 1 sequence# 1085 not archived, no available destinations ORA-0031...
  • wuweilong
  • wuweilong
  • 2012年02月07日 09:25
  • 1168
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:ASMB的BUG(ORA-04030 kfmditer)导致数据库宕机
举报原因:
原因补充:

(最多只允许输入30个字)