Oracle-07445[kgghash]:Oracle BUG导致更新LOB字段时进程被KILL掉

原创 2014年01月07日 19:39:23
Oracle-07445[kgghash]:Oracle BUG导致更新LOB字段时进程被KILL掉

现象:
用户反馈有DB连接被意外KILL掉的情况,排查中发现alert日志中有如下信息:
Mon Apr 22 14:49:58 2013
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x2B1733615000] [PC:0x777EE03, kgghash()+367]
Errors in file /oracle/app/11gR1/diag/rdbms/portal/portal2/trace/portal2_ora_19188.trc (incident=186505):
ORA-07445: 出现异常错误: 核心转储 [kgghash()+367] [SIGSEGV] [ADDR:0x2B1733615000] [PC:0x777EE03] [Address not mapped to object] [] <<<<<<<<<
Incident details in: /oracle/app/11gR1/diag/rdbms/portal/portal2/incident/incdir_186505/portal2_ora_19188_i186505.trc
Mon Apr 22 14:55:08 2013
Thread 2 advanced to log sequence 4042
 Current log# 12 seq# 4042 mem# 0: +portalDG/portal/onlinelog/group_12.292.796513605
Mon Apr 22 14:55:08 2013
SUCCESS: diskgroup ARCHDG was mounted
Mon Apr 22 14:55:14 2013
SUCCESS: diskgroup ARCHDG was dismounted
Mon Apr 22 15:00:01 2013
Process 0x0x2147ed628 appears to be hung while dumping                <===进程被KILL掉
Current time = 1955749344, process death time = 1955689185 interval = 60000
Attempting to kill process 0x0x2147ed628 with OS pid = 19188
OSD kill succeeded for process 0x2147ed628
同时段的alert日志中确实有KILL process信息,并且在process被KILL前,该进程有ORA-07445错误报出,估计是Oracle的BUG。
查看portal2_ora_19188_i186505.trc日志信息:

Dump file /oracle/app/11gR1/diag/rdbms/portal/portal2/incident/incdir_186505/portal2_ora_19188_i186505.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/app/11gR1/db
System name: Linux
Node name: PORTAL02
Release: 2.6.18-194.el5
Version: #1 SMP Tue Mar 16 21:52:39 EDT 2010
Machine: x86_64
Instance name: portal2
Redo thread mounted by this instance: 2
Oracle process number: 213
Unix process pid: 19188, image: oracle@PORTAL02


*** 2013-04-22 14:49:58.322
*** SESSION ID:(1403.45735) 2013-04-22 14:49:58.322
*** CLIENT ID:() 2013-04-22 14:49:58.322
*** SERVICE NAME:(portal) 2013-04-22 14:49:58.322
*** MODULE NAME:(JDBC Thin Client) 2013-04-22 14:49:58.322
*** ACTION NAME:() 2013-04-22 14:49:58.322
 
Dump continued from file: /oracle/app/11gR1/diag/rdbms/portal/portal2/trace/portal2_ora_19188.trc
ORA-07445: 出现异常错误: 核心转储 [kgghash()+367] [SIGSEGV] [ADDR:0x2B1733615000] [PC:0x777EE03] [Address not mapped to object] []

========= Dump for incident 186505 (ORA 7445 [kgghash()+367]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x2B1733615000] [PC:0x777EE03, kgghash()+367]
Registers:
%rax: 0x000000001605399e %rbx: 0x0000000000000000 %rcx: 0x000000009df422ed
%rdx: 0x00000000895e2309 %rdi: 0x00002b1733615000 %rsi: 0x00000000096d62a0
%rsp: 0x00007fffb42e8a20 %rbp: 0x00007fffb42e8a30 %r8: 0x0000000000000000
 %r9: 0x00000000c55f2e0a %r10: 0x000000000777ee03 %r11: 0x000000002eeb4000
%r12: 0x0000000219382310 %r13: 0x0000000125d120e0 %r14: 0x00002b1733541e50
%r15: 0x0000000160c68fd0 %rip: 0x000000000777ee03 %efl: 0x0000000000010293
 kgghash()+356 (0x777edf8) movzbl 0xa(%rdi),%esi
 kgghash()+360 (0x777edfc) shl $0x10,%esi
 kgghash()+363 (0x777edff) add %esi,%eax
 kgghash()+365 (0x777ee01) jmp 0x777edb7
> kgghash()+367 (0x777ee03) movzbl (%rdi),%esi
 kgghash()+370 (0x777ee06) add %esi,%edx
 kgghash()+372 (0x777ee08) jmp 0x777ed19
 kgghash()+377 (0x777ee0d) add (%rdi),%edx
 kgghash()+379 (0x777ee0f) add 0x4(%rdi),%ecx

*** 2013-04-22 14:49:58.339
----- Current SQL Statement for this session (sql_id=bb8bnr20h93mk) -----
update oa9.sys_process set fd_detail=:1 , fd_descriptor=:2 where fd_id=:3

----- Call Stack Trace -----
skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
 <- ssexhd <- restore_rt <- kgghash <- kkocsCreateBindSet <- kkocsInitAdaptiveCtx
 <- apadrv <- opitca <- PGOSF385_kksFullTypeCheck <- rpiswu2 <- kksSetBindType
 <- kksfbc <- opiexe <- kpoal8 <- opiodr <- ttcpip
 <- opitsk <- opiino <- opiodr <- opidrv <- sou2o
 <- opimai_real <- ssthrdmain <- main <- libc_start_main <- start

----------------------------------------
Cursor#35(0x2b1732bb2c28) state=BOUND curiob=0x2b1733541e50
 curflg=4c fl2=0 par=(nil) ses=0x214cd7880
----- Dump Cursor sql_id=bb8bnr20h93mk xsc=0x2b1733541e50 cur=0x2b1732bb2c28 -----
Dump Parent Cursor sql_id=bb8bnr20h93mk phd=0x21b32bee8 plk=0x21a12acb8
 sqltxt(0x21b32bee8)=update oa9.sys_process set fd_detail=:1 , fd_descriptor=:2 where fd_id=:3
hash=d464c012671d8361b5a174b881048e72
 parent=0x1e7cda730 maxchild=01 plk=0x21a12acb8 ppn=n
cursor instantiation=0x2b1733541e50 used=1366613397 exec_id=0 exec=0
 child#0(0x218b3bb40) pcs=0x1a2dff738
 clk=0x21ac20ed8 ci=0x1e737a828 pn=0x21bd41568 ctx=0x125d120e0
 kgsccflg=0 llk[0x2b1733541e58,0x2b1733541e58] idx=0
 xscflg=130672 fl2=11100008 fl3=2222600 fl4=100
----- Bind Byte Code (IN) -----
 Opcode = 14 Bind Skip Long
Opcode = 2 Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 0x1a3438220 Offsi = 48, Offsi = 32
 Opcode = 2 Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 0x1a3438250 Offsi = 48, Offsi = 64
 Opcode = 1 Unoptimized
Null Offset
----- Bind Info (kkscoacd) -----
 Bind#0
 oacdty=01 mxl=4001(5018) mxlc=00 mal=00 scl=00 pre=00
 oacflg=03 fl2=1000010 frm=01 csi=852 siz=4000 off=0
 kxsbbbfp=2b1733614060 bln=4000 avl=4001 flg=25
 value=" WZ3+"...
 Bind#1
 oacdty=01 mxl=2000(1620) mxlc=00 mal=00 scl=00 pre=00
 oacflg=03 fl2=1000010 frm=01 csi=852 siz=2128 off=0
 kxsbbbfp=2b17336137d0 bln=2000 avl=822 flg=05
 value="<process parent="oa">
<initial-node id="N1" name="开始" nextTargetNodeIds="N2" parent="oaInitNode">
<results>
<result priority="0" target="N2"/>
</results>
</initial-node>
<nodes>
<node id="N2" name="提交" nextTargetNodeIds="N4" parent="oaDraftNode">
<res"...
 Bind#2
 oacdty=01 mxl=128(64) mxlc=00 mal=00 scl=00 pre=00
 oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=2000
 kxsbbbfp=2b1733613fa0 bln=128 avl=32 flg=01
 value="13e3064a34e8924d0e196e044089dff5"
 Frames pfr (nil) siz=0 efr (nil) siz=0
 Cursor frame dump
 enxt: 2.0x00000060 enxt: 1.0x00000898
 pnxt: 1.0x00000060
 kxscphp=0x2b1733606828 siz=1008 inu=176 nps=96
 kxscbhp=0x2b17335a5660 siz=11456 inu=10696 nps=10512

分析:
ORA-07445通常是ORACLE的BUG引起的,在文档ORA-7445 [kgghash] [ID 310172.1]中发现有符合我们现象的BUG:
Bug 13463131 Dump (kgghash) from bind peeking,BUG 13463131的完整内容:
Bind peeking may dump in kgghash() when long character datatypes are
being used.
The stack will typically include the following calls:
   kkocsCreateBindSet->kgghash
<===== 与我们的CASE相符合,表 oa9.sys_process有CLOB字段,且在我们的CASE中是call stack为kgghash <- kkocsCreateBindSet <- kkocsInitAdaptiveCtx

The bind variable that causes the dump shows the avl value is greater than
the bln (in the cursor dump in the trace).

  Bind#7
   oacdty=96 mxl=4001(22518) mxlc=00 mal=00 scl=00 pre=00
   oacflg=01 fl2=1000010 frm=01 csi=873 siz=4000 off=0
   kxsbbbfp=0e42f060  bln=4000  avl=4001  flg=25
 
This situation can happen, for example, using a CLOB column and having
the client bind with CHAR.
<====抽取我们的dump文件相关内容,可以发现bln=4000 avl=4001,符合文档描述。
Bind#0
 oacdty=01 mxl=4001(5018) mxlc=00 mal=00 scl=00 pre=00
 oacflg=03 fl2=1000010 frm=01 csi=852 siz=4000 off=0
 kxsbbbfp=2b1733614060 bln=4000 avl=4001 flg=25 <<<<<<
 value=" WZ3+"...
========================================
Workaround
  Disable bind peeking (set "_optim_peek_user_binds" = false)
  but note that this may cause changes to execution plans so
  may not be practical.
<=====Oracle给出的解决方案是禁用bind peeking,但是带来的风险是可能导致SQL执行计划改变
另外可以升级至11.2.0.3之后再apply patch 13463131解决。
Note: This fix supersedes fixes of bug 7352414 , bug 8763922.



What is "Bind Peeking" ?
 
With 9i, a new feature "Bind Peeking" was introduced to try to get around the issues associated with the selectivity of a bind variable.
During hard parsing of a query using bind variables, the binds are "peeked at" and selectivity is generated based on the bind and the underlying column statistics.
This method is successful but if the bind selectivity of the initial execution of a statement varied from the selectivity of subsequent executions with different sets of binds this could sometimes lead to plans being enerated that were not representative of the general query usage and which performed poorly.

This feather is controlled by the hidden parameter "_OPTIM_PEEK_USER_BINDS". the default value is 'true'.

Peeking is enabled if all of the following conditions are met:
 * - optimizer_features_enabled is greater than 8.2
 * - there are user binds
 * - the cursor is fully bound
 * - the cursor is not remote mapped
 * - the client is using V8OCI
 * - it's a select or DML
 * - peeking is possible for this statement and the statement is being
 * loaded for the first time, or the original cursor was peeked and
 * this cursor is being reloaded

For more details, please refer to:

Document 387394.1 Query using Bind Variables is suddenly slow
Document 430208.1 Bind Peeking By Example
"_OPTIM_PEEK_USER_BINDS" [Hidden] Reference Note [ID 201672.1]


MTK Kernel启动流程源码解析 5 start_kernel 下

http://blog.csdn.net/xichangbao/article/details/52895769 一 start_kernel start_kernel函数是kerne...
  • ffmxnjm
  • ffmxnjm
  • 2017-04-18 17:57:11
  • 280

RAC11.2.0.4+单节点物理standby dataguard-rman恢复方式

1.1、dg名词和概念 (1)、primary数据库(主数据库) primary数据库既可以是单实例数据库,也可以是rac结构。 日志传输服务log transport service:包括LGWR/...
  • offbeatmine
  • offbeatmine
  • 2016-01-18 11:28:25
  • 917

kill掉oracle中进程

当执行oracle中sql语句发现存在锁时,可以查看当前的进程,然后再kill掉不想要的。 SELECT 'exec prc_kill_session(''' || username || ''',...
  • c2406070418
  • c2406070418
  • 2013-12-17 11:43:36
  • 578

oracle查看已经kill掉了session,但没有被kill的process

select * from v$process p where p.ADDR not in (select b.paddr from v$bgprocess b)   and p.ADDR not i...
  • fycghy0803
  • fycghy0803
  • 2013-11-20 16:28:52
  • 813

ALERT.LOG for ASM Shows "WARNING: failed to online diskgroup resource ora.GI.dg (unable to communica

APPLIES TO: OracleDatabase - Enterprise Edition - Version 11.2.0.1 to 12.1.0.1 [Release 11.2 to12.1...
  • wuweilong
  • wuweilong
  • 2015-01-04 22:48:26
  • 2888

RMAN异机恢复步骤及故障处理

一、测试机安装OS+Oracle Software,包括配置oracle用户组和环境变量(略) 二、开始异机恢复 1. 复制源库最新备份集、初始化参数、密码文件到测试机 [oracle@or...
  • aaron8219
  • aaron8219
  • 2014-10-12 21:20:14
  • 2360

记一次app主进程被kill的经历

出现问题最近在为河北电信做定制游戏平台开发的时候,遇到了一个很奇怪的问题,在某些盒子上,从平台打开某些游戏,玩上几分钟,然后该退出游戏,回到公司的游戏平台后,便出现了页面数据丢失的问题。 排查了几天...
  • qwe511455842
  • qwe511455842
  • 2017-03-24 14:05:13
  • 477

Oracle Kill被锁进程

  • 2013年06月07日 13:49
  • 2KB
  • 下载

S5P4418 使用SD卡启动Android系统

S5P4418的启动方式 我的板子是g4418box,默认的从eMMC启动,如果插入了SD卡那么就从SD启动,具体的板子需要看HW设计。 SD卡的layout 在layout之前我们需要知道各个分...
  • sy373466062
  • sy373466062
  • 2016-08-17 14:42:57
  • 4744

使用apt-mirror建立本地debian仓库源

先介绍一下环境: 主机:Win7 虚拟机:VirtualBox + Debian7 由于软件源的体积比较大,所以我又给虚拟机添加了一块50GB的虚拟硬盘(给虚拟机添加虚拟硬盘的方法参...
  • Michaelwubo
  • Michaelwubo
  • 2014-12-10 13:25:51
  • 8068
收藏助手
不良信息举报
您举报文章:Oracle-07445[kgghash]:Oracle BUG导致更新LOB字段时进程被KILL掉
举报原因:
原因补充:

(最多只允许输入30个字)