今天发生一起ORA-00600事件,记录一下
接到user报告后,首先产看alert
Fri Sep 30 14:40:57 WIT 2022
Errors in file /oracle/app/admin/IDHRISDB/udump/idhrisdb_ora_1833.trc:
ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []
进一步查看trace file,截取部分内容:
/oracle/app/admin/IDHRISDB/udump/idhrisdb_ora_1833.trc
Oracle Database 10g Release 10.2.0.5.0 - 64bit Production
ORACLE_HOME = /oracle/app/product/10.2.0
System name: Linux
Node name: hrisdb_prod
Release: 2.6.32-300.10.1.el5uek
Version: #1 SMP Wed Feb 22 17:37:40 EST 2012
Machine: x86_64
Instance name: IDHRISDB
Redo thread mounted by this instance: 1
Oracle process number: 406
Unix process pid: 1833, image: oracle@hrisdb_prod
*** ACTION NAME:() 2022-09-30 14:40:57.589
*** MODULE NAME:(SQL Developer) 2022-09-30 14:40:57.589
*** SERVICE NAME:(SYS$USERS) 2022-09-30 14:40:57.589
*** SESSION ID:(477.45483) 2022-09-30 14:40:57.589
*** 2022-09-30 14:40:57.589
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []
Current SQL statement for this session:
SELECT *
from em_daywk a
WHERE
NOT EXISTS(
SELECT 1
FROM em_monwk b
WHERE A.fact_no = b.fact_no
AND A.pnl_no = b.pnl_no
AND A.card_date like b.card_ym||'%'
AND b.card_ym < '202209'
AND b.move_mk = '0')
AND A.card_date < '20220901'
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ssd_unwind_bp: unhandled instruction at 0x3cd16be instr=f
ksedst()+31 call ksedst1() 000000000 ? 000000001 ?
7FFF276087C0 ? 7FFF27608820 ?
7FFF27608760 ? 000000000 ?
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.5.0 and later [Release: 10.2 and later ]
Information in this document applies to any platform.
Symptoms
When executing a statement using a SELECT getting:
ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []
The execution plan from the ORA-600 trace file is showing hash join is used.
Call Stack is including:
kcblasm1 kcblasm kxhfFndFreeSlot kxhfNewBuffer qerhjGetNewBuffer ksxb1bqb kxhrPack Cause
The issue was investigated in:
Bug 9800302 - 10.2.0.5 GETTING ORA-00600 [KCBLASM_1] closed as duplicate of:Bug 7612454 - Abstract: DSS:PERF REGRESSIONS IN SERIAL DIRECT READS fixed in 11.2.
As per development team the number of slots available for direct I/Os (limited to 4096) forced the hash-join algorithm to operate on fewer number of slots and resulted in more spills to disk. This caused:
direct path IO to perform worse in 10.2.0.5 than earlier releases with more “direct path read” operations or ORA-600 [kcblasm_1] errors.
Solution
- Upgrade the database to 11.2.
OR - Apply &incFamilyProds=false&flag=search))" target=_blank>Patch 7612454 available on MOS. If a patch is not currently available on top of your database version and/or platform please raise a Service Request to request for it.
Please be sure that your database version qualifies for getting a new patch as per Note 209768.1 and Note 742060.1.
OR - Use the workaround of setting:
“_hash_join_enabled”= false
简单讲旧是10.2.0.5的select使用hash join引起的bug
解决方案:
1.升级到11.2
2.打补丁到7612454
3.disable hash_join
先执行opatch lsinv看看数据库打了哪些补丁,确实发现没有这个补丁
[oracle@hrisdb_prod OPatch]$ pwd
/oracle/app/product/10.2.0/OPatch
[oracle@hrisdb_prod OPatch]$ ./opatch lsinv
Invoking OPatch 10.2.0.4.9
Oracle Interim Patch Installer version 10.2.0.4.9
Copyright (c) 2009, Oracle Corporation. All rights reserved.
Oracle Home : /oracle/app/product/10.2.0
Central Inventory : /oracle/app/oraInventory
from : /etc/oraInst.loc
OPatch version : 10.2.0.4.9
OUI version : 10.2.0.5.0
OUI location : /oracle/app/product/10.2.0/oui
Log file location : /oracle/app/product/10.2.0/cfgtoollogs/opatch/opatch2022-09-30_16-00-22PM.log
Patch history file: /oracle/app/product/10.2.0/cfgtoollogs/opatch/opatch_history.txt
Lsinventory Output file location : /oracle/app/product/10.2.0/cfgtoollogs/opatch/lsinv/lsinventory2022-09-30_16-00-22PM.txt
--------------------------------------------------------------------------------
Installed Top-level Products (2):
Oracle Database 10g 10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 4 10.2.0.5.0
There are 2 products installed in this Oracle Home.
Interim patches (2) :
Patch 12880299 : applied on Mon Jun 03 13:13:36 WIT 2019
Unique Patch ID: 14837224
Created on 10 May 2012, 03:33:54 hrs PST8PDT
Bugs fixed:
12880299
Patch 8350262 : applied on Wed Mar 12 22:50:25 WIT 2014
Unique Patch ID: 13011832
Created on 13 Sep 2010, 14:09:12 hrs GMT
Bugs fixed:
8350262
--------------------------------------------------------------------------------
OPatch succeeded.
看看trace file中记录的sql的执行计划
确实走hash_join,但因为是生产库,无法停机打补丁,先使用disable _hash_join的方式
SQL> alter session set "_hash_join_enabled" = true;
Session altered.
紧接着再次查看执行计划,现在是merge join了
先让用户这样执行,再找时间打补丁
同时执行计划中的用于建立build table的表返回太多的行,可能是触发bug的原因,因为hash join适合于小表join大表,所以跟前端交代要改写sql