今天本来是休假半天,正在外面办事的时候,突然收到pushmail的告警,oracle数据库中有500多个ora-的报错,心里咯噔了一下,怎么会有那么多的报错!但是又没收到数据库down的pushmail,难道是程序遇到bug了?赶紧打了电话给同事,请他们帮忙看看。
同时也顾不上吃午饭了,匆匆赶到了公司,了解情况。Team Leader已经把该故障升级为一级故障,已经影响到了部分应用模块的正常使用。同事已经在metalink上开了SR,另外也开了一个电话会议,向客户介绍了当前情况。
赶紧先把SR扫了一遍,大致了解了情况,是ora-600[19004]的报错,oracle要求提供explain plan和10053的trace,于是一边登录现网执行语句,查看情况,另一边也介入了电话会议,确认了影响的范围,安抚客户情绪,并表达了针对这个问题的初步诊断分析,以及对策。
在alertlog中,我们看到有大量的报错:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:13 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1789954.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:16 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1507510.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:29 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1884392.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:31 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1372166.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:35 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1388784.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:37 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1388784.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:37 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1884392.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:43 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1105974.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:17 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1482830.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:24 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1773588.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:30 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1527844.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:33 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1527844.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:39 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1773588.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:40 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1388784.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:47 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1589298.trc:
从tracefile中看,语句是类似的,只是带入的变量有所不同。其sql是:
d . valuename AS pccodename ,
a . workingscope AS workingscope ,
h . currencyname AS currencyname ,
a . globalpccode AS globalpccode ,
a . grade AS grade ,
a . fax AS fax ,
b . zmclpattern AS zmclpattern1 ,
a . orgtype AS orgtype ,
a . industrytype AS industry ,
a . scagreementtype AS scagreementtype ,
a . regioncode AS regioncode ,
a . booksetid AS booksetid ,
a . areaid AS areaid ,
a . citycode AS citycode ,
g . languagename AS languagename ,
a . relationtype AS relationtype ,
a . saleschannelid AS saleschannelid ,
a . corporateperson AS corporateperson ,
j . cityname AS cityname ,
a . attr3 AS attr3 ,
f . countryname AS countryname ,
a . isbottom AS isbottom ,
a . paymentmethod AS paymentmethod ,
n . orgname AS relatedorgname ,
a . bgorgid AS bgorgid ,
a . shipmentterm AS shipmentterm ,
a . shipmentmethod AS shipmentmethod ,
'' AS tyresize ,
c . areacode AS areacode ,
a . parentorgid AS parentorgid ,
'' AS selectedcustom ,
j . citycode AS citycode1 ,
a . comments AS comments ,
a . ownerorgid AS ownerorgid ,
a . registerfund AS registerfund ,
a . telephone AS telephone ,
a . orgid AS orgid ,
a . opendate AS opendate ,
o . valuename AS solutiontypename ,
a . orgname AS orgname ,
l . valuename AS statusname ,
a . solutiontype AS solutiontype ,
a . initflag AS initflag ,
a . datelastupdated AS datelastupdated ,
a . createdate AS createdate ,
a . targettype AS targettype ,
a . url AS url ,
a . attr1 AS attr1 ,
d . languagecode AS languagecode ,
a . portcode AS portcode ,
a . preferedcurrency AS preferedcurrency ,
c . areaname AS areaname ,
a . countrycode AS orgcountrycode ,
a . status AS status ,
i . saleschannelname AS saleschannelname ,
a . paymentterm AS paymentterm ,
e . valuename AS corptypename ,
b . orgcode AS parentorgcode ,
a . booksetopendate AS booksetopendate ,
a . relatedorgid AS relatedorgid ,
b . shortname AS parentorgname ,
a . corporationtype AS corporationtype ,
a . orgcode AS orgcode ,
a . shortname AS shortname ,
a . email AS email ,
a . defaultlanguage AS defaultlanguage ,
j . languagecode AS languagecode7 ,
1 AS COUNT
FROM table_abcd d ,
table_abcd o ,
table_112233 c ,
table_aaaa_bbbb_cc h ,
lang g ,
table_abcd l ,
table_xyzzxy b ,
saleschannel i ,
table_xyzzxy a ,
table_abcd_1234_a f ,
city j ,
table_xyzzxy n ,
table_abcd e
WHERE 1 = 1
AND h . languagecode = j . languagecode
AND a . defaultlanguage = g . languagecode
AND d . languagecode = h . languagecode
AND l . TYPE = ' Status '
AND a . status = l . VALUE
AND d . TYPE = ' GlobalPCCode '
AND n . orgid ( + ) = a . relatedorgid
AND d . languagecode = o . languagecode
AND o . TYPE = ' SolutionType '
AND a . solutiontype = o . VALUE
AND a . globalpccode = d . VALUE
AND d . languagecode = f . languagecode
AND i . saleschannelid ( + ) = a . saleschannelid
AND a . preferedcurrency = h . currencycode
AND j . citycode ( + ) = a . citycode
AND c . areaid ( + ) = a . areaid
AND e . TYPE = ' CorporationType '
AND b . orgid ( + ) = a . parentorgid
AND a . countrycode = f . countrycode
AND a . corporationtype = e . VALUE
AND d . languagecode = l . languagecode
AND d . languagecode = e . languagecode
AND d . languagecode = ' cn '
AND a . orgid = 100001419
这个sql无论在做explain plan for还是10053的时候,都会报600的错。
在metalink中查到,这个ora-600的报错是一个oracle的bug,在oracle 9208,10105,10203版本的几乎所有平台都会发生。这个bug已经有patch了:patch 4899105
当然,幸好oracle也提供了临时解决方法:
在故障现场,执行了以下2个语句之后,问题解决:
exec dbms_stats . delete_table_stats ( ownname => ' user_name ' , tabname => ' table_name ' , cascade_indexes => true ) ;
--重新收集不带直方图的统计信息:
exec dbms_stats . gather_table_stats ( ownname => ' user_name ' , tabname => ' table_name ' , cascade => true , method_opt => ' for all columns size 1 ' ) ;
临时的解决故障后,之后的步骤,就是和客户商量时间打上这个重要的patch了。
其实我还是有点纳闷的是,为什么会触发这个bug,难道是统计信息不够精确?但是我的数据库是使用oracle的默认收集统计信息的设置,而且某个月还会手工收集一次,应该不会有很旧的统计信息。另外,如果表的直方图信息有问题,这样的表应该不止一个,为什么直方图的信息会导致600的报错?为什么只是在执行一个sql的时候报错了600,而不是所有信息不准确的且multi join的sql都报错?
原文地址:http://www.oracleblog.org/working-case/suffering-ora-600-19004/