ORA-600(17069)错误的解决过程

在一个报表数据库后台发现了这个错误,详细的错误信息为:

Fri Feb 20 08:16:44 2009
Errors in file /u1/oracle/admin/repdb01/bdump/repdb01_j015_5099.trc:
ORA-00600: internal error code, arguments: [17069], [0x6A5DEE1E0], [], [], [], [], [], []
Fri Feb 20 08:16:47 2009
Errors in file /u1/oracle/admin/repdb01/bdump/repdb01_j015_5099.trc:
ORA-00600: internal error code, arguments: [17069], [0x6A5DEE1E0], [], [], [], [], [], []

进一步检查对应的trace文件:

bash-2.03$ more /u1/oracle/admin/repdb01/bdump/repdb01_j015_5099.trc
/u1/oracle/admin/repdb01/bdump/repdb01_j015_5099.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 – Production
ORACLE_HOME = /data/oracle/product/920
System name:    SunOS
Node name:      newreport
Release:        5.8
Version:        Generic_117350-26
Machine:        sun4u
Instance name: repdb01
Redo thread mounted by this instance: 1
Oracle process number: 35
Unix process pid: 5099, image: oracle@newreport (J015)
*** SESSION ID:(12.28191) 2009-02-20 08:16:44.060
*** 2009-02-20 08:16:44.060
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [17069], [0x6A5DEE1E0], [], [], [], [], [], []
Current SQL statement for this session:
DECLARE job BINARY_INTEGER := :job; next_date DATE := :mydate;  broken BOOLEAN := FALSE; BEGIN P_GENERATE_REPDATA('FR20T000002000000
0000032'); :mydate := next_date; IF broken THEN :b := 1; ELSE :b := 0; END IF; END;
----- Call Stack Trace -----
calling                 call     entry                   argument values in hex     
location                type     point                   (? means dubious value)    
--------------------  -------- -------------------- ----------------------------
ksedmp()+328            CALL     ksedst()+0             FFFFFFFF7FFF6430 ?
                                                             000000000 ? 000000000 ?
                                                             00000003E ?
                                                            FFFFFFFF7FFF6CC8 ?
                                                             1031D56C8 ?
kgeriv()+208            PTR_CALL                      0000000000000000        000000000 ? 000103400 ?
                                                            0001035D9 ? 000102C00 ?
                                                             1035D9000 ? 1035D9C28 ?

从trace文件包含的进程名称j015来看,导致问题是一个JOB。从trace文件中包含的错误语句则更加证实了这一点。由于这个JOB在数据库中已经运行了很长时间,一直没有出现过错误。现在运行报错,肯定是由于其他的外部原因导致JOB运行的异常。

大部分的ORA-600错误在Metalink上都有详细的描述,于是查询了Metalink:文档Doc ID: 39616.1中汇总了ORA-600(17069)错误的已知Bug,不过这些Bug的描述都与当前问题的现象并不太相符。不过文档中还是包含了一些有价值的信息。文档中描述ORA-600(17069)错误的第二个参数代表Library Cache Object Handle,这里的值为0x6A5DEE1E0。

看起来问题可能和LATCH有关,但是根据错误信息所显示的地址,在V$LATCH和V$LATCH_CHILDREN视图中都没有找到有价值的信息。

从错误信息这个方向入手找不到什么有价值的信息了,现在只能回过头来检查发生错误的JOB。根据JOB的特性,在运行失败后这个JOB会自动再次执行,检查JOB运行时的V$LOCK信息:

SQL> SELECT ADDR, TYPE, ID1, ID2, LMODE, REQUEST, BLOCK
  2  FROM V$LOCK  WHERE SID = 75;
ADDR               TY        ID1        ID2      LMODE    REQUEST      BLOCK
---------------- -- ---------- ---------- ---------- ---------- ----------
0000000690342780            CU -1.703E+09            6            6            0            0
00000006903426F8            JQ            0           63            6            0            0

在V$LOCK中没有什么特别的信息,接着检查V$SESSION_WAIT,看看这个JOB在等待什么:

SQL> SELECT EVENT, P1TEXT, P1RAW, P2TEXT, P2RAW, STATE
  2  FROM V$SESSION_WAIT  WHERE SID = 75;
EVENT               P1TEXT            P1RAW              P2TEXT       P2RAW              STATE
----------------- --------------- ---------------- ------------ ---------------- -------
library cache pin handle address 00000006A5DEE1E0 pin address 00000006B1A971A8 WAITING

很明显,查询结果中P1RAW的值就是ORA-600(17069)错误的第二个参数,配合等待事件信息基本上可以确定问题就是出现在LIBRARY CACHE PIN的过程中。再次查看Metalink信息,Oracle指出这个错误的原因多半是:运行时间很长的PROCEDURE在执行过程中,所依赖的对象被编译或者删除了。

检查出错JOB所调用过程的状态:

SQL> SELECT OWNER, OBJECT_NAME, OBJECT_TYPE, STATUS
  2   FROM DBA_OBJECTS
  3   WHERE OWNER = 'FUJIANREP'
  4   AND OBJECT_NAME = 'P_GENERATE_REPDATA';
OWNER                              OBJECT_NAME                       OBJECT_TYPE        STATUS
------------------------------ ------------------------------ ------------------ -------
FUJIANREP                          P_GENERATE_REPDATA               PROCEDURE            INVALID

果然,出错过程的状态是不正常的。在修正错误前,首先将JOB置于BROKEN状态以避免JOB再次运行:

SQL> EXEC DBMS_JOB.BROKEN(63, TRUE)
PL/SQL procedure successfully completed.
SQL> COMMIT;
Commit complete.

在操作系统级杀掉JOB对应的PROCESS:

SQL> SELECT SPID FROM V$PROCESS WHERE ADDR IN (SELECT PADDR FROM V$SESSION WHERE SID = 75);
SPID
------------
14927
SQL> HOST kill -9 14927

现在JOB调用已经被终止,可以手工重新编译过程了:

SQL> ALTER PROCEDURE P_GENERATE_REPDATA COMPILE;
ALTER PROCEDURE P_GENERATE_REPDATA COMPILE
*
ERROR at line 1:
ORA-04021: timeout occurred while waiting to lock object FUJIANREP.P_GENERATE_REPDATA

编译报错,错误信息指出没有获得编译对象所需的锁,而导致超时错误的发生。

由于从V$LOCK和V$LATCH视图中都无法获得有意义的信息,只能检查是否有其他人当前在访问P_GENERATE_REPDATA所依赖的对象:

SQL> SELECT * FROM V$ACCESS
  2   WHERE (OWNER, OBJECT) IN
  3  (SELECT REFERENCED_OWNER, REFERENCED_NAME FROM DBA_DEPENDENCIES
  4   WHERE OWNER = 'FUJIANREP' AND NAME = 'P_GENERATE_REPDATA');
         SID OWNER                              OBJECT                             TYPE
---------- ------------------------------ ------------------------------ ------------
          54 FUJIANREP                          CAT_BUYER                          SYNONYM
          54 FUJIANREP                          CAT_CATEGORY                      SYNONYM
          54 FUJIANREP                          CAT_DOSEAGE_FORM                 SYNONYM
          54                                FUJIANREP                          CAT_DRUG                          SYNONYM
          54                                FUJIANREP                          CAT_ENTERPRISE                   SYNONYM
          54                                FUJIANREP                          CAT_METRIC                        SYNONYM
          54                                FUJIANREP                          CAT_ORG                           SYNONYM
          54                                FUJIANREP                          CAT_PRODUCT                       SYNONYM
          54                                FUJIANREP                          CAT_QUALITY_DEFINE               SYNONYM
          54                                FUJIANREP                          GOV_CAT_BUYER                    TABLE
          54                                FUJIANREP                          GOV_CAT_ENTERPRISE               TABLE
          54                                FUJIANREP                          GOV_S_MO_BU                       TABLE
          54                                FUJIANREP                          GOV_S_MO_BU_EN                   TABLE
          54                                FUJIANREP                          GOV_S_MO_BU_PR                   TABLE
          54                                FUJIANREP                          GOV_S_MO_EN                       TABLE
          54                                FUJIANREP                          GOV_S_MO_ME                       TABLE
          54                                FUJIANREP                          GOV_S_MO_ME_CA                   TABLE
          54                                FUJIANREP                          GOV_S_MO_ME_PR                   TABLE
          54                                FUJIANREP                          GOV_S_MO_ORDER                   TABLE
          54                                FUJIANREP                          GOV_S_YE_ORDER                   TABLE
          54                                FUJIANREP                          GRP_HOSPITAL                      TABLE
          54                                FUJIANREP                          GRP_LEVEL                         TABLE
          54                                FUJIANREP                          ORD_ORDER                         TABLE
          54                                FUJIANREP                          ORD_ORDER_ITEM                   TABLE
          54                                FUJIANREP                          ORD_ORDER_ITEM_REP               CURSOR
          54                                FUJIANREP                          ORD_ORDER_RECEIVE                TABLE
          54                                FUJIANREP                          ORD_ORDER_RECEIVE_REP          SYNONYM
          54                                FUJIANREP                          ORD_ORDER_REP                    CURSOR
          54                                FUJIANREP                          ORD_ORDER_RETURN                 TABLE
          54                                FUJIANREP                          ORD_ORDER_RETURN_REP           CURSOR
          54                                FUJIANREP                          PLT_PLAT                          CURSOR
          54                                FUJIANREP                          USER_TAB_PARTITIONS            CURSOR
          54                                NDMAIN                             CAT_BUYER                         TABLE
          54                                NDMAIN                             CAT_CATEGORY                      TABLE
          54                                NDMAIN                             CAT_DOSEAGE_FORM                 TABLE
          54                                NDMAIN                             CAT_DRUG                          TABLE
          54                                NDMAIN                             CAT_ENTERPRISE                   TABLE
          54                                NDMAIN                             CAT_METRIC                        TABLE
          54                                NDMAIN                             CAT_ORG                           TABLE
          54                                NDMAIN                             CAT_PRODUCT                       TABLE
          54                                NDMAIN                             CAT_QUALITY_DEFINE              TABLE
          54                                NDMAIN                             ORD_ORDER                         VIEW
          54                                NDMAIN                             ORD_ORDER_ITEM                   VIEW
          54                                NDMAIN                             ORD_ORDER_RECEIVE                VIEW
          54                                NDMAIN                             ORD_ORDER_RETURN                 VIEW
          54                                NDMAIN                             PLT_PLAT                          TABLE
          54                                PUBLIC                             USER_TAB_PARTITIONS            SYNONYM
          54                                SYS                                 STANDARD                          PACKAGE
         145 SYS                                 STANDARD                           PACKAGE
          54                                SYS                                 SYS_STUB_FOR_PURITY_ANALYSIS        PACKAGE
          54                                SYS                                 USER_TAB_PARTITIONS            VIEW
51 rows selected.

过程依赖的对象果然被其他人所访问,检查这个会话的信息:

SQL> SELECT SID, SERIAL#, USERNAME, PROGRAM, TERMINAL

  2  FROM V$SESSION WHERE SID = 54;

       SID    SERIAL#                                USERNAME                            PROGRAM        TERMINAL

---------- ---------- ------------------------------ ------------ ----------

        54      26216                                FUJIANREP                          PlSqlDev.exe LIBY

没想到是同事通过pldevelop连接的会话,看看这个会话在做什么:

SQL> SELECT SQL_TEXT FROM V$SQL

  2  WHERE ADDRESS IN (SELECT SQL_ADDRESS FROM V$SESSION WHERE SID = 54);

SQL_TEXT---------------------------------------------------------------------

ALTER TABLE GOV_S_MO_EN TRUNCATE PARTITION P200901

居然是TRUNCATE分区的操作,难怪会导致过程处于INVALID状态,不过TRUNCATE操作应该不会持续很长时间,而导致问题产生的语句理论上应该已经运行很久了:

SQL> SELECT EVENT, P1TEXT, P1, P2TEXT, P2, P3TEXT, P3, SECONDS_IN_WAIT
  2  FROM V$SESSION_WAIT WHERE SID = 54;
EVENT                         P1TEXT    P1            P2TEXT              P2          P3TEXT         P3   SECONDS_IN_WAIT
------------------------- ------- ---- -------- -------- -------- ---- ---------------
db file sequential read   file#      1            block#         170158          blocks         1         3995643

这个TRUNCATE的等待时间已经超过10天了,很显然这是一个僵死的会话。应从后台Kill掉对应的进程:

SQL> SELECT SPID FROM V$PROCESS WHERE ADDR IN (SELECT PADDR FROM V$SESSION WHERE SID = 54);
SPID
------------
12974
SQL> HOST kill -9 12974

切换为FUJIANREP用户,再次编译过程:

SQL> ALTER PROCEDURE P_GENERATE_REPDATA COMPILE;
Procedure altered.

至此问题解决。将JOB重新设置BROKEN即可。

SQL> EXEC DBMS_JOB.BROKEN(63, FALSE)
PL/SQL procedure successfully completed.
SQL> COMMIT;
Commit complete.

问题解决后再次检查过程,发现TRUNCATE语句居然就是这个过程的一部分。这个过程会先执行TRUNCATE,然后执行插入等DML语句。

所有问题都搞清楚了:问题出在手工执行出错的过程中,可能由于网络的原因导致客户端与数据库端失去联系,数据库中的会话变成僵死状态停在了TRUNCATE TABLE语句处。而执行者只是中止了客户端的请求,并没有意识到后台进程的问题。

等到JOB的运行时间一到,尝试再次运行相同的过程时,发现过程处于运行状态,且对一些表持有LOCK和LATCH,于是引发了上面的ORA-600(17079)错误。这也是随后手工编译PROCEDURE报错ORA-4021的原因。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值