这两个错误是由同一个BUG导致的。
数据库环境11.2.0.2 RAC for Solaris sparc,错误信息如下:
2012-01-29 06:15:10.168000 +08:00
Errors in file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc (incident=384590):
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
Incident details in: /app/diag/rdbms/orcl/orcl1/incident/incdir_384590/orcl1_lms3_81_i384590.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2012-01-29 06:15:11.923000 +08:00
Dumping diagnostic data in directory=[cdmp_20120129061511], requested by (instance=1, sid=81 (LMS3)), summary=[incident=384590].
Sweep [inc][384590]: completed
Sweep [inc2][384590]: completed
2012-01-29 06:15:17.289000 +08:00
Errors in file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc:
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
LMS3 (ospid: 81): terminating the instance due to error 484
2012-01-29 06:15:20.910000 +08:00
ORA-1092 : opitsk aborting process
2012-01-29 06:15:22.384000 +08:00
.
.
.
2012-04-17 04:26:44.373000 +08:00
Errors in file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc (incident=432578):
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /app/diag/rdbms/orcl/orcl1/incident/incdir_432578/orcl1_lms1_8678_i432578.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2012-04-17 04:26:45.864000 +08:00
Dumping diagnostic data in directory=[cdmp_20120417042645], requested by (instance=1, sid=8678 (LMS1)), summary=[incident=432578].
Errors in file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
2012-04-17 04:26:47.359000 +08:00
Sweep [inc][432578]: completed
Sweep [inc2][432578]: completed
2012-04-17 04:26:53.095000 +08:00
Errors in file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
LMS1 (ospid: 8678): terminating the instance due to error 484
2012-04-17 04:26:56.593000 +08:00
ORA-1092 : opitsk aborting process
2012-04-17 04:26:58.088000 +08:00
Instance terminated by LMS1, pid = 8678
可以看到,无论是kjbrref:pkey错误的出现还是kjbmprlst:shadow错误的出现,都直接导致了实例的CRASH。可以说这两个错误都是非常严重的问题。而且二者都发生在LMSn进程上。
*** 2012-01-29 06:15:10.194
*** SESSION ID:(1009.1) 2012-01-29 06:15:10.194
*** CLIENT ID:() 2012-01-29 06:15:10.194
*** SERVICE NAME:(SYS$BACKGROUND) 2012-01-29 06:15:10.194
*** MODULE NAME:() 2012-01-29 06:15:10.194
*** ACTION NAME:() 2012-01-29 06:15:10.194
Dump continued from file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
========= Dump for incident 384590 (ORA 600 [kjbrref:pkey]) ========
----- Beginning of Customized Incident Dump(s) -----
GCS RESOURCE 0xb92d0cfa0 hashq [0xbb35eddc8,0xc0f9b1f60] name[0x511ed.ca] pkey 136931.0
grant 0xb94a7e8f8 cvt 0x0 send 0x0@1,0 write 0x0,0@65536
flag 0x2 mdrole 0x1 mode 1 scan 0.0 role LOCAL
disk: 0x0000.00000000 write: 0x0000.00000000 cnt 0x0 hist 0x0
xid 0x0000.000.00000000 sid 3 pkwait 0s rmacks 0
refpcnt 0 weak: 0x0000.00000000
pkey 136931.0
hv 91 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 12, dom 0]
kjga st 0x4, step 0.35.0, cinc 18, rmno 6345, flags 0x20
lb 16384, hb 32767, myb 16957, drmb 16957, apifrz 1
GCS SHADOW 0xb94a7e8f8,626 resp[0xb92d0cfa0,0x511ed.ca] pkey 136931.0
grant 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL
master 1 owner 2 sid 3 remote[0x68fde3ef0,11] hist 0x10c30086180431f
history 0x1f.0x6.0x1.0xc.0x6.0x1.0xc.0x6.0x1.0x0.
cflag 0x0 sender 0 flags 0x0 replay# 0 abast 0x0.x0.1 dbmap 0x0
disk: 0x0000.00000000 write request: 0x0000.00000000
pi scn: 0x0000.00000000 sq[0xb92d0cfd0,0xb92d0cfd0]
msgseq 0x1 updseq 0x0 reqids[11,0,0] infop 0x0 lockseq x67d9
GCS SHADOW END
GCS RESOURCE END
----- End of Customized Incident Dump(s) -----
*** 2012-01-29 06:15:10.261
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst1()+96 CALL skdstdst() FFFFFFFF7FFF4C00 ?
100670460 ? 000000000 ?
00000000A ? 000000001 ?
10BD552E0 ?
ksedst()+60 CALL ksedst1() 000000000 ? 000000001 ?
00010C1D1 ? 00010C000 ?
10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032 CALL ksedst() 000000000 ? 10B21A000 ?
10B21AA90 ? 10C1D2000 ?
00010B000 ? 00010C1D2 ?
dbgexPhaseII()+1800 PTR_CALL dbkedDefDump() 000000003 ? 000000002 ?
10A6ABAA8 ? 0000014B0 ?
10C1C9000 ? 000000003 ?
dbgexExplicitEndInc CALL dbgexPhaseII() 10C373D30 ?
()+728 FFFFFFFF7A634920 ?
FFFFFFFF7FFF8FDC ?
0018E0001 ? 10A6A2D98 ?
000001C00 ?
dbgeEndDDEInvocatio CALL dbgexExplicitEndInc 10A6A2C50 ?
nImpl()+704 () FFFFFFFF7A634920 ?
FFFFFFFF7FFF8F28 ?
FFFFFFFF7FFFC620 ?
000000000 ?
FFFFFFFFFE4E26A0 ?
kjbrref()+1496 CALL dbgeEndDDEInvocatio 10C373D30 ? 001B1D800 ?
n() FFFFFFFFFEC0AF31 ?
FFFFFFFF7FFFC620 ?
000002868 ? 0018E0001 ?
kjblreplay()+7380 CALL kjbrref() 000002868 ? 10C1CA3E0 ?
000021768 ? A681AFA10 ?
B92D0CFA0 ? C0F96F920 ?
kjbldrmrpst()+4864 CALL kjblreplay() 000000000 ? 000000001 ?
10C1CA0A0 ? BDA03C9B8 ?
000000000 ? 10C1E8890 ?
kjmprcfgsync()+1424 CALL kjbldrmrpst() A681AFA10 ? 000000001 ?
另一个trace文件:
*** 2012-04-17 04:26:44.389
*** SESSION ID:(673.1) 2012-04-17 04:26:44.389
*** CLIENT ID:() 2012-04-17 04:26:44.389
*** SERVICE NAME:(SYS$BACKGROUND) 2012-04-17 04:26:44.389
*** MODULE NAME:() 2012-04-17 04:26:44.389
*** ACTION NAME:() 2012-04-17 04:26:44.389
Dump continued from file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
========= Dump for incident 432578 (ORA 600 [kjbmprlst:shadow]) ========
----- Beginning of Customized Incident Dump(s) -----
FUSION MSG 0xffffffff79c40b80,39 from 2 spnum 14 ver[38,11161] ln 144 sq[2,8]
REPLAY 1 [0x103699.c7, 151132.0] c[0x7e7bd3240,55] [0x494e,x38]
grant 2 convert 0 role x0
pi [0x0.0x0] flags 0x0 state 0x100
disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
msgRM# 11161 bkt# 18131 drmbkt# 18131
pkey 151132.0 undo 0 stat 5 masters[32768, 2->32768] reminc 38 RM# 11152
flg x0 type x0 afftime x8517cf38
nreplays by lms 0 = 4046
nreplays by lms 1 = 4105
nreplays by lms 2 = 4176
nreplays by lms 3 = 4214
nreplays by lms 4 = 4158
nreplays by lms 5 = 4162
hv 125 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 36, dom 0]
kjga st 0x4, step 0.36.0, cinc 38, rmno 11161, flags 0x20
lb 16384, hb 32767, myb 18131, drmb 18131, apifrz 1
FUSION MSG DUMP END
GCS RESOURCE 0xbb93a40e8 hashq [0xba8f40298,0xc27d16700] name[0x103699.c7] pkey 151008.0
grant 0xb99d64f38 cvt 0x0 send 0x0@1,0 write 0x0,0@65536
flag 0x2 mdrole 0x1 mode 1 scan 0.0 role LOCAL
disk: 0x0000.00000000 write: 0x0000.00000000 cnt 0x0 hist 0x0
xid 0x0000.000.00000000 sid 1 pkwait 0s rmacks 0
refpcnt 0 weak: 0x0000.00000000
pkey 151008.0
hv 125 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 36, dom 0]
kjga st 0x4, step 0.36.0, cinc 38, rmno 11161, flags 0x20
lb 16384, hb 32767, myb 18131, drmb 18131, apifrz 1
GCS SHADOW 0xb99d64f38,42 resp[0xbb93a40e8,0x103699.c7] pkey 151008.0
grant 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL
master 1 owner 2 sid 1 remote[0x85fed2220,13] hist 0xb93e302087234c9f
history 0x1f.0x19.0xd.0x39.0x8.0x4.0xc.0x1f.0x39.0x1.
cflag 0x0 sender 0 flags 0x0 replay# 0 abast 0x0.x0.1 dbmap 0x0
disk: 0x0000.00000000 write request: 0x0000.00000000
pi scn: 0x0000.00000000 sq[0xbb93a4118,0xbb93a4118]
msgseq 0x1 updseq 0x0 reqids[13,0,0] infop 0x0 lockseq xf0d1
GCS SHADOW END
GCS RESOURCE END
----- End of Customized Incident Dump(s) -----
*** 2012-04-17 04:26:44.478
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst1()+96 CALL skdstdst() FFFFFFFF7FFF4D20 ?
100670460 ? 000000000 ?
00000000A ? 000000001 ?
10BD552E0 ?
ksedst()+60 CALL ksedst1() 000000000 ? 000000001 ?
00010C1D1 ? 00010C000 ?
10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032 CALL ksedst() 000000000 ? 10B21A000 ?
10B21AA90 ? 10C1D2000 ?
00010B000 ? 00010C1D2 ?
dbgexPhaseII()+1800 PTR_CALL dbkedDefDump() 000000003 ? 000000002 ?
10A6ABAA8 ? 0000014B0 ?
10C1C9000 ? 000000003 ?
dbgexExplicitEndInc CALL dbgexPhaseII() 10C373D30 ?
()+728 FFFFFFFF7A634920 ?
FFFFFFFF7FFF90FC ?
0018E0001 ? 10A6A2D98 ?
000001C00 ?
dbgeEndDDEInvocatio CALL dbgexExplicitEndInc 10A6A2C50 ?
nImpl()+704 () FFFFFFFF7A634920 ?
FFFFFFFF7FFF9048 ?
FFFFFFFF7FFFC740 ?
000000000 ?
FFFFFFFFFE4E26A0 ?
kjbmprlst()+13504 CALL dbgeEndDDEInvocatio 10C373D30 ? 001B1D800 ?
n() FFFFFFFFFEC0AF31 ?
FFFFFFFF7FFFC740 ?
0013F5000 ? 0018E0001 ?
kjmxmpm()+796 PTR_CALL kjbmprlst() 101782000 ? 00010C1CA ?
10C1EA000 ? 10C1CA000 ?
10A6A3000 ? 10A6A3000 ?
kjmpbmsg()+4584 CALL kjmxmpm() 00010A400 ? 000000000 ?
0852DA2C5 ? 00010C000 ?
10A7EE000 ? BE22AF0C0 ?
kjmsm()+11308 CALL kjmpbmsg() 00010A400 ? 00000009C ?
00010C000 ? 10A7EE000 ?
000000001 ? 000000027 ?
ksbrdp()+1236 PTR_CALL kjmsm() 000001888 ? 25916872D1 ?
000002000 ? 000000000 ?
00000024B ? 000001000 ?
opirip()+1008 CALL ksbrdp() 10BB56000 ? BD8C0B680 ?
000000001 ? 000001400 ?
00010B800 ? 10AC212D8 ?
opidrv()+780 CALL opirip() 10A6A3000 ? 380013D50 ?
000380002 ? 3800055C0 ?
380002000 ? 00010C000 ?
sou2o()+92 CALL opidrv() 000000032 ? 000000004 ?
FFFFFFFF7FFFF780 ?
0001EA190 ?
FFFFFFFF7AF42F10 ?
FFFFFFFF7FFFFBB8 ?
opimai_real()+516 CALL sou2o() FFFFFFFF7FFFF758 ?
可以看到,两个TRACE文件也非常接近,而且连报错的前几个堆栈函数的名称都完全一样。
查询MOS,确认为Bug 12834027 ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking,这个问题在最新的11.2.0.3.1PSU中被FIXED,除了打补丁之外,还可以考虑通过隐含参数"_gc_read_mostly_locking"=FALSE来禁止READ-MOSTLY OBJECT LOCKING。此外,禁止DRM也可以避免该错误的产生。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-730063/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/4227/viewspace-730063/