一套HP-UX上的10.2.0.4系统出现ORA-00600[17175] Oracle600内部错误,相关的日志信息如下:
Wed Dec 1 01:57:55 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc:
ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []
ORA-00601: cleanup lock conflict
Wed Dec 1 01:57:57 2010
Trace dumping is performing id=[cdmp_20101201015757]
Wed Dec 1 01:58:05 2010
LGWR: terminating instance due to error 472
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms1_3291.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms2_3293.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms3_3295.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms0_3289.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmon_3283.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmd0_3287.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Shutting down instance (abort)
License high water mark = 421

/u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name: HP-UX
Node name: XGP2_db1
Release: B.11.31
Version: U
Machine: ia64
Instance name: xgp21
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)

*** SERVICE NAME:(SYS$BACKGROUND) 2010-12-01 01:57:55.933
*** SESSION ID:(333.1) 2010-12-01 01:57:55.933
*** 2010-12-01 01:57:55.933
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []
ORA-00601: cleanup lock conflict


ksedst 

在metalink上搜索600[17175]内部错误相关的文档,可以找到该错误的大量信息:

Keywords: ora-00600 [17175]

1. Bug 6250251: ORA-00600 17175 DURING KGI CLEANUP - DUMP - ORADEBUG
--ora-600 followed by ora-601 and instance crash with ORA-17175.
--Also, setting of heap check event triggers this problem. In this case
--it is event="10235 trace name context forever, level 27"

2. Bug 4216668 - Dump from INSERT / MERGE on internal columns (Doc ID 4216668.8)
--INSERT or MERGE commands might core dump if operating on object types and internal columns are involved.

3. Bug 7590297: ORA-600 [17175] [255] ORA-601: CLEANUP LOCK CONFLICT CRASHED THE DATABASE

4. SR 3-2296150050
--The error has occurred when Oracle was cleaning shared pool latch/heap information about the process
which died in middle.
--There is no data corruption associated with this error.
--This is evident from the function kghcln in the trace stack at which it failed.
--This problem is usually the symptom of some earlier problem with the latch.
--Either after a process has died, or a process has signaled an error while holding a shared pool latch,
and the index to the shared pool latch is invalid.

--There was a Bug 7590297 raised for this issue which could not be progressed due to unavailability of information.
--From few earlier known issues - This can be due to PMON may sometimes signal ORA-601
while trying to start up additional shared servers or dispatchers.
--There the workaround suggested was to Start the instance with max # of shared servers.

--Can you reproduce the problem?If the instance has been restated the issue may not persist as it is related to memory.
--If the issue persists then we have to perform the following to monitoring the instance to investigate further:

--1. Set the following event in parameter file:
--event="10257 trace name context forever, level 10"
--event="601 trace name SYSTEMSTATE level 10"

--The first event will cause PMON to dump info about shared server startup.
--The second event will cause PMON to do a system state dump when the 601 occurs.

--2. You should also have the track of this in intervals and save the historical results from:

--SQL> select e.total_waits, e.total_timeouts, e.time_waited from v$session_event e, v$session s
, v$bgprocess b where b.name='PMON' and s.paddr=b.paddr and e.sid=s.sid and e.event='process startup';


5. SR 3-2123025401
--=== ODM Solution / Action Plan ===
--Disabled NUMA for resolution

6. SR 7314313.994

Analysis:

Bug 6250251 and bug 4216668 are not applicable to this case.
Bug 7590297 is applicable to this case, as the call stack, error message are the same with this case.
But this patch is suspended as requested info is not available.

SR 3-2296150050: same error message, same DB version, similar call stack; closed without solution.
SR 3-2123025401: same error message, same DB version, similar call stack.
The issue happened twice in that SR and solved by disabling NUMA
SR 7314313.994: same error message, same DB version, similar call stack; closed without solution.

ERROR:
ORA-600 [17175] [a]


VERSIONS:
versions 9.2 to 10.1


DESCRIPTION:


This error occurs when we are cleaning up a shared pool latch (either after a process has died,
or a process has signaled an error while holding a shared pool latch),
and the index to the shared pool latch is invalid.


ARGUMENTS:
Arg [a] index of the latch recovery structure - usually 255

FUNCTIONALITY:
Generic Heap Manager


IMPACT:
INSTANCE HANG
PROCESS FAILURE
INSTANCE FAILURE
以下为Oracle GCS给出的行动计划,GCS认为绝大多数ORA-00600 [17xxx]是由memory相关的问题引起的,这些问题往往在重启实例后就可以得到解决。并建议可以设置shared_servers=max_shared_servers后进一步观察:
From the uploaded files it looks like you were reported with ORA-00600 [17175] errors
and crashed the instance.What is the current status after the restart of the database.
Are you still reported with the same errors and crashing the instance ?
Mostly the ORA-00600 [17xxx] errors are memory releated and might have got resolved after the database restart.


Further looking at the uploaded trace file the failing functions and the error closely
matches Bug 6958493and is closed as duplicate of BaseBug 6962340which is closed
as could not able to reproduce the error.


Also a smillar issue is reported inBug 3104250which is fixed in 10g, but that doesn't mean
you cannot get this error for a new reason and that the same workaround would fix it.
We need to implement the workaround and set: shared_servers=max_shared_servers
if the error reproduces again. If this is still repeated issue then we can file a new bug with development for the same.


ACTION PLAN
===========
1. Monitor the alertlog for the ORA-00600 [17175] errors for the next few days and if the database still crashes then please
set shared_servers=max_shared_servers and see if the problem resolves or not.