中午接到电话,客户的核心系统Oracle数据库实例宕机,远程过去在告警日志中看到大量的如下报错,报错很频繁:
......
Fri Jul 25 13:20:14 2014
Errors in file /u01/app/oracle/diag/rdbms/d012band/d012band/trace/d012band_smon_5964354.trc (incident=43361):
ORA-00600: internal error code, arguments: [13013], [5001], [268], [8452274], [7], [8452274], [17], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/d012band/d012band/incident/incdir_43361/d012band_smon_5964354_i43361.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Fri Jul 25 13:20:17 2014
Dumping diagnostic data in directory=[cdmp_20140725132017], requested by (instance=1, osid=5964354 (SMON)), summary=[incident=43361].
Starting background process SMCO
......
一段时间后,最后实例被PMON进程终止,导致核心系统宕机。
客户环境:Oracle Database 11.2.0.3.0 for IBM AIX 6.1,单机,IBM HACMP主备模式保护。
参考《 New and Improved: ORA-600 [13013] "Unable to get a Stable set of Records" (文档 ID 1438920.1)》和《ORA-600 [13013] "Unable to get a Stable set of Records" (文档 ID 28185.1)》文章;该报错是由于对某个表执行DML操作,该表对应的某个索引损坏导致的,解决的办法是找出操作的表和受损的索引,重建索引即可。
下面是对ORA-600的参数说明:
6 Argument format
=================
This format relates to Oracle Server 8.0.3 and above
Arg [a] Passcount
Arg [b] Data Object number
Arg [c] Tablespace Decimal Relative DBA (RDBA) of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Decimal RDBA of block being updated (Typically same as [c])
Arg [f] Code
由此执行下面的SQL语句查询找出DML操作的表,以及表对应的索引:
SQL> SELECT OWNER,OBJECT_NAME,OBJECT_TYPE FROM DBA_OBJECTS WHERE DATA_OBJECT_ID=268;
发现问题对象是SYS用户下面的SMON_SCN_TIME表,SYS.SMON_SCN_TIME是Oracle更新非常频繁的一张表,该数据字典用于维护SCN和时间的关系。
SQL> SELECT OWNER,INDEX_NAME FROM DBA_INDEXES WHERE TABLE_NAME='SMON_SCN_TIME';
发现这个表对应有SMON_SCN_TIME_TIM_IDX和SMON_SCN_TIME_SCN_IDX两个索引。
对表进行分析,找出有问题的索引:
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE;
Table analyzed.
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE;
table/Index Cross Reference Failure - see trace file
能够正常的分析表,但是不能正常的分析表上的索引。
参考《OERR: ORA-1499 table/Index Cross Reference Failure - see trace file (文档 ID 1499.1)》文章,尝试找出受影响的索引:
查看trace file:
BH (0x70000078fe2df98) file#: 2 rdba: 0x0081483e (2/84030) class: 1 ba: 0x70000078ce52000
set: 174 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 2272,19
dbwrid: 5 obj: 272 objn: 272 tsn: 1 afn: 2 hint: f
hash: [0x700000801ff36a0,0x700000801ff36a0] lru: [0x70000074bf81f40,0x70000074bf81ce0]
ckptq: [NULL] fileq: [NULL] objq: [0x7000007f59b4138,0x7000007f59b4138] objaq: [0x7000007f59b4128,0x7000007f59b4128]
use: [NULL] wait: [NULL] fast-cr-pins: 2
st: XCURRENT md: NULL fpin: 'kdiwh15: kdifxs' tch: 12
flags:
LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 1 rdba: 0x0081483e (2/84030)
scn: 0x0000.2f4752d8 seq: 0x01 flg: 0x06 tail: 0x52d80601
frmt: 0x02 chkval: 0x0778 type: 0x06=trans data
0x0081483e要去掉0x,0x表示十六进制。
执行下面的SQL语句:
SELECT owner, segment_name, segment_type, partition_name
FROM DBA_SEGMENTS
WHERE header_file = (SELECT file#
FROM v$datafile
WHERE rfile# = dbms_utility.data_block_address_file(to_number('0080f8b2','XXXXXXXX'))
AND ts#= 1)
AND header_block = dbms_utility.data_block_address_block(to_number('0080f8b2','XXXXXXXX'));
可惜没有任何的查询结果。
继续参考《Instance Terminated With Error ORA-00474: SMON Process Terminated With Error (文档 ID 1361872.1)》文章,文章有下面一段描述:
CAUSE
ORA600 [13011] is raised due to indexes corruption. To verify the corruption run the following statements:
SQL> conn / as sysdba
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE;
Table analyzed.
-- It should come out clean giving message table analyzed.
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE;
-- it should fail with Ora-1499 if at least one index is corrupted
ORA-1499就是我们前面遇到的报错:
Error: ORA 1499
Text: table/Index Cross Reference Failure - see trace file
-------------------------------------------------------------------------------
Cause: An error occurred when validating an index or a table using the
ANALYZE command.
One or more entries does not point to the appropriate cross-reference.
Action: Check the trace file for more descriptive messages about the problem.
Correct these errors.
如果在分析表的时候收到ORA-1499的错误,表示至少有一个索引是损坏的。
SOLUTION
Rebuild corrupted indexes:
SQL> conn as sysdba
SQL> ALTER INDEX SMON_SCN_TIME_TIM_IDX REBUILD ONLINE;
SQL> ALTER INDEX SMON_SCN_TIME_SCN_IDX REBUILD ONLINE;
then re-run
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE;
Table analyzed.
Note: The last statement should not report any errors.
两个索引重建之后,ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE语句能顺利的分析,告警日志不再有报错产生。
总结:要学会充分利用MOS,根据不同的线索搜索不同的文章来看,不要在一篇文章上吊死。
......
Fri Jul 25 13:20:14 2014
Errors in file /u01/app/oracle/diag/rdbms/d012band/d012band/trace/d012band_smon_5964354.trc (incident=43361):
ORA-00600: internal error code, arguments: [13013], [5001], [268], [8452274], [7], [8452274], [17], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/d012band/d012band/incident/incdir_43361/d012band_smon_5964354_i43361.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Fri Jul 25 13:20:17 2014
Dumping diagnostic data in directory=[cdmp_20140725132017], requested by (instance=1, osid=5964354 (SMON)), summary=[incident=43361].
Starting background process SMCO
......
一段时间后,最后实例被PMON进程终止,导致核心系统宕机。
客户环境:Oracle Database 11.2.0.3.0 for IBM AIX 6.1,单机,IBM HACMP主备模式保护。
参考《 New and Improved: ORA-600 [13013] "Unable to get a Stable set of Records" (文档 ID 1438920.1)》和《ORA-600 [13013] "Unable to get a Stable set of Records" (文档 ID 28185.1)》文章;该报错是由于对某个表执行DML操作,该表对应的某个索引损坏导致的,解决的办法是找出操作的表和受损的索引,重建索引即可。
下面是对ORA-600的参数说明:
6 Argument format
=================
This format relates to Oracle Server 8.0.3 and above
Arg [a] Passcount
Arg [b] Data Object number
Arg [c] Tablespace Decimal Relative DBA (RDBA) of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Decimal RDBA of block being updated (Typically same as [c])
Arg [f] Code
由此执行下面的SQL语句查询找出DML操作的表,以及表对应的索引:
SQL> SELECT OWNER,OBJECT_NAME,OBJECT_TYPE FROM DBA_OBJECTS WHERE DATA_OBJECT_ID=268;
发现问题对象是SYS用户下面的SMON_SCN_TIME表,SYS.SMON_SCN_TIME是Oracle更新非常频繁的一张表,该数据字典用于维护SCN和时间的关系。
SQL> SELECT OWNER,INDEX_NAME FROM DBA_INDEXES WHERE TABLE_NAME='SMON_SCN_TIME';
发现这个表对应有SMON_SCN_TIME_TIM_IDX和SMON_SCN_TIME_SCN_IDX两个索引。
对表进行分析,找出有问题的索引:
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE;
Table analyzed.
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE;
table/Index Cross Reference Failure - see trace file
能够正常的分析表,但是不能正常的分析表上的索引。
参考《OERR: ORA-1499 table/Index Cross Reference Failure - see trace file (文档 ID 1499.1)》文章,尝试找出受影响的索引:
查看trace file:
BH (0x70000078fe2df98) file#: 2 rdba: 0x0081483e (2/84030) class: 1 ba: 0x70000078ce52000
set: 174 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 2272,19
dbwrid: 5 obj: 272 objn: 272 tsn: 1 afn: 2 hint: f
hash: [0x700000801ff36a0,0x700000801ff36a0] lru: [0x70000074bf81f40,0x70000074bf81ce0]
ckptq: [NULL] fileq: [NULL] objq: [0x7000007f59b4138,0x7000007f59b4138] objaq: [0x7000007f59b4128,0x7000007f59b4128]
use: [NULL] wait: [NULL] fast-cr-pins: 2
st: XCURRENT md: NULL fpin: 'kdiwh15: kdifxs' tch: 12
flags:
LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 1 rdba: 0x0081483e (2/84030)
scn: 0x0000.2f4752d8 seq: 0x01 flg: 0x06 tail: 0x52d80601
frmt: 0x02 chkval: 0x0778 type: 0x06=trans data
0x0081483e要去掉0x,0x表示十六进制。
执行下面的SQL语句:
SELECT owner, segment_name, segment_type, partition_name
FROM DBA_SEGMENTS
WHERE header_file = (SELECT file#
FROM v$datafile
WHERE rfile# = dbms_utility.data_block_address_file(to_number('0080f8b2','XXXXXXXX'))
AND ts#= 1)
AND header_block = dbms_utility.data_block_address_block(to_number('0080f8b2','XXXXXXXX'));
可惜没有任何的查询结果。
继续参考《Instance Terminated With Error ORA-00474: SMON Process Terminated With Error (文档 ID 1361872.1)》文章,文章有下面一段描述:
CAUSE
ORA600 [13011] is raised due to indexes corruption. To verify the corruption run the following statements:
SQL> conn / as sysdba
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE;
Table analyzed.
-- It should come out clean giving message table analyzed.
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE;
-- it should fail with Ora-1499 if at least one index is corrupted
ORA-1499就是我们前面遇到的报错:
Error: ORA 1499
Text: table/Index Cross Reference Failure - see trace file
-------------------------------------------------------------------------------
Cause: An error occurred when validating an index or a table using the
ANALYZE command.
One or more entries does not point to the appropriate cross-reference.
Action: Check the trace file for more descriptive messages about the problem.
Correct these errors.
如果在分析表的时候收到ORA-1499的错误,表示至少有一个索引是损坏的。
SOLUTION
Rebuild corrupted indexes:
SQL> conn as sysdba
SQL> ALTER INDEX SMON_SCN_TIME_TIM_IDX REBUILD ONLINE;
SQL> ALTER INDEX SMON_SCN_TIME_SCN_IDX REBUILD ONLINE;
then re-run
SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE;
Table analyzed.
Note: The last statement should not report any errors.
两个索引重建之后,ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE语句能顺利的分析,告警日志不再有报错产生。
总结:要学会充分利用MOS,根据不同的线索搜索不同的文章来看,不要在一篇文章上吊死。
--end--
另附点击打开链接