UNDO相关问题总结(六)

本次主题也是undo相关问题的最后一篇,也可以归纳为数据库的非常规恢复,主要描述的数据库出现ORA-600 [4xxx]错误的处理方案。

问题描述
ORA-600 [4xxx] 错误基本都与回滚段相关,下面简单描述几种(仅供参考)
  • ORA-600 [4000] [a] : 这个错误表示oracle尝试撤销数据字典表(数据字典表ITL中存在未提交事务),而根据ITL中的内容寻找undo segment number时,出现错误和失败,其中参数 a 表示undo segment number,数据库无法启动(一般都是bootstrap$自举对象撤销时失败),一般需要通过bbed工具修改数据字典,undo相关问题中这类问题最为严重,处理也最为复杂。
  • ORA-600 [4193] [a] [b] A mismatch has been detected between Redo records and Rollback (Undo) records.We are validating the Undo block sequence number in the undo block against the Redo block sequence number relating to the change being applied.This error is reported when this validation fails。这类问题处理相对简单,处理方法见后面。
  • ORA-600 [4194] [a] [b] A mismatch has been detected between Redo records and rollback (Undo) records. We are validating the Undo record number relating to the change being applied against the maximum undo record number recorded in the undo block.This error is reported when the validation fails。这类问题处理相对简单,处理方法见后面
  • ORA-600 [4194] : While backing out an undo record (i.e. at the time of rollback) we found a transaction id mis-match indicating either a corruption in the rollback segment or corruption in an object which the rollback segment is trying to apply undo records on.This would indicate a corrupted rollback segment。这类问题处理相对简单,处理方法见后面。
  • ORA-600 [4097] :  XID involved and seq# from undo header,the database crashed and system rollback segment has problems with transaction table. when accessing a rollback segment header to see if a transaction has been committed you observe the XID given is in the future of the transaction tracle。这类问题处理相对简单,处理方法见后面。
解决方法(仅供参考)
1.将回滚段管理方式改为手工,然后尝试启动数据库,如果不成功则尝试步骤2
   create pfile='xxxx' from spfile; 在pfile中添加如下条目
   UNDO_MANAGEMENT = MANUAL
   undo_tablespace=system
2.尝试设置一些event(10513 等)尝试启动数据库,未成功则执行步骤3
  在pfile中添加事件条目
  event='10513 trace name context forever,level 2 :10512 trace name context forever,level 1: 10511 trace name context forever,level 2:10510 trace name context forever,level 1' scope=spfile;
  事件说明
  •   10510 : Turn off SMON check to offline pending offline rollbacksegment
  •   10511 : Turn off SMON check to cleanup undo dictionary
  •   10512 : Turn off SMON check to shrink rollback segments
  •   10513 : Turn off SMON rollback uncommitted transaction(禁止smon进程进行事务回滚) 
3.使用_offline_rollback_segments/_corrupted_rollback_segments屏蔽回滚段,尝试启动数据库,未成功执行步骤4
   在pfile中添加问题回滚段,例如:  _offline_rollback_segments=('_SYSSMU1_3780397527$','_SYSSMU2_2232571081$')
    回滚段名称获取方式
  •      dba_rollback_segments,v$rollname (数据库打开状态)
  •      strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u (数据库关闭状态)
4.如果依然不能open数据库,考虑使用bbed工具提交事务,修改回滚段状态等操作,未成功执行步骤5
     ORA-600 [4000] 错误一般需要使用bbed修改数据字典,修改前一定要备份!
5.如果依然还不能open数据库,考虑使用dul,抽取数据
   注意事项:
  1.     数据库启动完成以后,一定要记得完成扫尾工作,将有问题的回滚段offline 并drop掉,新建回滚表空间,将原来的   undo表空间切换至新建的,将undo管理还原为自动模式,最后以spfile启动数据库。、
  2.     如果问题解决过程中使用了bbed修改了数据字典,最好检查一下数据字典的一致性(hcheck.sql脚本),观察是否有其它600错误出现,最后判断是否需要重建冲数据。

MOS 文档
ORA-600 [4000] [a]
VERSIONS:          
  version 6.0 to 9.2
DESCRIPTION:
  This has the potential to be a very serious error.
  It means that Oracle has tried to find an undo segment number in the
  dictionary cache and failed 
ARGUMENTS:
  Arg [a] Undo segment number
FUNCTIONALITY:     
  KERNEL TRANSACTION UNDO
IMPACT:            
  INSTANCE FAILURE - Instance will not restart
  STATEMENT FAILURE
SUGGESTIONS:
  As per Note 1371820.8, this can be seen when executing DML on tables residing
  in tablespaces transported from another database. 
  It is fixed in 8.1.7.4, 9.0.1.4 and 9.2.0.1 The workaround however is to
  create more rollback segments in the target database until the highest
  rollback segment number (select max(US#) from sys.undo$;) is at least
  as high as in equivalent max(US#) from the source database. 
  It has also been seen where memory has been corrupted so try shutting
  down and restarting the instance.
  If the database will not start contact Oracle Support Services
  immediately, providing the alert.log and associated trace files.
Known Bugs

NBBugFixedDescription
 1676156611.2.0.3.9, 11.2.0.3.BP22, 11.2.0.4, 12.1.0.2, 12.2.0.0Instance fails to start with ORA-600 [4000] [usn#]
 1391019011.2.0.3.BP15, 11.2.0.4, 12.1.0.1ORA-600 [4000] from plugged in tablespace in Exadata
 1474172711.2.0.2.9, 11.2.0.2.BP19, 11.2.0.3.BP12, 11.2.0.3.BP13, 12.1.0.1Fixes for bug 12326708 and 14624146 can cause problems - backout fix
+1042501011.2.0.3, 12.1.0.1Stale data blocks may be returned by Exadata FlashCache
*914554111.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.1OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile after CREATE CONTROLFILE in 11g
 12353983 ORA-600 [4000] with XA in RAC
 768785611.2.0.1ORA-600 [4000] from DML on transported ASSM tablespace
 291744111.1.0.6OERI [4000] during startup
 31157339.2.0.5, 10.1.0.2OERI[4000] / index corruption can occur during index coalesce
 29595569.2.0.5, 10.1.0.2STARTUP after an ORA-701 fails with OERI[4000]
 13718208.1.7.4, 9.0.1.4, 9.2.0.1OERI:4506 / OERI:4000 possible against transported tablespace
+4345967.3.4.2, 8.0.3.0ORA-600[4000] from altering storage of BOOTSTRAP$



ORA-600 [4193] [a] [b]
VERSIONS:          
  versions 6.0 to 10.1
DESCRIPTION:       
  A mismatch has been detected between Redo records and Rollback (Undo)
  records.  
  We are validating the Undo block sequence number in the undo block against
  the Redo block sequence number relating to the change being applied.  
  This error is reported when this validation fails.
ARGUMENTS:
  Arg [a] Undo record seq number
  Arg [b] Redo record seq number
 
FUNCTIONALITY:
  KERNEL TRANSACTION UNDO
 
IMPACT:
  PROCESS FAILURE
  POSSIBLE ROLLBACK SEGMENT CORRUPTION
 
SUGGESTIONS:       
 
  This error may indicate a rollback segment corruption.
 
  This may require a recovery from a database backup depending on
  the situation.
  For further analysis, please submit the trace files and alert.log to
  Oracle Support Services.
  Known Issues:

Known Bugs
NBBugFixedDescription
 1403424411.2.0.3.BP09, 11.2.0.4, 12.1.0.1Lost write type corruption using ASM in 11.2.0.3
 824076210.2.0.5, 11.1.0.7.10, 11.2.0.1Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] / SMON may spin to recover transaction



ORA-600 [4194] [a] [b]
VERSIONS:          
  versions 6.0 to 10.1
 
DESCRIPTION:
  A mismatch has been detected between Redo records and rollback (Undo)
  records.
 
  We are validating the Undo record number relating to the change being
  applied against the maximum undo record number recorded in the undo block.
 
  This error is reported when the validation fails.
 
ARGUMENTS:
  Arg [a] Maximum Undo record number in Undo block
  Arg [b] Undo record number from Redo block
 
FUNCTIONALITY:     
  Kernel Transaction Undo called from Cache layer
 
IMPACT:            
  PROCESS FAILURE
  POSSIBLE ROLLBACK SEGMENT CORRUPTION
 
SUGGESTIONS:

  This error may indicate a rollback segment corruption.
 
  This may require a recovery from a database backup depending on
  the situation.
 
  If the Known Issues section below does not help in terms of identifying
  a solution, please submit the trace files and alert.log to Oracle
  Support Services for further analysis. 
 
  Known Issues:
 
Known Bugs

NBBugFixedDescription
 824076210.2.0.5, 11.1.0.7.10, 11.2.0.1Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] / SMON may spin to recover transaction
 32105209.2.0.5, 10.1.0.2OERI[kjccqmg:esm] / OERI[4194] / corruption possible in RAC
+7926108.0.6.0, 8.1.6.0Rollback segment corruption OERI:4194 can occur if block checking detects a corrupt block



ORA-600 [4137]
VERSIONS:
  versions 7.0 to 10.1  
DESCRIPTION:       

  While backing out an undo record (i.e. at the time of rollback) we found a
  transaction id mis-match indicating either a corruption in the rollback
  segment or corruption in an object which the rollback segment is trying to
  apply undo records on. This would indicate a corrupted rollback segment.

FUNCTIONALITY:     
Kernel Transaction Undo Recovery

IMPACT:            
  POSSIBLE PHYSICAL CORRUPTION in Rollback segments
SUGGESTIONS:

  Signalled during rollback (also rollback for consistent read).
  The consistency check that compares the transaction id of the
  transaction being rolled back against the transaction id in
  undo block being applied is failing.  
  A possible cause is a lost write to the undo segment.

  The main approach is to identify the file containing the bad undo
  segment block and treat it as if the file is corrupt. Consult
  the trace file for this information.  
  If in archivelog mode, restore the file & roll forward.  
  If in Noarchivelog mode, restore from a cold backup taken before
  the error was reported.  
  Alternatively, you can look at dba_rollback_segs data dictionary view.
  If the status column that describes what state the rollback segment is
  currently in is "needs recovery" then lookup the following article
  for posible solution.
 
Note:28812.1
  Rollback Segment Needs Recovery  
  If the Known Issues section below does not help in terms of identifying
  a solution, please submit the trace files and alert.log to Oracle
  Support Services for further analysis.  
  Known Issues:  

Known Bugs

NBBugFixedDescription
 824076210.2.0.5, 11.1.0.7.10, 11.2.0.1Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] / SMON may spin to recover transaction
 6714918.1.6.0Rollback Segment corruption possible if RBS has > 32767 extents



Step by step to resolve ORA-600 4194 4193 4197 on database crash 

Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.2.0.3 [Release 9.2 to 11.2]
Information in this document applies to any platform.
***Checked for relevance on 04-Dec-2013***

SYMPTOMS

The following error is occurring in the alert.log right before the database crashes.

ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []

This error indicates that a mismatch has been detected between redo records and rollback (undo) records.

ARGUMENTS:

Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block

Since we are adding a new undo record to our undo block, we would expect that the new record number is equal to the maximum record number in the undo block plus one. Before Oracle can add a new undo record to the undo block it validates that this is correct. If this validation fails, then an ORA-600 [4194] will be triggered.

CHANGES

This issue generally occurs when there is a power outage or hardware failure that initially crashes the database. On startup, the database does the normal roll forward (redo) and then rollback (undo), this is where the error is generated on the rollback.

CAUSE

This also can be cause by the following defect

Bug 8240762 Abstract: Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] after SHRINK

Details: 
Undo corruption may be caused after a shrink and the same undo block may be used 
for two different transactions causing several internal errors like:
ORA-600 [4193] / ORA-600 [4194] for new transactions
ORA-600 [4137] for a transaction rollback

SOLUTION

Best practice to create a new undo tablespace.
This method includes segment check.

Create pfile from spfile to edit
>create pfile from spfile;

1. Shutdown the instance

2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'

3. >startup restrict pfile=<initsid.ora>

4. >select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';

This is critical - we are looking for all undo segments to be offline - System will always be online.

If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR.  There are many options from this moment and Oracle Support Analyst can offer different solutions for the bad undo segments.

If all offline then continue to the next step

5. Create new undo tablespace - example
>create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;

6. Drop old undo tablespace
>drop tablespace <old undo tablespace> including contents and datafiles;

7. >shutdown immediate;

8 >startup mount;

9 modify the pfile with the new undo tablespace name
>alter system set undo_tablespace = '<new tablespace>' scope=pfile;

10. >shutdown immediate;

11. >startup; 
 
        Startup using the normal spfile
================

The reason we create a new undo tablespace first is to use new undo segment numbers that are higher then the current segments being used.  This way when a transaction goes to do block clean-out the reference to that undo segment does not exist and continues with the block clean-out.

  
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值