oracle修复失败怎么办,【案例】Oracle异常恢复 由于坏块和不恰当恢复引起各种ORA-600案例完整思路...

天萃荷净

运维DBA反映Oracle数据库无法Open正常打开,查看alert日志发现大量各种Ora-00600错误,分析原因为由于坏块和不恰当恢复引起

1.故障后重启,无法启动主要表现在block坏块,引起的各种ORA-600等错误

Mon Mar 02 16:09:27 2015

ALTER DATABASE OPEN

Beginning crash recovery of 1 threads

parallel recovery started with 23 processes

Started redo scan

Completed redo scan

read 962 KB redo, 256 data blocks need recovery

Started redo application at

Thread 1: logseq 726, block 37343

Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0

Mem# 0: /u01/app/oracle/oradata/oa/redo03.log

Mon Mar 02 16:09:27 2015

RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3

Completed redo application of 0.27MB

Mon Mar 02 16:09:27 2015

RECOVERY OF THREAD 1 STUCK AT BLOCK 3104 OF FILE 3

Mon Mar 02 16:09:27 2015

RECOVERY OF THREAD 1 STUCK AT BLOCK 3613 OF FILE 3

Mon Mar 02 16:09:28 2015

RECOVERY OF THREAD 1 STUCK AT BLOCK 272 OF FILE 3

Mon Mar 02 16:09:28 2015

RECOVERY OF THREAD 1 STUCK AT BLOCK 2512 OF FILE 3

Hex dump of (file 2, block 92889) in trace file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc

Corrupt block relative dba: 0x00816ad9 (file 2, block 92889)

Bad header found during preparing block for write

Data in bad block:

type: 0 format: 0 rdba: 0x6ad90000

last change scn: 0x0000.00c6a052 seq: 0x1 flg: 0x00

spare1: 0x6 spare2: 0xa2 spare3: 0x5d7e

consistency value in tail: 0xa0520001

check value in block header: 0x0

block checksum disabled

Mon Mar 02 16:09:28 2015

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p007_4196.trc (incident=3833):

ORA-00600: internal error code, arguments: [4502], [1], [], [], [], [], [], [], [], [], [], []

Mon Mar 02 16:09:28 2015

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p013_4208.trc (incident=3881):

ORA-00600: internal error code, arguments: [2037], [4259067], [4244307968], [159], [243], [0], [2162032704], [100728832], [], [], [], []

Slave exiting with ORA-1172 exception

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p009_4200.trc:

ORA-01172: recovery of thread 1 stuck at block 3613 of file 3

ORA-01151: use media recovery to recover block, restore backup if needed

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p001_4184.trc:

ORA-01172: recovery of thread 1 stuck at block 2512 of file 3

ORA-01151: use media recovery to recover block, restore backup if needed

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:

ORA-10388: parallel query server interrupt (failure)

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc (incident=3697):

ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3697/oa_dbw2_4158_i3697.trc

Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0xD2DDB7, kcbs_shrink_pool()+705] [flags: 0x0, count: 1]

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_mman_4152.trc (incident=3673):

ORA-07445: exception encountered: core dump [kcbs_shrink_pool()+705] [SIGSEGV] [ADDR:0x0] [PC:0xD2DDB7] [SI_KERNEL(general_protection)] []

Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3673/oa_mman_4152_i3673.trc

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc:

Mon Mar 02 16:09:34 2015

Instance terminated by DBW2, pid = 4158

2.第二次重启后增加新错误ORA-00600[17182]

Mon Mar 02 16:39:50 2015

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p002_4321.trc (incident=4993):

ORA-00600: internal error code, arguments: [17182], [0x7F548C2BDBA8], [], [], [], [], [], [], [], [], [], []

进行了一些恢复处理后,日志中报错

主要体现在进行了不完全恢复,而且应该是对redo进行了重命名或者redo头损坏锁引起的一系列提示

Beginning crash recovery of 1 threads

Started redo scan

Completed redo scan

read 962 KB redo, 256 data blocks need recovery

Started redo application at

Thread 1: logseq 726, block 37343

Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0

Mem# 0: /u01/app/oracle/oradata/oa/redo03.log

RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3

Aborting crash recovery due to error 1172

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:

ORA-01172: recovery of thread 1 stuck at block 1673 of file 3

ORA-01151: use media recovery to recover block, restore backup if needed

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:

ORA-01172: recovery of thread 1 stuck at block 1673 of file 3

ORA-01151: use media recovery to recover block, restore backup if needed

ORA-1172 signalled during: alter database open...

Tue Mar 03 11:17:59 2015

Sweep [inc][17178]: completed

Sweep [inc][17177]: completed

Sweep [inc2][17178]: completed

Tue Mar 03 11:18:00 2015

ALTER DATABASE RECOVER database until cancel

Media Recovery Start

started logmerger process

Parallel Media Recovery started with 24 slaves

ORA-279 signalled during: ALTER DATABASE RECOVER database until cancel ...

ALTER DATABASE RECOVER CONTINUE DEFAULT

Tue Mar 03 11:18:06 2015

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:

ORA-00266: name of archived log file needed

ORA-266 signalled during: ALTER DATABASE RECOVER CONTINUE DEFAULT ...

ALTER DATABASE RECOVER CANCEL

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'

Slave exiting with ORA-1547 exception

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'

ORA-10879 signalled during: ALTER DATABASE RECOVER CANCEL ...

Tue Mar 03 11:18:06 2015

Checker run found 4 new persistent data failures

Tue Mar 03 11:18:13 2015

alter database open resetlogs

RESETLOGS is being done without consistancy checks. This may result

in a corrupted database. The database should be recreated.

RESETLOGS after incomplete recovery UNTIL CHANGE 12986989

Resetting resetlogs activation ID 3278679642 (0xc36cae5a)

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:

ORA-00367: checksum error in log file header

ORA-00322: log 1 of thread 1 is not current copy

ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/oa/redo01.log'

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:

3.增加了_allow_resetlogs_corruption= TRUE之后数据库报ORA-600[2662]

Tue Mar 03 11:19:26 2015

SMON: enabling cache recovery

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc (incident=18195):

ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_18195/oa_ora_6864_i18195.trc

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc:

ORA-00704: bootstrap process failure

ORA-00704: bootstrap process failure

ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []

Error 704 happened during db open, shutting down database

USER (ospid: 6864): terminating the instance due to error 704

Instance terminated by USER, pid = 6864

ORA-1092 signalled during: alter database open...

opiodr aborting process unknown ospid (6864) as a result of ORA-1092

Tue Mar 03 11:19:29 2015

ORA-1092 : opitsk aborting process

4.undo已经被其offline,无法正常访问,导致系统报ORA-704和ORA-00376

Wed Mar 04 21:10:58 2015

SMON: enabling cache recovery

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-00376: file 3 cannot be read at this time

ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-00376: file 3 cannot be read at this time

ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'

Error 704 happened during db open, shutting down database

USER (ospid: 17074): terminating the instance due to error 704

Instance terminated by USER, pid = 17074

ORA-1092 signalled during: alter database open...

opiodr aborting process unknown ospid (17074) as a result of ORA-1092

Wed Mar 04 21:11:00 2015

ORA-1092 : opitsk aborting process

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检测,这里可以知道undo 不知道怎么折腾的数据文件scn较大而且还offline,

5.通过一些列方法(bbed,隐含参数等)调整数据库scn,强制启动数据库,报如下错误

Wed Mar 04 22:50:23 2015

SMON: enabling cache recovery

ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):

select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small

Error 704 happened during db open, shutting down database

USER (ospid: 17807): terminating the instance due to error 704

Instance terminated by USER, pid = 17807

ORA-1092 signalled during: alter database open resetlogs...

opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,该错误怀疑是文件头scn不够大,块延迟清理导致,进一步增加scn尝试,最后依旧是ORA-00704/ORA-00604/ORA-01555错误

Wed Mar 04 22:50:23 2015

SMON: enabling cache recovery

ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):

select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small

Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small

Error 704 happened during db open, shutting down database

USER (ospid: 17807): terminating the instance due to error 704

Instance terminated by USER, pid = 17807

ORA-1092 signalled during: alter database open resetlogs...

opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,在scn上做手脚估计难以解决给问题,对其启动过程做10046和errorstack分析发现

PARSING IN CURSOR #3 len=202 dep=2 uid=0 oct=3 lid=0 tim=1425481940448439 hv=3819099649 ad='64ff91af8' sqlid='3nkd3g3ju5ph1'

select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null

END OF STMT

PARSE #3:c=1000,e=334,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=0,tim=1425481940448439

BINDS #3:

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=7f5b3253a6f0 bln=22 avl=01 flg=05

value=0

Bind#1

oacdty=01 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00

oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0

kxsbbbfp=7f5b3253a6b8 bln=32 avl=06 flg=05

value="PROPS$"

Bind#2

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=7f5b3253a688 bln=24 avl=02 flg=05

value=1

EXEC #3:c=0,e=640,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=2853959010,tim=1425481940449147

WAIT #3: nam='db file sequential read' ela= 5 file#=1 block#=345 blocks=1 obj#=37 tim=1425481940449186

WAIT #3: nam='db file sequential read' ela= 4 file#=1 block#=44528 blocks=1 obj#=37 tim=1425481940449221

WAIT #3: nam='db file sequential read' ela= 3 file#=1 block#=5505 blocks=1 obj#=37 tim=1425481940449247

*** 2015-03-04 23:12:20.450

dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)

----- Error Stack Dump -----

ORA-00604: error occurred at recursive SQL level 2

ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small

----- Current SQL Statement for this session (sql_id=g64r07v2jn8nq) -----

SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'

这里可以发现是数据库在启动的过程中需要执行SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,而该语句递归调用了select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null 语句。既然这样通过一些方法避免数据库启动之时查询SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,果然数据库启动成功。

6.Ora-00600报错知识点补充

ORA-600 [4502] [a]

Arg [a] ITL entry with a lock count

Meaning: During ITL cleanout we clear all row locks but the ITL entry

still thinks there is an uncleared lock. Ie: ITL has a locked

row but there are no locked rows in the block

大体意思是数据库发现undo 的itl已经被清除,但是block中的itl依然存在,从而出现ORA-600[4502],引起该问题除bug外主要原因是坏块

ORA-600 [2037] [a] [b] [d] [e] [f] [g]

Arg [a] Relative Data Block Address (RDBA) that the redo vector is for

Arg [b] The Block format

Arg 1 RDBA in the block itself

Arg [d] The block type

Arg [e] The sequence number

Arg [f] Flags, if set

Arg [g] The return value from the block head/tail checker.

DESCRIPTION:

During recovery we are examining a block to ensure that it is not

corrupt prior to applying any change vectors.

The block has failed this check and this exception is raised

大体意思是在恢复过程中,正在检查的块,以确保它在应用任何变化向量之前不损坏。如果检查失败排除该异常ORA-600[2037],引起该问题除bug外主要原因是坏块

ORA-600 [kcbzpbuf_1],[a],[b]

Arg [a] Corruption reason

Arg [b] Calculate checksum flag

Corruption reason:

#define KCBH_GOOD 0 /* block is valid */

#define KCBH_ZERO 1 /* block header was entirely zero on disk */

#define KCBH_BROKEN 2 /* corruption could be from a partial disk write */

#define KCBH_CHKVAL 3 /* The check value for the block failed */

#define KCBH_CORRUPT 4 /* this is the wrong block or is not a data block */

#define KCBH_ZERONG 5 /* all zero block and it is not allowed */

Calculate checksum flag:

The possible values are 1 (Generate Checksum - db_block_checksum is enabled - default value)

0 (do not generate checksum - db_block_checksum=false)

kcbzpbuf_1是该错误的源码函数

ORA-600 [17182] [a] [b] [d] [e]

DESCRIPTION:

Oracle has detected that the magic number in a memory chunk header has been overwritten.

This is a heap (in memory) corruption and there is no underlying data corruption.

The error may occur in the one of the process specific heaps

(the Call heap, PGA heap, or session heap) or in the shared heap (SGA).

1

ORACLE 发现在内存中重要的块头被重新,但是没有基础数据损坏,大部分和数据块或者内存损坏有关系.

ORA-600 [4552] [a] [b] 1 [d] [e]

1

DESCRIPTION:

This assertion is raised because we are trying to unlock the rows in a

block, but receive an incorrect block type.

The second argument is the block type received.

ORACLE尝试对某行进行解锁但是接收到了不正确的数据块类型,Arg [b]是接收到的数据块类型

ORA-600 [2662] [a] [b] [d] [e]

DESCRIPTION:

A data block SCN is ahead of the current SCN.

The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN

stored in a UGA variable.

If the SCN is less than the dependent SCN then we signal the ORA-600 [2662]

internal error.

ARGUMENTS:

Arg [a] Current SCN WRAP

Arg [b] Current SCN BASE

Arg 1 dependent SCN WRAP

Arg [d] dependent SCN BASE

Arg [e] Where present this is the DBA where the dependent SCN came from.

主要的含义就是oracle文件头scn比某个block dependent scn小从而出现该问题

--------------------------------------ORACLE-DBA----------------------------------------

最权威、专业的Oracle案例资源汇总之【案例】Oracle异常恢复 由于坏块和不恰当恢复引起各种ORA-600案例完整思路

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值