一次oracle数据库数据块损坏的处理过程

最新推荐文章于 2023-09-20 22:30:31 发布

hijk139

最新推荐文章于 2023-09-20 22:30:31 发布

阅读量6.5k

点赞数 1

分类专栏： oracle

本文链接：https://blog.csdn.net/hijk139/article/details/19074407

版权

oracle 专栏收录该内容

63 篇文章 0 订阅

订阅专栏

1.1 报错ORA-600 [kcratr_nab_less_than_odr],不能启动

1，现象描述：服务器存储断电，导致数据库down机，再次尝试启动数据库，数据库不能正常启动，数据库报错如下

SQL> alter database open;

alter database open
*

ERROR at line 1:

ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []

2，查看alert日志，数据库启动时，已开始日志恢复，但恢复时报错ORA-600，除了生成一个trc文件外，没有更详细的信息

Thu Feb 06 23:13:19 2014

ALTER DATABASE OPEN

Beginning crash recovery of 1 threads

 parallel recovery started with 3 processes

Started redo scan

Completed redo scan

 read1580 KB redo, 211 data blocks need recovery

Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_ora_4679.trc  (incident=375883):

ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []

Incident details in:/app/oracle/diag/rdbms/epm/epm1/incident/incdir_375883/epm1_ora_4679_i375883.trc

Aborting crash recovery due to error 600

Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_ora_4679.trc:

ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []

Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_ora_4679.trc:

ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []

ORA-600 signalled during: ALTER DATABASEOPEN...

Trace dumping is performingid=[cdmp_20140206231320]

3，查看trc文件，通过WARNING可以看出，数据库恢复时认为seq2022（对应日志文件）中的记录，应该恢复到523240，但是恢复到523468却被迫停止了。应该是控制文件和日志文件记录不完全一致导致，注意这几个数据正好对应到kcratr_nab_less_than_odr的几个参数：[kcratr_nab_less_than_odr],[1], [2022], [523240], [523468]

*** 2014-02-06 23:37:11.872

Successfully allocated 3 recovery slaves

Using 45 overflow buffers per recoveryslave

Thread 1 checkpoint: logseq 2022, block 2,scn 9187726619852

 cache-low rba: logseq 2022, block 520079

   on-disk rba: logseq 2022, block 523468, scn 9187726672777

 start recovery at logseq 2022, block 520079, scn 0


*** 2014-02-06 23:37:11.890

Started writing zeroblks thread 1 seq 2022blocks 523240-523247

----------------------------------------------

WARNING! Crash recovery of thread 1 seq2022 is

ending at redo block 523240 but should nothave ended before

redo block 523468

Incident 375884 created, dump file:/app/oracle/diag/rdbms/epm/epm1/incident/incdir_375884/epm1_ora_4971_i375884.trc

ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []

ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []

ORA-00600: internal error code, arguments:[kcratr_nab_less_than_odr], [1], [2022], [523240], [523468], [], [], [], [],[], [], []

4,查询metalink，已有相关案例，主要是由于存储突然断电，导致向在线日志文件中写日志失败。

This Problem is caused by Storage Problemof the Database Files. The Subsystem (eg. SAN) crashed while the Database wasopen. The Database then crashed since the Database Files were not accessibleanymore. This caused a lost Write into the Online RedoLogs and so InstanceRecovery is not possible and raising the ORA-600.

官网中提供的解决办法是通过备份的控制文件恢复数据库，然后resetlog启动，当然如果没有备份，也可以通过参考文件中重建控制文件的方法。

1. If you could restore your StorageEnvironment and the Online RedoLogs from the Time of the crash you can try amanual Recovery followed by a RESETLOGS:

 SQL> startup mount;

 SQL> recover database until cancel usingbackup controlfile;

 -> manually provide Online RedoLogcontaining the last (current) Sequence when asked, eg.

 ORA-00279: change 100000 generated atxx/xx/xxxx xx:xx:xx needed for thread 1

ORA-00289: suggestion :

/flash_recovery/archivelog/xxxx_xx_xx/o1_mf_1_100_%u_.arc

ORA-00280: change 100000 for thread 1 is insequence #100


Specify log: {<RET>=suggested |filename | AUTO | CANCEL}

/ora/oradata/dbtest/redo04_1.rdo

Log applied.

Media recovery complete.

SQL> alter database open resetlogs;

5,通过metalink上的解决方法，根据提示输入在线日志文件名，数据库却再次报错ORA-00600 [2662] [2139]: internal errorcode, arguments: [2662], [2139], [791626372], [2139],

[oracle@localhost oradata]$ sqlplus"/as sysdba"

 SQL*Plus: Release 11.2.0.1.0 Production onThu Feb 6 17:53:47 2014

 Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount;

ORACLE instance started.

 Total System Global Area 6714322944 bytes

Fixed Size                  2226056 bytes

Variable Size            5033166968 bytes

Database Buffers         1660944384 bytes

Redo Buffers               17985536 bytes

Database mounted.

SQL> recover database until cancel usingbackup controlfile;

ORA-00279: change 9187726668895 generatedat 02/05/2014 01:00:04 needed for

thread 1

ORA-00289: suggestion :/app/archive_log/1_2022_804560942.dbf

ORA-00280: change 9187726668895 for thread1 is in sequence #2022


Specify log: {<RET>=suggested |filename | AUTO | CANCEL}

/app/oracle/oradata/epm/redo03.log

Log applied.

Media recovery complete.

SQL> alter database open resetlogs;

alter database open resetlogs

*

ERROR at line 1:

ORA-00603: ORACLE server session terminatedby fatal error

ORA-00600: internal error code, arguments:[2662], [2139], [791626372], [2139],

[791626610], [12583120], [], [], [], [],[], []

ORA-00600: internal error code, arguments:[2662], [2139], [791626371], [2139],

[791626610], [12583120], [], [], [], [],[], []

ORA-01092: ORACLE instance terminated.Disconnection forced

ORA-00600: internal error code, arguments:[2662], [2139], [791626368], [2139],

[791626610], [12583120], [], [], [], [],[], []

Process ID: 9658

Session ID: 1705 Serial number: 5

6，再次查看alert日志，报错基本和上述相同，并提示，生成了一个trc文件，网上查询，确定为resetlogs仍然存在数据库坏块

[oracle@localhost trace]$ tail -falert_epm1.log

ORA-00600: internal error code, arguments:[2662], [2139], [791626371], [2139], [791626610], [12583120], [], [], [], [],[], []

ORA-01092: ORACLE instance terminated.Disconnection forced

ORA-00600: internal error code, arguments:[2662], [2139], [791626368], [2139], [791626610], [12583120], [], [], [], [],[], []

Incident details in:/app/oracle/diag/rdbms/epm/epm1/incident/incdir_411735/epm1_ora_9658_i411735.trc

Errors in file/app/oracle/diag/rdbms/epm/epm1/incident/incdir_411735/epm1_ora_9658_i411735.trc:

ORA-00603: ORACLE server session terminatedby fatal error

ORA-00600: internal error code, arguments:[2662], [2139], [791626372], [2139], [791626610], [12583120], [], [], [], [],[], []

ORA-00600: internal error code, arguments:[2662], [2139], [791626371], [2139], [791626610], [12583120], [], [], [], [],[], []

ORA-01092: ORACLE instance terminated.Disconnection forced

ORA-00600: internal error code, arguments:[2662], [2139], [791626368], [2139], [791626610], [12583120], [], [], [], [],[], []

7.重启启动数据库，数据库已能正常启动，但启动后很快就down掉，

SQL> startup mount;

ORACLE instance started.

Total System Global Area 6714322944 bytes
Fixed Size                  2226056 bytes
Variable Size            5033166968 bytes
Database Buffers         1660944384 bytes
RedoBuffers               17985536 bytes
Database mounted.

SQL> alter database open;

Database altered.

SQL> select * from v$instance; 

select * from v$instance

*

ERROR at line 1:

ORA-03135: connection lost contact

Process ID: 10171

Session ID: 1705 Serial number: 5

1.2 启动后down，报错ORA-600 [4194]

8，查看日志，日志报错,ORA-01595,ora-600 [4194]

Block recovery completed at rba 2.137.16,scn 2139.791646684

Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_smon_10046.trc:

ORA-01595: error freeing extent (49) ofrollback segment (6))

ORA-00600: internal error code, arguments:[4194], [], [], [], [], [], [], [], [], [], [], []

9，继续查询metalink，关于ora-600[4194]，文章ID 1428786.1有详细讲解，原因是由于断电或者硬件故障，数据库实例恢复rollback时报错，

Symptoms

The following error is occurring in thealert.log right before the database crashes.

ORA-00600: internal error code, arguments:[4194], [#], [#], [], [], [], [], []

This error indicates that a mismatch hasbeen detected between redo records and rollback (undo) records.

ARGUMENTS:

Arg [a] - Maximum Undo record number inUndo block

Arg [b] - Undo record number from Redoblock

 Since we are adding a new undo record toour undo block, we would expect that the new record number is equal to themaximum record number in the undo block plus one. Before Oracle can add a newundo record to the undo block it validates that this is correct. If thisvalidation fails, then an ORA-600 [4194] will be triggered.

Changes

This issue generally occurs when there is apower outage or hardware failure that initially crashes the database. Onstartup, the database does the normal roll forward (redo) and then rollback(undo), this is where the error is generated on the rollback.

10，metalink的解决方案是重建undo表空间，基本思路是先把undo设置为manual管理方式，重建undo，然后重启库即可

SQL> Create pfile='/tmp/corrupt.ora'from spfile ;

vi /tmp/corrupt.ora

*.Undo_management=Manual

启动数据库到mount状态

SQL> Startup mountpfile='/tmp/corrupt.ora'

ORACLE instance started.

 

Total System Global Area 6714322944 bytes

Fixed Size                  2226056 bytes

Variable Size            5033166968 bytes

Database Buffers         1660944384 bytes

Redo Buffers               17985536 bytes

Database mounted.

SQL> Show parameter undo

NAME                                 TYPE        VALUE

----------------------------------------------- ------------------------------

undo_management                      string      MANUAL

undo_retention                       integer     10800

undo_tablespace                      string      UNDOTBS1

SQL> Alter database open ;

Database altered.

SQL> Create rollback segment r01 ;


Rollback segment created.

SQL> Alter rollback segment r01 online ;

 Rollback segment altered.

SQL> Create undo tablespace undotbs_2datafile '/app/oracle/oradata/epm/undotbs_2.dbf' size 200M ;

Tablespace created.

 SQL> alter system set undo_tablespace ='undotbs_2' scope=spfile;

 System altered.

再次重启数据库，此时也能正确启动

SQL> startup

ORACLE instance started.

 Total System Global Area 6714322944 bytes

Fixed Size                  2226056 bytes

Variable Size            5033166968 bytes

Database Buffers         1660944384 bytes

Redo Buffers               17985536 bytes

Database mounted.

Database opened.

1.3 数据库坏块的处理

11，此时，查看数据库日志，有数据库坏块方面的报错，虽然只有只报出一个数据库坏块，实际上可能存在更多的坏块

Thu Feb 06 18:52:57 2014

Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_j000_13117.trc  (incident=460023):

ORA-01578: ORACLE data block corrupted(file # 2, block # 31061)

ORA-01110: data file 2:'/app/oracle/oradata/epm/sysaux01.dbf'

Errors in file /app/oracle/diag/rdbms/epm/epm1/trace/epm1_j000_13117.trc  (incident=460024):

ORA-01578: ORACLE data block corrupted(file # 2, block # 31061)

ORA-01110: data file 2:'/app/oracle/oradata/epm/sysaux01.dbf'

Errors in file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_j000_13117.trc  (incident=460025):

ORA-01578: ORACLE data block corrupted(file # , block # )

ORA-01578: ORACLE data block corrupted(file # 2, block # 31061)

ORA-01110: data file 2:'/app/oracle/oradata/epm/sysaux01.dbf

12，验证数据库坏块，常用的有两种方法，两种方法本质上是一致的，本文采用的是rman命令

1），rman命令

run {

allocate channel d1 type disk;

allocate channel d2 type disk;

backup validate check logical database;

}

验证某个文件

RMAN> backup validate datafile 4;

Starting backup at 06-FEB-14

using channel ORA_DISK_1

channel ORA_DISK_1: starting full datafilebackup set

channel ORA_DISK_1: specifying datafile(s)in backup set

input datafile file number=00004name=/app/oracle/oradata/epm/users01.dbf

channel ORA_DISK_1: backup set complete,elapsed time: 00:04:25

List of Datafiles

=================

File Status Marked Corrupt Empty BlocksBlocks Examined High SCN

---- ------ -------------- --------------------------- ----------

4   FAILED 0              46754        477122          9187726907487

 File Name: /app/oracle/oradata/epm/users01.dbf

 Block Type Blocks Failing Blocks Processed

 ---------- -------------- ----------------

 Data       0              292182         

 Index      0              45604          

 Other      2              92580          

 
validate found one or more corrupt blocks

See trace file/app/oracle/diag/rdbms/epm/epm1/trace/epm1_ora_13757.trc for details

Finished backup at 06-FEB-14

2），dbv命令

验证某个数据库文件

dbvFILE=/app/oracle/oradata/epm/users01.dbf

Page 439168 is influx - most likely media corrupt

Corrupt block relative dba: 0x0106b380(file 4, block 439168)

Fractured block found during dbv:

Data in bad block:

 type: 32 format: 2 rdba: 0x0106b380

 lastchange scn: 0x085b.2ef4cb1d seq: 0xe flg: 0x04

 spare1: 0x0 spare2: 0x0 spare3: 0x0

 consistency value in tail: 0x72a7201b

 check value in block header: 0xfd1d

 computed block checksum: 0xb9af

 

Page 439296 is influx - most likely mediacorrupt

Corrupt block relative dba: 0x0106b400(file 4, block 439296)

Fractured block found during dbv:

Data in bad block:

 type: 32 format: 2 rdba: 0x0106b400

 lastchange scn: 0x085b.2ef4cb24 seq: 0x19 flg: 0x04

 spare1: 0x0 spare2: 0x0 spare3: 0x0

 consistency value in tail: 0x72ac2025

 check value in block header: 0xfd1f

 computed block checksum: 0xb9b4

 
DBVERIFY - Verification complete

Total Pages Examined         : 477120

Total Pages Processed (Data) : 292182

Total Pages Failing   (Data) : 0

Total Pages Processed (Index): 45604

Total Pages Failing   (Index): 0

Total Pages Processed (Other): 92578

Total Pages Processed (Seg)  : 0

Total Pages Failing   (Seg) : 0

Total Pages Empty            : 46754

Total Pages Marked Corrupt   : 2

Total Pages Influx           : 2

Total Pages Encrypted        : 0

Highest block SCN            : 791817065 (2139.791817065)

13，此时，全库逻辑导出，保护好现有的数据

expdp userid=\"/ as sysdba\"full=y dumpfile=epm20140206_3.dmp directory=dpdata1 LOGFILE=epm20140206_3.log PARALLEL=3

14，通过rman命令,共检验出22个数据库坏块

SQL> select * fromv$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO

---------- ---------- ---------------------------- ---------

        4     439296          1                  0 FRACTURED

        4     439168          1                  0 FRACTURED

        8    2610431          1                  0 FRACTURED

        8    2547178          1                  0 FRACTURED

        8    2547114          1                  0 FRACTURED

        8    2547050          1                  0 FRACTURED

        8    2546986          1                  0 FRACTURED

        8    2546922          1                  0 FRACTURED

        8    2546890          1                  0 FRACTURED

        8    2546858          1                  0 FRACTURED

        8    2546826          1                  0 FRACTURED

      FILE#    BLOCK#     BLOCKSCORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------------------------- ---------

        8    2546794          1                  0 FRACTURED

        8    2546762          1                  0 FRACTURED

        8    2546730          1                  0 FRACTURED

        8    2546698          1                  0 FRACTURED

        8    2459433          1                  0 FRACTURED

        8    2459305          1                  0 FRACTURED

        8    1596687          1                  0 FRACTURED

        9     876808          1                  0 FRACTURED

        9     662038          1                  0 FRACTURED

        9     345491          1                  0 FRACTURED

        9     281617          1                  0 FRACTURED

15，进一步定位到数据库对象

查看坏块对应的对象

Selecttablespace_name,segment_type,owner,segment_name From dba_extents Wherefile_id=2 and 31061 between block_id and block_id+blocks-1;

根据坏块生产查看对象的sql

select 'Select tablespace_name,segment_type,owner,segment_nameFrom dba_extents Where file_id=' || FILE# || ' and ' || BLOCK# ||  ' between block_id and block_id+blocks-1;'from v$database_block_corruption;

16，对于索引坏块，处理比较简单，直接rebuild即可,如果索引坏块较多，可以选择重建所有索引

重建索引

SQL> alter INDEX EPM.IDX_QRTZ_T_NEXT_FIRE_TIMErebuild online;

对于表比较复杂

1)，可以通过备份恢复数据块，本例由于备份有问题，未成功

RMAN> blockrecover datafile 2 block31061 from backupset;
 
Starting recover at 06-FEB-14

using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)

channel ORA_DISK_1: specifying block(s) torestore from backup set

restoring blocks of datafile 00002

channel ORA_DISK_1: reading from backuppiece /app/backup/rman_bak/rman.c0_20140122_1629_1_EPM.bak

channel ORA_DISK_1: ORA-19870: error whilerestoring backup piece /app/backup/rman_bak/rman.c0_20140122_1629_1_EPM.bak

ORA-19501: read error on file"/app/backup/rman_bak/rman.c0_20140122_1629_1_EPM.bak", block number154496 (block size=8192)

ORA-27061: waiting for async I/Os failed

Linux-x86_64 Error: 5: Input/output error

Additional information: -1

Additional information: 1048576

2)，如果没有备份，则可以通过如下方法跳过坏块，当然会丢失部分数据

SQL> ALTER SESSION SET EVENTS

 2  '10231 TRACE NAME CONTEXTFOREVER, LEVEL 10';

 Session altered.

 SQL> create tableEPM.REQ_RESPSB_SUPPLIER_BAK as select * from EPM.REQ_RESPSB_SUPPLIER;

 Table created.

 SQL> select count(*) fromEPM.REQ_RESPSB_SUPPLIER_BAK;

  COUNT(*)

----------

   188786

17，至此，数据库启动成功，业务也基本恢复正常了

参考文档

metalink:ORA-600 [kcratr_nab_less_than_odr]during Instance Recovery after Database Crash (Doc ID 1299564.1)

metalink:Step by step to resolve ORA-6004194 4193 4197 on database crash (Doc ID 1428786.1)

http://www.eygle.com/archives/2010/05/ora-00600_kcratr1_lostwrt.html

http://www.xifenfei.com/2347.html

http://www.askmaclean.com/archives/ora-6004194%E9%94%99%E8%AF%AF%E4%B8%80%E4%BE%8B.html

hijk139

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
一次oracle数据库数据块损坏的处理过程

1.1 报错ORA-600 [kcratr_nab_less_than_odr],不能启动1，现象描述：服务器存储断电，导致数据库down机，再次尝试启动数据库，数据库不能正常启动，数据库报错如下SQL> alter database open;alter database open*ERROR at line 1:ORA-00600: internal error
复制链接

扫一扫