oracle艰难的恢复（二）

最新推荐文章于 2023-06-13 16:25:24 发布

skate

最新推荐文章于 2023-06-13 16:25:24 发布

阅读量5.1k

点赞数

分类专栏： Oracle Backup && Recover 文章标签： oracle 数据库 structure table file database

本文链接：https://blog.csdn.net/wyzxg/article/details/5898120

版权

Oracle Backup && Recover 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

author：skate

time：2010-09-20

前面已经说了192已经恢复失败，现在要说说245数据库恢复。

192数据库已经没有办法了，那接下来看看245的数据库是否可以恢复

由于一个失误，操作log没有保留下来，所以只能凭记忆了，一些比较重要的错误记得，小错误可能就不记得，反正恢复的过程挺不容易啊

通过查看监听的状态及数据库启动的错误提示：类似文件不存在。类似下面的错误

ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9925
ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 13: Permission denied

但是在os层还能看见相应的数据文件，我于是重启了245服务器，在启动时候提示文件系统不一致，需要检查验证，用fsck命令检查修复文件系统，遇到类似下面的错误:

Illegal block #16 (812151909） in inode 8601655 , cleared.

too many illegal blocks in inode 1721953 clear inode <y> ?

.......

是交互式询问，我的回答都是‘Y’,最后reboot系统，这回系统启动是正常的，然后在启动数据库，报类似如下的错误：

ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/free/oracle/oradata/orcl/system01.dbf'
ORA-01207: file is more recent than control file - old control file

这个错误是oracle数据库比较常见的错误，引起的原因很多，像由于数据库突然掉电（或shutdown abort），那再次启动数据库就会报这个错误。根据错误提示是控制文件太旧，控制文件对于oracle数据库来说是非常重要的，他就像军队的指挥部，记录了数据库的全部信息，如数据文件，日志文件，归档文件，备份文件等。那为什么oracle在启动的时候会报这个错误呢？要是理解oracle启动原理，就比较好理解了，因为oracle在启动的时候要做数据库的一致性检查。而控制文件是实时记录数据库的动态信息，如检查点等，如果数据库在突然的异常，导致当前的数据库信息无法实时的更新到控制文件中，那控制文件相对于数据库就滞后了。那oracle在启动检查时发现不一致，就会报错了。

这个错误通常有两种方法：

1. 一个是数据库为核心，来恢复控制文件，最后达到数据库一致；

2. 另一个是以控制文件为核心，来恢复数据库，最后达到数据库一直，但是这样一来就有可能丢失数据。

为了保证不丢失数据，最好的方法是数据文件为核心，来恢复控制文件。

方法一：用旧的控制文件恢复，最后用resetlogs打开数据库。

具体步骤：

1：startup mount;
2：recover database using backup controlfile until cancel; ///然后根据情况，指定archive log和redo file。
3：alter database open resetlogs;

因为这里丢失一个archivelog，alertlog日志里显示如下：

Media Recovery Log /oracle/product/9.2.0/arch/1_9998.dbf
Errors with log /oracle/product/9.2.0/arch/1_9998.dbf.

所以无法恢复成功，记得dbsnake的blog说可以跳过丢失归档恢复，哪天测试下。既然不成功，那就用第二个方法

方法二：创建控制文件，然后open数据库。

具体步骤：

1.startup mount
2.alter database backup controlfile to trace //或者：alter database backup controlfile to '/oracle/backup/245new.sql'
3.在用trace文件里，提炼创建controlfile的sql或脚本
4.shutdown
5.startup nomount //再次用第2步骤创建控制文件
6.recover database
7.alter database open

结果还是提示丢失一个archivelog，和前面类似，看样子那个log是很长时间以前的了，那怎么办，无法进行丢失归档log恢复，那就只能让数据库不做一致性检查，通过隐含参数（ _allow_resetlogs_corruption= TRUE）打开数据库，结果打开后数据库又报：

Errors in file /u01/app/oracle/admin/orcl/udump/orcl_ora_19297.trc:
ORA-00600: internal error code, arguments: [2662], [0], [2168198369], [0], [2168310274], [8388617], [], []
Sun Sep 19 13:32:37 2010
Errors in file /u01/app/oracle/admin/orcl/udump/orcl_ora_19297.trc:
ORA-00600: internal error code, arguments: [2662], [0], [2168198369], [0], [2168310274], [8388617], [], []

ORA-600 [2662] "Block SCN is ahead of Current SCN"，说明当前数据库的数据块的SCN大于当前的SCN，
主要是和存储在UGA变量中的dependent SCN进行比较，如果当前的SCN小于它，数据库就会产生这个ORA-600 [2662]的错误了。

这个错误一共有五个参数，分别代表不同的含义，

ORA-600 [2662] [a] [b] [c] [d] [e]

Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

产生这个600的原因一般有以下几个：

1）使用隐含参数_ALLOW_RESETLOGS_CORRUPTION后resetlogs打开数据库
2）硬件错误引起数据库没法写控制文件和重做日志文件
3）错误的部分恢复数据库
4）恢复了控制文件但是没有使用recover database using backup controlfile进行恢复
5）数据库crash后设置了_DISABLE_LOGGING隐含参数
6）在并行服务器环境中DLM存在问题

我这个里是因为用了_ALLOW_RESETLOGS_CORRUPTION，强制打开数据库才会出现这个错误。

可以通过Oracle的内部事件来调整并增进SCN.

1.通过immediate trace name方式(在数据库Open状态下)

alter session set events 'IMMEDIATE trace name ADJUST_SCN level x';

2.通过10015事件(在数据库无法打开，mount状态下)

alter session set events '10015 trace name adjust_scn level x';

注:level 1为增进SCN 10亿 (1 billion) (1024*1024*1024),通常Level 1已经足够。也可以根据实际情况适当调整。

我通过执行 alter session set events '10015 trace name adjust_scn level 10';

然后重新启动数据库（带隐含参数）又报如下的错误：

ORA-00600: internal error code, arguments: [4193], [12410], [12320], [], [], [], [], []

The ORA-600[4194]：
is signalled when the transaction table and the undo block is out of sync. Maybe your rollback segments corrupted.
A mismatch has been detected between Redo records and rollback (Undo) records.
This error may indicate a rollback segment corruption.This may require a recovery from a database backup depending on the situation.

说明：600-[4000]到[5000]的错误都是和rollback 相关的

对于oracle回滚段相关的错误，可以用如下隐含参数

_offline_rollback_segments：Oracle还会尝试进行数据块的延迟清除，
_corrupted_rollback_segments： Oracle假设事务已经回滚，而不会进行其他的检查

我这里用的是_corrupted_rollback_segments，那这个参数里的回滚段都填什么呢？对于启动的数据库可以查视dba_rollback_segs;对于没有启动的数据库，可以通过linux操作系统命令strings从system表空间的数据文件中获取回滚段信息。windows系统的话，可以用ftp把system数据文件传到linux系统，然后再用strings命令查看，类似：

> strings system01.dbf | grep _SYSSMU

通过添加两个隐含参数，数据库可以起来了

_allow_resetlogs_corruption= TRUE
_corrupted_rollback_segments= _SYSSMU2$, _SYSSMU3$, _SYSSMU4$, _SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$,
_SYSSMU1$, _SYSSMU12$, _SYSSMU13$, _SYSSMU14$, _SYSSMU15$, _SYSSMU16$, _SYSSMU17$, _SYSSMU18$, _SYSSMU19$, _SYSSMU20$, _SYSSMU11$

接下来就是重建回滚表空间，用如下的命令删除回滚表空间

sql> DROP TABLESPACE UNDOTBS1 INCLUDING CONTENTS AND DATAFILES;

如果不删除的话，可以通过如下语句把标记“ NEEDS RECOVERY ”的回滚段删除

sql> drop rollback segment _SYSSMU1$
....

把问题的回滚段都删除了，也可以不用重创建undo表空间，只是以后用回滚段是从你删除的位置开始计数，例如我删除到_SYSSMU10$，那下次启动就是从_SYSSMU11$开始，可以会有潜在为风险，但我觉得不会有大碍的，我这里就没有重建undo表空间

sql> DROP TABLESPACE UNDOTBS1 INCLUDING CONTENTS AND DATAFILES;

再新建undo空间

SQL> CREATE UNDO TABLESPACE UNDOTBS1 DATAFILE '/oracle/oradata/orcl/UNDOTBS1.DBF' SIZE2000M;

因为数据库是强制打开了，所以问题一大堆。在我删除回滚段是就如下错误

ORA-00600: internal error code, arguments: [25012], [17], [76], [], [], [], [], []

还报如下错误：
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 3724)
ORA-01110: data file 1: '/oracle/oradata/orcl/system01.dbf'

如下是ORA-600 [25012]的metalink说明：
#######################################################################3
ORA-600 [25012] "Relative to Absolute File Number Conversion Error"

PURPOSE:
This article discusses the internal error "ORA-600 [25012]", what
it means and possible actions. The information here is only applicable
to the versions listed and is provided only for guidance.

ERROR:
ORA-600 [25012] [a]

VERSIONS:
versions 8.0 to 10.1

DESCRIPTION:

We are trying to generate the absolute file number given a tablespace
number and relative file number and cannot find a matching file number
or the file number is zero.

ARGUMENTS:
Arg [a] Tablespace Number
Arg Relative file number

FUNCTIONALITY:
KERNEL FILE MANAGEMENT TABLESPACE COMPONENT

IMPACT:
POSSIBLE PHYSICAL CORRUPTION

SUGGESTIONS:

The possibility of physical corruption exists.

Obtain the trace files and alert.log for this error and log an iTAR
with Oracle Support Services for diagnosis.

##################################################################

通过说明可以看出是文件号的相对和绝对号转换的错误，一般是因为PHYSICAL CORRUPTION导致

我重新启动数据库，如下错误不停的出现：

ORA-00600: internal error code, arguments: [25012], [17], [76], [], [], [], [], []

和

ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 3724)
ORA-01110: data file 1: '/oracle/oradata/orcl/system01.dbf'

大概过了半小时，又报如下错误：

ORA-00600: internal error code, arguments: [6002], [6], [36], [1], [0], [], [], []

如下是ORA-600 [6002]的metalink说明：
###########################################################
Doc ID: Note:47449.1
Subject: ORA-600 [6002] "Index block check"
Type: REFERENCE
Status: PUBLISHED
Content Type: TEXT/PLAIN
Creation Date: 02-OCT-1997
Last Revision Date: 15-OCT-2004

Note: For additional ORA-600 related information please read [NOTE:146580.1]

PURPOSE:

This article discusses the internal error "ORA-600 [6002]", what it means and possible actions.
The information here is only applicable to the versions listed and is provided only for guidance.

ERROR:

ORA-600 [6002] [a] [b] [c] [d]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:

Oracle was trying to insert a key and key data into a b*tree index. In order to do this, it had to
first find the correct leaf block to do the insert. Once the correct leaf block is retrieved, Oracle validates t
he block by checking the data size and number of columns in the key. If there is a mismatch then ORA-600 [6002] is reported.

ARGUMENTS:
Arg [a] Number of bytes in keydata
Arg [b] Number of bytes in the index layer of the leaf header
Arg [c] Number of columns in index search key structure
Arg [d] Number of columns in the index layer fo the leaf header

FUNCTIONALITY:

Kernel Data layer Index

IMPACT:

PROCESS FAILURE Possible INDEX corruption. No corruption to underlying data.

SUGGESTIONS:

Check for problem indexes associated with any tables involved in the failing SQL statement.

ANALYZE TABLE <tablename> VALIDATE STRUCTURE CASCADE;

Drop and recreate any indexes that are reported as being a problem. If the Known Issues section below does not help in terms of identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further analysis.

Known issues:

Bug# 3785200 * See [NOTE:3785200.8] * This bug is alerted in [NOTE:281047.1] Corruption possible in
automatic space managed segments Fixed: 9.2.0.6, 10.1.0.2 Bug# 1475310 * See [NOTE:1475310.8] * This bug is alerted
in [NOTE:125149.1] ALTER INDEX .. REBUILD ONLINE/ MOVE IOT ONLINE can corrupt the index Fixed: 8.1.7.1, 9.0.1.0
Bug# 1475310 ALTER INDEX .. REBUILD ONLINE/ MOVE IOT ONLINE can corrupt the index Fixed: 8.1.7.1, 9.0.1.0

###########################################################
从这个说明上可以看到是索引有坏块了。

system表空间的数据块损修复坏比较麻烦，看损坏的是什么，如果是索引还好，如果是核心的数据表那就难了。对坏块的恢复
思想就是跳过坏块或者用bbed修正坏块，

如果损坏是表，丢数据可以非常大，一种方法是跳过坏块，不让oracle检测坏块

如果有rman备份就简单了，一条命令就ok

RMAN>blockrecover datafile 5 block 12;

可是这里没有备份啊，那就先确定损坏的是什么

SQL> select segment_name, segment_type, owner, tablespace_name
2    from dba_extents et
3   where et.file_id = &fileid
4     and &bad_block_id between et.block_id and et.block_id+blocks-1
5 ;

SEGMENT_NAME SEGMENT_TYPE OWNER TABLESPACE_NAME
------------------------- ------------------ ------------------------------ ------------------------------
SMON_SCN_TO_TIME CLUSTER SYS SYSTEM

SQL>

SMON_SCN_TO_TIME是SMON_SCN_TIME表的基簇，SMON_SCN_TIME表用以记录数据库中scn对应的时间戳。我们直接查看用
以创建数据字典的sql.bsq文件（$ORACLE_HOME/rdbms/admin/sql.bsq），可以进一步了解他们的结构:

[root@localhost ~]# cat /u01/app/oracle/product/9.2.0/rdbms/admin/sql.bsq |grep -A 34 "create cluster smon_scn_to_time"
create cluster smon_scn_to_time (
thread number                         /* the thread number */
)
/
create index smon_scn_to_time_idx on cluster smon_scn_to_time
/
create table smon_scn_time (
thread number,                        /* the thread number */
time_mp number,                        /* time this recent scn represents */
time_dp date,                          /* time converted into oracle date */
scn_wrp number,                        /* scn.wrp */
scn_bas number                         /* scn.bas */
) cluster smon_scn_to_time (thread)
/
create table aw$
(awname varchar2("M_IDEN"),              /* name of AW */
owner# number not null,                /* owner of AW */
awseq# number not null)                /* aw sequence number */
/
create unique index aw_ind$ on aw$(awname, owner#)
/
create table ps$
( awseq# number not null,                 /* aw sequence number */
psnumber number(10),                    /* pagespace number */
psgen number(4),                        /* pagespace generation */
mapoffset number,                       /* offset of map */
maxpages number,                        /* max pages in ps */
almap raw(8),                           /* pointer to almap */
header raw(200),                        /* internal header */
gelob blob)                             /* erase list */
lob (gelob) store as (disable storage in row)
/
create unique index i_ps$ on ps$ (awseq#, psnumber, psgen)
/
create sequence psindex_seq$ /* sequence for pagespace index */
[root@localhost ~]#

可以进一步通过validate验证所有这些对象，如下

SQL> analyze table SMON_SCN_TIME validate structure;
SQL> analyze table SMON_SCN_TIME validate structure cascade;
和
SQL> analyze cluster SMON_SCN_TO_TIME validate structure;
SQL> analyze cluster SMON_SCN_TO_TIME validate structure cascade;

我的验证结果是：

SYS@orcl> analyze table smon_scn_time validate structure;
analyze table smon_scn_time validate structure
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 1, block # 3724)
ORA-01110: data file 1: '/oracle/oradata/orcl/system01.dbf'

SYS@orcl> analyze table smon_scn_time validate structure cascade;
analyze table smon_scn_time validate structure cascade
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 1, block # 3724)
ORA-01110: data file 1: '/oracle/oradata/orcl/system01.dbf'

SYS@orcl> analyze cluster SMON_SCN_TO_TIME validate structure;
analyze cluster SMON_SCN_TO_TIME validate structure
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 1, block # 3724)
ORA-01110: data file 1: '/oracle/oradata/orcl/system01.dbf'

我查询表smon_scn_time，也报如上错误
select * from sys.smon_scn_time;

报错如下：

ORA-01578: ORACLE data block corrupted (file # 1, block # 3724)
ORA-01110: data file 1: '/oracle/oradata/orcl/system01.dbf'

因为报ORA-01578，说明数据块已经被标识为"software corrupt"，查询dba_tables.skip_corrupt字段
也可以验证，如果没被标识坏块，可以通过dbms_repair完成。

那就通过dbms_repair.skip_corrupt_blocks或者10231事件，让扫描是跳过坏块，可以抢救完好数据

1）dbms_repair.skip_corrupt_blocks：
        execute dbms_repair.skip_corrupt_blocks('&schema','&tablename');

    清除标志：
        execute dbms_repair.skip_corrupt_blocks('&schema','&tablename',flags=>dbms_repair.noskip_flag);
2）通过10231事件：

alter session set events '10231 trace name context forever,level 10';

通过执行会话级10231，跳过坏块

SYS@orcl> alter session set events '10231 trace name context forever,level 10'

这样就可以读取smon_scn_time所有完好数据块的内容

create table smon_scn_time_bak as
select * from sys.smon_scn_time

删除簇
SYS@orcl> drop cluster smon_scn_to_time including tables cascade constraints;

Cluster dropped.

创建簇
SYS@orcl> create cluster smon_scn_to_time (
2 thread number
3 );

Cluster created.

创建簇索引
SYS@orcl> create index smon_scn_to_time_idx on cluster smon_scn_to_time;

Index created.

创建表

SYS@orcl> create table smon_scn_time (
2    thread number,                        /* the thread number */
3    time_mp number,                        /* time this recent scn represents */
4    time_dp date,                          /* time converted into oracle date */
5    scn_wrp number,                        /* scn.wrp */
6    scn_bas number                         /* scn.bas */
7 ) cluster smon_scn_to_time (thread);

Table created.

把数据copy回来
insert into sys.smon_scn_time select * from smon_scn_time_bak

重启数据库
SYS@orcl> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SYS@orcl> startup
ORACLE instance started.

Total System Global Area 5253640632 bytes
Fixed Size                   750008 bytes
Variable Size            3154116608 bytes
Database Buffers         2097152000 bytes
Redo Buffers                1622016 bytes
Database mounted.
Database opened.

到此问题算是ok，245数据库可以正常使用了

如果受损的是索引，只需alter index .....rebuild online 即可:

1>非分区索引：
对于非分区索引，可以采用删除、重建等方式；重建中，如果不带online,会依据旧的索引（包含坏块的）重建索

   2>分区索引：
     可以采用：
      alter index xxx rebuild paration yyyy；--因为分区索引的重建不会使用旧索引，所以不需要online参数：

查看该分区索引的所在分区：

--查看索引所属表

select table_owner,table_name, partitioned
from dba_indexes
where owner='&OWNER' and index_name='&segment_name';

--查看分区索引所在的分区：

select segment_name, segment_type, owner, tablespace_name
from dba_extents et
where et.file_id = &fileid
and &bad_block_id between et.block_id and et.block_id+blocks-1

如果损坏的是索引，除了处理索引外，还得看看是否有与该索引想关联的约束：

            select owner, constraint_name, constraint_type, table_name
         from dba_constraints
        where r_owner = '&TABLE_OWNER'
          and r_constraint_name = '&INDEX_NAME';

小知识点：

簇的简单说明：

cluster 可以将两个或多个表捆绑在一起,,cluster是一种数据库结构,在这个结构中,可以将两个或者多
个表储存在相同的数据块或段中,加入到cluster中的每个表行将物理地存储在相同的块中,好像这些表在
cluster key 处连接起来了一样

在创建cluster的时候有两种选择:

* index cluster(默认,需要在其中创建cluster index)
* hash cluster

创建cluster的步骤:

创建簇
SQL> create cluster ab_cluster (sex char(2) ) tablespace users;

Cluster created

创建簇表
SQL> create table aaa(id varchar2(4) ,name varchar2(4),sex char(2),age number(2)) cluster ab_cluster (sex);

Table created

SQL> create index ab_cluster_idx on cluster ab_cluster tablespace users;

Index created