引题:朋友一时兴起使用了rm**,删除了oracle数据文件后找我帮忙,我在帮朋友恢复数据库时,遇到了当recover时,报错不能找到28739号归档日志,这样我就不能同步scn,更不能打开数据库了。这是归档日志不连续的典型案例,我最后告诉他要做好心理准备。事情还没有完,这个真实案例引发了我的思考,如果当时在朋友没有做rman拯救措施的情况下,可不可能不使用rman即可恢复数据文件呢!最后我找到了答案:)

案例

1.系统solaris SunOS TJLT-YDWG6 5.9 Generic_122300-25 sun4u sparc SUNW,Sun-Fire-V890

DB  OracleDatabase 10g Enterprise Edition Release 10.2.0.1.0 - 64bi

2.案情描述

现场工程师使用了rm -rf *.dbf命令把所有的数据文件全部删除了

现在有5月4日的备份

restore database until time "to_date('2012-05-04 12:00:00','yyyy-mm-dd hh24:mi:ss')";  进行恢复显示finish restore complete没有问题已经把文件 复制回来了

进行同步

RMAN> recover database until time "to_date('2012-05-04 11:00:00','yyyy-mm-dd hh24:mi:ss')";                

Starting recover at 2012-07-26 14:02:42

using channel ORA_DISK_1

starting media recovery

unable to find archive log

archive log thread=1 sequence=28739  缺少28739号的归档日志,导致undotbs01.dbf文件不一致

Oracle Error:

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 2 needs more recovery to be consistent

ORA-01110: data file 2: '/opt/oradata/kpidb/undotbs01.dbf'

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of recover command at 07/26/2012 14:02:51

RMAN-06054: media recovery requesting unknown log: thread 1 seq 28739 lowscn 1513525474

Leonarding

2012.7.26

在我们工作中可能会经常发生这样类似的突发状况,在遇到此情况下首先要做的就是冷静,上面发生的问题到了我这里之后,我就发现数据库已经变成了mount状态,在使用文件句柄方式恢复数据文件已经为时已晚,所以我采用了常规的恢复方式,没想到啊没想到,归档日志还不全,立马我整个人都“斯巴达”了,最后告诉朋友做DBA是需要勇气的。

下面我用自己的测试库演示一下操作系统rm级别的删除数据文件后,数据库仍然处于open状态的时候使用文件句柄来恢复被rm删除的数据文件,并最终顺利打开数据库的实验!我是一个比较严谨的人,所以一上来我先做个了“压缩全库备份”做到有备无患

描述一下场景:

操作系统:Enterprise Linux Enterprise Linux AS release 4 (October Update 8)

数据库版本:Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

一、备份重于一切

因此在演示之前我们先做一下备份

RMAN> show all;

RMAN配置参数区,如下都是默认值

RMAN configuration parameters are:

冗余配置保留政策:冗余数是1

CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default

开启增量备份:关

CONFIGURE BACKUP OPTIMIZATION OFF; # default

默认备份设备是磁盘

CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default

控制文件自动备份:关

CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default

控制文件自动备份目录和格式:%F 【备份设备:Disk】

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default

备份的并行度:1,备份类型为备份集

CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default

数据文件采用复制方式备份

CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default

归档日志采用复制方式备份

CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default

最大值:无限制

CONFIGURE MAXSETSIZE TO UNLIMITED; # default

加密数据库:关

CONFIGURE ENCRYPTION FOR DATABASE OFF; # default

加密算法采用AES128

CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default

归档日志删除策略:空

CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default

快照控制文件名

CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/10.2.0/db_1/dbs/snapcf_LEO.f'; # default

配置RMAN默认备份介质存放目录到/home/oracle/backup

RMAN> configure channel device type disk format '/home/oracle/backup/DB_LEO_%U';

new RMAN configuration parameters:

CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT  '/home/oracle/backup/DB_LEO_%U';

new RMAN configuration parameters are successfully stored

配置控制文件自动备份并保存到/home/oracle/backup/control目录

RMAN> configure controlfile autobackup on;             启动控制文件自动备份

new RMAN configuration parameters:

CONFIGURE CONTROLFILE AUTOBACKUP ON;

new RMAN configuration parameters are successfully stored

RMAN> configure controlfile autobackup format for device type disk to 设置控制文件自动备份目录和格式

'/home/oracle/backup/control/cf_LEO_%F';

new RMAN configuration parameters:

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO

'/home/oracle/backup/control/cf_LEO_%F';

new RMAN configuration parameters are successfully stored

显示配置后RMAN环境变量

RMAN configuration parameters are:

CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default

CONFIGURE BACKUP OPTIMIZATION OFF; # default

CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default

CONFIGURE CONTROLFILE AUTOBACKUP ON;

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/home/oracle/backup/control/cf_LEO_%F';

CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default

CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default

CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default

CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT  '/home/oracle/backup/DB_LEO_%U';

CONFIGURE MAXSETSIZE TO UNLIMITED; # default

CONFIGURE ENCRYPTION FOR DATABASE OFF; # default

CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default

CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default

CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/10.2.0/db_1/dbs/snapcf_LEO.f'; # default

使用rman命令压缩备份数据库

RMAN> backup as compressed backupset full database format

2> '/home/oracle/backup/full_bk1_%u%p%s.rmn' include current controlfile

3> plus

4> archivelog format '/home/oracle/backup/arch_bk1_%u%p%s.rmn' delete all input;(删除备份过的旧归档日志)

Starting backup at 26-JUL-12

current log archived

allocated channel: ORA_DISK_1

channel ORA_DISK_1: sid=318 devtype=DISK

channel ORA_DISK_1: starting compressed archive log backupset

channel ORA_DISK_1: specifying archive log(s) in backup set

input archive log thread=1 sequence=5 recid=1 stamp=784059130

input archive log thread=1 sequence=6 recid=6 stamp=788723549

input archive log thread=1 sequence=7 recid=4 stamp=788723547

input archive log thread=1 sequence=8 recid=5 stamp=788723548

channel ORA_DISK_1: starting piece 1 at 26-JUL-12

channel ORA_DISK_1: finished piece 1 at 26-JUL-12

piece handle=/home/oracle/backup/arch_bk1_01nh3iuq11.rmn tag=TAG20120726T235048 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:16

channel ORA_DISK_1: deleting archive log(s)

archive log filename=/home/oracle/arch/LEO/1_5_784052660.dbf recid=1 stamp=784059130

archive log filename=/home/oracle/arch/LEO/1_6_784052660.dbf recid=6 stamp=788723549

archive log filename=/home/oracle/arch/LEO/1_6_784052660.dbf recid=2 stamp=784059169

archive log filename=/home/oracle/arch/LEO/1_7_784052660.dbf recid=4 stamp=788723547

archive log filename=/home/oracle/arch/LEO/1_7_784052660.dbf recid=3 stamp=784059221

archive log filename=/home/oracle/arch/LEO/1_8_784052660.dbf recid=5 stamp=788723548

channel ORA_DISK_1: starting compressed archive log backupset

channel ORA_DISK_1: specifying archive log(s) in backup set

input archive log thread=1 sequence=1 recid=8 stamp=789695445

channel ORA_DISK_1: starting piece 1 at 26-JUL-12

channel ORA_DISK_1: finished piece 1 at 26-JUL-12

piece handle=/home/oracle/backup/arch_bk1_02nh3ivb12.rmn tag=TAG20120726T235048 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:09

channel ORA_DISK_1: deleting archive log(s)

archive log filename=/home/oracle/arch/LEO/1_1_788725257.dbf recid=8 stamp=789695445

channel ORA_DISK_1: starting compressed archive log backupset

channel ORA_DISK_1: specifying archive log(s) in backup set

input archive log thread=1 sequence=1 recid=7 stamp=788725258

channel ORA_DISK_1: starting piece 1 at 26-JUL-12

channel ORA_DISK_1: finished piece 1 at 26-JUL-12

piece handle=/home/oracle/backup/arch_bk1_03nh3ivk13.rmn tag=TAG20120726T235048 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02

channel ORA_DISK_1: deleting archive log(s)

archive log filename=/home/oracle/arch/LEO/1_1_788723547.dbf recid=7 stamp=788725258

Finished backup at 26-JUL-12


Starting backup at 26-JUL-12

using channel ORA_DISK_1

channel ORA_DISK_1: starting compressed full datafile backupset

channel ORA_DISK_1: specifying datafile(s) in backupset

input datafile fno=00001 name=/u01/app/oracle/oradata/LEO/file1/system01.dbf

input datafile fno=00003 name=/u01/app/oracle/oradata/LEO/file1/sysaux01.dbf

input datafile fno=00002 name=/u01/app/oracle/oradata/LEO/file1/undotbs01.dbf

input datafile fno=00005 name=/u01/app/oracle/oradata/LEO/file1/example01.dbf

input datafile fno=00004 name=/u01/app/oracle/oradata/LEO/file1/users01.dbf

channel ORA_DISK_1: starting piece 1 at 26-JUL-12

channel ORA_DISK_1: finished piece 1 at 26-JUL-12

piece handle=/home/oracle/backup/full_bk1_04nh3ivo14.rmn tag=TAG20120726T235119 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:04:56

channel ORA_DISK_1: starting compressed full datafile backupset

channel ORA_DISK_1: specifying datafile(s) in backupset

including current control file in backupset

channel ORA_DISK_1: starting piece 1 at 26-JUL-12

channel ORA_DISK_1: finished piece 1 at 26-JUL-12

piece handle=/home/oracle/backup/full_bk1_05nh3j9015.rmn tag=TAG20120726T235119 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02

Finished backup at 26-JUL-12


Starting backup at 26-JUL-12

current log archived

using channel ORA_DISK_1

channel ORA_DISK_1: starting compressed archive log backupset

channel ORA_DISK_1: specifying archive log(s) in backup set

input archive log thread=1 sequence=2 recid=9 stamp=789695779

channel ORA_DISK_1: starting piece 1 at 26-JUL-12

channel ORA_DISK_1: finished piece 1 at 26-JUL-12

piece handle=/home/oracle/backup/arch_bk1_06nh3j9316.rmn tag=TAG20120726T235619 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02

channel ORA_DISK_1: deleting archive log(s)

archive log filename=/home/oracle/arch/LEO/1_2_788725257.dbf recid=9 stamp=789695779

Finished backup at 26-JUL-12


Starting Control File and SPFILE Autobackup at 26-JUL-12

piece handle=/home/oracle/backup/control/cf_LEO_c-1558319476-20120726-00 comment=NONE

Finished Control File and SPFILE Autobackup at 26-JUL-12

ll       看一下我们的备份文件已经生成,下面可以安心做实验啦

-rw-r----- 1 oracle oinstall 11180032 Jul 26 23:51 arch_bk1_01nh3iuq11.rmn

-rw-r----- 1 oracle oinstall 7031808 Jul 26 23:51 arch_bk1_02nh3ivb12.rmn

-rw-r----- 1 oracle oinstall 1030144 Jul 26 23:51 arch_bk1_03nh3ivk13.rmn

-rw-r----- 1 oracle oinstall    8704 Jul 26 23:56 arch_bk1_06nh3j9316.rmn

drwxr-xr-x 2 oracle oinstall    4096 Jul 26 23:56 control

-rw-r----- 1 oracle oinstall 57221120 Jul 26 23:56 full_bk1_04nh3ivo14.rmn

-rw-r----- 1 oracle oinstall 1097728 Jul 26 23:56 full_bk1_05nh3j9015.rmn

二、模拟数据文件删除

SYS@LEO> select status from v$instance;  查看数据库是否是open状态,是open没有问题

STATUS

------------

OPEN

SYS@LEO> select name from v$datafile;   查看存在的数据文件

NAME

--------------------------------------------------------------------------------

/u01/app/oracle/oradata/LEO/file1/system01.dbf

/u01/app/oracle/oradata/LEO/file1/undotbs01.dbf

/u01/app/oracle/oradata/LEO/file1/sysaux01.dbf

/u01/app/oracle/oradata/LEO/file1/users01.dbf

/u01/app/oracle/oradata/LEO/file1/example01.dbf

SYS@LEO> host rm /u01/app/oracle/oradata/LEO/file1/example01.dbf 删除example01.dbf数据文件

我们检查一下删除example01.dbf文件后数据库是不是还处在open状态

SYS@LEO> select status from v$instance;   呵呵 就像我们在开始说的一下,open木有问题

STATUS

------------

OPEN

我们检查一下删除example01.dbf文件后表空间处于什么状态呢?

SYS@LEO> select TABLESPACE_NAME,status from dba_tablespaces;

TABLESPACE_NAME               STATUS

------------------------------ ---------

SYSTEM                        ONLINE

UNDOTBS                       ONLINE

SYSAUX                         ONLINE

TEMPTS1                       ONLINE

USERS                          ONLINE

TEMP1                         ONLINE

TEMP2                         ONLINE

EXAMPLE                       ONLINE     我们看到EXAMPLE还是联机状态,为什么是这样呢,我们不是已经把文件删除了嘛!对喽可能有人已经想到了,我们是从操作系统层面上直接删除的(没有通过数据库删除),所以数据库此时还认为这个文件没有变化,数据字典中记录的还是完好的状态

conn ls/ls                                          切换用户

数据文件被删除,所以创建表失败(当我们往数据文件上写数据时,数据库才会检查文件状态)

LS@LEO> create table t1 tablespace example as select * from liusheng;

create table t1 tablespace example as select * from liusheng

ERROR at line 1:

ORA-01116: error in opening database file 5

ORA-01110: data file 5: '/u01/app/oracle/oradata/LEO/file1/example01.dbf'

ORA-27041: unable to open file

Linux Error: 2: No such file or directory

Additional information: 3

注:如果我们在其他数据文件上创建表时,是可以正常创建的,因为其他数据文件没有被破坏

例LS@LEO> create table t2 tablespace users as select * from liusheng;

Table created.

三、通过文件句柄恢复数据文件

查询dbwr进程pid

[oracle@secdb1 bdump]$ ps -ef|grep dbw|grep -v grep

oracle  26745    1 0 Jul26 ?       00:00:01 ora_dbw0_LEO

注:1.可能有的朋友对 grep –v grep这句不是很明白,-v选项反向过滤oracle   2405  421 0 01:03 pts/9   00:00:00 grep dbw这行记录的意思,只保留ora_dbw0_LEO这行记录,方便观察

2. dbwr进程会打开所有访问数据文件的句柄。在proc目录中可以查到,目录名是进程PID,fd(file discription)表示文件描述符。

cd /proc/26745/fd

ls -l

lr-x------ 1 oracle oinstall 64 Jul 27 01:09 0 -> /dev/null

lr-x------ 1 oracle oinstall 64 Jul 27 01:09 1 -> /dev/null

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 10 -> /u01/app/oracle/admin/LEO/adump/ora_26737.aud

lr-x------ 1 oracle oinstall 64 Jul 27 01:09 11 -> /dev/zero

lr-x------ 1 oracle oinstall 64 Jul 27 01:09 12 -> /dev/zero

lr-x------ 1 oracle oinstall 64 Jul 27 01:09 13 -> /u01/app/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 14 -> /u01/app/oracle/product/10.2.0/db_1/dbs/hc_LEO.dat

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 15 -> /u01/app/oracle/product/10.2.0/db_1/dbs/lkLEO

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 16 -> /u01/app/oracle/oradata/LEO/file1/control01.ctl

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 17 -> /u01/app/oracle/oradata/LEO/file2/control02.ctl

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 18 -> /u01/app/oracle/oradata/LEO/file3/control03.ctl

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 19 -> /u01/app/oracle/oradata/LEO/file1/system01.dbf

lr-x------ 1 oracle oinstall 64 Jul 27 01:09 2 -> /dev/null

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 20 -> /u01/app/oracle/oradata/LEO/file1/undotbs01.dbf

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 21 -> /u01/app/oracle/oradata/LEO/file1/sysaux01.dbf

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 22 -> /u01/app/oracle/oradata/LEO/file1/users01.dbf

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 23 -> /u01/app/oracle/oradata/LEO/file1/example01.dbf (deleted)

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 24 -> /u01/app/oracle/oradata/LEO/file1/temp01.dbf

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 25 -> /u01/app/oracle/oradata/LEO/file1/temp1_01.dbf

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 26 -> /u01/app/oracle/oradata/LEO/file1/temp1_02.dbf

lr-x------ 1 oracle oinstall 64 Jul 27 01:09 3 -> /dev/null

lr-x------ 1 oracle oinstall 64 Jul 27 01:09 4 -> /dev/null

l-wx------ 1 oracle oinstall 64 Jul 27 01:09 5 -> /u01/app/oracle/admin/LEO/udump/leo_ora_26737.trc

l-wx------ 1 oracle oinstall 64 Jul 27 01:09 6 -> /u01/app/oracle/admin/LEO/bdump/alert_LEO.log

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 7 -> /u01/app/oracle/product/10.2.0/db_1/dbs/lkinstLEO (deleted)

l-wx------ 1 oracle oinstall 64 Jul 27 01:09 8 -> /u01/app/oracle/admin/LEO/bdump/alert_LEO.log

lrwx------ 1 oracle oinstall 64 Jul 27 01:09 9 -> /u01/app/oracle/product/10.2.0/db_1/dbs/hc_LEO.dat

注:“23 -> /u01/app/oracle/oradata/LEO/file1/example01.dbf (deleted)”被删除数据文件会被标示(deleted),我们使用的是redhat linux,如果在solaris系统中使用lsof命令查看句柄和文件对应关系,我们通过copy文件句柄方式恢复数据文件到原位置

cp /proc/26745/fd/23 /u01/app/oracle/oradata/LEO/file1/example01.dbf

确认example01.dbf数据文件已经恢复成功

ll /u01/app/oracle/oradata/LEO/file1/example01.dbf

-rw-r----- 1 oracle oinstall 209723392 Jul 27 01:22 /u01/app/oracle/oradata/LEO/file1/example01.dbf

四、数据文件recover

LS@LEO> alter database datafile 5 offline; 先把example01.dbf文件脱机

Database altered.

LS@LEO> recover datafile 5;                  完成介质恢复,实质同步控制文件、redo日志scn号

Media recovery complete.

LS@LEO> alter database datafile 5 online; 再把example01.dbf文件联机

Database altered.

五、测试是否可以正常创建表t1

LS@LEO> create table t1 tablespace example as select * from liusheng;

Table created.                                                                                              成功创建

LS@LEO> select * from t1;

 ORDERID NAME      LS_DATE

---------- ---------- -------------------

       1 ls1       1981-01-02 00:00:00

       1 ls2       1998-01-03 00:00:00

       1 ls3       1999-01-04 00:00:00

       1 ls4       2000-01-05 00:00:00

       1 ls5       2000-01-06 00:00:00

       1 ls6       2001-01-07 00:00:00

       1 ls7       2001-01-08 00:00:00

       1 ls8       2002-01-09 00:00:00

       1 ls9       2002-01-10 00:00:00

       1 ls10      2011-01-11 00:00:00

10 rows selected.

小结:当我们在Linux系统中不小心rm了数据文件时,一定要冷静,不要做关闭数据库、重启操作系统等危险操作,因为在不了解数据库运行状态的前提下做这些往往是徒劳的,还可能造成无法挽回的后果。此时我们不妨在数据库open状态下使用文件句柄方式来恢复被我们rm掉的数据文件。因为只要数据文件被某个进程使用着,那么这个进程就会一直持有这个数据文件的句柄,那么所指向的数据文件依然可以读写,我们可以从proc->pid->fd目录中找到被删除的数据文件句柄(deleted)copy到原位置来恢复,这也是在恢复窗口时间很短的情况下快速恢复的好方法。


oracle视频教程请关注:http://u.youku.com/user_video/id_UMzAzMjkxMjE2.html