Oracle 数据库归档缺失,介质恢复失败,如何处理?

图片.png

数据库版本:

Oracle 11.2.0.4.0

问题现象:

cjc01.dbf数据文件损坏,通过RMAN进行恢复,由于缺失需要的归档文件,导致介质恢复失败,数据库无法OPEN。
OPEN报错如下:

ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/oracle/app/oracle/oradata/cjc/cjc01.dbf'

介质恢复报错如下,由于缺失thread 1 sequence 15 archived log,导致介质恢复失败:

RMAN> recover datafile 5;

Starting recover at 2024-10-19 11:09:42
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 16 is already on disk as file /arch/cjc_1_16_1177245832.arc
archived log for thread 1 with sequence 17 is already on disk as file /arch/cjc_1_17_1177245832.arc
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 10/19/2024 11:09:42
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 15 and starting SCN of 1128934 found to restore

问题分析:

数据文件损坏,通常需要RMAN+归档进行完全恢复,如果需要的归档文件同时损坏或有缺失,介质恢复一般无法完成,数据库不能正常启动,如何在归档缺失的情况下强制完成介质恢复,启动数据库呢?

解决方案:

由于数据库启动时,会检查每个数据文件头的 checkpoint scn,通过备份restore的数据文件cjc01.dbf 的 checkpoint scn比其他数据文件小,所以需要介质恢复,那么通过bbed等工具强制修改cjc01.dbf数据文件头部的checkpoint scn,改成和其他数据文件相同,是否能欺骗数据库完成介质恢复,启动数据库呢?

实验过程如下:

1.归档模式

SQL> archive log list;
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       /arch
Oldest online log sequence     13
Next log sequence to archive   15
Current log sequence	       15

2.创建测试数据

SQL> create tablespace cjc datafile '/oracle/app/oracle/oradata/cjc/cjc01.dbf' size 10M autoextend on;
SQL> create user cjc identified by "a" default tablespace cjc;
SQL> grant dba to cjc;
SQL> conn cjc/a
SQL> create table t1(id int);
SQL> insert into t1 values(1);
SQL> insert into t1 values(2);
SQL> commit;
SQL> select * from t1;
	ID
----------
	 1
	 2

3.rman备份数据文件cjc01.dbf

SQL> col name for a50
SQL> select * from v$dbfile order by 1;
     FILE# NAME
---------- --------------------------------------------------
	 1 /oracle/app/oracle/oradata/cjc/system01.dbf
	 2 /oracle/app/oracle/oradata/cjc/sysaux01.dbf
	 3 /oracle/app/oracle/oradata/cjc/undotbs01.dbf
	 4 /oracle/app/oracle/oradata/cjc/users01.dbf
	 5 /oracle/app/oracle/oradata/cjc/cjc01.dbf

[oracle@cjc-db-01 ~]$ mkdir /oradata/back
[oracle@cjc-db-01 ~]$ rman target /
RMAN> backup datafile 5 format '/oradata/back/rman_5_cjc01_%U';
...
input datafile file number=00005 name=/oracle/app/oracle/oradata/cjc/cjc01.dbf
...
piece handle=/oradata/back/rman_5_cjc01_0137v7oe_1_1 tag=TAG20241019T105510 comment=NONE
...

[oracle@cjc-db-01 ~]$ ls -lrth /oradata/back/
total 1.2M
-rw-r----- 1 oracle oinstall 1.2M Oct 19 10:55 rman_5_cjc01_0137v7oe_1_1

4.查看归档序列号

select sequence#,status from v$archived_log;
 SEQUENCE# S
---------- -
......
	13 A
	14 A
11 rows selected.

5.归档切换

alter system switch logfile;

6.新增数据和表

SQL> conn cjc/a
SQL> select * from t1;
SQL> insert into t1 values(3);
SQL> commit;
SQL> create table t2 as select * from t1;

7.归档切换

alter system switch logfile;
alter system switch logfile;

8.查看归档序列号

select sequence#,status from v$archived_log;
 SEQUENCE# S
---------- -
......
	13 A
	14 A
	15 A
	16 A
	17 A

14 rows selected.

9.模拟故障:

(1)停库

SQL> conn / as sysdba
SQL> shutdown immediate

(2)数据文件丢失;

[oracle@cjc-db-01 cjc]$ pwd
/oracle/app/oracle/oradata/cjc
[oracle@cjc-db-01 cjc]$ mv cjc01.dbf cjc01.dbf.bak

(3)归档丢失

[oracle@cjc-db-01 arch]$ pwd
/arch
[oracle@cjc-db-01 arch]$ ls -lrth
total 323M
......
-rw-r----- 1 oracle oinstall  50M Oct 13 13:00 cjc_1_13_1177245832.arc
-rw-r----- 1 oracle oinstall  13M Oct 13 15:53 cjc_1_14_1177245832.arc
-rw-r----- 1 oracle oinstall  29M Oct 19 10:59 cjc_1_15_1177245832.arc
-rw-r----- 1 oracle oinstall  21K Oct 19 11:01 cjc_1_16_1177245832.arc
-rw-r----- 1 oracle oinstall 2.5K Oct 19 11:01 cjc_1_17_1177245832.arc

[oracle@cjc-db-01 arch]$ mv cjc_1_15_1177245832.arc cjc_1_15_1177245832.arc.bak

10.启动数据库失败

SQL> startup
ORACLE instance started.

Total System Global Area  563691520 bytes
Fixed Size		    2255232 bytes
Variable Size		  222299776 bytes
Database Buffers	  335544320 bytes
Redo Buffers		    3592192 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/oracle/app/oracle/oradata/cjc/cjc01.dbf'

11.恢复数据库失败,缺失需要的thread 1 sequence 15 archived log

[oracle@cjc-db-01 arch]$ rman target /
RMAN> restore datafile 5;
RMAN> recover datafile 5;

Starting recover at 2024-10-19 11:09:42
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 16 is already on disk as file /arch/cjc_1_16_1177245832.arc
archived log for thread 1 with sequence 17 is already on disk as file /arch/cjc_1_17_1177245832.arc
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 10/19/2024 11:09:42
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 15 and starting SCN of 1128934 found to restore

无法 open 数据库

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/oracle/app/oracle/oradata/cjc/cjc01.dbf'

12.检查控制文件记录的CHECKPOINT_CHANGE#,数据文件头记录的CHECKPOINT_CHANGE#

SQL> select file#,status,CHECKPOINT_CHANGE# from v$datafile;

     FILE# STATUS  CHECKPOINT_CHANGE#
---------- ------- ------------------
	 1 SYSTEM	      1137224
	 2 ONLINE	      1137224
	 3 ONLINE	      1137224
	 4 ONLINE	      1137224
	 5 ONLINE	      1137224

可以看到,FILE#=5 CHECKPOINT_CHANGE# 为1136379,小于其他文件的1137224

SQL> select file#,status,recover,CHECKPOINT_CHANGE# from v$datafile_header;

     FILE# STATUS  REC CHECKPOINT_CHANGE#
---------- ------- --- ------------------
	 1 ONLINE  NO		  1137224
	 2 ONLINE  NO		  1137224
	 3 ONLINE  NO		  1137224
	 4 ONLINE  NO		  1137224
	 5 ONLINE  YES		  1136379

查看日志序列号

SQL> select thread#,group#,sequence#,FIRST_CHANGE#,NEXT_CHANGE#,status from v$log;

   THREAD#     GROUP#  SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# STATUS
---------- ---------- ---------- ------------- ------------ ----------------
	 1	    1	      16       1136991	    1137048 INACTIVE
	 1	    3	      18       1137051	 2.8147E+14 CURRENT
	 1	    2	      17       1137048	    1137051 INACTIVE

将 1136379 和 1137224 转换为16进制。

select to_char(1136379,'XXXXXXXXXXXX'),to_char(1137224,'XXXXXXXXXXXX') from dual;
TO_CHAR(11363 TO_CHAR(11372
------------- -------------
       1156FB	     115A48

13.BBED修改文件头检查点SCN和RBA

[oracle@cjc-db-01 ~]$ bbed

BBED: Release 2.0.0.0.0 - Limited Production on Sat Oct 19 12:27:27 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

查看数据文件

BBED> info
 File#  Name                                                        Size(blks)
 -----  ----                                                        ----------
     1  /oracle/app/oracle/oradata/cjc/system01.dbf                      96000
     2  /oracle/app/oracle/oradata/cjc/sysaux01.dbf                      66560
     3  /oracle/app/oracle/oradata/cjc/undotbs01.dbf                      8960
     4  /oracle/app/oracle/oradata/cjc/users01.dbf                         640
     5  /oracle/app/oracle/oradata/cjc/cjc01.dbf                          1280

查看 file 1 文件头检查点SCN,和 v$datafile_header 查询值相同

BBED> set file 1 block 1 
BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x00115a48   ---16进制转成10进制是 1137224
      ub2 kscnwrp                           @488      0x0000

查看 file 2 文件头检查点SCN,和 v$datafile_header 查询值相同

BBED> set file 2 block 1 
BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x00115a48   ---16进制转成10进制是 1137224
      ub2 kscnwrp                           @488      0x0000

查看 file 5 文件头检查点SCN,和 v$datafile_header 查询值相同,和 file 1,file 2值不同

BBED> set file 5 block 1
BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x001156fb   ---16进制转成10进制是 1136379
      ub2 kscnwrp                           @488      0x0000

将 file 5 block 1 的 kscnbas 修改为 0x00115a48

BBED> modify /x 485a1100 offset 484 
BBED> p kcvfhckp
BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x00115a48
      ub2 kscnwrp                           @488      0x0000

查看file 1文件头检查点RBA

BBED> set file 1 block 1 
BBED> p kcvfhckp
......   
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x00000012
         ub4 kcrbabno                       @504      0x0000008d
         ub2 kcrbabof                       @508      0x0010

查看file 2文件头检查点RBA,和 file 1相同

BBED> set file 2 block 1 
BBED> p kcvfhckp
......
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x00000012
         ub4 kcrbabno                       @504      0x0000008d
         ub2 kcrbabof                       @508      0x0010

查看file 5文件头检查点RBA,和file 1,file 2不同

BBED> set file 5 block 1
BBED> p kcvfhckp
......
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x0000000f
         ub4 kcrbabno                       @504      0x0000de69
         ub2 kcrbabof                       @508      0x0010

将 file 5文件头检查点RBA改成和file 1,file 2相同
将 file 5 block 1 的 kcrbaseq 改成 0x00000012
将 file 5 block 1 的 kcrbabno 改成 0x0000008d

BBED> modify /x 12 offset 500 
BBED> modify /x 8d00 offset 504
BBED>  p kcvfhckp
......
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x00000012
         ub4 kcrbabno                       @504      0x0000008d
         ub2 kcrbabof                       @508      0x0010
......

提交修改

BBED> sum apply

14.再次执行介质恢复,成功

[oracle@cjc-db-01 arch]$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Sat Oct 19 12:56:01 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CJC (DBID=3819394245, not open)

RMAN> recover datafile 5;

Starting recover at 2024-10-19 12:56:09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=18 device type=DISK

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 2024-10-19 12:56:10

检查数据文件头CHECKPOINT_CHANGE#,已修改为1137224

SQL> select file#,status,recover,CHECKPOINT_CHANGE# from v$datafile_header;

     FILE# STATUS  REC CHECKPOINT_CHANGE#
---------- ------- --- ------------------
	 1 ONLINE  NO		  1137224
	 2 ONLINE  NO		  1137224
	 3 ONLINE  NO		  1137224
	 4 ONLINE  NO		  1137224
	 5 ONLINE  NO		  1137224

15.可以正常启动数据库

SQL> alter database open;
Database altered

16.数据验证

查询,cjc.t1表丢失1条数据,cjc.t2表全部丢失,也就是15 号归档及以后所有数据都全部丢失

SQL> select * from cjc.t1;

	ID
----------
	 1
	 2

SQL> select * from cjc.t2;
select * from cjc.t2
                  *
ERROR at line 1:
ORA-08103: object no longer exists

SQL> select * from cjc.t100;
select * from cjc.t100
                  *
ERROR at line 1:
ORA-00942: table or view does not exist

查看 ORA-08103 错误说明:

[oracle@cjc-db-01 ~]$ oerr ora 08103
08103, 00000, "object no longer exists"
// *Cause:  The object has been deleted by another user since the operation
//          began, or a prior incomplete recovery restored the database to
//          a point in time during the deletion of the object.
// *Action: Delete the object if this is the result of an incomplete
//          recovery.

翻译如下:
自操作开始以来,该对象已被其他用户删除,或者之前的不完全恢复将数据库还原到删除该对象期间的某个时间点。

The object has been deleted by another user since the operation began, or a prior incomplete recovery restored the database to a point in time during the deletion of the object.

如果这是不完全恢复的结果,请删除该对象。

Delete the object if this is the result of an incomplete recovery.

可以看到该表

set line 300
col owner for a10
col object_name for a10
col created for a25
col LAST_DDL_TIME for a25
select owner,object_name,created,LAST_DDL_TIME  from dba_objects where owner='CJC';
OWNER	   OBJECT_NAM CREATED			LAST_DDL_TIME
---------- ---------- ------------------------- -------------------------
CJC	   T1	      2024-10-19 10:51:12	2024-10-19 10:51:12
CJC	   T2	      2024-10-19 10:59:52	2024-10-19 10:59:52

SQL> conn cjc/a
SQL> select * from tab;

TNAME			       TABTYPE	CLUSTERID
------------------------------ ------- ----------
T1			       TABLE
T2			       TABLE

SQL> select * from t1;

	ID
----------
	 1
	 2

无法查询数据

SQL> select * from t2;
select * from t2
              *
ERROR at line 1:
ORA-08103: object no longer exists

可以删除t2

SQL> drop table t2 purge;
Table dropped.

其他:
除了v$datafile_header 和 bbed 可以查询 数据文件头 CHECKPOINT_CHANGE# 和 RBA以外,Dump Data File Header也可以查询到相同的信息,方法如下:

SQL> select spid from v$process where addr in (select paddr from v$session where sid = (select sid from v$mystat where rownum=1)); 
SPID
------------------------
4870

SQL> alter session set events 'immediate trace name FILE_HDRS level 10';

[oracle@cjc-db-01 trace]$ ls -lrth *4870*
-rw-r----- 1 oracle oinstall 186 Oct 19 11:13 cjc_ora_4870.trm
-rw-r----- 1 oracle oinstall 39K Oct 19 11:13 cjc_ora_4870.trc

查看trace,可以看到 CHECKPOINT_CHANGE# , RBA 和上面查询的值一致。

[oracle@cjc-db-01 trace]$ vi cjc_ora_4870.trc
......
DUMP OF DATA FILES: 5 files in database

DATA FILE #1:
  name #7: /oracle/app/oracle/oradata/cjc/system01.dbf
.....
Checkpointed at scn:  0x0000.00115a48 10/19/2024 11:05:37  ---CHECKPOINT_CHANGE#
 thread:1 rba:(0x12.8d.10)                                 ---RBA
......

DATA FILE #5:
  name #9: /oracle/app/oracle/oradata/cjc/cjc01.dbf
.....
Checkpointed at scn:  0x0000.001156fb 10/19/2024 10:55:10   ---CHECKPOINT_CHANGE#
 thread:1 rba:(0xf.de69.10)                                 ---RBA
......

注意:
此方法仅供参考,有丢失数据的风险,请勿用于生产环境。
###chenjuchao 20241019###
欢迎关注我的公众号《IT小Chen》

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值