Oracle 11g Data Guard环境中的归档管理

Oracle 11g Data Guard环境中的归档管理

在11g里面,随着ASM、RAC、Data Guard(包括Active Data Guard)的成熟,使用RAC+ASM+Data Guard越来越成为一种可靠的、维护简单、稳定的高可用性和容灾保护方案。这篇文章谈谈如何管理Oracle 11g Data Guard环境中的归档日志。

归档日志是重要的,不然就不必提到这篇文章,备份恢复需要它,而Data Guard也需要它。在早期版本的Data Guard环境中,常常面临着归档日志管理问题。在Data Guard环境里面,对归档日志管理需要达到以下几个方面的要求或者说是需求:

  • 不能够随意删除掉归档日志,归档日志丢失会导致Data Guard需要重新搭建。
  • 不能随意使用RMAN删除归档日志,否则同样会导致Data Guard需要重新搭建。
  • 在使用RMAN备份后,如果归档没有被传送或应用到备库上,那么RMAN不应该删除归档日志,否则Data Guard需要的归档就必须从备份里面还原出来,增加了维护工作量。
  • 对RMAN的备份脚本没有特别的要求,否则脚本意外改动,可能会导致Data Guard需要的归档日志被删除。
  • 归档应尽量保存在磁盘上,以避免Data Guard长时间维护时归档被删除。
  • 备库的归档日志不需要花精力去维护,自动删除已经应用过的归档日志。

幸运的是,在11g环境里面,上述的几点很容易就满足,那就是只需要做到以下几点。

  • 使用快速恢复区(fast recovery area),在10g版本的文档中称为闪回恢复区(flash recovery area),老实说,一直不太明白为什么取名叫闪回恢复区,难道是因为10g有了数据库闪回功能?在RAC中,毫无疑问快速恢复区最好是置放在ASM上。
  • 为快速恢复区指定合适的空间。首先我们需要预估一个合理的归档保留时间长。比如由于备份系统问题或Data Guard备库问题、维护等,需要归档保留的时间长度。假设是24小时,再评估一下在归档量最大的24小时之内,会有多少量的归档?一般来说是在批量数据处理的时候归档量最大,假设这24小时之内归档最大为200G。注意对于RAC来说是所有节点在这24小时的归档量之和。最后为快速恢复区指定需要的空间量,比通过参数db_recovery_file_dest_size指定快速恢复区的大小。这里同样假设快速恢复区们存放归档日志。
  • 在备库上指定快速恢复区以及为快速恢复区指定合适的大小,在备库上指定快速恢复区的大小主要考虑的是:切换成为主库后归档日志容量;如果主库归档容量压力大,备库能否存储更多的归档日志以便可以通过备库来备份归档日志。
  • 对主库和备份使用RMAN配置归档删除策略:CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;

完成了上述几个步骤,那么归档管理的要求基本上就达到了。通过这样的设置,实现的效果如下:

  • 归档日志如果没有应用到备库,那么在RMAN中使用backup …. delete inputs alldelete archivelog all不会将归档日志删除。但但是请注意如果是使用delete force命令则会删除掉归档,不管归档有没有被应用到备库。
  • 如果归档日志已经应用到了备库,那么在RMAN中使用backup …. delete inputs alldelete archivelog all可以删除归档日志,在正常情况下,由于归档日志可能很快应用到Data Guard,所以在RMAN备份之后可以正常删除归档日志。RMAN也不需要使用特别的备份脚本,也不必担心人为不小心使用。delete archivelog all命令删除了归档。
  • 备库的归档日志存储到快速恢复区中,备库的快速恢复区空间紧张时,会自动删除已经应用过的较早的归档日志以释放空间,这样便可以实现备库的归档日志完全自动管理。
  • 如果由于备份异常或Data Guard异常,在快速恢复区空间紧张时,Oracle在切换日志时,会自动删除掉已经应用过的归档日志,以释放空间。但是如果归档日志没有应用到Data Guard,那么归档日志不会被删除。这种情况下,快速恢复区的归档可能会增加到空间耗尽,最后就会出现数据库不能归档,数据库挂起的问题。

注意上面最后一点,当快速恢复区空间紧张时,Oracle开始删除归档日志,删除的条件还包括归档日志已经应用到备库,这种情况下如果归档日志还没有备份,也会被删除掉。这里的问题是,文档中描述的快速恢复区空间紧张,具体是指什么时间?也就是快速恢复区的空间消耗多少百分比的时候才算是空间紧张?在MOS文章《Files being deleted in the flash recovery area, messagesin the alert log Deleted Oracle managed file <filename> (Doc ID1369341.1)》里面有提到,空间使用率达到80%以后就开始删除文件(归档日志)。

Oracle在往快速恢复区存储文件时,其步骤大概是这样的:Oracle估计需要的空间大小(切换日志时就是归档日志大小),然后将这个大小与当前的占用空间大小相加,看是否超过了80%,如果超过了,那么就回收空间(回收的空间应大于等于新建文件需要的空间大小,也就是回收的空间以够用为原则)。如果不能回收空间(比如归档日志没有被应用到备库),那就只能继续占用新的空间,直到空间耗尽。

这里的问题是,假设快速恢复区设定了200G空间,那么在使用到80%,也就是160G的时候就开始回收空间。那么我们在估算空间时,就应该上浮20%。比如我们要求保留24小时归档,这24小时之内归档量最大是200G,那么我们应该为快速恢复区设置240G左右的容量。

那么,这个80%的比率能够更改吗以便延迟Oracle删除归档日志的时间吗?答案是肯定的。没有相应的数据库参数来设定,但是可以通过事件来设置,事件号是19823:

oerr ora 19823

19823, 00000, "soft limit recovery area spacepressure percentage"

// *Document: NO

// *Cause: Set on allinstances to alter recovery area space pressure

//         trigger percentage.

// *Action: level 1to 100 indicates the percentage when the space

//          pressure has to be triggered.

下在是一个测试:
测试环境:主库是Oracle 11.2.0.3for Linux两节点RAC,备库是Oracle 11.2.0.3 for linux单实例库。测试是在主库的节点1上进行的,其在线日志大小为512MB,快速恢复区指定的大小为16GB。
当前主库的FRA(快速恢复区)的使用率已经接近于80%:

select * from V$RECOVERY_AREA_USAGE;

FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES

-------------------- ------------------ ------------------------- ---------------

CONTROL FILE                          0                         0               0

REDO LOG                          15.33                         0              13

ARCHIVED LOG                      64.04                     63.81              45

BACKUP PIECE                        .24                         0               1

IMAGE COPY                            0                         0               0

FLASHBACK LOG                         0                         0               0

FOREIGN ARCHIVED LOG                  0                         0               0

在主库上创建一个表,并插入一行数据:

create table t1 ( idnumber, name char(1000)) tablespaceTBS_USERS;

insert into t1 values (1,'a');

然后执行下在的测试代码:

begin

  for i in 1..300000 loop

    update t1 setname=to_char(i);

  end loop;

  commit;

end;

/

这段代码产生的日志量大约在690MB左右:

SQL> select * fromv$sesstat where sid=155 and statistic#=178;

       SID STATISTIC#      VALUE
---------- ---------- ----------
       155        178  696908584

多次执行上述代码,发现FRA(快速恢复区)的空间使用率基本上在80%左右。alert日志也有相应的删除较早的归档日志的信息:

Thu Jan 02 12:28:50 2014

Thread 1 advanced to log sequence 981 (LGWR switch)

  Current log# 12 seq# 981 mem# 0:+DATA1/ractest/onlinelog/group_12.299.835542549

  Current log# 12 seq# 981 mem# 1:+DG_FLA/ractest/onlinelog/group_12.298.835542551

Thu Jan 02 12:28:50 2014

LNS: Standby redo logfile selected for thread 1 sequence 981 for destination LOG_ARCHIVE_DEST_2

Thu Jan 02 12:28:50 2014

Deleted Oracle managed file +DG_FLA/ractest/archivelog/2014_01_02/thread_2_seq_309.424.835783855

Deleted Oracle managed file +DG_FLA/ractest/archivelog/2014_01_02/thread_1_seq_947.426.835783855

Deleted Oracle managed file +DG_FLA/ractest/archivelog/2014_01_02/thread_1_seq_948.437.835784237

Archived Log entry 2645 added for thread 1 sequence 980 ID 0xc8804744 dest 1:

上面的日志也可以看到其过程是:切换日志;删除不需要的最老的归档日志;生成新的归档日志。

现在我们利用事件19823将这个比率调到95%看看会是什么样子:

SQL> alter systemset event='19823 trace name context forever,level 95' scope=spfile sid='*';

然后重启主库。再运行上面的测试代码,发现Oracle不再删除归档日志,而是到接近95%的空间使用率时再开始删除归档日志:

FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES

-------------------- ------------------ ------------------------- ---------------

CONTROL FILE                          0                         0               0

REDO LOG                          15.33                         0              13

ARCHIVED LOG                      68.99                     65.72              49

BACKUP PIECE                        .24                         0               1

IMAGE COPY                            0                         0               0

FLASHBACK LOG                         0                         0               0

FOREIGN ARCHIVED LOG                  0                         0               0

.............

FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES

-------------------- ------------------ ------------------------- ---------------

CONTROL FILE                          0                         0               0

REDO LOG                          15.33                         0              13

ARCHIVED LOG                      78.62                      59.9              55

BACKUP PIECE                        .24                         0               1

IMAGE COPY                            0                         0               0

FLASHBACK LOG                         0                         0               0

FOREIGN ARCHIVED LOG                  0                         0               0

从上面的最后一次对v$recovery_area_usage的查询数据可以看到,此时空间利用率达到了94.19%,离95%已经很接近(在线日志的大小是512MB,占快速恢复区的3.1%,如果在快速恢复区里面多一个文件就会超过95%)。

接下来我们将这个比率调整到50%,看看是什么结果:

SQL> alter systemset event='19823 trace name context forever,level 50' scope=spfile sid='*';

然后重启主库。再运行上面的测试代码,发现Oracle在删除归档日志,但是每次均删除的日志只需要容纳要新增的文件即可,不会一下子删除到使利用率到50%以下:

FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES

-------------------- ------------------ ------------------------- ---------------

CONTROL FILE                          0                         0               0

REDO LOG                          15.33                         0              13

ARCHIVED LOG                      72.47                     48.57              54

BACKUP PIECE                        .24                         0               1

IMAGE COPY                            0                         0               0

FLASHBACK LOG                         0                         0               0

FOREIGN ARCHIVED LOG                  0                         0               0

然后一直使用alter system switch logfile命令,每执行一次,Oracle会删除一个归档日志,到最后快速恢复区的空间利用率到接近于50%。

Thu Jan 02 12:56:29 2014

Thread 1 advanced to log sequence 1004 (LGWR switch)

  Current log# 12 seq# 1004 mem# 0:+DATA1/ractest/onlinelog/group_12.299.835542549

  Current log# 12 seq# 1004 mem# 1:+DG_FLA/ractest/onlinelog/group_12.298.835542551

Thu Jan 02 12:56:30 2014

Deleted Oracle managed file +DG_FLA/ractest/archivelog/2014_01_02/thread_1_seq_963.317.835788195

Thu Jan 02 12:56:30 2014

LNS: Standby redo logfile selected for thread 1 sequence 1004 for destination LOG_ARCHIVE_DEST_2

Archived Log entry 2703 added for thread 1 sequence 1003 ID 0xc8804744 dest 1:

FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES

-------------------- ------------------ ------------------------- ---------------

CONTROL FILE                          0                         0               0

REDO LOG                          15.33                         0              13

ARCHIVED LOG                      33.29                     28.86              65

BACKUP PIECE                        .24                         0               1

IMAGE COPY                            0                         0               0

FLASHBACK LOG                         0                         0               0

FOREIGN ARCHIVED LOG                  0                         0               0   

 出自:http://ju.outofmemory.cn/entry/62897


附加:

<filename> (文档 ID 1369341.1)

In this Document

 Symptoms
 Changes
 Cause
 Solution


APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2.0 and later
Information in this document applies to any platform.
***Checked for relevance on 04-Sep-2013***

SYMPTOMS

Seeing messages like this in the alert log showing files being deleted from the flash recovery area (FRA) and wondering what is happening?

2011-09-30 10:03:43.574000 -05:00
Deleted Oracle managed file +BKUPDG/rpreba/archivelog/2011_09_30/thread_1_seq_11747.266.763204175
Deleted Oracle managed file +BKUPDG/rpreba/archivelog/2011_09_30/thread_1_seq_11751.462.763204177
2011-09-30 10:03:45.876000 -05:00
Deleted Oracle managed file +BKUPDG/rpreba/archivelog/2011_09_30/thread_1_seq_11739.782.763204163
2011-09-30 10:03:47.034000 -05:00
Deleted Oracle managed file +BKUPDG/rpreba/archivelog/2011_09_30/thread_1_seq_11734.748.763204167
2011-09-30 10:05:09.361000 -05:00
Deleted Oracle managed file +BKUPDG/rpreba/archivelog/2011_09_30/thread_1_seq_11735.469.763204169
2011-09-30 10:05:51.057000 -05:00
Deleted Oracle managed file +BKUPDG/rpreba/archivelog/2011_09_30/thread_1_seq_11755.592.763204153
Deleted Oracle managed file +BKUPDG/rpreba/archivelog/2011_09_30/thread_1_seq_11753.627.763204159


They cannot understand why this was done. Space pressure? 
There was 20% free in the FRA before, during, and after we restored the logs. 
Why was Oracle automatically deleting these archivelogs that we needed? 
The db_recovery_file_dest_size is correct and matches the disk group size perfectly.

 

CHANGES

They had restored some archivelogs that had been deleted, as a the Goldengate product needed them.
After restoring the archivelogs they needed, they  noticed that the +FRA diskgroup was still at 20% free. 

All of the requested logs were now on disk again.  Later, though, when trying to access some of the 
very same restored archivelogs, they found they had been mysteriously deleted. 


CAUSE

With Oracle 11.2 and onwards, Oracle will start to purge the files in the FRA when the SPACE_USED reaches 80% of SPACE_LIMIT.

SOLUTION

With Oracle 11.2 and onwards, Oracle will start to purge the files in the FRA when the SPACE_USED reaches 80% of SPACE_LIMIT.

Two queries you can run to look at space usage in the FRA are:

select * from V$FLASH_RECOVERY_AREA_USAGE;

SELECT  substr(name, 1, 30) name
      , space_limit/(1073741824) AS Quota_GB
      , space_used/(1073741824)  AS Used_GB
      , space_reclaimable/(1073741824) AS Reclaimable_GB
      , number_of_files AS files
  FROM V$RECOVERY_FILE_DEST ;





  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值