在Oracle体系中,Online Redo Log、Archived Redo Log、Check Point和Switch Log动作之间是存在着千丝万缕的关系。作为我们非理论研究者而言,过多的强调技术细节意义不是很大。但是在实际中,进行诊断和问题解决的时候,对原理的理解还是有一定的作用的。
本篇从两个场景入手,研究一下Redo Log切换过程中,check point和相关状态情况。
1、实验环境介绍
本篇实验在Oracle 11R2上进行测试。
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
当前为归档archive模式,三个日志组。
SQL> select log_mode from v$database;
LOG_MODE
------------
ARCHIVELOG
SQL> select group#, sequence#, archived, status, first_change# from v$log;
GROUP# SEQUENCE# ARCHIVED STATUS FIRST_CHANGE#
---------- ---------- -------- ---------------- -------------
1 139 YES INACTIVE 3466294
2 140 NO CURRENT 3466703
3 138 YES INACTIVE 3464619
我们通过alter log来监控系统行为。我们可以使用log_checkpoints_to_alert来控制检查点发出的时候,默认是不写入到alert_log中。
SQL> show parameter log_checkpoints;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_checkpoints_to_alert boolean FALSE
SQL> alter system set log_checkpoints_to_alert=true;
System altered
SQL> show parameter log_checkpoints;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_checkpoints_to_alert boolean TRUE
2、三种常见的Checkpoint行为
我们经常能够看到的Checkpoint一共有三种。分别为:全局检查点(Global Checkpoint)、增量检查点(Incremental Checkpoint)和日志切换检查点(Switch Log Checkpoint)。
当我们发出alter system checkpoint或者启动关闭的时候,会有主动检查点动作。
SQL> alter system checkpoint;
System altered
Sun Aug 04 03:52:53 2013
Beginning global checkpoint up to RBA [0x8c.23a.10], SCN: 3467262
Completed checkpoint up to RBA [0x8c.23a.10], SCN: 3467262
增量检查点主要是针对Global Checkpoint所带来的性能问题而引入的一种优化机制。检查点Check Point引入的目的,是为了保证进行实例恢复(一旦出现Instance Crash)的时候,消耗的时间可以被接受。存在一个Check Point在Redo Log中,相当于是“声明”了之前的Redo Log Entry对应的Dirty Blocks都已经写入到了数据文件中。
增量检查点是Oracle运行阶段中自动进行的。
Sun Aug 04 04:13:13 2013
Incremental checkpoint up to RBA [0x99.17.0], current log tail at RBA [0x99.169.0]
两种检查点(Global Checkpoint和Incremental Checkpoint)运行都是一次性的,也就是一次性完成,不是一个持续过程。Switch Log Checkpoint在进行Online Redo Log进行切换的时候开始,结束之后一段时间才能结束。
Beginning log switch checkpoint up to RBA [0x8d.2.10], SCN: 3467294
Thread 1 advanced to log sequence 141 (LGWR switch)
Current log# 3 seq# 141 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_3_870n4lsg_.log
Current log# 3 seq# 141 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_3_870n4o31_.log
Sun Aug 04 03:53:56 2013
Archived Log entry 232 added for thread 1 sequence 140 ID 0xe96e7a48 dest 1:
Sun Aug 04 03:59:08 2013
Completed checkpoint up to RBA [0x8d.2.10], SCN: 3467294
从开始到结束,中间持续了若干分钟,而且这分钟里面,数据库的事务操作没有结束。这就是第三类Check Point。
下面我们进入正题,研究一下Redo Log切换过程。
3、非覆盖过程Redo Log切换行为
首先我们先看看Archive Log模式下的时候,日志非覆盖的情形。
当前日志组online为2,为current状态。其他日志状态为Inactive状态。
SQL> select group#, sequence#, archived, status, first_change# from v$log;
GROUP# SEQUENCE# ARCHIVED STATUS FIRST_CHANGE#
---------- ---------- -------- ---------------- -------------
1 139 YES INACTIVE 3466294
2 140 NO CURRENT 3466703
3 138 YES INACTIVE 3464619
常见的Online Redo Log Group状态是四个:Active、Inactive、Current和Unusable。当一个日志组被新加入到系统中,没有被写入过的时候,状态为Unusable。Current状态表示当前日志组。Active是说日志组已经不是当前日志组,但是Redo Log Entry对应的Dirty Block还没有完全写入到数据文件中。Inactive表示Dirty Block已经完全写入。
此时,我们切换日志。
SQL> alter system switch logfile;
System altered
SQL> select group#, sequence#, archived, status, first_change# from v$log;
GROUP# SEQUENCE# ARCHIVED STATUS FIRST_CHANGE#
---------- ---------- -------- ---------------- -------------
1 139 YES INACTIVE 3466294
2 140 YES ACTIVE 3466703
3 141 NO CURRENT 3467294
刚刚的Current日志状态变化为Active。表示还没有完全写入,此时Alert Log中显示出刚刚的变化。
Beginning log switch checkpoint up to RBA [0x8d.2.10], SCN: 3467294
Thread 1 advanced to log sequence 141 (LGWR switch)
Current log# 3 seq# 141 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_3_870n4lsg_.log
Current log# 3 seq# 141 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_3_870n4o31_.log
Sun Aug 04 03:53:56 2013
Archived Log entry 232 added for thread 1 sequence 140 ID 0xe96e7a48 dest 1:
此时,立即生成的log记录中,有开始log switch checkpoint开始的记录,之后切换日志。同时,有Archived Log的信息。
在刚刚的信息v$log中,我们也的确看到了group 2日志组Archived Log状态是Yes,而状态是Active。这就告诉我们:在Active状态时候,Oracle Archived是可以进行归档的。注意:此时没有发生切换日志覆盖动作。
注意,此时时间是3:53。过一会之后,才结束log switch checkpoint。
[oracle@bspdev ~]$ date
Sun Aug 4 03:54:15 CST 2013
SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;
TO_CHAR(SYSDATE,'YYYY-MM-DDHH2
------------------------------
2013-08-04 03:55:18
--延后信息显示在日志上
Sun Aug 04 03:53:56 2013
Archived Log entry 232 added for thread 1 sequence 140 ID 0xe96e7a48 dest 1:
Sun Aug 04 03:59:08 2013
Completed checkpoint up to RBA [0x8d.2.10], SCN: 3467294
基本上,接近6分钟才结束这个过程。日志组group 2状态也变为Inactive。
SQL> select group#, sequence#, archived, status, first_change# from v$log;
GROUP# SEQUENCE# ARCHIVED STATUS FIRST_CHANGE#
---------- ---------- -------- ---------------- -------------
1 139 YES INACTIVE 3466294
2 140 YES INACTIVE 3466703
3 141 NO CURRENT 3467294
那么如果我们由于生成redo log entry的量过大、速度过快,形成切换循环到下一个redo log的时候,归档或者active尚未完成,是什么现象呢?我们继续分析。
4、覆盖过程Redo Log切换行为
Redo Log切换和归档,是Oracle运维和性能调优的一个方面。Redo Log设置不合理,可能会给系统带来一定的性能问题。
比较常见的问题根源是:日志设置相对于业务量而言,体积较小,日志切换速度高。同时,低优先级DBWR或者ARCH来不及将日志写入或者归档。从而引起阻塞等待。
我们当前状态如下:
SQL> set timing on;
SQL> select group#, sequence#, archived, status, first_change# from v$log;
GROUP# SEQUENCE# ARCHIVED STATUS FIRST_CHANGE#
---------- ---------- -------- ---------------- -------------
1 142 YES INACTIVE 3467843
2 143 YES ACTIVE 3467851
3 144 NO CURRENT 3467864
Executed in 0.015 seconds
我们为了模拟情况,使用脚本连续进行switch log,形成日志来不及归档或者写入脏块的情形。
SQL> alter system switch logfile;
System altered
Executed in 0.015 seconds
SQL> alter system switch logfile;
System altered
Executed in 2.262 seconds
SQL> alter system switch logfile;
System altered
Executed in 2.964 seconds
SQL> alter system switch logfile;
System altered
Executed in 0.234 seconds
SQL> alter system switch logfile;
System altered
Executed in 1.139 seconds
SQL> alter system switch logfile;
System altered
Executed in 0.015 seconds
SQL> alter system switch logfile;
System altered
Executed in 0.234 seconds
SQL> alter system switch logfile;
System altered
Executed in 1.731 seconds
SQL> alter system switch logfile;
System altered
Executed in 1.341 seconds
连续的switch log动作,我们从执行时间上,可以看到极度的不平衡。时间最短可以有0.015s,而最长时间达到3s。明显这个过程中,由于archive和db block写入,引起一系列的等待或者阻塞。
对应最后的日志信息如下:
SQL> select group#, sequence#, archived, status, first_change# from v$log;
GROUP# SEQUENCE# ARCHIVED STATUS FIRST_CHANGE#
---------- ---------- -------- ---------------- -------------
1 151 YES INACTIVE 3467978
2 152 YES INACTIVE 3467981
3 153 NO CURRENT 3467984
Executed in 0.015 seconds
Sequence从144变换到153。这个阶段alert log信息如下:
Sun Aug 04 04:03:14 2013
Beginning log switch checkpoint up to RBA [0x90.2.10], SCN: 3467864
Thread 1 advanced to log sequence 144 (LGWR switch)
Current log# 3 seq# 144 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_3_870n4lsg_.log
Current log# 3 seq# 144 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_3_870n4o31_.log
Sun Aug 04 04:03:14 2013
Archived Log entry 235 added for thread 1 sequence 143 ID 0xe96e7a48 dest 1:
Sun Aug 04 04:06:39 2013
Beginning log switch checkpoint up to RBA [0x91.2.10], SCN: 3467958
Thread 1 advanced to log sequence 145 (LGWR switch)
Current log# 1 seq# 145 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log
Current log# 1 seq# 145 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log
Sun Aug 04 04:06:39 2013
Archived Log entry 236 added for thread 1 sequence 144 ID 0xe96e7a48 dest 1:
Thread 1 cannot allocate new log, sequence 146
Checkpoint not complete
Current log# 1 seq# 145 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log
Current log# 1 seq# 145 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log
Sun Aug 04 04:06:42 2013
Completed checkpoint up to RBA [0x90.2.10], SCN: 3467864
Beginning log switch checkpoint up to RBA [0x92.2.10], SCN: 3467962
Thread 1 advanced to log sequence 146 (LGWR switch)
Current log# 2 seq# 146 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_2_870n48hc_.log
Current log# 2 seq# 146 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_2_870n4dtl_.log
Sun Aug 04 04:06:42 2013
Archived Log entry 237 added for thread 1 sequence 145 ID 0xe96e7a48 dest 1:
Thread 1 cannot allocate new log, sequence 147
Checkpoint not complete
Current log# 2 seq# 146 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_2_870n48hc_.log
Current log# 2 seq# 146 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_2_870n4dtl_.log
Completed checkpoint up to RBA [0x92.2.10], SCN: 3467962
Completed checkpoint up to RBA [0x91.2.10], SCN: 3467958
Beginning log switch checkpoint up to RBA [0x93.2.10], SCN: 3467966
Thread 1 advanced to log sequence 147 (LGWR switch)
Current log# 3 seq# 147 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_3_870n4lsg_.log
Current log# 3 seq# 147 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_3_870n4o31_.log
Sun Aug 04 04:06:45 2013
Archived Log entry 238 added for thread 1 sequence 146 ID 0xe96e7a48 dest 1:
Beginning log switch checkpoint up to RBA [0x94.2.10], SCN: 3467969
Thread 1 advanced to log sequence 148 (LGWR switch)
Current log# 1 seq# 148 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log
Current log# 1 seq# 148 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log
Completed checkpoint up to RBA [0x94.2.10], SCN: 3467969
Beginning log switch checkpoint up to RBA [0x95.2.10], SCN: 3467972
Thread 1 advanced to log sequence 149 (LGWR switch)
Current log# 2 seq# 149 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_2_870n48hc_.log
Current log# 2 seq# 149 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_2_870n4dtl_.log
Archived Log entry 239 added for thread 1 sequence 147 ID 0xe96e7a48 dest 1:
Archived Log entry 240 added for thread 1 sequence 148 ID 0xe96e7a48 dest 1:
Beginning log switch checkpoint up to RBA [0x96.2.10], SCN: 3467975
Thread 1 advanced to log sequence 150 (LGWR switch)
Current log# 3 seq# 150 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_3_870n4lsg_.log
Current log# 3 seq# 150 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_3_870n4o31_.log
Completed checkpoint up to RBA [0x96.2.10], SCN: 3467975
Beginning log switch checkpoint up to RBA [0x97.2.10], SCN: 3467978
Thread 1 advanced to log sequence 151 (LGWR switch)
Current log# 1 seq# 151 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log
Current log# 1 seq# 151 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log
ORACLE Instance wilson - Can not allocate log, archival required
Thread 1 cannot allocate new log, sequence 152
All online logs needed archiving
Current log# 1 seq# 151 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log
Current log# 1 seq# 151 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log
Archived Log entry 241 added for thread 1 sequence 149 ID 0xe96e7a48 dest 1:
Beginning log switch checkpoint up to RBA [0x98.2.10], SCN: 3467981
Thread 1 advanced to log sequence 152 (LGWR switch)
Current log# 2 seq# 152 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_2_870n48hc_.log
Current log# 2 seq# 152 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_2_870n4dtl_.log
Archived Log entry 242 added for thread 1 sequence 150 ID 0xe96e7a48 dest 1:
Archived Log entry 243 added for thread 1 sequence 151 ID 0xe96e7a48 dest 1:
Thread 1 cannot allocate new log, sequence 153
Checkpoint not complete
Current log# 2 seq# 152 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_2_870n48hc_.log
Current log# 2 seq# 152 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_2_870n4dtl_.log
Completed checkpoint up to RBA [0x97.2.10], SCN: 3467978
Completed checkpoint up to RBA [0x95.2.10], SCN: 3467972
Completed checkpoint up to RBA [0x93.2.10], SCN: 3467966
Sun Aug 04 04:06:50 2013
Beginning log switch checkpoint up to RBA [0x99.2.10], SCN: 3467984
Thread 1 advanced to log sequence 153 (LGWR switch)
Current log# 3 seq# 153 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_3_870n4lsg_.log
Current log# 3 seq# 153 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_3_870n4o31_.log
Completed checkpoint up to RBA [0x99.2.10], SCN: 3467984
Archived Log entry 244 added for thread 1 sequence 152 ID 0xe96e7a48 dest 1:
Sun Aug 04 04:06:53 2013
Completed checkpoint up to RBA [0x98.2.10], SCN: 3467981
从日志里面,我们看到几个现象:
首先,switch log check point是可以进行嵌套过程的。一个log switch check point没有结束,可能有一个新的check point开始。
Sun Aug 04 04:03:14 2013
Beginning log switch checkpoint up to RBA [0x90.2.10], SCN: 3467864 –第一个check point动作
Thread 1 advanced to log sequence 144 (LGWR switch)
Current log# 3 seq# 144 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_3_870n4lsg_.log
Current log# 3 seq# 144 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_3_870n4o31_.log
Sun Aug 04 04:03:14 2013
Archived Log entry 235 added for thread 1 sequence 143 ID 0xe96e7a48 dest 1:
Sun Aug 04 04:06:39 2013
Beginning log switch checkpoint up to RBA [0x91.2.10], SCN: 3467958 –第二个check point动作开始
Thread 1 advanced to log sequence 145 (LGWR switch)
Current log# 1 seq# 145 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log
Current log# 1 seq# 145 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log
Sun Aug 04 04:06:39 2013
Archived Log entry 236 added for thread 1 sequence 144 ID 0xe96e7a48 dest 1:
Thread 1 cannot allocate new log, sequence 146
Checkpoint not complete
Current log# 1 seq# 145 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log
Current log# 1 seq# 145 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log
Sun Aug 04 04:06:42 2013
Completed checkpoint up to RBA [0x90.2.10], SCN: 3467864 –第一个check point结束
Beginning log switch checkpoint up to RBA [0x92.2.10], SCN: 3467962
Thread 1 advanced to log sequence 146 (LGWR switch)
Current log# 2 seq# 146 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_2_870n48hc_.log
Current log# 2 seq# 146 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_2_870n4dtl_.log
Sun Aug 04 04:06:42 2013
Archived Log entry 237 added for thread 1 sequence 145 ID 0xe96e7a48 dest 1:
Thread 1 cannot allocate new log, sequence 147
Checkpoint not complete
Current log# 2 seq# 146 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_2_870n48hc_.log
Current log# 2 seq# 146 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_2_870n4dtl_.log
Completed checkpoint up to RBA [0x92.2.10], SCN: 3467962 –第三个check point结束
Completed checkpoint up to RBA [0x91.2.10], SCN: 3467958 –第二个check point结束
第二个是当切换很频繁的时候,日志check point会有不能推动等待情况。Alert log中存在check point not complete的时候,对应还常出现“Thread 1 cannot allocate new log”。
如果切换过程中要求覆盖当前日志没有归档,Oracle是有等待情况。一旦出现这样的情况,就说明需要进行归档过程的调整。
beginning log switch checkpoint up to RBA [0x97.2.10], SCN: 3467978
Thread 1 advanced to log sequence 151 (LGWR switch)
Current log# 1 seq# 151 mem# 0: /u01/oradata/WILSON/onlinelog/o1_mf_1_870n42n1_.log
Current log# 1 seq# 151 mem# 1: /u01/flash_recovery_area/WILSON/onlinelog/o1_mf_1_870n44z3_.log
ORACLE Instance wilson - Can not allocate log, archival required
Thread 1 cannot allocate new log, sequence 152
All online logs needed archiving
我们发现:如果一个日志要被覆盖,要保证日志已经归档,并且状态已经写入。
5、结论
Check Point、Switch Log和Redo Log是Oracle中比较复杂的问题。网络上很多人经常讨论检查点之类的细节问题,笔者认为:不同版本的oracle在这些方法上有差异,针对性能调优过程,过于细节的内容也没有太大的意义,没必要过于纠结。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/17203031/viewspace-767731/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/17203031/viewspace-767731/