[20170322]关于IMU.txt-CSDN博客

[20170322]关于IMU.txt

--昨天测试实例crash恢复时,别人测试无法重复我前面的测试,最终发现IMU导致这个问题.
--我本人对IMU的东西很少探究.摘抄一段:

从Oracle10g开始，Oracle在数据库中引入了In Memory Undo(可以被缩写为IMU)的新技术，使用这一技术，数据库会在共享内存中
(Shared Pool)开辟独立的内存区域用于存储Undo信息，这样就可以避免Undo信息以前在Buffer Cache中的读写操作，从而可以进一步的
减少Redo生成，同时可以大大减少以前的UNDO SEGMENT的操作。IMU中数据通过暂存、整理与收缩之后也可以写出到回滚段，这样的写出
提供了有序、批量写的性能提升。

IMU机制与前面日志提到的PVRS紧密相关，由于每个IMU Buffer的大小在64～128KB左右，所以仅有特定的小事务可以使用，每个事务会被
绑定到一个独立的空闲的IMU Buffer，同时相关的Redo信息会写入PVRS中，同样每个IMU Buffer会由一个独立的In Memory Undo Latch保
护，当IMU Buffer或PVRS写满之后，数据库需要写出IMU中的信息。

一个新引入的隐含参数可以控制该特性是否启用，这个参数是_in_memory_undo，在Oracle 10g中这个参数的缺省值是TRUE(不同版本和平
台参数的初始设置可能不同)：

--//实际上就是减少redo allocation 拴锁调用,采用一种批量方式写入redo日志.redo信息写入private redo,IMU单独使用一个独立的空
--//闲的IMU Buffer,这样当IMU Buffer或PVRS写满之后，数据库需要写出IMU中的信息以及相关redo信息。最明显感觉写入redo的顺序是
--//发生了变化(这点有许多文章介绍).

IMU的内存在Shared Pool中分配，回想一下Redo Log Buffer的内存使用与功能，实际上IMU技术在某种程度上也是参考了Log Buffer的机
制，通过以下查询可以获得系统当前分配的IMU内存：

SYS@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

SYS@book> select * from v$sgastat where name ='KTI-UNDO';
POOL NAME BYTES
------------ ---------- -------
shared pool KTI-UNDO 2042760

In Memory Undo池缺省的会分配3个，用以提供更好的并发：

SYS@book> @ &r/hide imu_pool
NAME DESCRIPTION DEFAULT_VALUE SESSION_VALUE SYSTEM_VALUE
----------- --------------------- ------------- ------------- ------------
_imu_pools in memory undo pools TRUE 3 3

--//有几种情况关闭IMU特性.
1.打开flashback on功能.
2.开启附加日志.alter database add supplemental log data;
3.rac环境.
4._in_memory_undo=false.
...

--昨天的测试我发现很久才写出到日志文件,看看什么情况会触发写出.(注这个测试无法做到精确,测试时尽量避开一些大事务避免日志切
换,以及awr报表整点生成的时间,我测试环境基本我自己使用比较好控制).

0.环境:
SYS@book> @ &r/ver1
PORT_STRING VERSION BANNER
--------------------- ---------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

create table t1 ( id number,name varchar2(10));
create table t2 ( id number,name varchar2(10));
create table t3 ( id number,name varchar2(10));
create table t4 ( id number,name varchar2(10));

insert into t1 select rownum,'ABCDEF' from dual connect by level<=10;
insert into t2 select rownum,'ABCDEF' from dual connect by level<=10;
insert into t3 select rownum,'ABCDEF' from dual connect by level<=10;
insert into t4 select rownum,'ABCDEF' from dual connect by level<=10;

commit ;
alter system checkpoint ;

1.测试1:
SCOTT@book> alter system archive log current ;
System altered.

SCOTT@book> @ &r/logfile
GROUP# STATUS     TYPE       MEMBER                       IS_ GROUP# THREAD# SEQUENCE#       BYTES BLOCKSIZE MEMBERS ARC STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
------ ---------- ---------- ---------------------------- --- ------ ------- --------- ----------- --------- ------- --- ---------- ------------- ------------------- ------------ -------------------
     1            ONLINE     /mnt/ramdisk/book/redo01.log NO       1       1       701    52428800       512       1 YES ACTIVE       13276937278 2017-03-22 08:58:13 13276941089 2017-03-22 09:42:21
     2            ONLINE     /mnt/ramdisk/book/redo02.log NO       2       1       702    52428800       512       1 NO CURRENT      13276941089 2017-03-22 09:42:21 2.814750E+14
     3            ONLINE     /mnt/ramdisk/book/redo03.log NO       3       1       700    52428800       512       1 YES INACTIVE     13276936571 2017-03-22 08:58:10 13276937278 2017-03-22 08:58:13
--//当前/mnt/ramdisk/book/redo02.log.
--//测试方法就是查询当前日志查询特定字符串.
--//另外必须注意一个问题,就是日志是循环写入的,必要是使用大事务清除里面的日志信息.
update tx set object_name=object_name;
update tx set object_name=object_name;
update tx set object_name=object_name;
commit ;
--//多执行几次就覆盖了.

--//session 1:
SCOTT@book> update t1 set name='2ZZZZt1' ;
10 rows updated.

--//不提交.

--//session 5:
$ strings /mnt/ramdisk/book/redo02.log | grep 2ZZZZt[1234]
--//没有输出.

--//session 1:
SCOTT@book> rollback ;
Rollback complete.

--//session 5:
$ strings /mnt/ramdisk/book/redo02.log | grep 2ZZZZt[1234] | uniq -c
      8 2ZZZZt1
      1 2ZZZZt1@
      1 ns2ZZZZt1/
--//可以发现回滚都会触发写入redo文件.

SCOTT@book> update t1 set name='2ZZZZt1' ;
10 rows updated.

--//不提交.

--//session 5:
$ strings /mnt/ramdisk/book/redo02.log | grep 2ZZZZt[1234] | uniq -c
      8 2ZZZZt1
      1 2ZZZZt1@
      1 ns2ZZZZt1/

--//session 1:
SCOTT@book> commit ;
Commit complete.

--//session 5:
$ strings /mnt/ramdisk/book/redo02.log | grep 2ZZZZt[1234] | uniq -c
      8 2ZZZZt1
      1 2ZZZZt1@
      1 ns2ZZZZt1/
      6 2ZZZZt1
      1 _R2ZZZZt1S
      1 _N2ZZZZt1S
      1 2ZZZZt1E
      1 NS2ZZZZt1V

--//可以发现提交都会触发写入redo文件.
--//测试一可以发现提交回滚会触发写日志文件.

2.测试2:
--//大事务也会触发提前写日志,因为IMU缓存,private不足.
--//session 1:
SCOTT@book> update t1 set name='3ZZZZt1' ;
10 rows updated.

--//session 5:
$ strings /mnt/ramdisk/book/redo02.log | grep 3ZZZZt[1234] | uniq -c
--//没有输出!!

--//session 1:
SCOTT@book> update tx set object_name=object_name where rownum<=200;
200 rows updated.

--//session 5:
$ strings /mnt/ramdisk/book/redo02.log | grep 3ZZZZt[1234] | uniq -c
10 3ZZZZt1

--//你可以发现如果事务很大,会自动切换会传统模式.

3.测试3:
--//这个测试不贴出来了.我个人认识错误._imu_pools=3,我以为4个会话会出现提前写出日志的情况实际上不会.
--//大家可以自行测试.

4.测试4:
--//别的会话发出commit是否会执行alter system checkpoint是否会写出日志.

SCOTT@book> alter system archive log current ;
System altered.

--//session 1,不提交:
SCOTT@book> update t1 set name='4ZZZZt1' ;
10 rows updated.

--//session 5:
$ strings /mnt/ramdisk/book/redo03.log | grep 4ZZZZt[1234] | uniq -c
--//没有输出.

--//session 2:
SCOTT@book> commit;
Commit complete.

--//session 5:
$ strings /mnt/ramdisk/book/redo03.log | grep 4ZZZZt[1234] | uniq -c
--//没有输出.

--//session 2:
SCOTT@book> alter system checkpoint ;
System altered.

--//session 5:
$ strings /mnt/ramdisk/book/redo02.log | grep 4ZZZZt[1234] | uniq -c
10 4ZZZZt1
--//可以发现alter system checkpoint ;会触发这类日志写出.

--//别的会话执行提交(包括回滚,因为回滚最后一条命令实际上就是提交).
--//而alter system checkpoint ;因为要写脏块,要写出这类日志,也会触发.
--//另外说明一下:我测试遇到alter system checkpoint ;不写日志情况,不过大部分都会触发.

5.测试5:
SCOTT@book> alter system archive log current ;
System altered.

--//测试切换日志的情况.

--//session 1,不提交:
SCOTT@book> update t1 set name='6ZZZZt1' ;
10 rows updated.

--//session 5:
$ strings /mnt/ramdisk/book/redo01.log | grep 6ZZZZt[1234] | uniq -c
--//没有输出.

--//session 2,切换日志:
SCOTT@book> alter system archive log current ;
System altered.

--//session 5:
$ strings /mnt/ramdisk/book/redo01.log | grep 6ZZZZt[1234] | uniq -c
10 6ZZZZt1

--//可以发现切换日志时,触发写出.因为这些事务的scn小于新建在线日志的scn,必须写出.

6.做一个小结:
1.测试1,事务提交或者回滚必须写出这个是必须的.
2.测试2,大事务会提前写出,因为IMU buffer,private redo大小的限制,这个很好理解.
3.测试4,别的会话提交或者回滚不会触发写出,但是alter system checkpoint大多数情况下会触发写出.我遇到小部分没有写出的情况不
知道为什么?
4.测试5,日志切换一定会写出,因为新建的在线日志scn号大于那些在IMU buffer事务的scn号.
这很容易理解为什么归档日志要比在线日志小很多.也就是Preemptive redolog switches.

--//找到一个链接:
httpS://blogs.sap.com/2012/10/22/oracle-insights-into-redo-handling-preemptive-redolog-switches-and-log-file-switch-private-strand-flush-incomplete/

We have seen that the redo handling has changed in Oracle 10g, but what has this to do with "preemptive redo log
switching"? Well at first we need to know a little detail about how many private and public redo threads are created or
used. You will have at least 2 public log buffer (redo) threads, if you have more than one CPU. The maximum amount of
private redo threads is defined as transactions / 10. All of these values are based on the currently used algorithms
which can change of course by every patch or version. However Oracle will dynamically adjust the number of active public
and private redo threads.

So let's assume the following scenario with the new redo handling approach. You got a session <X>, that deletes a couple
of data sets (with the fictive SCN 20) but does not finish immediately. So the redo information is currently located in
the private redo area. The session idles for some time and a log switch occurs at fictive SCN 30. A few minutes after
the log switch session <X> will finish and wants to copy its information from the private area to the public area and
write it to the redo log files, but the current log file has a greater starting SCN.

Oracle is aware of this issue (become very nasty in case of recovery – think about the "write-ahead logging" approach)
and it assumes the worst-case. So in case of a redo log switch – Oracle calculates the maximum needed space for all
active private and public redo threads and started the log file switch at that point when all of this data fits at the
end of the current log file. If the redo threads do not contain any content, the free space will be not used and you
will see smaller archive redo log files in case of a log switch. Just a simple fictive calculation:

--//里面提到的计算公式:
2 active public redo threads with 3 MB each
20 active Private redo threads with 64 KB each

So in worst case your archive log files are round about 7.5 MB smaller (6144 KB + 1280 KB = 7424 KB) than the redo log files.
--//说明感觉应该使用128K,估计它的环境是64K.我的测试机器:

SELECT INDX
      ,last_buf_kcrfa
      ,PNEXT_BUF_KCRFA_CLN nxtbufadr
      ,NEXT_BUF_NUM_KCRFA_CLN nxtbuf#
      ,BYTES_IN_BUF_KCRFA_CLN "B/buf"
      ,PVT_STRAND_STATE_KCRFA_CLN state
      ,STRAND_NUM_ORDINAL_KCRFA_CLN strand#
      ,PTR_KCRF_PVT_STRAND stradr
      ,INDEX_KCRF_PVT_STRAND stridx
      ,SPACE_KCRF_PVT_STRAND strspc
      ,TXN_KCRF_PVT_STRAND txn
      ,TOTAL_BUFS_KCRFA totbufs#
      ,STRAND_SIZE_KCRFA strsz
FROM X$KCRFSTRAND ;

INDX LAST_BUF_KCRFA   NXTBUFADR           NXTBUF#      B/buf      STATE    STRAND# STRADR               STRIDX     STRSPC        TXN   TOTBUFS#      STRSZ
---- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------------- ---------- ---------- ---------- ---------- ----------
   0 0000000060594E00 0000000060237E00        134          0          0 3735928559 00                        0          0          0       7024    3596288
   1 0000000060902E00 0000000060903000       7023          0          0 3735928559 00                        0          0          0       7024    3596288
   2 00               00                        0          0          0 3735928559 00000000819F4054 3735928559     126464          0        249     132096
   3 00               00                        0          0          0 3735928559 0000000081A15054 3735928559     126464          1        249     132096
   4 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
   5 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
   6 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
   7 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
   8 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
   9 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
10 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
11 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
12 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
13 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
14 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
15 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
16 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
17 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
18 rows selected.

--//前面2个是public redo,大小3596288.而后面16个是private redo.大小132096.
--//3596288/1024=3512KB
--//132096/1024=129KB.
--//3512*2+16*128=9072KB

--//按照2 active public redo threads with 3 MB each计算:
--//6144+16*128=8192KB
--//我的在线日志大小50M*1024=51200K. 多少情况下写满42M上下.
--//可以通过另外方法验证:public redo数量受参数_log_parallelism_max控制,这个参数受cpu实例显示.
--//实际上这个值与cpu数量有关.floor(cpu_count/16). 对于我的测试环境是2(24个CPU).

SYS@book> alter system set "_log_parallelism_max"=1 scope=spfile ;
System altered.

--//这样可以减少到1个.重启数据库:
SELECT INDX
      ,last_buf_kcrfa
      ,PNEXT_BUF_KCRFA_CLN nxtbufadr
      ,NEXT_BUF_NUM_KCRFA_CLN nxtbuf#
      ,BYTES_IN_BUF_KCRFA_CLN "B/buf"
      ,PVT_STRAND_STATE_KCRFA_CLN state
      ,STRAND_NUM_ORDINAL_KCRFA_CLN strand#
      ,PTR_KCRF_PVT_STRAND stradr
      ,INDEX_KCRF_PVT_STRAND stridx
      ,SPACE_KCRF_PVT_STRAND strspc
      ,TXN_KCRF_PVT_STRAND txn
      ,TOTAL_BUFS_KCRFA totbufs#
      ,STRAND_SIZE_KCRFA strsz
FROM X$KCRFSTRAND ;

      INDX LAST_BUF_KCRFA   NXTBUFADR           NXTBUF#      B/buf      STATE    STRAND# STRADR               STRIDX     STRSPC        TXN   TOTBUFS#      STRSZ
---------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------------- ---------- ---------- ---------- ---------- ----------
         0 0000000060903E00 0000000060233A00        100          0          0 3735928559 00                        0          0          0      14056    7196672
         1 00               00                        0          0          0 3735928559 00000000819F3054 3735928559     126464          0        249     132096
         2 00               00                        0          0          0 3735928559 0000000081A15054 3735928559     126464          1        249     132096
         3 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
         4 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
         5 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
         6 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
         7 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
         8 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
         9 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
        10 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
        11 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
        12 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
        13 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
        14 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
        15 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
        16 00               00                        0          0          0 3735928559 00                        0          0          0        249     132096
17 rows selected.
--//这样public redo数量等于1.按照前面的公式计算归档写满可以增加3M.测试看看.

create table tx as select * from dba_objects;
update tx set object_name=object_name;
update tx set object_name=object_name;
update tx set object_name=object_name;
update tx set object_name=object_name;
update tx set object_name=object_name;
update tx set object_name=object_name;
update tx set object_name=object_name;
--//执行多次.

$ ls -ltr /u01/app/oracle/archivelog/book | grep "2017-03-22 11:"
-rw-r----- 1 oracle oinstall 50312704 2017-03-22 11:03:37 1_717_896605872.dbf
-rw-r----- 1 oracle oinstall 50179584 2017-03-22 11:03:40 1_718_896605872.dbf
-rw-r----- 1 oracle oinstall 50181632 2017-03-22 11:04:27 1_719_896605872.dbf
-rw-r----- 1 oracle oinstall 45753856 2017-03-22 11:04:29 1_720_896605872.dbf
-rw-r----- 1 oracle oinstall 39139328 2017-03-22 11:04:35 1_721_896605872.dbf
-rw-r----- 1 oracle oinstall 50179584 2017-03-22 11:07:21 1_722_896605872.dbf

$ ls -ltr -h /u01/app/oracle/archivelog/book | grep "2017-03-22 11:"
-rw-r----- 1 oracle oinstall 48M 2017-03-22 11:03:37 1_717_896605872.dbf
-rw-r----- 1 oracle oinstall 48M 2017-03-22 11:03:40 1_718_896605872.dbf
-rw-r----- 1 oracle oinstall 48M 2017-03-22 11:04:27 1_719_896605872.dbf
-rw-r----- 1 oracle oinstall 44M 2017-03-22 11:04:29 1_720_896605872.dbf
-rw-r----- 1 oracle oinstall 38M 2017-03-22 11:04:35 1_721_896605872.dbf
-rw-r----- 1 oracle oinstall 48M 2017-03-22 11:07:21 1_722_896605872.dbf
--//归档文件可以达到48M.比原来大一些.

7.补充log file switch (private strand flush incomplete)

Message "log file switch (private strand flush incomplete)"

The approach of the "preemptive redolog switches" describes how to ensure enough free space in the redo log file, but
what has this to do with the message "log file switch (private strand flush incomplete)".

Once again: Oracle is using the "write-ahead logging" approach. This means that all of the corresponding redo data has
to be written to the redo log files before the database writer (DBWR) can write its data to the data files (which
happens by a log file switch for checkpointing).

What happens if the DBWR wants to write a data block, that is currently associated with a private redo thread? The DBWR
is aware of this and copies the associated private redo data to the public redo area and applies it to the data before
writing it. The foreground session (that initially allocated the private redo thread for data change) will be suspended,
if it tries to generate more redo data while the DBWR is doing its necessary activities and an alert log file entry "log
file switch (private strand flush incomplete)" is written. This is the story behind that particular alert log entry.The
redo log behavior changes by this situation, but let's disregard these internals right now.

--//这个事件实际上就与Preemptive redolog switches有关,因为在切换日志时要把当前privare redo与IMU buffer内容写当前日志,
--//如果你redo磁盘IO很慢,就会在alert.log文件中大量出现这个等待事件.

--//曾经跟zergduan有关一次讨论,对方认为日志切换就会出现,实际上我的测试环境出现这个事件很少:

$ grep -i "private" alert_book.log
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
Private strand flush not complete
--//一共13次.注alert提示是Private strand flush not complete.

$ grep -C4 "^Private" alert_book.log |grep 2017
Wed Mar 08 09:01:23 2017
Sat Mar 11 13:36:33 2017
Sat Mar 11 14:01:36 2017
Sat Mar 11 17:36:47 2017
Sat Mar 11 18:01:43 2017
Sun Mar 12 06:00:11 2017
Wed Mar 15 16:57:56 2017
Wed Mar 15 16:58:22 2017
Fri Mar 17 12:05:27 2017
Fri Mar 17 12:05:30 2017
Fri Mar 17 17:04:20 2017
Wed Mar 22 08:36:48 2017
Wed Mar 22 08:36:48 2017
Wed Mar 22 09:46:45 2017
Wed Mar 22 09:46:45 2017
Wed Mar 22 09:52:54 2017
Wed Mar 22 09:52:54 2017
Wed Mar 22 10:10:28 2017
Wed Mar 22 10:10:28 2017
Wed Mar 22 10:37:40 2017
Wed Mar 22 10:37:40 2017
Wed Mar 22 11:04:26 2017
Wed Mar 22 11:04:26 2017
--//今天出现最多,是因为测试我执行一些大事务.

$ grep -C4 "^Private" alert_book.log
--
Wed Mar 22 09:46:45 2017
ALTER SYSTEM ARCHIVE LOG
Wed Mar 22 09:46:45 2017
Thread 1 cannot allocate new log, sequence 703
Private strand flush not complete
Current log# 2 seq# 702 mem# 0: /mnt/ramdisk/book/redo02.log
Beginning log switch checkpoint up to RBA [0x2bf.2.10], SCN: 13276941372
Thread 1 advanced to log sequence 703 (LGWR switch)
Current log# 3 seq# 703 mem# 0: /mnt/ramdisk/book/redo03.log
--

--//实际上DBWR也会触发log file switch (private strand flush incomplete).不一定在日志切换.

--//我仔细看了生产系统的情况,大部分都是产生大量日志来不及归档,Private strand flush not complete前面总有一行
--//Thread 1 cannot allocate new log, sequence NNN.

$ grep -i ^private alert_hrp430.log |wc
931 4655 31654

$ grep -B 1 -i private alert_hrp430.log | grep Thread|wc
931 7448 44201

--//也就是在线日志设置有点小,出现大事务时来不及归档,总之我个人认为磁盘IO慢也是一个原因 ,小事务密集也是一个原因.

--//写的有点乱...