优化器革命之-Dynamic Sampling（二）

最新推荐文章于 2021-04-04 08:42:01 发布

cotchte0421

最新推荐文章于 2021-04-04 08:42:01 发布

阅读量211

点赞数

说了这么多，我们先来看一下动态采样的威力到底如何？
create table t as
select mod(num, 100) c1, mod(num, 100) c2, mod(num, 75) c3, mod(num, 30) c4
from (select level num from dual connect by level <= 10001);

创建了一张表T，字段C1和C2的值保持联动。
select c1,c2 from t where rownum<20;

C1 C2
---------- ----------
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
表不收集统计信息，我们先来看看只查询C1列的情况：
select count(*) from t where c1 = 10;

COUNT(*)
----------
100

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 1300 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C1"=10)

Note
-----
- dynamic sampling used for this statement (level=2)

非常好，动态采样已经非常精准的估计出了返回的基数为100，跟实际值之间丝毫无差。当然你的环境下可能会遭遇些许的误差，这是正常的。
我们收集一下表的统计信息：
begin
dbms_stats.gather_table_stats(ownname =>'test',
tabname => 't',
no_invalidate => FALSE,
estimate_percent => 100,
force => true,
degree => 5,
method_opt => 'for all columns size 1',
cascade => true);
end;
/

select count(*) from t where c1 = 10 and c2=10;

COUNT(*)
----------
100

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 1 | 6 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------

其实通过两个谓词过滤后，有100条的记录符合条件，但是优化器根据数据字典里的统计信息评估后，认为只返回1条记录。
因为优化器并不知道C1，C2的值是联动的，它只会傻傻的按照公式来计算基数：
cardinality = (selectivity c1) *???? (selectivity c2) *???? # rows in table
= 1/100 * 1/100 * 10001 = 1
我们把采样级别设置为4（在设置为4的情况下，由于我们的查询是多谓词查询，即使表上有统计信息，也会使用到动态采样）
alter session set optimizer_dynamic_sampling=4;

Session altered.

set autotrace on

select count(*) from t where c1 = 10 and c2=10;

COUNT(*)
----------
100

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 600 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C1"=10 AND "C2"=10)

Note
-----
- dynamic sampling used for this statement (level=4)

GOOD，优化器使用动态采样后，精确的估计出了基数值。如果你使用的是11G之后的版本，还可以通过扩展的统计信息收集来解决多列之间有数据依赖的问题：

select dbms_stats.create_extended_stats(ownname=>user,
tabname => 'DEPEND_TEST', extension => '(c1, c2)' ) AS c1_c2_correlation
from dual ;

begin
dbms_stats.gather_table_stats(ownname =>'test',
tabname => 't',
no_invalidate => FALSE,
estimate_percent => 100,
force => true,
degree => 5,
method_opt => 'for all columns size 1',
cascade => true);
end;
/

alter session set optimizer_dynamic_sampling=2;

select count(*) from t where c1 = 10 and c2=10;

COUNT(*)
----------
100

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 600 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C1"=10 AND "C2"=10)

我们看到通过11G的扩展统计信息收集也解决了多列之间有数据依赖情况下，基数计算不准的问题，但是，但是扩展的统计信息只对于做等值查询有效，我们看看下面的情况：
select c1,c2 from t where c1 = 10 and c2>10;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 90 | 540 | 8 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 90 | 540 | 8 (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("C1"=10 AND "C2">10)

虽然我们收集了扩展的统计信息，但是由于C2做的是非等值查询，扩展的统计信息失效了。这个时候我们依然只能求助于动态采样了。
alter session set optimizer_dynamic_sampling=4;

select c1,c2 from t where c1 = 10 and c2>10;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 6 | 8 (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("C1"=10 AND "C2">10)

Note
-----
- dynamic sampling used for this statement (level=4)

动态采样总是那么好使，依然为我们计算出了精准的基数。

需要注意的是，手工设置统计信息的表，不管是在session/system级，还是在cursor级，还是在segment级别启用动态采样，都将不会有效：
begin
dbms_stats.set_table_stats(ownname => user,
tabname => 't',
numrows => 100);
end;
/
上面对表T进行了行数的统计信息设定。

select /*+ DYNAMIC_SAMPLING(10) */
count(*) as cnt
from
t
where
attr1 = 1
and id > 0
;

CNT
----------
100000

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 24 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
|* 2 | TABLE ACCESS FULL| T | 1 | 26 | 24 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("ATTR1"=1 AND "ID">0)
采样了级别为10，由于查询条件是多个谓词，即使表上存在统计信息也符合动态采样的条件（前面已经有过论述），但是执行计划的输出表示，这个查询没有使用到动态采样。10053的跟踪结果也说明了这一点。这里不再贴出。
我们重新收集统计信息看看：

test@DLSP>begin
2 dbms_stats.gather_table_stats(ownname =>'test',
3 tabname => 't',
4 no_invalidate => FALSE,
5 estimate_percent => 100,
6 force => true,
7 degree => 5,
8 method_opt => 'for all columns size 1',
9 cascade => true);
10 end;
11 /

PL/SQL procedure successfully completed.

test@DLSP>set autotrace on
test@DLSP>select /*+ DYNAMIC_SAMPLING(10) */
2 count(*) as cnt
3 from
4 t
5 where
6 attr1 = 1
7 and id > 0
8 ;

CNT
----------
100000

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 3537 (1)| 00:00:43 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T | 100K| 781K| 3537 (1)| 00:00:43 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("ATTR1"=1 AND "ID">0)

Note
-----
- dynamic sampling used for this statement (level=10)

可以看到优化器已经使用了动态采样，执行计划输出的Note部分表明了这一点。我们看看能不能骗过ORACLE，在统计信息收集后，再手工设置统计信息，看看能不能动态采样。

test@DLSP>begin
2 dbms_stats.set_table_stats(ownname => user,
3 tabname => 't',
4 numrows => 100);
5 end;
6 /

PL/SQL procedure successfully completed.

test@DLSP>select /*+ dynamic_sampling(5) */
2 count(*) as cnt
3 from
4 t
5 where
6 attr1 = 1
7 and id > 0
8 ;

CNT
----------
100000

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 3519 (1)| 00:00:43 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T | 10 | 80 | 3519 (1)| 00:00:43 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("ATTR1"=1 AND "ID">0)
可以看到没有使用到，ORACLE不是那么好骗的。
这一点要引起注意，如果你的统计信息是手工设定的，动态采样技术将不能为你服务。
文章有点长了，请关注下一篇！！

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/22034023/viewspace-1222339/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/22034023/viewspace-1222339/

cotchte0421

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
优化器革命之-Dynamic Sampling（二）

说了这么多，我们先来看一下动态采样的威力到底如何？create table t asselect mod(num, 100) c1, mod(num, 100) c2, mod(num, 75) c3, mod(num,...
复制链接

扫一扫