说了这么多,我们先来看一下动态采样的威力到底如何?
create table t as
select mod(num, 100) c1, mod(num, 100) c2, mod(num, 75) c3, mod(num, 30) c4
from (select level num from dual connect by level <= 10001);
创建了一张表T,字段C1和C2的值保持联动。
select c1,c2 from t where rownum<20;
C1 C2
---------- ----------
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
表不收集统计信息,我们先来看看只查询C1列的情况:
select count(*) from t where c1 = 10;
COUNT(*)
----------
100
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 1300 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10)
Note
-----
- dynamic sampling used for this statement (level=2)
非常好,动态采样已经非常精准的估计出了返回的基数为100,跟实际值之间丝毫无差。当然你的环境下可能会遭遇些许的误差,这是正常的。
我们收集一下表的统计信息:
begin
dbms_stats.gather_table_stats(ownname =>'test',
tabname => 't',
no_invalidate => FALSE,
estimate_percent => 100,
force => true,
degree => 5,
method_opt => 'for all columns size 1',
cascade => true);
end;
/
select count(*) from t where c1 = 10 and c2=10;
COUNT(*)
----------
100
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 1 | 6 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
其实通过两个谓词过滤后,有100条的记录符合条件,但是优化器根据数据字典里的统计信息评估后,认为只返回1条记录。
因为优化器并不知道C1,C2的值是联动的,它只会傻傻的按照公式来计算基数:
cardinality = (selectivity c1) *???? (selectivity c2) *???? # rows in table
= 1/100 * 1/100 * 10001 = 1
我们把采样级别设置为4(在设置为4的情况下,由于我们的查询是多谓词查询,即使表上有统计信息,也会使用到动态采样)
alter session set optimizer_dynamic_sampling=4;
Session altered.
set autotrace on
select count(*) from t where c1 = 10 and c2=10;
COUNT(*)
----------
100
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 600 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10 AND "C2"=10)
Note
-----
- dynamic sampling used for this statement (level=4)
GOOD,优化器使用动态采样后,精确的估计出了基数值。如果你使用的是11G之后的版本,还可以通过扩展的统计信息收集来解决多列之间有数据依赖的问题:
select dbms_stats.create_extended_stats(ownname=>user,
tabname => 'DEPEND_TEST', extension => '(c1, c2)' ) AS c1_c2_correlation
from dual ;
begin
dbms_stats.gather_table_stats(ownname =>'test',
tabname => 't',
no_invalidate => FALSE,
estimate_percent => 100,
force => true,
degree => 5,
method_opt => 'for all columns size 1',
cascade => true);
end;
/
alter session set optimizer_dynamic_sampling=2;
select count(*) from t where c1 = 10 and c2=10;
COUNT(*)
----------
100
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 600 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10 AND "C2"=10)
我们看到通过11G的扩展统计信息收集也解决了多列之间有数据依赖情况下,基数计算不准的问题,但是,但是扩展的统计信息只对于做等值查询有效,我们看看下面的情况:
select c1,c2 from t where c1 = 10 and c2>10;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 90 | 540 | 8 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 90 | 540 | 8 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C1"=10 AND "C2">10)
虽然我们收集了扩展的统计信息,但是由于C2做的是非等值查询,扩展的统计信息失效了。这个时候我们依然只能求助于动态采样了。
alter session set optimizer_dynamic_sampling=4;
select c1,c2 from t where c1 = 10 and c2>10;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 6 | 8 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C1"=10 AND "C2">10)
Note
-----
- dynamic sampling used for this statement (level=4)
动态采样总是那么好使,依然为我们计算出了精准的基数。
需要注意的是,手工设置统计信息的表,不管是在session/system级,还是在cursor级,还是在segment级别启用动态采样,都将不会有效:
begin
dbms_stats.set_table_stats(ownname => user,
tabname => 't',
numrows => 100);
end;
/
上面对表T进行了行数的统计信息设定。
select /*+ DYNAMIC_SAMPLING(10) */
count(*) as cnt
from
t
where
attr1 = 1
and id > 0
;
CNT
----------
100000
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 24 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
|* 2 | TABLE ACCESS FULL| T | 1 | 26 | 24 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("ATTR1"=1 AND "ID">0)
采样了级别为10,由于查询条件是多个谓词,即使表上存在统计信息也符合动态采样的条件(前面已经有过论述),但是执行计划的输出表示,这个查询没有使用到动态采样。10053的跟踪结果也说明了这一点。这里不再贴出。
我们重新收集统计信息看看:
test@DLSP>begin
2 dbms_stats.gather_table_stats(ownname =>'test',
3 tabname => 't',
4 no_invalidate => FALSE,
5 estimate_percent => 100,
6 force => true,
7 degree => 5,
8 method_opt => 'for all columns size 1',
9 cascade => true);
10 end;
11 /
PL/SQL procedure successfully completed.
test@DLSP>set autotrace on
test@DLSP>select /*+ DYNAMIC_SAMPLING(10) */
2 count(*) as cnt
3 from
4 t
5 where
6 attr1 = 1
7 and id > 0
8 ;
CNT
----------
100000
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 3537 (1)| 00:00:43 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T | 100K| 781K| 3537 (1)| 00:00:43 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("ATTR1"=1 AND "ID">0)
Note
-----
- dynamic sampling used for this statement (level=10)
可以看到优化器已经使用了动态采样,执行计划输出的Note部分表明了这一点。我们看看能不能骗过ORACLE,在统计信息收集后,再手工设置统计信息,看看能不能动态采样。
test@DLSP>begin
2 dbms_stats.set_table_stats(ownname => user,
3 tabname => 't',
4 numrows => 100);
5 end;
6 /
PL/SQL procedure successfully completed.
test@DLSP>select /*+ dynamic_sampling(5) */
2 count(*) as cnt
3 from
4 t
5 where
6 attr1 = 1
7 and id > 0
8 ;
CNT
----------
100000
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 3519 (1)| 00:00:43 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T | 10 | 80 | 3519 (1)| 00:00:43 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("ATTR1"=1 AND "ID">0)
可以看到没有使用到,ORACLE不是那么好骗的。
这一点要引起注意,如果你的统计信息是手工设定的,动态采样技术将不能为你服务。
文章有点长了,请关注下一篇!!
create table t as
select mod(num, 100) c1, mod(num, 100) c2, mod(num, 75) c3, mod(num, 30) c4
from (select level num from dual connect by level <= 10001);
创建了一张表T,字段C1和C2的值保持联动。
select c1,c2 from t where rownum<20;
C1 C2
---------- ----------
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
表不收集统计信息,我们先来看看只查询C1列的情况:
select count(*) from t where c1 = 10;
COUNT(*)
----------
100
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 1300 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10)
Note
-----
- dynamic sampling used for this statement (level=2)
非常好,动态采样已经非常精准的估计出了返回的基数为100,跟实际值之间丝毫无差。当然你的环境下可能会遭遇些许的误差,这是正常的。
我们收集一下表的统计信息:
begin
dbms_stats.gather_table_stats(ownname =>'test',
tabname => 't',
no_invalidate => FALSE,
estimate_percent => 100,
force => true,
degree => 5,
method_opt => 'for all columns size 1',
cascade => true);
end;
/
select count(*) from t where c1 = 10 and c2=10;
COUNT(*)
----------
100
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 1 | 6 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
其实通过两个谓词过滤后,有100条的记录符合条件,但是优化器根据数据字典里的统计信息评估后,认为只返回1条记录。
因为优化器并不知道C1,C2的值是联动的,它只会傻傻的按照公式来计算基数:
cardinality = (selectivity c1) *???? (selectivity c2) *???? # rows in table
= 1/100 * 1/100 * 10001 = 1
我们把采样级别设置为4(在设置为4的情况下,由于我们的查询是多谓词查询,即使表上有统计信息,也会使用到动态采样)
alter session set optimizer_dynamic_sampling=4;
Session altered.
set autotrace on
select count(*) from t where c1 = 10 and c2=10;
COUNT(*)
----------
100
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 600 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10 AND "C2"=10)
Note
-----
- dynamic sampling used for this statement (level=4)
GOOD,优化器使用动态采样后,精确的估计出了基数值。如果你使用的是11G之后的版本,还可以通过扩展的统计信息收集来解决多列之间有数据依赖的问题:
select dbms_stats.create_extended_stats(ownname=>user,
tabname => 'DEPEND_TEST', extension => '(c1, c2)' ) AS c1_c2_correlation
from dual ;
begin
dbms_stats.gather_table_stats(ownname =>'test',
tabname => 't',
no_invalidate => FALSE,
estimate_percent => 100,
force => true,
degree => 5,
method_opt => 'for all columns size 1',
cascade => true);
end;
/
alter session set optimizer_dynamic_sampling=2;
select count(*) from t where c1 = 10 and c2=10;
COUNT(*)
----------
100
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 100 | 600 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10 AND "C2"=10)
我们看到通过11G的扩展统计信息收集也解决了多列之间有数据依赖情况下,基数计算不准的问题,但是,但是扩展的统计信息只对于做等值查询有效,我们看看下面的情况:
select c1,c2 from t where c1 = 10 and c2>10;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 90 | 540 | 8 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 90 | 540 | 8 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C1"=10 AND "C2">10)
虽然我们收集了扩展的统计信息,但是由于C2做的是非等值查询,扩展的统计信息失效了。这个时候我们依然只能求助于动态采样了。
alter session set optimizer_dynamic_sampling=4;
select c1,c2 from t where c1 = 10 and c2>10;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 6 | 8 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C1"=10 AND "C2">10)
Note
-----
- dynamic sampling used for this statement (level=4)
动态采样总是那么好使,依然为我们计算出了精准的基数。
需要注意的是,手工设置统计信息的表,不管是在session/system级,还是在cursor级,还是在segment级别启用动态采样,都将不会有效:
begin
dbms_stats.set_table_stats(ownname => user,
tabname => 't',
numrows => 100);
end;
/
上面对表T进行了行数的统计信息设定。
select /*+ DYNAMIC_SAMPLING(10) */
count(*) as cnt
from
t
where
attr1 = 1
and id > 0
;
CNT
----------
100000
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 24 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
|* 2 | TABLE ACCESS FULL| T | 1 | 26 | 24 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("ATTR1"=1 AND "ID">0)
采样了级别为10,由于查询条件是多个谓词,即使表上存在统计信息也符合动态采样的条件(前面已经有过论述),但是执行计划的输出表示,这个查询没有使用到动态采样。10053的跟踪结果也说明了这一点。这里不再贴出。
我们重新收集统计信息看看:
test@DLSP>begin
2 dbms_stats.gather_table_stats(ownname =>'test',
3 tabname => 't',
4 no_invalidate => FALSE,
5 estimate_percent => 100,
6 force => true,
7 degree => 5,
8 method_opt => 'for all columns size 1',
9 cascade => true);
10 end;
11 /
PL/SQL procedure successfully completed.
test@DLSP>set autotrace on
test@DLSP>select /*+ DYNAMIC_SAMPLING(10) */
2 count(*) as cnt
3 from
4 t
5 where
6 attr1 = 1
7 and id > 0
8 ;
CNT
----------
100000
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 3537 (1)| 00:00:43 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T | 100K| 781K| 3537 (1)| 00:00:43 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("ATTR1"=1 AND "ID">0)
Note
-----
- dynamic sampling used for this statement (level=10)
可以看到优化器已经使用了动态采样,执行计划输出的Note部分表明了这一点。我们看看能不能骗过ORACLE,在统计信息收集后,再手工设置统计信息,看看能不能动态采样。
test@DLSP>begin
2 dbms_stats.set_table_stats(ownname => user,
3 tabname => 't',
4 numrows => 100);
5 end;
6 /
PL/SQL procedure successfully completed.
test@DLSP>select /*+ dynamic_sampling(5) */
2 count(*) as cnt
3 from
4 t
5 where
6 attr1 = 1
7 and id > 0
8 ;
CNT
----------
100000
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 3519 (1)| 00:00:43 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T | 10 | 80 | 3519 (1)| 00:00:43 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("ATTR1"=1 AND "ID">0)
可以看到没有使用到,ORACLE不是那么好骗的。
这一点要引起注意,如果你的统计信息是手工设定的,动态采样技术将不能为你服务。
文章有点长了,请关注下一篇!!
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22034023/viewspace-1222339/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/22034023/viewspace-1222339/