1、前导零
创建一个包含2000000行数据的表,它有一个id列,采用序号并利用0来进行填充其他位。因此典型的存储值将是A00000000000000001,系统中绝大部分使用这种策略的查询,类似于where id={string constant}的形式;但是,如果它们使用了基于区间的谓词,可能将出现一些奇怪的性能问题。
SQL> create table t1
2 nologging
3 pctfree 0
4 as
5 with generator as (
6 select
7 rownum id
8 from all_objects
9 where rownum <= 2000
10 )
11 select
12 /*+ ordered use_nl(v2) */
13 trunc((rownum-1)/10000) grp_id,
14 'A' || lpad(rownum, 17, '0') id
15 from
16 generator v1,
17 generator v2
18 where
19 rownum <= 2000000
20 ;
表已创建。
SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user,
4 tabname=> 't1',
5 cascade=> true,
6 estimate_percent=> null,
7 method_opt=>'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace traceonly;
SQL> select * from t1 where id between 'A00000000000060000' and 'A00000000000070000';
已选择10001行。
执行计划
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17 | 374 | 1190 (3)| 00:00:15 |
|* 1 | TABLE ACCESS FULL| T1 | 17 | 374 | 1190 (3)| 00:00:15 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"<='A00000000000070000' AND "ID">='A00000000000060000')
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
7520 consistent gets
6849 physical reads
0 redo size
325111 bytes sent via SQL*Net to client
7741 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10001 rows processed
针对该查询得到的基数是非常低的(17),但是很明显却返回了10001行。下面重新创建直方图,直方图的默认值为75个桶。
SQL> set autotrace off;
SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user,
4 tabname=> 't1',
5 cascade=> true,
6 estimate_percent=> null,
7 method_opt=>'for all columns size 75'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace traceonly;
SQL> select * from t1 where id between 'A00000000000060000' and 'A00000000000070000';
已选择10001行。
执行计划
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8924 | 191K| 1190 (3)| 00:00:15 |
|* 1 | TABLE ACCESS FULL| T1 | 8924 | 191K| 1190 (3)| 00:00:15 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"<='A00000000000070000' AND "ID">='A00000000000060000')
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
7520 consistent gets
6849 physical reads
0 redo size
325111 bytes sent via SQL*Net to client
7741 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10001 rows processed
直方图的默认值为75个桶,执行计划显示估计的基数为8924——这个结果至少大体上是正确的。创建直方图是一种方法,如果id列的值没有字符,是可以转换为数字的字符串(00000000000000001),那么在id列上面创建一个函数索引也是一种办法:create index t1_i1 on t1(grp_id, to_number(id));
2、致命的默认值
即时是数据库应用程序正确的使用了日期类型,也仍然需要避免null值。为了不让任何列为null,每个可空的列都有一个默认值。因此,大部分独立于数据库的开发人员会选择一个什么样的值来表示null日期呢?如果表示很久以后的日期呢?比如4000年12月31日。
SQL> create table t1
2 as
3 with generator as (
4 select
5 rownum id
6 from all_objects
7 where rownum <= 2000
8 )
9 select
10 /*+ ordered use_nl(v2) */
11 decode(
12 mod(rownum - 1,1000),
13 0,to_date('4000-12-31','yyyy-mm-dd'),
14 to_date('2000-01-01','yyyy-mm-dd') + trunc((rownum - 1)/100)
15 ) date_closed
16 from
17 generator v1,
18 generator v2
19 where rownum <= 1827 * 100;
表已创建。
SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user,
4 tabname=> 't1',
5 cascade=> true,
6 estimate_percent=> null,
7 method_opt=>'for all columns size 1' --直方图的默认值为1桶。
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace traceonly;
SQL> select *
2 from t1
3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and
4 to_date('2003-12-31', 'yyyy-mm-dd');
已选择36463行。
执行计划
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 291 | 2328 | 61 (4)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 291 | 2328 | 61 (4)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DATE_CLOSED"<=TO_DATE(' 2003-12-31 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED">=TO_DATE(' 2003-01-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
2759 consistent gets
0 physical reads
0 redo size
494301 bytes sent via SQL*Net to client
27145 bytes received via SQL*Net from client
2432 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
36463 rows processed --这里实际放回了36463行记录,但是oracle却计算错误了(291)。
SQL> set autotrace off;
SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user,
4 tabname=> 't1',
5 cascade=> true,
6 estimate_percent=> null,
7 method_opt=>'for all columns size 11' --直方图的默认值为11桶。
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace traceonly;
SQL> select *
2 from t1
3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and
4 to_date('2003-12-31', 'yyyy-mm-dd');
已选择36463行。
执行计划
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36320 | 283K| 61 (4)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 36320 | 283K| 61 (4)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DATE_CLOSED">=TO_DATE(' 2003-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED"<=TO_DATE(' 2003-12-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
2759 consistent gets
0 physical reads
0 redo size
494301 bytes sent via SQL*Net to client
27145 bytes received via SQL*Net from client
2432 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
36463 rows processed --36463:36320这次oracle计算得比较准确了。
可以看见加大了直方图的桶数之后,CBO估算的行数就比较接近真实值了,那我们再加大直方图的桶数试一下呐!
SQL> set autotrace off;
SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=> user,
4 tabname=> 't1',
5 cascade=> true,
6 estimate_percent=> null,
7 method_opt=>'for all columns size 75'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace trace;
SQL> select *
2 from t1
3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and
4 to_date('2003-12-31', 'yyyy-mm-dd');
已选择36463行。
执行计划
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36345 | 283K| 61 (4)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 36345 | 283K| 61 (4)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DATE_CLOSED">=TO_DATE(' 2003-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED"<=TO_DATE(' 2003-12-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
2759 consistent gets
0 physical reads
0 redo size
494301 bytes sent via SQL*Net to client
27145 bytes received via SQL*Net from client
2432 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
36463 rows processed --36463:36345 加大了直方图的桶数之后,CBO估算返回的行数误差更小了。
3、离散数据的风险
考虑一个包含period列的计数系统——存储1~12月的数据,并额外再加一个月,其对应的数值为99(同时包含了第二种选择,即这个特殊的月给定的值为13)。
SQL> create table t1
2 as
3 with generator as (
4 select
5 rownum id
6 from all_objects
7 where rownum <= 1000
8 )
9 select
10 /*+ ordered use_nl(v2) */
11 mod(rownum-1,13) period_01,
12 mod(rownum-1,13) period_02
13 from
14 generator v1,
15 generator v2
16 where
17 rownum <= 13000
18 ;
表已创建。
SQL> update t1 set
2 period_01 = 99,
3 period_02 = 13
4 where
5 period_01 = 0;
已更新1000行。
SQL> commit;
提交完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace on exp;
SQL> select count(*) from t1 where period_01 between 4 and 6;
COUNT(*)
----------
3000 --这里实际有3000行,但是oracle估算的是1663行,不准确!
执行计划
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 1663 | 4989 | 6 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("PERIOD_01"<=6 AND "PERIOD_01">=4)
SQL> select count(*) from t1 where period_02 between 4 and 6;
COUNT(*)
----------
3000 --这里实际有3000行,但是oracle估算的是4167行,不准确!
执行计划
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 4167 | 12501 | 6 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("PERIOD_02"<=6 AND "PERIOD_02">=4)
SQL> set autotrace off;
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 254' --重新收集直方图。
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace on exp;
SQL> select count(*) from t1 where period_01 between 4 and 6;
COUNT(*)
----------
3000
执行计划
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 3000 | 9000 | 6 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("PERIOD_01"<=6 AND "PERIOD_01">=4)
SQL> select count(*) from t1 where period_02 between 4 and 6;
COUNT(*)
----------
3000
执行计划
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 3000 | 9000 | 6 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("PERIOD_02"<=6 AND "PERIOD_02">=4)
4、函数索引
SQL> create index idx_t1_PERIOD_01 on t1(upper(PERIOD_01));
索引已创建。
SQL> select i.index_name,i.index_type from user_indexes i;
INDEX_NAME INDEX_TYPE
------------------------------ ---------------------------
IDX_T1_PERIOD_01 FUNCTION-BASED NORMAL
SQL> select c.TABLE_NAME,c.COLUMN_NAME from user_tab_cols c;
TABLE_NAME COLUMN_NAME
------------------------------ ------------------------------
T1 PERIOD_01
T1 PERIOD_02
T1 SYS_NC00003$
需要牢记的是,如果创建了一个基于函数的索引,那么实际上是在虚拟列上创建了索引,当收集关于该表及其索引的统计信息时,同时也就收集了虚拟列上的统计信息。这在类似情况下,诸如upper(PERIOD_01)='xxx'的谓词将被优化为:SYS_NC00003$='xxx'。
5、相互关联的列
如果在谓词中使用相互依赖(相关)的列容易将问题复杂化,前提是这些相关联的列同时出现在where子句中。
SQL> create table t1
2 nologging
3 as
4 select
5 trunc(dbms_random.value(0,25))n1,
6 rpad('x',40)ind_pad,
7 trunc(dbms_random.value(0,20))n2,
8 lpad(rownum,10,'0')small_vc,
9 rpad('x',200)padding
10 from
11 all_objects
12 where
13 rownum <= 10000
14 ;
表已创建。
SQL> update t1 set n2 = n1;
已更新10000行。
SQL> commit;
提交完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname => user,
4 tabname=> 'T1',
5 cascade=> true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace traceonly;
SQL> select small_vc
2 from t1
3 where n1 = 2
4 and n2 = 2;
已选择420行。
执行计划
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16 | 272 | 66 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 16 | 272 | 66 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("N1"=2 AND "N2"=2)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
401 consistent gets
0 physical reads
0 redo size
9059 bytes sent via SQL*Net to client
712 bytes received via SQL*Net from client
29 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
420 rows processed
根据以上执行计划,CBO估算将会返回16行数据,实际确实返回了420行,如果这个表要与多表关联,基数一旦算错,必然导致整个SQL的执行计划全部出错,从而导致SQL性能下降。
接下来再次在相关列上执行查询,不过sql语句中包含了dynamic_sampling提示。
SQL> select /*+ dynamic_sampling(t1 1) */
2 small_vc
3 from t1
4 where n1 = 2
5 and n2 = 2;
已选择420行。
执行计划
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 370 | 6290 | 66 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 370 | 6290 | 66 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("N1"=2 AND "N2"=2)
Note
-----
- dynamic sampling used for this statement (level=2)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
401 consistent gets
0 physical reads
0 redo size
9059 bytes sent via SQL*Net to client
712 bytes received via SQL*Net from client
29 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
420 rows processed