1、两表连接
oracle有两个公式用于连接基数的计算:
假设我们对表t1和t2进行连接,连接列分别是c1和c2。
Join Selectivity =
((num_rows(t1) - num_nulls(t1.c1)) / num_rows(t1)) *
((num_rows(t2) - num_nulls(t2.c2)) / num_rows(t2)) /
greater(num_distinct(t1.c1),num_distinct(t2.c2)) --greater 较大的
Join Cardinality =
Join Selectivity *
filtered cardinality(t1) * filtered cardinality(t2)
下面对其进行解释。
SQL> create table t1
2 as
3 select
4 trunc(dbms_random.value(0, 25))filter,
5 trunc(dbms_random.value(0, 30))join1,
6 lpad(rownum,10)v1,
7 from('x',100) padding
rpad('x',100) padding
8 from
9 all_objects
10 where
11 rownum <= 10000;
表已创建。
SQL> create table t2
2 as
3 select
4 trunc(dbms_random.value(0, 50))filter,
5 trunc(dbms_random.value(0, 40))join1,
6 lpad(rownum,10)v1,
7 rpad('x',100)padding
8 from
9 all_objects
10 where
11 rownum <= 10000;
表已创建。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't2',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
在本例中我们有:
t1.filter 25个不同的值
t2.filter 50个不同的值
t1.join1 30个不同的值
t2.join1 40个不同的值
由于两个表都有10000行,因此t1的过滤基数将是400(10000行除以不同值的个数25),t2的过滤基数将是200(10000行除以不同值的个数50)。
因为任意一个表中都没有空值,因此根据连接基数公式可以得到:
Join Selectivity =
((10000 - 0) / 10000) *
((10000 - 0) / 10000) /
greater(30,40) = 1/40
Join Cardinality = 1/40 * 400 * 200 = 2000
当通过autotrace运行时,可以看到下面的计划:
SQL> set autotrace traceonly explain;
SQL> select t1.v1, t2.v1
2 from t1, t2
3 where t1.filter = 1
4 and t2.join1 = t1.join1
5 and t2.filter = 1;
执行计划
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2000 | 68000 | 67 (2)| 00:00:01 |
|* 1 | HASH JOIN | | 2000 | 68000 | 67 (2)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T2 | 200 | 3400 | 33 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T1 | 400 | 6800 | 33 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."JOIN1"="T1"."JOIN1")
2 - filter("T2"."FILTER"=1)
3 - filter("T1"."FILTER"=1)
其中显示出表t1的过滤基数为400,表t2的过滤基数为200,并且连接基数为2000,与我们预测的结果相同。
下面将测试修改得复杂些,在t1表的连接列上每20行取一个空值,t2表的连接列上每30行取一个控制。
SQL> set autotrace off;
SQL> update t1 set join1 = null
2 where mod(to_number(v1),20) = 0;
已更新500行。
SQL> commit;
提交完成。
SQL> update t2 set join1 = null
2 where mod(to_number(v1),30) = 0;
已更新333行。
SQL> commit;
提交完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't2',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
Join Selectivity =
((10000 - 500) / 10000) *
((10000 - 333) / 10000) / 40 = 0.022959125
Join Cardinality = 0.022959125 * 400 * 200 = 1836.73
SQL> set autotrace traceonly explain;
SQL> select t1.v1, t2.v1
2 from t1, t2
3 where t1.filter = 1
4 and t2.join1 = t1.join1
5 and t2.filter = 1;
执行计划
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1837 | 62458 | 67 (2)| 00:00:01 |
|* 1 | HASH JOIN | | 1837 | 62458 | 67 (2)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T2 | 200 | 3400 | 33 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T1 | 400 | 6800 | 33 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."JOIN1"="T1"."JOIN1")
2 - filter("T2"."FILTER"=1)
3 - filter("T1"."FILTER"=1)
可以在连接列上存在空值的基础上,在filter列中再引入一些空值。
SQL> set autotrace off;
SQL> update t1 set filter = null where mod(to_number(v1),50) = 0;
已更新200行。
SQL> update t2 set filter = null where mod(to_number(v1),100) = 0;
已更新100行。
SQL> commit;
提交完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't2',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
从前面的示例中已经得到了连接选择率为0.022959125,因此现在要做的就是算出当过滤谓词作用的列上含有空值时每个表的过滤基数。
t1:1/25 * (1000 - 200) = 392
t2:1/50 * (1000 - 100) = 198
Join Cardinality = 0.022959125 * 392* 198 = 1781.995446
SQL> set autotrace traceonly explain;
SQL> select t1.v1, t2.v1
2 from t1, t2
3 where t1.filter = 1
4 and t2.join1 = t1.join1
5 and t2.filter = 1;
执行计划
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1782 | 60588 | 67 (2)| 00:00:01 |
|* 1 | HASH JOIN | | 1782 | 60588 | 67 (2)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T2 | 198 | 3366 | 33 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T1 | 392 | 6664 | 33 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."JOIN1"="T1"."JOIN1")
2 - filter("T2"."FILTER"=1)
3 - filter("T1"."FILTER"=1)
2、三表连接
SQL> create table t1
2 as
3 select
4 trunc(dbms_random.value(0, 40))join1,
5 trunc(dbms_random.value(0, 40))join2,
6 trunc(dbms_random.value(0, 40))join3,
7 trunc(dbms_random.value(0, 40))join4,
8 lpad(rownum,10)v1,
9 rpad('x',100)padding
10 from
11 all_objects
12 where
13 rownum <= 10000;
表已创建。
SQL> create table t2
2 as
3 select
4 trunc(dbms_random.value(0, 36)) join1,
5 trunc(dbms_random.value(0, 38)) join2,
6 trunc(dbms_random.value(0, 42)) join3,
rpad('x',100) padding
lpad(rownum,10) v1,
rpad('x',100) padding
from
all_objects
11 where
12 rownum <= 10000;
表已创建。
SQL> create table t3
2 as
3 select
4 trunc(dbms_random.value(0, 37)) join2,
5 trunc(dbms_random.value(0, 39)) join3,
6 trunc(dbms_random.value(0, 41)) join4,
7 lpad(rownum,10) v1,
8 rpad('x',100) padding
9 from
10 all_objects
11 where
12 rownum <= 10000;
表已创建。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't2',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't3',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> select t1.v1, t2.v1, t3.v1
2 from t1, t2, t3
3 where t2.join1 = t1.join1 -- 36 / 40 distinct value
4 and t2.join2 = t1.join2 -- 38 / 40 distinct value
5 --
6 and t3.join2 = t2.join2 -- 37 / 38 distinct value
7 and t3.join3 = t2.join3 -- 39 / 42 distinct value
8 --
9 and t3.join4 = t1.join4;-- 40 / 41 distinct value
执行计划
----------------------------------------------------------
Plan hash value: 1573120526
------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9551 | 559K| 38 |
|* 1 | HASH JOIN | | 9551 | 559K| 38 |
| 2 | TABLE ACCESS FULL | T3 | 10000 | 195K| 4 |
|* 3 | HASH JOIN | | 62500 | 2441K| 18 |
| 4 | TABLE ACCESS FULL| T1 | 10000 | 195K| 4 |
| 5 | TABLE ACCESS FULL| T2 | 10000 | 195K| 4 |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T3"."JOIN2"="T2"."JOIN2" AND "T3"."JOIN3"="T2"."JOIN3"
AND "T3"."JOIN4"="T1"."JOIN4")
3 - access("T2"."JOIN1"="T1"."JOIN1" AND "T2"."JOIN2"="T1"."JOIN2")
Note
-----
- cpu costing is off (consider enabling it)
这一查询的连接顺序是t1->t2->t3。首先将t3散列进入内存,然后将t1散列进入内存,然后开始读t2。对于t2的每一行,oracle探查t1散列寻找匹配,因此第一个连接是t1->t2;如果第一个探查是成功的,那么orange探查t3散列寻找匹配,因此第二个连接是t2->t3——从技术上讲我们应该说第二个连接是(t1->t2)->t3。
我们的任务是找出优化器如何为t1->t2散列连接计算出中间基数62500,以及然后它如何把t3连接到中间结果集而得到基数9551。
首先把t2连接到t1——他的确只是重复应用公式并使用正确值的问题。
Join Selectivity =
{join1 bit} *
{join2 bit}=
((10000 - 0) / 10000) *
((10000 - 0) / 10000) /
greater(36,40) *
((10000 - 0) / 10000) *
((10000 - 0) / 10000) /
greater(38,40) =
1/1600
Join Cardinality =
1/1600 *
10000 * 10000 =
62500
得到了一个62500的中间表,还要将第三个表与它连接——这次有3个连接列。
Join Selectivity =
{join2 bit} *
{join3 bit} *
{join4 bit}=
((10000 - 0) / 10000) *
((10000 - 0) / 10000) /
greater(37,38) *
((10000 - 0) / 10000) *
((10000 - 0) / 10000) /
greater(39,42) *
((10000 - 0) / 10000) *
((10000 - 0) / 10000) /
greater(40,41) =
1/65436
Join Cardinality =
1/65436 *
62500 * 10000 =
9551
3、实现问题
将所有引用到的数据放入一个含有type列的表中,对于连接到这一引用表的大部分简单连接而言,优化器计算出的基数都极不合理。
SQL> SQL> create table t1
2 as
3 with generator as (
4 select --+ materialize
5 rownum id
6 from all_objects
7 where rownum <= 3000
8 )
9 select
10 /*+ ordered use_nl(v2) */
11 trunc(dbms_random.value(0,20)) class1_code,
12 trunc(dbms_random.value(0,25)) class2_code,
13 rownum id,
14 lpad(rownum,10,'0') small_vc
15 from
16 generator v1,
17 generator v2
18 where
19 rownum <= 500000;
表已创建。
SQL> create table type1 as
2 select
3 rownum-1 id,
4 'CLASS1' type,
5 lpad(rownum-1,10,'0') description
6 from all_objects
7 where rownum <= 20;
表已创建。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> alter table type1 add constraint type1_pk primary key(id, type);
表已更改。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 'type1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
创建了一个500 000行的表,将那些无意义的代码数字扩展成可识别的描述。所要做的全部工作就是我们从大表中选取一些行,连接到一个引用表而将代码翻译成描述,以便基于描述消除数据。
SQL> select t1.small_vc, type1.description
2 from t1, type1
3 where t1.id between 1000 and 1999
4 and type1.id = t1.class1_code
5 and type1.type = 'CLASS1'
6 and type1.description = lpad(0, 10, '0');
已选择47行。
执行计划
----------------------------------------------------------
Plan hash value: 1949750837
------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 2000 | 26 |
|* 1 | HASH JOIN | | 50 | 2000 | 26 |
|* 2 | TABLE ACCESS FULL| TYPE1 | 1 | 21 | 2 |
|* 3 | TABLE ACCESS FULL| T1 | 1001 | 19019 | 23 |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TYPE1"."ID"="T1"."CLASS1_CODE")
2 - filter("TYPE1"."TYPE"='CLASS1' AND
"TYPE1"."DESCRIPTION"='0000000000')
3 - filter("T1"."ID">=1000 AND "T1"."ID"<=1999)
Note
-----
- cpu costing is off (consider enabling it)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
1872 consistent gets
1862 physical reads
0 redo size
1665 bytes sent via SQL*Net to client
449 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
47 rows processed
可以看见计算出的基数的50,而实际返回了47行。现在创建包含两个数据集的引用表。
SQL> create table type2 as
2 select
3 rownum-1id,
4 'CLASS1'type,
5 lpad(rownum-1,10,'0')description
6 from all_objects
7 where rownum <= 20
8 union all
9 select
10 rownum-1id,
11 'CLASS2'type,
12 lpad(rownum-1,10,'0')description
13 from all_objects
14 where rownum <= 25;
表已创建。
SQL> update type2 set description = lpad(rownum-1,10,'0');
已更新45行。
SQL> commit;
提交完成。
SQL> alter table type2 add constraint type2_pk primary key(id, type);
表已更改。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 'type2',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace traceonly;
SQL> select t1.small_vc, type2.description
2 from t1, type2
3 where t1.id between 1000 and 1999
4 and type2.id = t1.class1_code
5 and type2.type = 'CLASS1'
6 and type2.description = lpad(0, 10, '0');
已选择47行。
执行计划
----------------------------------------------------------
Plan hash value: 2354741939
------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 1000 | 26 |
|* 1 | HASH JOIN | | 25 | 1000 | 26 |
|* 2 | TABLE ACCESS FULL| TYPE2 | 1 | 21 | 2 |
|* 3 | TABLE ACCESS FULL| T1 | 1001 | 19019 | 23 |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TYPE2"."ID"="T1"."CLASS1_CODE")
2 - filter("TYPE2"."TYPE"='CLASS1' AND
"TYPE2"."DESCRIPTION"='0000000000')
3 - filter("T1"."ID">=1000 AND "T1"."ID"<=1999)
Note
-----
- cpu costing is off (consider enabling it)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
1872 consistent gets
1862 physical reads
0 redo size
1665 bytes sent via SQL*Net to client
449 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
47 rows processed
计算出的基数是错误的,可以看见计算出的基数的25,而实际返回了47行。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 'type2',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for columns TYPE size 75'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> select t1.small_vc, type2.description
2 from t1, type2
3 where t1.id between 1000 and 1999
4 and type2.id = t1.class1_code
5 and type2.type = 'CLASS1'
6 and type2.description = lpad(0, 10, '0');
已选择47行。
执行计划
----------------------------------------------------------
Plan hash value: 2354741939
------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------
| 0 | SELECT STATEMENT | | 22 | 880 | 26 |
|* 1 | HASH JOIN | | 22 | 880 | 26 |
|* 2 | TABLE ACCESS FULL| TYPE2 | 1 | 21 | 2 |
|* 3 | TABLE ACCESS FULL| T1 | 1001 | 19019 | 23 |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TYPE2"."ID"="T1"."CLASS1_CODE")
2 - filter("TYPE2"."TYPE"='CLASS1' AND
"TYPE2"."DESCRIPTION"='0000000000')
3 - filter("T1"."ID">=1000 AND "T1"."ID"<=1999)
Note
-----
- cpu costing is off (consider enabling it)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
1872 consistent gets
1862 physical reads
0 redo size
1665 bytes sent via SQL*Net to client
449 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
47 rows processed
收集了75个桶的统计信息之后,结果更差了,22 : 47。简单的解决方法,将表做list分区表重新创建,在type列上进行区分。
SQL> create table type3(
2 id, type, description
3 )
4 partition by list (type) (
5 partition p1 values('CLASS1'),
6 partition p2 values('CLASS2')
7 )
8 as
9 select
10 id, type, description
11 from
12 type2;
表已创建。
SQL> alter table type3 add constraint type3_pk primary key(id, type) using index local;
表已更改。
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 'type3',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /
PL/SQL 过程已成功完成。
SQL> set autotrace traceonly;
SQL> select t1.small_vc, type3.description
2 from t1, type3
3 where t1.id between 1000 and 1999
4 and type3.id = t1.class1_code
5 and type3.type = 'CLASS1'
6 and type3.description = lpad(0, 10, '0');
已选择47行。
执行计划
----------------------------------------------------------
Plan hash value: 4192438172
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 2000 | 26 | | |
|* 1 | HASH JOIN | | 50 | 2000 | 26 | | |
| 2 | PARTITION LIST SINGLE| | 1 | 21 | 2 | 1 | 1 |
|* 3 | TABLE ACCESS FULL | TYPE3 | 1 | 21 | 2 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | T1 | 1001 | 19019 | 23 | | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TYPE3"."ID"="T1"."CLASS1_CODE")
3 - filter("TYPE3"."DESCRIPTION"='0000000000')
4 - filter("T1"."ID">=1000 AND "T1"."ID"<=1999)
Note
-----
- cpu costing is off (consider enabling it)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
1872 consistent gets
1862 physical reads
0 redo size
1665 bytes sent via SQL*Net to client
449 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
47 rows processed