SQL优化-关于ORDERED-HASH中错误选择连接字段对性能的影响

最新推荐文章于 2021-07-04 15:43:52 发布

niuyang1314

最新推荐文章于 2021-07-04 15:43:52 发布

阅读量5.6k

点赞数

分类专栏： Oracle 文章标签：优化 sql crm access table filter

Oracle 专栏收录该内容

37 篇文章 0 订阅

订阅专栏

关于ORDERED-HASH中错误选择连接字段对性能的影响

在BOSS项目的优化过程中,遇到了这么一个情况:

SELECT">heyf@VOUCHER>SELECT /*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82';

Elapsed: 00:00:20.21

#这个SQL足足花了20秒才跑完. 而且这中间的数据量并不是很大. 那为什么需要花费这么长时间呢?

#我们先来分析一下数据:

select">heyf@VOUCHER>select count(*) from crm_customer c where c.service_id = '82';

COUNT(*)
----------
28494

select">heyf@VOUCHER>select count(*) from crm_contract ;

COUNT(*)
----------
20302

select">heyf@VOUCHER>select count(*) from crm_contact_record ;

COUNT(*)
----------
139774

select">heyf@VOUCHER>select count(distinct t.customer_id) from crm_train_record t
2 where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
3 and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
4 and t.train_type = 'y' ;

COUNT(DISTINCTT.CUSTOMER_ID)
----------------------------
2262

#执行计划如下:

SELECT /*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82'

Elapsed: 00:00:20.21

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 7570 |
| 1 | SORT GROUP BY | | 1 | 14 | |
| 2 | VIEW | | 50 | 700 | 7570 |
|* 3 | FILTER | | | | |
| 4 | SORT GROUP BY | | 50 | 4650 | 7570 |
|* 5 | HASH JOIN | | 998 | 92814 | 7559 |
|* 6 | HASH JOIN | | 65 | 4745 | 7128 |
|* 7 | TABLE ACCESS FULL | CRM_CONTRACT | 64 | 1664 | 19 |
| 8 | MERGE JOIN CARTESIAN| | 10621 | 487K| 7108 | --注意这里的 MERGE JOIN CARTESIAN
|* 9 | TABLE ACCESS FULL | CRM_CUSTOMER | 160 | 4960 | 68 |
| 10 | BUFFER SORT | | 67 | 1072 | 7040 | -- T
|* 11 | TABLE ACCESS FULL | CRM_TRAIN_RECORD | 67 | 1072 | 44 |
| 12 | TABLE ACCESS FULL | CRM_CONTACT_RECORD | 137K| 2689K| 430 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("CR"."GMT_START">=MAX("T"."GMT_TRAIN") AND
"CR"."GMT_START"<TRUNC(MAX("T"."GMT_TRAIN")+6))
5 - access("CON"."CUSTOMER_ID"="CR"."CUSTOMER_ID")
6 - access("T"."CUSTOMER_ID"="CON"."CUSTOMER_ID" AND
"C"."CUSTOMER_ID"="CON"."CUSTOMER_ID")
7 - filter("CON"."PROD_ID"=1)
9 - filter("C"."SERVICE_ID"='82')
11 - filter("T"."GMT_TRAIN">=TO_DATE(' 2008-01-27 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "T"."GMT_TRAIN"<TO_DATE(' 2009-02-27 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "T"."TRAIN_TYPE"='y')

Note: cpu costing is off

疑问:
为什么慢?
为什么执行计划没有按我指定的来走? (秩序不对)
为什么会是:MERGE JOIN CARTESIAN (CRM_CUSTOMER , T) 而不是HASH?

仔细看了查询条件:
t.customer_id = con.customer_id
and con.customer_id = cr.customer_id
and c.customer_id = con.customer_id

由于我们在HINT里指定了ORDERED ,也就是让表根据: C T CON CR 的顺序走. 而我们的条件中没有C与T的条件.
所以执行计划认为需要走MERGE JOIN CARTESIAN,然后再与CON表进行HASH.

所以我们把条件稍微调整一下,使C和T表有关联:
t.customer_id = c.customer_id
(当然这样的逻辑是一样的)

SELECT /*+ ordered use_hash(c t cr con ) */ count( distinct c.customer_id)
FROM crm_customer c,
(select t.customer_id,max(t.gmt_train) gmt_train from crm_train_record t
where t.gmt_train >= to_date('2008-02-01', 'yyyy-MM-dd') - 5
and t.gmt_train < to_date('2009-02-26','yyyy-MM-dd')+1
and t.train_type = 'y'
group by t.customer_id) t
,crm_contract con
,crm_contact_record cr
WHERE
con.prod_id = 1
and t.customer_id = c.customer_id
and con.customer_id = cr.customer_id
and cr.gmt_start >= t.gmt_train
and cr.gmt_start < trunc(t.gmt_train + 6)
and c.customer_id = con.customer_id
and c.service_id = '82'

Elapsed: 00:00:00.31

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 571 |
| 1 | SORT GROUP BY | | 1 | 14 | |
| 2 | VIEW | | 22 | 308 | 571 |
|* 3 | FILTER | | | | |
| 4 | SORT GROUP BY | | 22 | 2046 | 571 |
|* 5 | HASH JOIN | | 423 | 39339 | 564 |
|* 6 | HASH JOIN | | 27 | 1971 | 133 |
|* 7 | HASH JOIN | | 67 | 3149 | 113 |
|* 8 | TABLE ACCESS FULL| CRM_CUSTOMER | 160 | 4960 | 68 |
|* 9 | TABLE ACCESS FULL| CRM_TRAIN_RECORD | 67 | 1072 | 44 |
|* 10 | TABLE ACCESS FULL | CRM_CONTRACT | 64 | 1664 | 19 |
| 11 | TABLE ACCESS FULL | CRM_CONTACT_RECORD | 137K| 2689K| 430 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("CR"."GMT_START">=MAX("T"."GMT_TRAIN") AND
"CR"."GMT_START"<TRUNC(MAX("T"."GMT_TRAIN")+6))
5 - access("C"."CUSTOMER_ID"="CR"."CUSTOMER_ID")
6 - access("C"."CUSTOMER_ID"="CON"."CUSTOMER_ID")
7 - access("T"."CUSTOMER_ID"="C"."CUSTOMER_ID")
8 - filter("C"."SERVICE_ID"='82')
9 - filter("T"."GMT_TRAIN">=TO_DATE(' 2008-01-27 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "T"."GMT_TRAIN"<TO_DATE(' 2009-02-27 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "T"."TRAIN_TYPE"='y')
10 - filter("CON"."PROD_ID"=1)

Note: cpu costing is off

-- 当然在这里你调整表的顺序,一样能优化这个SQL.
-- 但是在这里希望大家注意到的是:
-- 当你使用ORDERED的时候,千万要注意表的顺序以及条件中连接字段的使用.
-- 不然会导致执行计划错误,SQL缓慢

一个重复的例子:

create table t1 as select object_id from dba_objects ;
create table t2 as select * from t1 ;
create table t3 as select * from t1 ;

explain">heyf@VOUCHER>explain plan for
2 select /*+ ordered */count(*) from
3 t1,t3,t2
4 where t1.object_id=t2.object_id
5 and t2.object_id=t3.object_id;

Explained.

@plan">heyf@VOUCHER>@plan

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 3960 |
| 1 | SORT AGGREGATE | | 1 | 39 | |
|* 2 | HASH JOIN | | 1307 | 50973 | 3960 |
| 3 | TABLE ACCESS FULL | T2 | 1307 | 16991 | 3 |
| 4 | MERGE JOIN CARTESIAN| | 1708K| 42M| 3924 |
| 5 | TABLE ACCESS FULL | T1 | 1307 | 16991 | 3 |
| 6 | BUFFER SORT | | 1307 | 16991 | 3921 |
| 7 | TABLE ACCESS FULL | T3 | 1307 | 16991 | 3 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID" AND
"T2"."OBJECT_ID"="T3"."OBJECT_ID")

Note: cpu costing is off

21 rows selected.

explain">heyf@VOUCHER>explain plan for
2 select /*+ ordered */count(*) from
3 t1,t3,t2
4 where t1.object_id=t3.object_id
5 and t2.object_id=t3.object_id;

Explained.

@plan">heyf@VOUCHER>@plan

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | 11 |
| 1 | SORT AGGREGATE | | 1 | 39 | |
|* 2 | HASH JOIN | | 1307 | 50973 | 11 |
| 3 | TABLE ACCESS FULL | T2 | 1307 | 16991 | 3 |
|* 4 | HASH JOIN | | 1307 | 33982 | 7 |
| 5 | TABLE ACCESS FULL| T1 | 1307 | 16991 | 3 |
| 6 | TABLE ACCESS FULL| T3 | 1307 | 16991 | 3 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
4 - access("T1"."OBJECT_ID"="T3"."OBJECT_ID")

Note: cpu costing is off

20 rows selected.