首先,来理解一下标量子查询:处于select之后from之前的子查询称为标量子查询 .比如:select num1,cal,(select name from t2 where t2.id = t1.id)from t1;举这个例子只是为了方便理解标量的含义。当然定义为返回单列的选择语句,或者返回一行的表达式的子查询称为标量子查询。
标量子查询的缺点十分明显:驱动表固定是外表t1, t1返回的结果集传值t2得到结果。所以如果t1表过大(或者以后t1表逐渐增长以后会变得很大)。将会引起很大的性能问题【数据仓库跑批流程应该禁用标量子查询】
今天这条SQL运行了5个小时没出结果。。。。。。。(不得不说耐心真好,我一般最多等十分钟)
SELECT /*+ NO_USE_HASH(C,B)*/
C.CCCC_AAAA_NO,
C.PRIM_ACCT,
ACCOUNT_SYSTEM,
CUSTOMER_TYPE,
CUSTOMER_STATUS,
CREATE_DT,
HOME_BRANCH_NO,
COMPANY_SIZE,
NOTICE_IND,
NOTICE_CUST_NO,
STMT_FREQUENCY,
STMT_CYCLE,
STMT_DAY,
ID_NO,
ID_TYPE,
SHORT_NAME,
EMAIL_ADD1,
EMAIL_ADD2,
CREDIT_RANKING,
TITLE_CODE,
NAME1,
ADD1,
POSTCODE,
PHONE_NO_RES,
PHONE_RES_EXT,
PHONE_NO_BUS,
PHONE_BUS_EXT,
FAX_NO,
TELEX_NO,
PCODE_RGSTER,
REGSTR_ADD1,
REGSTR_ADD2,
PHONE_RGSTR_NO,
PHONE_RGSTR_EXT,
BIRTH_DATE_1,
SEX_CODE,
EMPLOYER_NAME,
EMPLOYED_FROM,
EMPLOYER_ADDR,
OCCUP_DESCRIP,
OCCUPATION_CODE,
INCOME,
INCOME_WMY,
COMPANY_NO,
BUSINESS_NO,
LICENCE_NO,
BOSS_NAME,
BOSS_BDAY,
BUS_RGSTR_DATE,
CAPITAL_AMT,
CONTACT_REL_1,
PHONE_NO_1,
ADD2,
ADD3,
ADD4,
MOBILE_NO,
FXSP_TYPE,
INDUSTRY_CODE,
BUS_SECTOR_CODE,
CUST_SUB_TYPE,
DEP_STMT_TYPE,
ID_ISSUE_DATE,
ID_EXP_DATE,
REGISTRY_ADD,
ID_ISSUE_PLAC,
LST_MNT_DATE,
B.BRANCH_NO
FROM CUSM_M C
INNER JOIN (SELECT
DISTINCT CCCCCCCC_NO,
(SELECT SJJGM
FROM JGDY H
WHERE H.JGM = CB_ACCT.BRANCH_NO) BRANCH_NO
FROM CB_ACCT) B ON C.CCCC_AAAA_NO = B.CCCCCCCC_NO;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 2079508004
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18M| 14G| | 1793K (1)| 00:04:41 |
|* 1 | INDEX SKIP SCAN | JJJJ_IDX3 | 1 | 8 | | 1 (0)| 00:00:01 |
| 2 | MERGE JOIN | | 18M| 14G| | 1793K (1)| 00:04:41 |
| 3 | SORT JOIN | | 18M| 397M| | 147K (1)| 00:00:24 |
| 4 | VIEW | | 18M| 397M| | 147K (1)| 00:00:24 |
| 5 | HASH UNIQUE | | 18M| 380M| 1107M| 147K (1)| 00:00:24 |
| 6 | TABLE ACCESS STORAGE FULL| CC_ACCTT | 36M| 760M| | 71431 (1)| 00:00:12 |
|* 7 | SORT JOIN | | 19M| 14G| 35G| 1645K (1)| 00:04:18 |
| 8 | TABLE ACCESS STORAGE FULL | CUSM_M | 19M| 14G| | 306K (1)| 00:00:48 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("H"."JGM"=:B1)
filter("H"."JGM"=:B1)
7 - access("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")
filter("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")
通过SQL和PLAN都可以很容易的找出标量子查询
1.select之后from之前,当然这个SQL的标量隐藏在了内联视图里面了
(SELECT
DISTINCT CCCCCCCC_NO,
(SELECT SJJGM
FROM JGDY H
WHERE H.JGM = CB_ACCT.BRANCH_NO) BRANCH_NO
FROM CB_ACCT) B
2.PLAN的id=1和2那两步,缩进一样,而且没有连接方式的父亲节点
|* 1 | INDEX SKIP SCAN | JJJJ_IDX3 | 1 | 8 | | 1 (0)| 00:00:01 |
| 2 | MERGE JOIN | | 18M| 14G| | 1793K (1)| 00:04:41 |
通过这两点均可以判断SQL里面包含标量.如果SQL特别长就直接看PLAN就行了
标量是否产生性能问题,注意取决于主表(外表)返回的行数.其实我们都知道这种数据仓库跑批的表不可能小。象征性的查一下
我之前的博客里面发过这个脚本 http://blog.csdn.net/skybig1988/article/details/71125223 也可以自己定制,很简单
可以看出表的行数很大,不适合走标量(>10000行)
对于标量子查询,只能通过改写【标量子查询可以等价改写为外连接】
当然此处的标量改写十分简单.有些复杂的比如 聚合类、不等值、树形查询的标量千万需要注意改写前后是否等价
SELECT /*+ NO_USE_HASH(C,B)*/
C.CCCC_AAAA_NO,
C.PRIM_ACCT,
ACCOUNT_SYSTEM,
CUSTOMER_TYPE,
CUSTOMER_STATUS,
CREATE_DT,
HOME_BRANCH_NO,
COMPANY_SIZE,
NOTICE_IND,
NOTICE_CUST_NO,
STMT_FREQUENCY,
STMT_CYCLE,
STMT_DAY,
ID_NO,
ID_TYPE,
SHORT_NAME,
EMAIL_ADD1,
EMAIL_ADD2,
CREDIT_RANKING,
TITLE_CODE,
NAME1,
ADD1,
POSTCODE,
PHONE_NO_RES,
PHONE_RES_EXT,
PHONE_NO_BUS,
PHONE_BUS_EXT,
FAX_NO,
TELEX_NO,
PCODE_RGSTER,
REGSTR_ADD1,
REGSTR_ADD2,
PHONE_RGSTR_NO,
PHONE_RGSTR_EXT,
BIRTH_DATE_1,
SEX_CODE,
EMPLOYER_NAME,
EMPLOYED_FROM,
EMPLOYER_ADDR,
OCCUP_DESCRIP,
OCCUPATION_CODE,
INCOME,
INCOME_WMY,
COMPANY_NO,
BUSINESS_NO,
LICENCE_NO,
BOSS_NAME,
BOSS_BDAY,
BUS_RGSTR_DATE,
CAPITAL_AMT,
CONTACT_REL_1,
PHONE_NO_1,
ADD2,
ADD3,
ADD4,
MOBILE_NO,
FXSP_TYPE,
INDUSTRY_CODE,
BUS_SECTOR_CODE,
CUST_SUB_TYPE,
DEP_STMT_TYPE,
ID_ISSUE_DATE,
ID_EXP_DATE,
REGISTRY_ADD,
ID_ISSUE_PLAC,
LST_MNT_DATE,
B.BRANCH_NO
FROM CUSM_M C
INNER JOIN (SELECT DISTINCT CCCCCCCC_NO,
sjjgm BRANCH_NO
FROM CB_ACCT LEFT JOIN jgdy ON cb_acct.branch_no=jgm
) B ON C.CCCC_AAAA_NO = B.CCCCCCCC_NO;
Plan hash value: 2285049241
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36M| 28G| | 1834K (1)| 00:04:47 |
| 1 | MERGE JOIN | | 36M| 28G| | 1834K (1)| 00:04:47 |
| 2 | SORT JOIN | | 36M| 829M| | 188K (1)| 00:00:30 |
| 3 | VIEW | | 36M| 829M| | 188K (1)| 00:00:30 |
| 4 | HASH UNIQUE | | 36M| 1037M| 1384M| 188K (1)| 00:00:30 |
|* 5 | HASH JOIN RIGHT OUTER | | 36M| 1037M| | 71501 (1)| 00:00:12 |
| 6 | INDEX FULL SCAN | JJJJ_IDX3 | 1241 | 9928 | | 1 (0)| 00:00:01 |
| 7 | TABLE ACCESS STORAGE FULL| CC_ACCTT | 36M| 760M| | 71431 (1)| 00:00:12 |
|* 8 | SORT JOIN | | 19M| 14G| 35G| 1645K (1)| 00:04:18 |
| 9 | TABLE ACCESS STORAGE FULL | CUSM_M | 19M| 14G| | 306K (1)| 00:00:48 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("CB_ACCT"."BRANCH_NO"="JGM"(+))
8 - access("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")
filter("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")
改写之后标量消失.SQL运行了7分钟出结果。但是这个SQL里面没有不等值连接,走MERGE JOIN显然毫无意义。明显走HASH是最好的选择
一直不理解SQL上面的/*+ NO_USE_HASH(C,B)*/ 的意义,最后开发回应说这个 HINT是为了让SQL走嵌套循环,因为走NL比较快。听到这个理由我也是呵呵了!
这里我简单的说一下NL、HASH、SMJ在实际工作中该如何选择:
嵌套循环:
看SQL语句的返回条数 太大的话一般都是错误的
看驱动表返回的行数 一般不能超过1w 最好在1k 以内(但是这个取决于服务器性能,可能性能好的服务器临界值超过20w都可行)
看被驱动表的链接列 是否包含在索引里面 (必须包含在索引里面)
看到distinct ,group by ,sum()一般不走嵌套循环(数据量超级多才去group by)当然数据量少的话也可以走NL
哈希连接只能用于等值连接
排序合并连接唯一的作用:非等值连接
去掉/*+ NO_USE_HASH(C,B)*/ 之后.SQL运行了30秒便出结果
SELECT
C.CCCC_AAAA_NO,
C.PRIM_ACCT,
ACCOUNT_SYSTEM,
CUSTOMER_TYPE,
CUSTOMER_STATUS,
CREATE_DT,
HOME_BRANCH_NO,
COMPANY_SIZE,
NOTICE_IND,
NOTICE_CUST_NO,
STMT_FREQUENCY,
STMT_CYCLE,
STMT_DAY,
ID_NO,
ID_TYPE,
SHORT_NAME,
EMAIL_ADD1,
EMAIL_ADD2,
CREDIT_RANKING,
TITLE_CODE,
NAME1,
ADD1,
POSTCODE,
PHONE_NO_RES,
PHONE_RES_EXT,
PHONE_NO_BUS,
PHONE_BUS_EXT,
FAX_NO,
TELEX_NO,
PCODE_RGSTER,
REGSTR_ADD1,
REGSTR_ADD2,
PHONE_RGSTR_NO,
PHONE_RGSTR_EXT,
BIRTH_DATE_1,
SEX_CODE,
EMPLOYER_NAME,
EMPLOYED_FROM,
EMPLOYER_ADDR,
OCCUP_DESCRIP,
OCCUPATION_CODE,
INCOME,
INCOME_WMY,
COMPANY_NO,
BUSINESS_NO,
LICENCE_NO,
BOSS_NAME,
BOSS_BDAY,
BUS_RGSTR_DATE,
CAPITAL_AMT,
CONTACT_REL_1,
PHONE_NO_1,
ADD2,
ADD3,
ADD4,
MOBILE_NO,
FXSP_TYPE,
INDUSTRY_CODE,
BUS_SECTOR_CODE,
CUST_SUB_TYPE,
DEP_STMT_TYPE,
ID_ISSUE_DATE,
ID_EXP_DATE,
REGISTRY_ADD,
ID_ISSUE_PLAC,
LST_MNT_DATE,
B.BRANCH_NO
FROM CUSM_M C
INNER JOIN (SELECT DISTINCT CCCCCCCC_NO,
sjjgm BRANCH_NO
FROM CB_ACCT LEFT JOIN jgdy ON cb_acct.branch_no=jgm
) B ON C.CCCC_AAAA_NO = B.CCCCCCCC_NO
Plan hash value: 967350049
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36M| 28G| | 1059K (1)| 00:02:46 |
|* 1 | HASH JOIN | | 36M| 28G| 1244M| 1059K (1)| 00:02:46 |
| 2 | VIEW | | 36M| 829M| | 188K (1)| 00:00:30 |
| 3 | HASH UNIQUE | | 36M| 1037M| 1384M| 188K (1)| 00:00:30 |
|* 4 | HASH JOIN RIGHT OUTER | | 36M| 1037M| | 71501 (1)| 00:00:12 |
| 5 | INDEX FULL SCAN | JJJJ_IDX3 | 1241 | 9928 | | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS STORAGE FULL| CC_ACCTT | 36M| 760M| | 71431 (1)| 00:00:12 |
| 7 | TABLE ACCESS STORAGE FULL | CUSM_M | 19M| 14G| | 306K (1)| 00:00:48 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C"."CCCC_AAAA_NO"="B"."CCCCCCCC_NO")
4 - access("CB_ACCT"."BRANCH_NO"="JGM"(+))
其实这个SQL还可以继续优化,ID=5这一步INDEX FULL SCAN是单块读改成全表扫描可以提升100+倍,加上一体机本身的全表扫描优化TABLE ACCESS STORAGE FULL。提升会更多!!!
如上可知标量子查询是一个非常恐怖的用法。当外部表返回的数据量不大时。完全不会引起性能问题。但是此时隐患已经埋下
随着外部表数据量的增加。标量的性能会慢慢受到影响,一旦过了这个临界值。性能下降的非常明显和可怕。所以在数据仓库
中应该用外连接代替标量,避免给程序埋下隐患。