原创不易,有用记得转发哦!~
点击上方"Python大数据与SQL优化笔记"关注
在SQL优化中有一个比较实用而又有效的方法,就是建中间临时表,相信如果你是数据库开发人员,你肯定使用过这种方法,那么我们今天说的with as这种优化技巧和中间临时表就有相同的点,其实都有点用空间换时间的意思。
在oracle一般会在重复使用两次的地方自动将with as转化为临时表,这个思想有点类上面说的中间临时表,比如有些sql语句我们无法将其优化,将sql语句查询返回的结果集(with as也是如此)物化为一个实体的表,而这个表的访问成本大大小于之前返回该结果集的方式(比如访问该结果集要采取全表扫描、多表关联hash join、nested loop等)
如果看到with as的执行计划中存在了TEMP TABLE TRANSFORMATION和LOAD AS SELECT,这里是将with as组成的结果集转换为了一个临时表,而后面的查询则可以使用这个临时表,由于这个临时表一般比原来的表或者表连接的数据量小,所以无论IO还是cpu成本都相对原sql语句都减小了。
同样即使with as转换的结果集我们只访问一次,也可以加上hint materialize将其构造为临时表。
下面我们来说说一个例子
create table t_0424_1 as select * from dba_objects;SQL> set autot traceSQL> set linesize 200SQL> SELECT COUNT (*) cnt, object_type 2 FROM t_0424_1 WHERE object_type = 'TABLE' GROUP BY object_type UNION ALL SELECT COUNT (*), object_type FROM t_0424_1 WHERE object_type = 'INDEX' GROUP BY object_type UNION ALL SELECT COUNT (*), object_type FROM t_0424_1 WHERE object_type = 'VIEW' 3 GROUP BY object_type; 4 5 6 7 8 9 10 11 12 13 14Execution Plan----------------------------------------------------------Plan hash value: 1264367756----------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 710K| 7630K| 7543 (68)| 00:01:31 || 1 | UNION-ALL | | | | | || 2 | SORT GROUP BY NOSORT| | 153K| 1645K| 2514 (2)| 00:00:31 ||* 3 | TABLE ACCESS FULL | T_0424_1 | 153K| 1645K| 2514 (2)| 00:00:31 || 4 | SORT GROUP BY NOSORT| | 224K| 2413K| 2514 (2)| 00:00:31 ||* 5 | TABLE ACCESS FULL | T_0424_1 | 224K| 2413K| 2514 (2)| 00:00:31 || 6 | SORT GROUP BY NOSORT| | 332K| 3571K| 2514 (2)| 00:00:31 ||* 7 | TABLE ACCESS FULL | T_0424_1 | 332K| 3571K| 2514 (2)| 00:00:31 |----------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 3 - filter("OBJECT_TYPE"='TABLE') 5 - filter("OBJECT_TYPE"='INDEX') 7 - filter("OBJECT_TYPE"='VIEW')Note----- - dynamic sampling used for this statement (level=2)Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 25035 consistent gets 0 physical reads 0 redo size 679 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed CNT OBJECT_TYPE---------- ------------------- 23120 TABLE 30784 INDEX 41512 VIEW
我们看到上面的执行计划里t_0424_1被扫描了3次,那么我们是不是可以用中间临时表来减少表的扫描IO大小呢?我们来看看传统的方法,中间临时表
create table temp_table as SELECT COUNT (*) cnt, object_type FROM t_0424_1 GROUP BY object_type 原sql改下下面的: SELECT cnt, object_type FROM temp_table WHERE object_type = 'TABLE' UNION ALL SELECT cnt, object_type FROM temp_table WHERE object_type = 'INDEX' UNION ALL SELECT cnt, object_type FROM temp_table WHERE object_type = 'VIEW'; SQL> set autot traceSQL> SELECT cnt, object_type FROM temp_table WHERE object_type = 'TABLE' 2 3 4 UNION ALL SELECT cnt, object_type 5 6 FROM temp_table WHERE object_type = 'INDEX' UNION ALL SELECT cnt, object_type FROM temp_table WHERE object_type = 'VIEW'; 7 8 9 10 11Execution Plan----------------------------------------------------------Plan hash value: 549471565---------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 3 | 72 | 9 (67)| 00:00:01 || 1 | UNION-ALL | | | | | ||* 2 | TABLE ACCESS FULL| TEMP_TABLE | 1 | 24 | 3 (0)| 00:00:01 ||* 3 | TABLE ACCESS FULL| TEMP_TABLE | 1 | 24 | 3 (0)| 00:00:01 ||* 4 | TABLE ACCESS FULL| TEMP_TABLE | 1 | 24 | 3 (0)| 00:00:01 |---------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("OBJECT_TYPE"='TABLE') 3 - filter("OBJECT_TYPE"='INDEX') 4 - filter("OBJECT_TYPE"='VIEW')Note----- - dynamic sampling used for this statement (level=2)Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 10 consistent gets 0 physical reads 0 redo size 679 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed CNT OBJECT_TYPE---------- ------------------- 23120 TABLE 30784 INDEX 41512 VIEW
看到吗?逻辑读从25035 降到了10,是不是很神奇,有细心的会问,表还不是被扫描三次了,没有减少啊。这就说到我们SQL优化的核心了就是减少IO扫描,不管减少扫描次数,还是减少扫描大小,其实都是为了减少IO的扫描次数,上面的中间临时表TEMP_TABLE比原表t_0424_1少了太多,所以同样扫描三次的IO是差距很大的,这就是典型的空间换时间。
那下面说说我们今天的主题了,能不能用with as 来优化上面的SQL呢?当然可以,把三段的共同部分提取到with as字句里,其实就是上面的中间临时表。
SQL修改为:WITH temp AS (SELECT /*+materialize*/ COUNT (*) cnt, object_type FROM t_0424_1 GROUP BY object_type) SELECT cnt, object_type FROM temp t WHERE t.object_type = 'TABLE' UNION ALL SELECT cnt, object_type FROM temp t WHERE t.object_type = 'INDEX' UNION ALL SELECT cnt, object_type FROM temp t WHERE t.object_type = 'VIEW'; SQL> WITH temp AS (SELECT /*+materialize*/ COUNT (*) cnt, object_type FROM t_0424_1 2 3 GROUP BY object_type) 4 SELECT cnt, object_type FROM temp t WHERE t.object_type = 'TABLE' UNION ALL SELECT cnt, object_type FROM temp t WHERE t.object_type = 'INDEX' UNION ALL SELECT cnt, object_type 5 FROM temp t WHERE t.object_type = 'VIEW'; 6 7 8 9 10 11 12 13 14 CNT OBJECT_TYPE---------- ------------------- 23120 TABLE 30784 INDEX 41512 VIEWExecution Plan----------------------------------------------------------Plan hash value: 632282849---------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 14M| 331M| 5935 (68)| 00:01:12 || 1 | TEMP TABLE TRANSFORMATION | | | | | || 2 | LOAD AS SELECT | SYS_TEMP_0FD9D679D_210BBB7 | | | | || 3 | HASH GROUP BY | | 4823K| 50M| 2659 (8)| 00:00:32 || 4 | TABLE ACCESS FULL | T_0424_1 | 4823K| 50M| 2507 (2)| 00:00:31 || 5 | UNION-ALL | | | | | ||* 6 | VIEW | | 4823K| 110M| 1978 (2)| 00:00:24 || 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D679D_210BBB7 | 4823K| 50M| 1978 (2)| 00:00:24 ||* 8 | VIEW | | 4823K| 110M| 1978 (2)| 00:00:24 || 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D679D_210BBB7 | 4823K| 50M| 1978 (2)| 00:00:24 ||* 10 | VIEW | | 4823K| 110M| 1978 (2)| 00:00:24 || 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D679D_210BBB7 | 4823K| 50M| 1978 (2)| 00:00:24 |---------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 6 - filter("T"."OBJECT_TYPE"='TABLE') 8 - filter("T"."OBJECT_TYPE"='INDEX') 10 - filter("T"."OBJECT_TYPE"='VIEW')Note----- - dynamic sampling used for this statement (level=2)Statistics---------------------------------------------------------- 2 recursive calls 8 db block gets 8357 consistent gets 1 physical reads 600 redo size 679 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed
逻辑读从25035 降到了8357, 减了三倍,但是比我们上面的中间临表的逻辑读要高,这个就要说到with as 和中间临时表的区别了,中间临时表是固态的表,数据不会随原表变化而变化,所以如果原表变化了,那么这个表就要重新建,增加了手动操作,而with as 没有这种限制,所以在项目开发中是用with as 还是中间临时表,看个人需要了。
下面来个实际项目上的案例,案例来源于网络:
SELECT t1.contactid, t1.skillid, t2.turntime starttime, t2.intime FROM xxxxx1 t1, ( SELECT t.contactid, MIN (t.starttime) turntime, MAX (t.starttime) intime FROM xxxxx1 t WHERE t.eventid = 806355209 AND t.contactstatus = 2 GROUP BY t.contactid) t2 WHERE t1.contactid = t2.contactid AND t1.starttime = t2.intime AND t1.eventid = 806355209 AND t1.contactstatus = 2;Execution Plan----------------------------------------------------------Plan hash value: 2829800415--------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |--------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 145 | | 76953 (2)| 00:15:24 ||* 1 | FILTER | | | | | | || 2 | HASH GROUP BY | | 1 | 145 | | 76953 (2)| 00:15:24 ||* 3 | HASH JOIN | | 114K| 15M| 8272K| 76943 (2)| 00:15:24 ||* 4 | TABLE ACCESS FULL| xxxxx1 | 109K| 6977K| | 38029 (2)| 00:07:37 ||* 5 | TABLE ACCESS FULL| xxxxx1 | 109K| 8587K| | 38029 (2)| 00:07:37 |--------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("T1"."STARTTIME"=MAX("T"."STARTTIME")) 3 - access("T1"."CONTACTID"="T"."CONTACTID") 4 - filter("T"."EVENTID"=806355209 AND "T"."CONTACTSTATUS"=2) 5 - filter("T1"."EVENTID"=806355209 AND "T1"."CONTACTSTATUS"=2)Statistics---------------------------------------------------------- 666 recursive calls 0 db block gets 324498 consistent gets 336648 physical reads 0 redo size 52016865 bytes sent via SQL*Net to client 397768 bytes received via SQL*Net from client 36118 SQL*Net roundtrips to/from client 21 sorts (memory) 0 sorts (disk) 541747 rows processed with as改写之后: #t表的结果集比原表xxxxx1小,下面扫描t的时候,就能减少IO扫描,当然我们也 #可以用中间临时表去优化 WITH t AS (SELECT contactid, skillid, starttime FROM xxxxx1 WHERE eventid = 806355209 AND contactstatus = 2)SELECT t1.contactid, t1.skillid, t2.turntime starttime, t2.intime FROM t t1, ( SELECT t.contactid, MIN (t.starttime) turntime, MAX (t.starttime) intime FROM t GROUP BY t.contactid) t2 WHERE t1.contactid = t2.contactid AND t1.starttime = t2.intimeExecution Plan----------------------------------------------------------Plan hash value: 1354327316------------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |------------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 183 | | 41280 (2)| 00:08:16 || 1 | TEMP TABLE TRANSFORMATION | | | | | | || 2 | LOAD AS SELECT | | | | | | ||* 3 | TABLE ACCESS FULL | xxxxx1 | 109K| 7299K| | 38029 (2)| 00:07:37 ||* 4 | HASH JOIN | | 1 | 183 | 9984K| 3251 (1)| 00:00:40 || 5 | VIEW | | 109K| 8694K| | 227 (2)| 00:00:03 || 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D66ED_3363B5A2 | 109K| 6225K| | 227 (2)| 00:00:03 || 7 | VIEW | | 109K| 10M| | 1942 (1)| 00:00:24 || 8 | HASH GROUP BY | | 109K| 6977K| 8232K| 1942 (1)| 00:00:24 || 9 | VIEW | | 109K| 6977K| | 227 (2)| 00:00:03 || 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D66ED_3363B5A2 | 109K| 6225K| | 227 (2)| 00:00:03 |------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 3 - filter("EVENTID"=806355209 AND "CONTACTSTATUS"=2) 4 - access("T1"."CONTACTID"="T2"."CONTACTID" AND "T1"."STARTTIME"="T2"."INTIME")Statistics---------------------------------------------------------- 1937 recursive calls 5572 db block gets 173790 consistent gets 173540 physical reads 1768 redo size 52132593 bytes sent via SQL*Net to client 397768 bytes received via SQL*Net from client 36118 SQL*Net roundtrips to/from client 44 sorts (memory) 0 sorts (disk) 541747 rows processed
这个sql语句如果想避免xxxxx1的全表扫描很难,但是我们注意到全表扫描了两次xxxxx1表,我们尝试利用下with as物化这个表的部分结果集为临时表来减小IO成本。就是把扫描原表变成扫描一个比较小的表,减小IO扫描。
这里我们用with as改写后,逻辑读和cost成本都降低了接近一半的样子,
其实在with as的改写后,oracle的优化器会计算是否用临时表的方式来完成查询,如果临时表的执行计划计算而来的cost较大,oracle还是会选择之前的方式,这就是CBO,但是前提是优化器的选择是对的,统计信息准确等等。
今天说到这里,望指正。
------------------------------------------------------------------------------------
点击关注有百G python视频教程大礼包送给你哦!~
福利
学习Python加小编微信拉你入群学习
请注明:加群,否则不会通过哦
微信:pythonislover88