记一个分析函数优化SQL的案例以及分析函数详解

首先来看一个优化案例,应用开发人员让审核一条SQL,他们说执行时间较长:

原有SQL的写法:
select t.identify as IDENTIFY,
       t.flat_date as FLAT_DATE,
       t.flat_time as FLAT_TIME,
       t.tps as MAXTPS,
       round((select avg(tps)
               from mon_ctpinfo t
              where t.flat_date = 'cluster2'
                and t.identify = '20170601'),
             3) as AVGTPS
  from mmm_cinfo t
 where t.tps = (select max(tps)
                  from mmm_cinfo t
                 where t.flat_date = 'cluster2'
                   and t.identify = '20170601')
   and t.flat_date = 'cluster2'
   and t.identify = '20170601'
 group by t.identify, t.flat_date, t.flat_time, t.tps;

单从SQL的写法上来看,同一张表肯定会被扫描3次,懂SQL优化的人都知道该SQL的写法真心不咋地。请看SQL未改写前的执行计划:

Plan hash value: 317034915
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |    30 |     5  (20)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE             |                    |     1 |    21 |            |          |       |       |
|   2 |   PARTITION LIST EMPTY      |                    |     1 |    21 |     2   (0)| 00:00:01 |INVALID|INVALID|
|   3 |    PARTITION RANGE EMPTY    |                    |     1 |    21 |     2   (0)| 00:00:01 |INVALID|INVALID|
|*  4 |     TABLE ACCESS FULL       | mmm_cinfo          |     1 |    21 |     2   (0)| 00:00:01 |INVALID|INVALID|
|   5 |  HASH GROUP BY              |                    |     1 |    30 |     5  (20)| 00:00:01 |       |       |
|   6 |   PARTITION LIST EMPTY      |                    |     1 |    30 |     2   (0)| 00:00:01 |INVALID|INVALID|
|   7 |    PARTITION RANGE EMPTY    |                    |     1 |    30 |     2   (0)| 00:00:01 |INVALID|INVALID|
|*  8 |     TABLE ACCESS FULL       | mmm_cinfo          |     1 |    30 |     2   (0)| 00:00:01 |INVALID|INVALID|
|   9 |      SORT AGGREGATE         |                    |     1 |    21 |            |          |       |       |
|  10 |       PARTITION LIST EMPTY  |                    |     1 |    21 |     2   (0)| 00:00:01 |INVALID|INVALID|
|  11 |        PARTITION RANGE EMPTY|                    |     1 |    21 |     2   (0)| 00:00:01 |INVALID|INVALID|
|* 12 |         TABLE ACCESS FULL   | mmm_cinfo          |     1 |    21 |     2   (0)| 00:00:01 |INVALID|INVALID|
------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter("T"."IDENTIFY"='20170601' AND "T"."FLAT_DATE"='cluster2')
   8 - filter("T"."IDENTIFY"='20170601' AND "T"."FLAT_DATE"='cluster2' AND "T"."TPS"= (SELECT MAX("TPS") 
              FROM "MON_ctpinfo" "T" WHERE "T"."IDENTIFY"='20170601' AND "T"."FLAT_DATE"='cluster2'))
  12 - filter("T"."IDENTIFY"='20170601' AND "T"."FLAT_DATE"='cluster2')

分析思路:
①乍一看,求avg(tps)的聚合子查询像是标量子查询,仔细瞅瞅,发现子查询里的表并没有和外面的表进行查询,故不是标量子查询,此种写法不妥。
②整体看,该sql有两个子查询,这两个子查询与主查询的表是同一个,按照此SQL的写法,会扫描同一张表mmm_cinfo三次,其实没必要,查看执行计划也可以一眼看出,同一张表被全表扫描了3次。
③改写如下

select t1.IDENTIFY,
       t1.FLAT_DATE,
       t1.FLAT_TIME,
       t1.MAXTPS,
       t1.AVGTPS
  from (select identify as IDENTIFY,
                flat_date as FLAT_DATE,
                flat_time as FLAT_TIME,
                tps as MAXTPS,
                round(avg(tps) over(partition by identify, flat_date), 3) as AVGTPS,
                max(tps) over(partition by by identify, flat_date) as mtps
          from mmm_cinfo t
         where t.flat_date = ''
           and t.identify = '') t1
 where t1.MAXTPS = t1.mtps
 group by t1.IDENTIFY,t1.FLAT_DATE,t1.FLAT_TIME,t1.MAXTPS,t1.AVGTPS;
 

④查看SQL改写后的执行计划,并查看SQL执行的结果是否一致

Plan hash value: 3134020514
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                    |     1 |    56 |     1 (100)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY           |                    |     1 |    56 |     1 (100)| 00:00:01 |       |       |
|*  2 |   VIEW                   |                    |     1 |    56 |     5 (100)| 00:00:01 |       |       |
|   3 |    WINDOW BUFFER         |                    |     1 |    30 |            |          |       |       |
|*  4 |     FILTER               |                    |       |       |            |          |       |       |
|   5 |      PARTITION LIST ALL  |                    |  6009K|   171M|  8144   (2)| 00:01:38 |     1 |    12 |
|   6 |       PARTITION RANGE ALL|                    |  6009K|   171M|  8144   (2)| 00:01:38 |     1 |  LAST |
|   7 |        TABLE ACCESS FULL | mmm_cinfo          |  6009K|   171M|  8144   (2)| 00:01:38 |     1 |   132 |
---------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("T1"."MAXTPS"="T1"."MTPS")
   4 - filter(NULL IS NOT NULL AND NULL IS NOT NULL)

⑤有执行计划可知,表MON_ctpinfo只访问了一次。很大的提升了SQL的执行性能。而且执行时间提升了95%以上。

接着我们来学习一下oracle常用的分析函数,不过,细说分析函数前,还需要提一下大家常用的聚合函数。
常用聚合函数

SELECT deptno,
	AVG(sal) as 平均值,
	MIN(sal) as 最小值,
	MAX(sal) as 最大值,
	SUM(sal) as 工资合计,
	COUNT(*) as 总行数,
	COUNT(comm) as "获得提成的人数",
	AVG(comm) 错误的人均提成算法,
	AVG(coalesce(comm,0)) "正确的人均提成" /* 需要把空值转换为0 */
 FROM emp
 GROUP BY deptno;

分析函数
①Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.
分析函数是oracle专门解决复杂报表统计需求的功能强大的函数,它可以在数据中进行分组然后计算基于组的某种统计值,与普通聚合函数不同的是,分析函数每个分组返回多行统计值,也就是每个分组中的每一行都有统计值。基于rows的组被称为一个窗口,该窗口有analytic_clause部分定义。针对于每一行,可以定义行的滑动窗口,该窗口可以决定用于执行计算的行范围。窗口大小可以基于物理行数或逻辑间隔(如时间)。

②Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.
分析函数是除ORDER BY子句之外在查询中执行的最后一组操作。在处理分析函数之前,所有关联和所有WHERE、GROUP BY和HAVING子句都已完成。因此,分析函数只能出现在select list或ORDER BY子句中。

③Analytic functions are commonly used to compute cumulative, moving, centered, and reporting aggregates.

分析函数的详细语法如下:
analytic_function::=
在这里插入图片描述
analytic_function:指定分析函数的名称。
arguments:分析函数有0到3个参数。参数可以是任何数据类型或任何可以隐式转换为数据类型的非数值数据类型。Oracle确定该参数具有最高数值优先级,会隐式地将其余参数转换为该数据类型。返回类型也是该数据类型,除非个别函数另有说明。

analytic_clause::=
在这里插入图片描述
使用OVER analytic_子句指示函数对查询结果集进行操作。该子句在FROM、WHERE、GROUP BY和HAVING子句之后计算。您可以在select list或ORDER BY子句中使用该子句指定分析函数。要基于分析函数过滤查询的结果,请将这些函数嵌套在父查询中,然后过滤嵌套子查询的结果。需要注意的亮点:①不能通过在该子句的任何部分指定任何分析函数来嵌套分析函数。不过,您可以在子查询中指定一个分析函数,并在其上计算另一个分析函数。②您可以使用用户定义的分析函数以及内置的分析函数指定OVER 后面的子句。

query_partition_clause::=
在这里插入图片描述

order_by_clause::=
在这里插入图片描述

windowing_clause ::=
在这里插入图片描述

参考资料:https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions004.htm#SQLRF06174

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值