oracle标量子查询简介和表连接改写

源自http://www.dbaxiaoyu.com/archives/2113


Oracle允许在select子句中包含单行子查询,这个也就是oracle的标量子查询,标量子查询有点类似于外连接,当使用到外连接时我们可以灵活的将其转化为标量子查询。

SQL> create table t1 as select * from all_users;

Table created.

SQL> create table t2 as select * from all_objects;

Table created.

SQL> select a.object_id,(select b.username from t1 b where a.owner=b.username) f
rom t2 a;

49812 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1364172329

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 53276 |  1560K|   152   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    17 |     2   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T2   | 53276 |  1560K|   152   (1)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("B"."USERNAME"=:B1)

Note
-----
   - dynamic sampling used for this statement

Statistics
----------------------------------------------------------
         13  recursive calls
          0  db block gets
       4282  consistent gets
          0  physical reads
          0  redo size
    1176699  bytes sent via SQL*Net to client
      37012  bytes received via SQL*Net from client
       3322  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      49812  rows processed

标量子查询其实还是一个子查询,那么它究竟是如何查询的:首先走的是外部的查询,比如上一个sql语句执行计划,先全表扫描的T2 a,然后取T2 a表的每一行数据就去和T1 b去过滤,过滤条件是a.owner=b.username,如果符合则返回子查询的值,如果不符合则用null补充。当然这个时候还有个类似的filter去重的运算,对于t2 a中重复的数据行不用再去和t1 b去过滤。

而上面这个标量子查询的sql语句其实是等价于下面外连接sql语句的:

SQL> select a.object_id,b.username from t2 a,t1 b
  2  where a.owner=b.username(+)  ;

49812 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 535089106

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      | 53276 |  2445K|   155   (2)| 00:00:02 |
|*  1 |  HASH JOIN RIGHT OUTER|      | 53276 |  2445K|   155   (2)| 00:00:02 |
|   2 |   TABLE ACCESS FULL   | T1   |    23 |   391 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL   | T2   | 53276 |  1560K|   152   (1)| 00:00:02 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"."OWNER"="B"."USERNAME"(+))

Note
-----
   - dynamic sampling used for this statement

Statistics
----------------------------------------------------------
          9  recursive calls
          0  db block gets
       4043  consistent gets
          0  physical reads
          0  redo size
    1176659  bytes sent via SQL*Net to client
      37012  bytes received via SQL*Net from client
       3322  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      49812  rows processed

而如果标量子查询中如果主查询的一行对应子查询返回有多个值,这个是不允许的

SQL> select a.username,b.object_id from t1 a,t2 b
  2  where a.username=b.owner(+);

29742 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1823443478

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 53276 |  2445K|   155   (2)| 00:00:02 |
|*  1 |  HASH JOIN OUTER   |      | 53276 |  2445K|   155   (2)| 00:00:02 |
|   2 |   TABLE ACCESS FULL| T1   |    23 |   391 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T2   | 53276 |  1560K|   152   (1)| 00:00:02 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"."USERNAME"="B"."OWNER"(+))

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       2726  consistent gets
          0  physical reads
          0  redo size
     654542  bytes sent via SQL*Net to client
      22294  bytes received via SQL*Net from client
       1984  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      29742  rows processed

SQL> select a.username,(select b.object_id from t2 b where a.username=b.owner) f
rom t1 a;
select a.username,(select b.object_id from t2 b where a.username=b.owner) from t
1 a
                   *
ERROR at line 1:
ORA-01427: single-row subquery returns more than one row

这里由于a.username=b.owner,其中b.owner有多个相同的值,所以这里返回的b.object_id可能有多个值,这里就出现上述的ora-01427错误。

标量子查询中也可以有聚合函数的出现:

SQL> select a.username,(select max(b.object_id) from t2 b where b.owner=a.userna
me) from t1 a;

23 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 367820

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    23 |   391 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T2   |   533 | 15990 |   152   (1)| 00:00:02 |
|   3 |  TABLE ACCESS FULL | T1   |    23 |   391 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("B"."OWNER"=:B1)

Note
-----
   - dynamic sampling used for this statement

Statistics
----------------------------------------------------------
          9  recursive calls
          0  db block gets
      15929  consistent gets
          0  physical reads
          0  redo size
       1206  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         23  rows processed

SQL> select a.username,max_id
from t1 a,(select owner,max(object_id) max_id from t2 group by owner) b
where a.username=b.owner(+)

但是我们注意到上述标量子查询却存在一个问题,就是无法将子查询展开为表连接,换句话说无法采用灵活的hash join outer的关联方式。

关于标量子查询和表关联的性能简介:
如果主查询返回的数据较多,而子查询中又没有高效的索引,关联列对应的主查询表又没有较多的重复值,那么这个标量子查询的执行成本是很大的,如上面的标量子查询和外连接的sql语句中可以看出外连接IO成本要明显小于标量子查询。

但是标量子查询oracle内部确是有优化的,优化器cache了中间的结果,如果结果集不大,子查询中又有高效的索引,那么这个标量子查询可能会比常规的表关联更加高效。

小鱼列出几种常会涉及到的标量子查询和表连接的sql改写:

1 最简单的标量子查询
table :a(a1,a2),b(a1,b2)
select a2,(select b2 from b where b.a1=a.a1) from a

表连接:
select a2,b2 from a,b where a.a1=b.a1(+);

2 标量子查询带有聚合函数
table :a(a1,a2),b(a1,b2)
select a2,(select sum(b2) from b where b.a1 = a.a1) from a

表连接1:
SELECT a2, x.sum_value
  FROM a,
       ( SELECT SUM (b2) sum_value, a1
            FROM b
        GROUP BY a1) x
 WHERE a.a1 = x.a1(+);

3 包含行转列的标量子查询改写
select a.*,
       (select data_object_id
           from b
          where b.object_id = a.object_id
            and b.object_type = 'INDEX') col1,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'TABLE') col2,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'JAVA') col3,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'PROCEDURE') col4,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'FUNCTION') col5
  from a
 where a.created > sysdate - 100;

改写的SQL:
select a.*,
       (case when b.object_type = 'INDEX' then b.data_object_id else null end) col1,
       (case when b.object_type = 'TABLE' then b.data_object_id else null end) col2,
       (case when b.object_type = 'JAVA' then b.data_object_id else null end) col3,
       (case when b.object_type = 'PROCEDURE' then b.data_object_id else null end) col4,
       (case when b.object_type = 'FUNCTION' then b.data_object_id else null end) col5
  from a
  left join b on a.object_id = b.object_id
 where a.created > sysdate - 100;

4 标量子查询同时包含行转列和聚合函数
select a.*,
       (select max(data_object_id)
           from b
          where b.object_id = a.object_id
            and b.object_type = 'INDEX') col1,
       (select max(data_object_id)
          from b
         where b.object_id = a.object_id
           and b.object_type = 'TABLE') col2,
       (select max(data_object_id)
          from b
         where b.object_id = a.object_id
           and b.object_type = 'JAVA') col3,
       (select max(data_object_id)
          from b
         where b.object_id = a.object_id
           and b.object_type = 'PROCEDURE') col4,
       (select max(data_object_id)
          from b
         where b.object_id = a.object_id
           and b.object_type = 'FUNCTION') col5
  from a
 where a.created > sysdate - 100;

改写的SQL为:
select a.*,
       (case when x.object_type='INDEX' then max_dataobj  else null end) col1,
       (case when x.object_type='TABLE' then max_dataobj  else null end) col2,
       (case when x.object_type='JAVA' then max_dataobj  else null end) col3,
       (case when x.object_type='PROCEDURE' then max_dataobj  else null end) col4,
       (case when x.object_type='FUNCTION' then max_dataobj  else null end) col5
  from a left join (select max(date_object_id) max_dataobj,object_type from b where object_type in ('INDEX', 'TABLE', 'JAVA', 'PROCEDURE', 'FUNCTION')) x on a.object_id=x.object_id
 where a.created > sysdate - 100;

5 标量子查询中出现rownum=1或者rownum<2

原则上标量子查询中出现rownum表示该SQL本来就是不严谨的,加上ROWNUM=1更多是为了防止标量子查询中返回多行而出现错误:
select a.*,
       (select data_object_id
           from b
          where b.object_id = a.object_id
            and b.object_type = 'INDEX' and rownum=1) col1,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'TABLE' and rownum=1) col2,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'JAVA' and rownum=1) col3,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'PROCEDURE' and rownum=1) col4,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'FUNCTION' and rownum=1) col5
  from a
 where a.created > sysdate - 100;

比如上面的SQL语句中对每个标量子查询都添加了rownum=1的限制,那么上述这个SQL语句如何改写为表的外连接了

这里首先取其中一个标量子查询来做分析:
select data_object_id
           from b
          where b.object_id = a.object_id
            and b.object_type = 'INDEX' and rownum=1

比如b表中有两行数据都满足b.object_id = a.object_id and b.object_type = 'INDEX'条件,rownum=1后oracle会根据b表的执行计划取到第一条后就返回,这两行数据都有可能取到,具体取哪一行要决定B表的访问方式是索引扫描还是全表扫描等,而在这个SQL本意中无论取哪一条都是满足业务需求的

那么这个标量子查询则可以简化为:
select max(data_object_id)
           from b
          where b.object_id = a.object_id
            and b.object_type = 'INDEX'
or
select min(data_object_id)
           from b
          where b.object_id = a.object_id
            and b.object_type = 'INDEX'

那么原SQL的标量子查询可以简化为
select a.*,
       (select max(data_object_id)
           from b
          where b.object_id = a.object_id
            and b.object_type = 'INDEX') col1,
       (select max(data_object_id)
          from b
         where b.object_id = a.object_id
           and b.object_type = 'TABLE' and rownum=1) col2,
       (select max(data_object_id)
          from b
         where b.object_id = a.object_id
           and b.object_type = 'JAVA' and rownum=1) col3,
       (select max(data_object_id)
          from b
         where b.object_id = a.object_id
           and b.object_type = 'PROCEDURE' and rownum=1) col4,
       (select data_object_id
          from b
         where b.object_id = a.object_id
           and b.object_type = 'FUNCTION' and rownum=1) col5
  from a
 where a.created > sysdate - 100;

如何改写包含聚合函数的标量子查询之前已经介绍过,这里直接改写为如下SQL语句:
select a.*, b.col1, b.col2, b.col3, b.col4, b.col5
from a left join
(select b.object_id,
        max(case when object_type = 'INDEX' then data_object_id else null end) col1,
        max(case when object_type = 'TABLE' then data_object_id else null end) col2,
        max(case when object_type = 'JAVA' then data_object_id else null end) col3,
        max(case when object_type = 'PROCEDURE' then data_object_id else null end) col4,
        max(case when object_type = 'FUNCTION' then data_object_id else null end) col5
        from b
        where b.object_type in
               ('INDEX', 'TABLE', 'JAVA', 'PROCEDURE', 'FUNCTION')
        group by b.object_id) x
on a.object_id = x.object_id
where a.created > sysdate - 100;

其实这个b.object_type in ('INDEX', 'TABLE', 'JAVA', 'PROCEDURE', 'FUNCTION')写不写也都符合业务逻辑

6 关于标量子查询改写后逻辑校验:
select a.job,
       a.deptno,
       (select distinct dname from dept2 b where b.deptno=a.deptno) as dname
       from emp a

有以下两个改写可供参考:
select distinct a.job,a.deptno,b.dname
from emp a
left join dept2 b on b.deptno=a.deptno;

select a.job,a.deptno,b.dname。
from emp a
left join (select dname,deptno from dept2 group by dname,deptno)b
on b.deptno=a.deptno

这里xiaoyu觉得第二种写法是完全复合业务逻辑的

简要分析下,对于原标量子查询中的(select distinct dname from dept2 b where b.deptno=a.deptno) as dname,由于标量子查询中只能返回单行,换句话说就是每个满足b.deptno=a.deptno条件的数据只能返回一行distinct dname,那么就是select dname,deptno from dept2 group by dname,deptno生成的数据不会有(dname=x deptno=a)和(dname=y deptno=a)的数据,因为这类数据在原SQL语句中如果存在是会报错的,那么可以确定的是对于dept2表只要deptno确定了,dname就确定了,所以这个改写不会改变原SQL的含义。

7 不等连接的标量子查询改写:
select a.liceneid,
       a.data_source,
       a.street,
       (select min(contdate)
          from ct
         where ct.licenedid = a.licenceid
           and ct.data_source = a.data_source
           and trunc(contdate) >= a.opensaledate) as mincontdate,
       (select min(buydate)
          from ct
         where ct.licenedid = a.licenceid
           and ct.data_source = a.data_source
           and trunc(buydate) >= a.opensaledate) as minbuydate
from a

这个SQL主要是在标量子查询中用了聚合函数min和非等值关联trunc(contdate) >= a.opensaledate和trunc(buydate) >= a.opensaledate,聚合函数可以参考我们之前的写法用group by先行构造,但是非等值连接如何改写了。

select a.liceneid,
       a.data_source,
       a.street,
       min(case when trunc(contdate) >= a.opensaledate then contdate end ) as mincontdate,
       min(case when trunc(buydate) >= a.opensaledate then buydate end ) as mincontdate
from a left join ct on ct.licenedid = a.licenceid and ct.data_source = a.data_source

由于有min聚合函数,必须要分组,但是上面这个SQL如果对a.liceneid,a.data_source,a.street三个字段分组,则不满足原SQL的含义,那么这个SQL究竟如何改写。

这里参考oracle 查询优化改写案例书籍,要改写这类不等连接的标量子查询,可以参考下面的写法:
with x
select
a.rowid as rid,
min(case when trunc(ct.contdate)>=a.opensaledate then ct.contdate end) as mincontdate,
min(case when trunc(ct.buydate)>=a.opensaledate then ct.contdate end) as minbuydate
from ct inner join a
on (ct.licenedid = a.licenceid  and ct.data_source = a.data_source)
group by a.rowid
select a.liceneid,
       a.data_source,
       a.street,
       x.mincontdate,
       x. minbuydate
from a left join x on a.rowid=x.rid;

这个改写思路其实可以这么理解:首先构造x表,这个表存储了都是满足了ct.licenedid = a.licenceid、and ct.data_source = a.data_source、trunc(ct.contdate)>=a.opensaledate和trunc(ct.buydate)&gt;=a.opensaledate条件的a.rowid、min(contdate)和min(buydate)数据,同样有min函数如果需要将这些满足条件的数据输出需要再次关联一次a表,而比较容易的就是将满足条件的rowid去和a表重复做left join,满足a.rowid=x.rid就全部输出,不满足在的就补全null

其实还可以用分析函数更简单的改写为
select a.liceneid,
       a.data_source,
       a.street,
       (case when trunc(ct.contdate) >= a.opensaledate then min(ct.contdate)over(partition by ct.contdate) end) as mincontdate,
       (case when trunc(ct.buydate) >= a.opensaledate then min(ct.buydate)over(partition by ct.buydate) end) as mincontdate
from a left join ct on ct.licenedid = a.licenceid and ct.data_source = a.data_source
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值