Oracle Partition Outer Join，外链接的补充，实现稠化报表。

最新推荐文章于 2021-04-07 20:03:16 发布

nayi_224

最新推荐文章于 2021-04-07 20:03:16 发布

阅读量725

点赞数

分类专栏： oracle 文章标签： oracle 分区外链接 partition outer join 稠化报表

本文链接：https://blog.csdn.net/nayi_224/article/details/80172497

版权

oracle 专栏收录该内容

25 篇文章 1 订阅

订阅专栏

摘要

本文主要阐述了为什么要使用Partition Outer Join，Partition Outer Join的使用方法，以及Partition Outer Join常见的应用场景。

为什么要使用Partition Outer Join语法

Partition Outer Join，分组（区）外链接。是oracle对于普通外链接的一种补充。掌握这个语法的关键在于了解在什么情况下传统的外链接无法满足需要以及如何使用分组外链接更好的去解决问题。

建立测试数据

create table nayi_180502_sale_old(    --销售表
       prod_name varchar2(50),        --产品名称
       month_num number,              --月份
       sale number                    --销售额
       );

insert into nayi_180502_sale_old
  select 'A', 1, 10
    from dual
  union all
  select 'A', 2, 20
    from dual
  union all
  select 'A', 4, 30 from dual;

create table nayi_180502_sale_month(
       month_num number, 
       month_name varchar2(10)
       );

insert into nayi_180502_sale_month
select 1, '一月' from dual
union all
select 2, '二月' from dual
union all
select 3, '三月' from dual
union all
select 4, '四月' from dual
;

select*from nayi_180502_sale_old;
select*from nayi_180502_sale_month;

PROD_NAME	MONTH_NUM	SALE
A	1	10
A	2	20
A	4	30

MONTH_NUM	MONTH_NAME
1	一月
2	二月
3	三月
4	四月

我们经常需要获取一个带有全部月份信息的结果集，这种时候需要用到左（右）外链接。

select *
  from nayi_180502_sale_month t1, nayi_180502_sale_old t2
 where t1.month_num = t2.month_num(+)
 order by t1.month_num;

MONTH_NUM	MONTH_NAME	PROD_NAME	SALE
1	一月	A	10
2	二月	A	20
3	三月
4	四月	A	30

对于这样的一般需求，外链接是可以胜任的。
但是实际数据格式与需求可能会更复杂些。
这张销售表中不止有A产品，还有B产品。而想要得到的结果集是每一种产品的每一个月份都生成一行。

对于这种需求，原来的写法显然是行不通的。

create table nayi_180502_sale_new as 
select*from nayi_180502_sale_old t
union all
select 'B', 3, 100 from dual;

select *
  from nayi_180502_sale_month t1, nayi_180502_sale_new t2
 where t1.month_num = t2.month_num(+)
 order by t1.month_num
;

MONTH_NUM	MONTH_NAME	PROD_NAME	SALE
1	一月	A	10
2	二月	A	20
3	三月	B	100
4	四月	A	30

一个经典的做法是使用月份与产品做笛卡尔积所生成的临时码表做外链接。

select t1.month_num,
       t1.month_name,
       t2.prod_name,
       t2.sale
  from (select *
          from (select distinct t1.prod_name from nayi_180502_sale_new t1) t1,
               nayi_180502_sale_month t2) t1,
       nayi_180502_sale_new t2
 where t1.month_num = t2.month_num(+)
   and t1.prod_name = t2.prod_name(+)
 order by t1.prod_name, t1.month_num;

MONTH_NUM	MONTH_NAME	PROD_NAME	SALE
1	一月	A	10
2	二月	A	20
3	三月
4	四月	A	30
1	一月
2	二月
3	三月	B	100
4	四月

这是确实可行的方法，但是一个很严重的问题是，它对nayi_180502_sale_new扫描了两次，这可能会带来很糟糕的执行效率。并且由于逻辑不直接，在更复杂的情况下会使得sql变得臃肿，难以维护。

Partition Outer Join语法正是为解决这种问题而存在的。它有一个oracle中很经典的“分组”的概念，这个概念与分析函数中的分组（partition by语句）的概念几乎一致。
我们可以理解为，在不使用Partition Outer Join语法时，将全部销售数据作为一个组。使用后则根据维度对数据分为多个组，这些组会分别被月份表外链接后再连接起来。

select *
  from nayi_180502_prod_month t2
  left outer join nayi_180502_sale_new t1 partition by(t1.prod_name) 
  on (t1.month_num = t2.month_num)
;

PROD_NAME	MONTH_NUM	MONTH_NAME	SALE
A	1	一月	10
A	2	二月	20
A	3	三月
A	4	四月	30
B	1	一月
B	2	二月
B	3	三月	100
B	4	四月

Partition Outer Join使用方法

具体语法

SELECT …..
FROM table_reference
PARTITION BY (expr [, expr ]… )
RIGHT OUTER JOIN table_reference

SELECT …..
FROM table_reference
LEFT OUTER JOIN table_reference
PARTITION BY {expr [,expr ]…)

从这个官方在线文档中我们可以看出。

partition by 语句需要紧贴在外链接的表后面。
“(expr [, expr ]… )”可以分为多个维度。
并不能用于full join。
{expr [,expr ]…) 文档中把(写成了{。

一个非常值得注意的是文档中语句的顺序。右外联的时候partition从句写在了上面，左外联的时候写到了下面。也就是说，partition by 语句需要紧贴在外链接的表后面。如果不这样做，比如

select *
  from nayi_180502_prod_month t2 partition by( t1.month_num) 
  left outer join nayi_180502_sale_new t1 
  on (t1.month_num = t2.month_num);

在查看执行计划后会发现它与没写partition by( t1.month_num)的执行计划是一样的。
如果位置正确，但是引用了其他表的值，则会直接报错，比如

select *
  from nayi_180502_prod_month t2 
  left outer join nayi_180502_sale_new t1 partition by( t2.month_num) 
  on (t1.month_num = t2.month_num);

一眼看上去可能会觉得它与分析函数中的partition by用法完全一样，但是…

一些更有创造性的写法。

partition by(1)，结果根没写它一样。
partition by(decode(t1.prod_name, '一个不存在的值', 'aaaaa', t1.prod_name))。一般我们会觉得这与t1.prod_name是等效的，但是在这个语法中会得到完全意想不到的结果——一个无限长度的结果集。同样的，使用rownum伪列也会得到类似的结果。
将表连接所用的列写入其中。比如

select t2.month_num code_month_num, t2.month_name, t1.*
  from nayi_180502_prod_month t2
  left outer join nayi_180502_sale_new t1 partition by(t1.prod_name, t1.month_num )
  on (t1.month_num = t2.month_num)
;

CODE_MONTH_NUM	MONTH_NAME	PROD_NAME	MONTH_NUM	SALE
1	一月	A	1	10
2	二月	A	2	20
3	三月	A	4
4	四月	A	4	30
1	一月	B	3
2	二月	B	3
3	三月	B	3	100
4	四月	B	3

可以看出partition从句中的列会单独加入到结果集中。这个结果看起来可以使用first_value()来实现。
当然，作为替代，这个decode是可以写在内部视图中的。

select *
  from nayi_180502_sale_month t2
  left outer join (select decode(t1.prod_name, 'FFF', 'FFF', t1.prod_name) prod_name,
                          t1.month_num,
                          t1.sale
                     from nayi_180502_sale_new t1) t1 partition by(t1.prod_name) on (t1.month_num =
                                                                                    t2.month_num);

这么写是完全可以的。

看来我还没有理解这个语法的深层原理，无法解释这些另类的情况，它与分析函数中的partition by并不完全一样。不过只要按正常套路写，就不会出什么问题。

Partition Outer Join常见的应用场景

一些来自官方在线文档的更详细的解释。
https://docs.oracle.com/cd/B28359_01/server.111/b28313/analysis.htm#DWHSG02013

Data Densification for Reporting
Data is normally stored in sparse form. That is, if no value exists for a given combination of dimension values, no row exists in the fact table. However, you may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them. For example, if a product did not sell during a particular time period, you may still want to see the product for that time period with zero sales value next to it. Moreover, time series calculations can be performed most easily when data is dense along the time dimension. This is because dense data will fill a consistent number of rows for each period, which in turn makes it simple to use the analytic windowing functions with physical offsets. Data densification is the process of converting sparse data into dense form.To overcome the problem of sparsity, you can use a partitioned outer join to fill the gaps in a time series or any other dimension. Such a join extends the conventional outer join syntax by applying the outer join to each logical partition defined in a query. Oracle logically partitions the rows in your query based on the expression you specify in the PARTITION BY clause. The result of a partitioned outer join is a UNION of the outer joins of each of the partitions in the logically partitioned table with the table on the other side of the join.

Note that you can use this type of join to fill the gaps in any dimension, not just the time dimension. Most of the examples here focus on the time dimension because it is the dimension most frequently used as a basis for comparisons.

这段话对使用场景进行了解释。
它首先提到了一个Densification for Reporting——稠化报表的概念。用图来解释的话。

非稠化（稀疏）报表：