SQL入门-进阶教程

前言

接上一次课程《SQL入门-基础教程》,本次课程是对上一次课程的深入,主要包括分组、聚合、连接、分析&窗口函数等方面,目的是通过讲解常用的语法和函数,争取用最短的时间对SQL有个整体的认识,每一块内容的详细讲解会在后面出一个系列的文章进行详细的详解,敬请关注…

好了,我们开始吧

00、预备知识

先讲几个数据仓库中的概念,目的是将口径拉齐,大家有个初步的了解即可。

1、事实表

发生在现实世界中的操作型事件,其所产生的可度量数值,比如,门店交易数据、仓库出库量数据、供应链进销存数据等

2、维度表

维度表就是你观察该事物的角度(维度),比如商品信息表、门店信息表、物流商品表等

3、大宽表

指业务主题相关的指标、维度、属性关联在一起的一张表,一般是由N张事件表和N张维度表关联生成

一、数据准备

1、运行环境

本次课程的运行环境为开源的Hue系统HUE链接
用户名:demo 密码:demo
PS:如果有公司的Hive SQL环境也可以,本次课程中的SQL都可以在Hive下面执行

2、数据表准备

本次使用with函数来代替数据表,在学习过程中with函数中的数据不需要改变,只需要在下面写自己的逻辑即可。

with dw_wms_outbound_info_v100 as(
  select '10700001' as order_no, '0011' as shop_no, '5501' as product_code, '2020-07-31' as order_date, 10 as payment_amount, '20200901' as dt union all
  select '10700001' as order_no, '0011' as shop_no, '5502' as product_code, '2020-08-01' as order_date, 90 as payment_amount, '20200901' as dt union all
  select '10700002' as order_no, '0011' as shop_no, '5501' as product_code, '2020-08-02' as order_date, 20 as payment_amount, '20200901' as dt union all
  select '10700002' as order_no, '0011' as shop_no, '5503' as product_code, '2020-08-02' as order_date, 30 as payment_amount, '20200901' as dt union all
  select '10700003' as order_no, '0011' as shop_no, '5501' as product_code, '2020-08-02' as order_date, 20 as payment_amount, '20200901' as dt union all
  select '10700004' as order_no, '0012' as shop_no, '6601' as product_code, '2020-07-31' as order_date, 40 as payment_amount, '20200901' as dt union all
  select '10700004' as order_no, '0012' as shop_no, '6601' as product_code, '2020-08-01' as order_date, 50 as payment_amount, '20200901' as dt
)
  
, dim_store_info_v100 as (
  select '0011' as shop_no, '北京一店' as shop_name, '20200901' as dt  union all
  select '0012' as shop_no, '北京二店' as shop_name, '20200901' as dt
)
  
, dim_sku_info_v100 as (
  select '5501' as product_code, '可乐350ml' as product_name, '20200901' as dt  union all
  select '5502' as product_code, '雪碧500ml' as product_name, '20200901' as dt  union all
  select '5504' as product_code, '芬达243ml' as product_name, '20200901' as dt  union all
  select '6601' as product_code, '巧克力'    as product_name, '20200901' as dt
)
 
---------------------上面的SQL不要动-----在下面写自己的SQL--------------------
 
select
  t1.dt,
  t1.order_no,
  t1.shop_no,
  t2.shop_name,
  t1.product_code,
  t3.product_name,
  t1.order_date,
  t1.payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_store_info_v100 t2 on t2.dt = '20200901' and t1.shop_no = t2.shop_no
left join dim_sku_info_v100 t3 on t3.dt = '20200901' and t1.product_code = t3.product_code
where t1.dt = '20200901'

如下图所示:
在这里插入图片描述

3、数据表的关系

如下图所示:dw_wms_outbound_info_v100表为事实表,dim_store_info_v100和dim_sku_info_v100表为维度表,分别通过shop_noproduct_code关联起来。
在这里插入图片描述

二、进阶语法

1、 with … as 语法

当我们在写一些结构相对复杂的SQL语句时,可能某个子查询在多个层级多个地方存在重复使用的情况,这个时候我们可以使用 with as 语句将其独立出来,极大提高SQL可读性,简化SQL~

样例:参考本次课程的[数据表准备SQL]

2、分组&聚合

2.0 语法结构

基本语法如下:
在这里插入图片描述

2.1 分组(group by)

分组顾名思义,即将原来的一整块数据按某种维度分成几小块,语法关键字为 group by,对比excel的透视表功能。
语法:group by col1, col2, …

例如:按门店和商品对数据进行分组

select
    shop_no,
    product_code
from dw_wms_outbound_info_v100
where dt = '20200901'
group by
    shop_no,
    product_code

查询结果:

shop_noproduct_code
00115501
00115502
00126601
00115503

2.2 聚合

聚合是在每个分组内进行一些统计,如在分组内的最大值(max)、最小值(min)、平均值(avg)、数据条数(count)等聚合函数

例:统计每个分组内的数据条数?(使用count函数来统计)

select
    shop_no,
    product_code,
    count(*) as record_cnt
from dw_wms_outbound_info_v100
where dt = '20200901'
group by
    shop_no,
    product_code

查询结果:

shop_noproduct_coderecord_cnt
001155013
001155021
001266012
001155031

注:count(*)代表的是统计组内所有的行,详细使用参考下面的聚合函数使用

2.2 分组过滤(having)

当在gropu by 子句中使用having 子句时,查询结果中只返回满足having条件的组

例:统计每笔订单支付的金额大于50的订单号?(使用count函数来统计)

select
  order_no,
  sum(payment_amount) as payment_amount
from dw_wms_outbound_info_v100
group by order_no
having payment_amount > 50

查询结果:

order_nopayment_amount
1070000490
10700001100

注:sum(payment_amount)代表的是统计组内所有的行的支付金额加总

3、表连接

3.0 语法结构

在这里插入图片描述

需要从多个数据表中读取数据,这时我们就可以使用SQL语句中的连接(JOIN),基本格式为 join ... on...and,如果不显示指定join类型,默认为inner,
在两个或多个数据表中查询数据。大家可以想象excel中的vlook使用过程。
</br>
PS:
1. 如下图中的A和B为两个表名
2. 下面图来源于网上博客

3.1 内连接(Inner join)

产生的结果是A和B的交集(相同列里面的相同值)
语法:[inner] join … on 条件1 and 条件2 and …

如下图所示:
在这里插入图片描述

例:查询商品表dim_sku_info_v100中所有商品在门店有销售的记录
SQL:

select
  t1.dt,
  t1.order_no,
  t1.shop_no,
  t1.product_code,
  t3.product_name,
  t1.order_date,
  t1.payment_amount
from dw_wms_outbound_info_v100 t1
inner join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
 
 
结果:
 
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt      |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700001|0011   |5501        |5501         |可乐350ml   |2020-07-31|10            |
|20200901|10700001|0011   |5502        |5502         |雪碧500ml   |2020-08-01|90            |
|20200901|10700002|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700003|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-07-31|40            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-08-01|50            |
+--------+--------+-------+------------+-------------+------------+----------+--------------+
 
说明:连接条件为商品编码,商品55035504没有显示出来

3.2 外连接(Full Outer Join)

产生的结果是A和B的并集(如果没有相同的值会用null作为值)
语法:full join … on 条件1 and 条件2 and …

在这里插入图片描述

例:查询商品表dim_sku_info_v100中所有商品在门店的记录,如果销售记录中的商品编码不在dim_sku_info_v100表中也要显示
SQL:
 
select
  t1.dt,
  t1.order_no,
  t1.shop_no,
  t1.product_code,
  t3.product_code as product_code2,
  t3.product_name,
  t1.order_date,
  t1.payment_amount
from dw_wms_outbound_info_v100 t1
full join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
 
结果:
 
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt      |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700002|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700003|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700001|0011   |5501        |5501         |可乐350ml   |2020-07-31|10            |
|20200901|10700001|0011   |5502        |5502         |雪碧500ml   |2020-08-01|90            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-07-31|40            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-08-01|50            |
|NULL    |NULL    |NULL   |NULL        |5504         |芬达243ml   |NULL      |NULL          |
|20200901|10700002|0011   |5503        |NULL         |NULL        |2020-08-02|30            |
+--------+--------+-------+------------+-------------+------------+----------+--------------+

3.3 左连接(Left outer join)

产生表A的完全集,根据on条件,B表中匹配的则有值(没有匹配的则以null值取代)
语法:left join … on 条件1 and 条件2 and …

在这里插入图片描述

例:查询门店所有销售记录的商品名称
SQL:
 
select
  t1.dt,
  t1.order_no,
  t1.shop_no,
  t1.product_code,
  t3.product_code as product_code2,
  t3.product_name,
  t1.order_date,
  t1.payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
 
结果:
 
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt      |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700001|0011   |5501        |5501         |可乐350ml   |2020-07-31|10            |
|20200901|10700001|0011   |5502        |5502         |雪碧500ml   |2020-08-01|90            |
|20200901|10700002|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700002|0011   |5503        |NULL         |NULL        |2020-08-02|30            |
|20200901|10700003|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-07-31|40            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-08-01|50            |
+--------+--------+-------+------------+-------------+------------+----------+--------------+

3.4 右连接(Right Outer Join)

产生表B的完全集,根据on条件,A表中匹配的则有值(没有匹配的则以null值取代),与Left Join相反
语法:right join … on 条件1 and 条件2 and …

在这里插入图片描述

例:查询在商品表中所有商品在门店的销售记录
SQL:
 
select
  t1.dt,
  t1.order_no,
  t1.shop_no,
  t1.product_code,
  t3.product_code as product_code2,
  t3.product_name,
  t1.order_date,
  t1.payment_amount
from dw_wms_outbound_info_v100 t1
right join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
 
结果:
 
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt      |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700003|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700002|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700001|0011   |5501        |5501         |可乐350ml   |2020-07-31|10            |
|20200901|10700001|0011   |5502        |5502         |雪碧500ml   |2020-08-01|90            |
|NULL    |NULL    |NULL   |NULL        |5504         |芬达243ml   |NULL      |NULL          |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-08-01|50            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-07-31|40            |
+--------+--------+-------+------------+-------------+------------+----------+--------------+

3.5 笛卡尔积(cross join)

cross对两个表执行笛卡尔乘积(可以理解为没有连接条件)。它为左表行和右表行的每种可能的组合返回一行,一般出现笛卡尔积时大概率现出了逻辑错误,比如连接条件写错表名,导致两个连接时没有连接条件。除非特殊情况下需要使用笛卡尔积

例:查询门店所有销售记录的商品名称

ps:注意下图中的连接条件

SQL:
 
select
  t1.dt,
  t1.order_no,
  t1.shop_no,
  t1.product_code,
  t3.product_code as product_code2,
  t3.product_name,
  t1.order_date,
  t1.payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_sku_info_v100 t3 on t1.product_code = t1.product_code
 
结果:如下
 
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|dt      |order_no|shop_no|product_code|product_code2|product_name|order_date|payment_amount|
+--------+--------+-------+------------+-------------+------------+----------+--------------+
|20200901|10700001|0011   |5501        |5501         |可乐350ml   |2020-07-31|10            |
|20200901|10700001|0011   |5501        |5502         |雪碧500ml   |2020-07-31|10            |
|20200901|10700001|0011   |5501        |5504         |芬达243ml   |2020-07-31|10            |
|20200901|10700001|0011   |5501        |6601         |巧克力      |2020-07-31|10            |
|20200901|10700001|0011   |5502        |5501         |可乐350ml   |2020-08-01|90            |
|20200901|10700001|0011   |5502        |5502         |雪碧500ml   |2020-08-01|90            |
|20200901|10700001|0011   |5502        |5504         |芬达243ml   |2020-08-01|90            |
|20200901|10700001|0011   |5502        |6601         |巧克力      |2020-08-01|90            |
|20200901|10700002|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700002|0011   |5501        |5502         |雪碧500ml   |2020-08-02|20            |
|20200901|10700002|0011   |5501        |5504         |芬达243ml   |2020-08-02|20            |
|20200901|10700002|0011   |5501        |6601         |巧克力      |2020-08-02|20            |
|20200901|10700002|0011   |5503        |5501         |可乐350ml   |2020-08-02|30            |
|20200901|10700002|0011   |5503        |5502         |雪碧500ml   |2020-08-02|30            |
|20200901|10700002|0011   |5503        |5504         |芬达243ml   |2020-08-02|30            |
|20200901|10700002|0011   |5503        |6601         |巧克力      |2020-08-02|30            |
|20200901|10700003|0011   |5501        |5501         |可乐350ml   |2020-08-02|20            |
|20200901|10700003|0011   |5501        |5502         |雪碧500ml   |2020-08-02|20            |
|20200901|10700003|0011   |5501        |5504         |芬达243ml   |2020-08-02|20            |
|20200901|10700003|0011   |5501        |6601         |巧克力      |2020-08-02|20            |
|20200901|10700004|0012   |6601        |5501         |可乐350ml   |2020-07-31|40            |
|20200901|10700004|0012   |6601        |5502         |雪碧500ml   |2020-07-31|40            |
|20200901|10700004|0012   |6601        |5504         |芬达243ml   |2020-07-31|40            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-07-31|40            |
|20200901|10700004|0012   |6601        |5501         |可乐350ml   |2020-08-01|50            |
|20200901|10700004|0012   |6601        |5502         |雪碧500ml   |2020-08-01|50            |
|20200901|10700004|0012   |6601        |5504         |芬达243ml   |2020-08-01|50            |
|20200901|10700004|0012   |6601        |6601         |巧克力      |2020-08-01|50            |
+--------+--------+-------+------------+-------------+------------+----------+--------------+
 
 
Time taken: 0.815 seconds, Fetched 28 row(s)

可见,由于出现了笛卡尔积,由于写错将条件写为(t1.product_code = t1.product_code),正确为t1.product_code = t2.product_code,最终数据最从7条变为了28条,数据量增长了4倍,

问题:如果A表有10万条数据,B表有10万条数据,如果发生逻辑错误,产生了笛卡尔积的操作,结果会是多少条数据?
10万? 100万? 10亿? 100亿?

4、连接后聚合

实际操作中常会对多表连接后的结果执行聚合操作,语法格式为:
在这里插入图片描述
SQL的执行顺序为 from -> join -> group -> having -> select -> order -> limit 在后面的课程中会专门对SQL的执行顺序进行详细讲解

例:查询门店每天商品的销售额,按日期+商品名称聚合
SQL:

select
  t1.order_date,
  t3.product_name,
  sum(t1.payment_amount) as payment_amount
from dw_wms_outbound_info_v100 t1
left join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
group by
  t1.order_date,
  t3.product_name
 
结果:如下
 
+----------+------------+--------------+
|order_date|product_name|payment_amount|
+----------+------------+--------------+
|2020-07-31|可乐350ml   |10            |
|2020-08-02|NULL        |30            |
|2020-08-01|雪碧500ml   |90            |
|2020-08-02|可乐350ml   |40            |
|2020-08-01|巧克力      |50            |
|2020-07-31|巧克力      |40            |
+----------+------------+--------------+

5、union[distinct | all]

UNION将多个SELECT语句的结果集合并为一个独立的结果集,distinct会对结果进行去重 (默认为 distinct,去重操作会影响性能),all保留所有

语法:
select_statement UNION [ALL | DISTINCT] 
select_statement UNION [ALL | DISTINCT] 
...
select_statement
例:使用union 来对0011门店和0012门店的有销售商品进行汇总,
SQL:
 
select
  shop_no,product_code
from dw_wms_outbound_info_v100
where shop_no = '0011'
 
 
union
 
select
  shop_no,product_code
from dw_wms_outbound_info_v100
where shop_no = '0012'
 
结果:去重
 
+-------+------------+
|shop_no|product_code|
+-------+------------+
|0011   |5501        |
|0011   |5502        |
|0012   |6601        |
|0011   |5503        |
+-------+------------+

四、进阶函数

1、聚合函数

函数名用途
count([DISTINCT] col)统计行数
avg(col)平均数
min(col)最小值
max(col)最大值
sum(col)求和
percentile_approx(col, p)返回组中列的精确第p个百分点 p必须介于0和1之间
stddev(col)标准差
collect_set(col)字段去重后的集合
collect_list(col)字段不去重的集合
corr(col1, col2)返回两列的皮尔逊相关系数

解释:

  1. count(统计行数):对分组内的数据条数进行统计,有如下几种方式:
  2. count(1):该种方式会对分组所有的记录进行统计,没有进行任何的过滤(1可以换成任何整数,习惯使用1
  3. count(*):该方式同count(1)
  4. count(col):根据某列进行统计,如果某一条记录该字段为null,那么此记录不会统计
  5. count(distinct col):与count(col)相比,该方法会对字段的值进行去重后统计

例如,首先查询门店所有销售记录的商品名称,并根据商品名称字段进行统计数据条数

SQL:
 
select
  order_date as od,
  count(1) as count1,
  count(*) as count2,
  count(product_code) as count3,
  count(product_name) as count4,
  round(avg(payment_amount),1) as avg,
  min(payment_amount) as min,
  max(payment_amount) as max,
  sum(payment_amount) as sum,
  round(percentile_approx(payment_amount, 0.5),1) as percentile,
  round(stddev(payment_amount),1) as stddev,
  collect_set(shop_no) as set,
  collect_list(shop_no) as list
from (
  select
    t1.dt,
    t1.order_no,
    t1.shop_no,
    t1.product_code,
    t3.product_code as product_code2,
    t3.product_name,
    t1.order_date,
    t1.payment_amount
  from dw_wms_outbound_info_v100 t1
  left join dim_sku_info_v100 t3 on t1.product_code = t3.product_code
) t1
group by order_date
 
查询结果:
 
+----------+------+------+------+------+----+---+---+---+----------+------+---------------+----------------------+
|od        |count1|count2|count3|count4|avg |min|max|sum|percentile|stddev|set            |list                  |
+----------+------+------+------+------+----+---+---+---+----------+------+---------------+----------------------+
|2020-08-02|3     |3     |3     |2     |23.3|20 |30 |70 |20        |5.8   |["0011"]       |["0011","0011","0011"]|
|2020-08-01|2     |2     |2     |2     |70.0|50 |90 |140|50        |28.3  |["0011","0012"]|["0011","0012"]       |
|2020-07-31|2     |2     |2     |2     |25.0|10 |40 |50 |10        |21.2  |["0011","0012"]|["0012","0011"]       |
+----------+------+------+------+------+----+---+---+---+----------+------+---------------+----------------------+

2、分析&窗口函数

常用的函数:

函数名用途
row_number() OVER([partition_by_clause] order_by_clause)分组排序,同一组内相同值会给不同序号
rank() OVER([partition_by_clause] order_by_clause)组内从1开始按order by排序后,将相同的值输出为相同的值,其它值不变(好乱,看下面样例吧)
dense_rank() OVER([partition_by_clause] order_by_clause)返回从1开始的递增整数序列。输出序列为ORDER BY表达式的重复值生成重复整数
lag(expr [, offset] [, default]) OVER ([partition_by_clause] order_by_clause)该函数使用前一行的列值返回表达式的值。您可以指定一个整数偏移量,它指定一个行位置,位于当前行之前的某些行数
lead与lag对应,记住一个就可以

2.1 row_number | rank | dense_rank

例如 :根据支付金额排序
SQL:
 
select
    order_date,
    order_no,
    shop_no,
    payment_amount,
    row_number() over(order by payment_amount) as row_number,
    rank() over(order by payment_amount) as rank,
    dense_rank() over(order by payment_amount) as dense_rank
from dw_wms_outbound_info_v100
 
 
结果:
+----------+--------+-------+--------------+----------+----+----------+
|order_date|order_no|shop_no|payment_amount|row_number|rank|dense_rank|
+----------+--------+-------+--------------+----------+----+----------+
|2020-07-31|10700001|0011   |10            |1         |1   |1         |
|2020-08-02|10700002|0011   |20            |2         |2   |2         |
|2020-08-02|10700003|0011   |20            |3         |2   |2         |
|2020-08-02|10700002|0011   |30            |4         |4   |3         |
|2020-07-31|10700004|0012   |40            |5         |5   |4         |
|2020-08-01|10700004|0012   |50            |6         |6   |5         |
|2020-08-01|10700001|0011   |90            |7         |7   |6         |
+----------+--------+-------+--------------+----------+----+----------+

2.2 lag | lead

例如 :计算每家门店每天相对于昨天的销售额的变化?

SQL:
 
select
  t1.order_date,
  t1.shop_no,
  t1.payment_amount,
  -- 以下为昨天的销售额
  lag(payment_amount) over(partition by shop_no order by t1.order_date) as yest_payment_amount,
  lag(payment_amount,1, 0) over(partition by shop_no order by t1.order_date) as yest_payment_amount2,
  lag(payment_amount,2, 0) over(partition by shop_no order by t1.order_date) as yest_payment_amount3,
  lead(payment_amount,1, 0) over(partition by shop_no order by t1.order_date) as yest_payment_amount4,
  lag(payment_amount,1, 0) over(partition by shop_no order by t1.order_date desc) as yest_payment_amount5
from (
  select
    order_date,
    shop_no,
    sum(payment_amount) as payment_amount --当天销售额
  from dw_wms_outbound_info_v100
  group by
    order_date,
    shop_no
) t1
 
 
结果:
+----------+-------+--------------+-------------------+--------------------+--------------------+--------------------+--------------------+
|order_date|shop_no|payment_amount|yest_payment_amount|yest_payment_amount2|yest_payment_amount3|yest_payment_amount4|yest_payment_amount5|
+----------+-------+--------------+-------------------+--------------------+--------------------+--------------------+--------------------+
|2020-07-31|0012   |40            |NULL               |0                   |0                   |50                  |50                  |
|2020-08-01|0012   |50            |40                 |40                  |0                   |0                   |0                   |
|2020-07-31|0011   |10            |NULL               |0                   |0                   |90                  |90                  |
|2020-08-01|0011   |90            |10                 |10                  |0                   |70                  |70                  |
|2020-08-02|0011   |70            |90                 |90                  |10                  |0                   |0                   |
+----------+-------+--------------+-------------------+--------------------+--------------------+--------------------+--------------------+

2.3 over

与over 配置使用的聚合函数:

  1. COUNT
  2. SUM
  3. MIN
  4. MAX
  5. AVG

使用样例:

SQL:
 
select
  t1.order_date,
  t1.order_no,
  t1.shop_no,
  t1.product_code,
  t1.payment_amount,
  --统计每天门店每天的单量
  count(order_no) over(partition by order_date, shop_no) as shop_order_cnt,
  --统计每天门店每天的销售额
  sum(payment_amount) over(partition by order_date, shop_no) as payment_amount_sum,
  --统计每天所有门店单品最高支付金额
  max(payment_amount) over(partition by order_date) as payment_amount_max
from dw_wms_outbound_info_v100 t1
 
结果:
 
+----------+--------+-------+------------+--------------+--------------+------------------+------------------+
|order_date|order_no|shop_no|product_code|payment_amount|shop_order_cnt|payment_amount_sum|payment_amount_max|
+----------+--------+-------+------------+--------------+--------------+------------------+------------------+
|2020-08-02|10700002|0011   |5503        |30            |3             |70                |30                |
|2020-08-02|10700002|0011   |5501        |20            |3             |70                |30                |
|2020-08-02|10700003|0011   |5501        |20            |3             |70                |30                |
|2020-08-01|10700004|0012   |6601        |50            |1             |50                |90                |
|2020-08-01|10700001|0011   |5502        |90            |1             |90                |90                |
|2020-07-31|10700004|0012   |6601        |40            |1             |40                |40                |
|2020-07-31|10700001|0011   |5501        |10            |1             |10                |40                |
+----------+--------+-------+------------+--------------+--------------+------------------+------------------+
 

3、复杂类型

函数名说明demo结果
map(key1, value1, …)根据给定的key-value生成mapselect map(‘a’, ‘001’, ‘b’,‘002’) as res
select map(‘a’, ‘001’, ‘b’,‘002’, ‘a’, ‘003’)[‘a’] as res
{“a”:“001”,“b”:“002”}
003
named_struct(name1, val1, name2, val2, …)创建struct结构select named_struct(‘expression’,‘1*1’,‘unit’,10).expression1*1
array(val1, val2, …)根据给定的元素生成array生成数组:select array(‘a’, ‘b’) as res
查询元素:select array(‘a’, ‘b’)[0] as res
[“a”,“b”]
a

4、集合函数

函数名说明demo结果
array_containsReturns TRUE if the array contains value.select array_contains(array(‘a’, ‘b’), ‘b’) as res
select array_contains(array(‘a’, ‘b’), ‘c’) as res
true
false
size返回数组或map元素的个数select size(array(‘a’, ‘b’)) as res
select size(map(‘a’, ‘b’)) as res
2
1
sort_array对数组进行排序select sort_array( array(‘b’, ‘c’,‘a’ ) ) as res[“a”,“b”,“c”]

参考文档

1、Windowing and Analytics Functions
2、HUE–hive函数大全

©️2020 CSDN 皮肤主题: 数字20 设计师:CSDN官方博客 返回首页