count(distinct ) over(partition by order by)替换成size(collect_set() over(partition by order by))

这个函数的大致意思是:在分组内求去重后的数量

为什么不用count(distinct ),不支持原因

例子:

数据准备:

SELECT '1' as id ,'201808' as m,'a' as k
union all
SELECT '2' as id ,'201808' as m,'a' as k
union all
SELECT '1' as id ,'201809' as m,'a' as k
union all
SELECT '1' as id ,'201808' as m,'b' as k
union all
SELECT '2' as id ,'201809' as m,'b' as k


id代表人编号,m代表月份,k代表其他key键。

idmk
1201808a
2201808a
1201809a
1201808b
2201809b


需求:本月累计人数(即9月份的客户要包含9月以前的客户数)

预计结果

k          m            n

a        201808     2

a        201809     2

b        201808    1

b        201809    2

第一步:

SELECT
k,
m,
size(collect_set(id) OVER(PARTITION BY k ORDER BY m asc rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )) as n
from
(
SELECT '1' as id ,'201808' as m,'a' as k
union all
SELECT '2' as id ,'201808' as m,'a' as k
union all
SELECT '1' as id ,'201809' as m,'a' as k
union all
SELECT '1' as id ,'201808' as m,'b' as k
union all
SELECT '2' as id ,'201809' as m,'b' as k
)t
order by k,m

 

kmn
a2018081
a2018082
a2018092
b2018081
b2018092


第二步:取出分组内最后一条即可

SELECT
k,
m,
n,
row_number() over(PARTITION BY k,m ORDER BY n desc)rk
from
(
SELECT
k,
m,
size(collect_set(id) OVER(PARTITION BY k ORDER BY m asc rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )) as n
from
(
SELECT '1' as id ,'201808' as m,'a' as k
union all
SELECT '2' as id ,'201808' as m,'a' as k
union all
SELECT '1' as id ,'201809' as m,'a' as k
union all
SELECT '1' as id ,'201808' as m,'b' as k
union all
SELECT '2' as id ,'201809' as m,'b' as k
)t
order by k,m
)having rk = 1
 

结果为:

kmnrk
a20180821
a20180921
b20180811
b20180921

 

若需要了解rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,请参考https://mp.csdn.net/postedit/82909903

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值