需求概述
近期公司开发某项学习功能,改功能有很多学习内容(如java,C,python等方向),每天都会有众多学习用户学习某一项或者多项学习内容。产生数据如下表:
日期 | 内容 | 学习用户 |
---|---|---|
2022-01-01 | java | u1 |
2022-01-02 | java | u1 |
2022-01-02 | java | u2 |
2022-01-01 | C | u1 |
2022-01-01 | C | u3 |
2022-01-01 | Python | u4 |
2022-01-02 | Python | u4 |
2022-01-02 | Python | u5 |
2022-01-02 | Python | u6 |
期望数据
现在想要计算截止每天每个学习内容的截止去重学习用户数,但是截止去重用户数小于等于1的要被过滤,期望数据如下:
日期 | 内容 | 去重截止学习用户数 |
---|---|---|
2022-01-02 | java | 2 |
2022-01-01 | C | 2 |
2022-01-02 | Python | 3 |
截止到2022-01-01,学习内容java的为去重用户数为1,学习内容C的为去重用户数为2,学习内容Python的为去重用户数为1。所以2022-01-01学习内容为java和python的都要内过滤。
基本思路
将所有天数和学习种类去重后作为临时表b,将明细表作为表a,a表内关联b表,关联条件为a的pdate要小于等于b的pdate并且学习内容一致,这样每个学习种类的截止到每天的学习用户数就取出了。再分组求和,having过滤组内数量即可。
逻辑实现
with tmp as ( -- 基础数据
select '2022-01-01' as pdate,'java' as icate, 'u1' as user
union all
select '2022-01-02' as pdate,'java' as icate, 'u1' as user
union all
select '2022-01-02' as pdate,'java' as icate, 'u2' as user
union all
select '2022-01-01' as pdate,'C' as icate, 'u1' as user
union all
select '2022-01-01' as pdate,'C' as icate, 'u3' as user
union all
select '2022-01-01' as pdate,'Python' as icate, 'u4' as user
union all
select '2022-01-02' as pdate,'Python' as icate, 'u4' as user
union all
select '2022-01-02' as pdate,'Python' as icate, 'u5' as user
union all
select '2022-01-02' as pdate,'Python' as icate, 'u6' as user
),
tmp1 as (
select distinct pdate,icate from tmp
)
select
a.pdate,a.icate,count(distinct a.user) as icount
from
tmp a
join
tmp1 b
on
a.pdate <= b.pdate
and
a.icate = b.icate
group by
a.pdate,a.icate
having
icount > 1;