HQL之种树题解

一、数据

user_low_carbon(user_id String,data_dt String,low_carbon int)
u_001 2017/1/1 10
plant_carbon(plant_id string,plant_name String,low_carbon int)

二、需求一

1.蚂蚁森林植物申领统计

问题:假设2017年1月1日开始记录低碳数据(user_low_carbon),
假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨,
剩余的能量全部用来领取“p002-沙柳”?。
统计在10月1日累计申领“p002-沙柳”?排名前10的用户信息;以及他比后一名多领了几颗沙柳。
得到的统计结果如下表样式:
user_id??plant_count?less_count(比后一名多领了几颗沙柳)

①求用户从2017年1月1日开始,到2017年10月1日一共收集了多少了碳

select  user_id,sum(low_carbon) sumCarbon
from user_low_carbon
where year(regexp_replace(data_dt,'/','-'))=2017 and month(regexp_replace(data_dt,'/','-'))  between 1 and 9
group by user_id  //t1

②求胡杨的单价

select low_carbon hycarbon from  plant_carbon where plant_id='p004'  //t2

③求沙柳的单价

select low_carbon slcarbon from  plant_carbon where plant_id='p002'  //t3

④求 (用户的碳总量-1颗胡杨的单价)/沙柳单价,之后向下取整,这是每个用户供领取多少课沙柳
按照沙柳的总量降序排序,取前11名的信息

select user_id,sumCarbon,floor((sumCarbon-hycarbon)/slcarbon) slcount
from 
t1 join t2 join t3
where sumCarbon>=hycarbon 
order by slcount desc
limit 11    //t4

⑤再从top11中取前10名的信息,再计算每一名比后一名多领了多少颗

select t4.*,slcount-lead(slcount,1,0) over(order by slcount desc) diffcount
from t4
limit 10

组合:

select t4.*,slcount-lead(slcount,1,0) over(order by slcount desc) diffcount
from
(select user_id,sumCarbon,floor((sumCarbon-hycarbon)/slcarbon) slcount
from 
(select  user_id,sum(low_carbon) sumCarbon
from user_low_carbon
where year(regexp_replace(data_dt,'/','-'))='2017' and month(regexp_replace(data_dt,'/','-'))  between 1 and 9
group by user_id )t1 
join 
(select low_carbon hycarbon from  plant_carbon where plant_id='p004')t2 
join
(select low_carbon slcarbon from  plant_carbon where plant_id='p002') t3
order by slcount desc
limit 11 )t4
limit 10

三、需求二

1、数据

user_low_carbon(user_id String,data_dt String,low_carbon int)
u_001 2017/1/1 10
plant_carbon(plant_id string,plant_name String,low_carbon int)

2.问题

问题:查询user_low_carbon表中每日流水记录,条件为:
用户在2017年,连续三天(或以上)的天数里,
每天减少碳排放(low_carbon)都超过100g的用户低碳流水。
需要查询返回满足以上条件的user_low_carbon表中的记录流水。
例如用户u_002符合条件的记录如下,因为2017/1/2~2017/1/5连续四天的碳排放量之和都大于等于100g:

条件1: 每天都得收集超过100g

条件2: 连续3天或以上达标

①求每个用户每天一共收集了多少碳
②对用户收集的总的碳量进行过滤,只剩用户收集的超过100g的日期


select user_id,regexp_replace(data_dt,'/','-') mydate,sum(low_carbon) sumCarbon
from user_low_carbon
where year(regexp_replace(data_dt,'/','-'))=2017
group by user_id,data_dt
having sumCarbon>100    //t1

③再判断这些日期是否满足连续3天这个条件
何谓连续3天?
如何判断当前记录属于连续3天?
如果当前这条记录的日期符合连续3天,那么这条日期应该属于以下情况中的任意一种:
当前此条日期位于连续3天中的第一天,用当前日期减去后一条的日期=-1,用当前日期减去后两条的日期=-2;
当前此条日期位于连续3天中的第二天,用当前日期减去后一条的日期=-1,用当前日期减去前一条的日期=1;
当前此条日期位于连续3天中的第三天,用当前日期减去前一条的日期=1,用当前日期减去前两条的日期=22;

select user_id,mydate,
lag(mydate,1,'1970-1-1') over(partition by user_id order by mydate) pre1day,
lag(mydate,2,'1970-1-1') over(partition by user_id order by mydate) pre2day,
lead(mydate,1,'1970-1-1') over(partition by user_id order by mydate) after1day,
lead(mydate,2,'1970-1-1') over(partition by user_id order by mydate) after2day
from t1   //t2

做差判断

select  user_id,regexp_replace(mydate,'-','/') data_dt
from t2
where 
(datediff(mydate,after1day)=-1 and datediff(mydate,after2day)=-2)
or
(datediff(mydate,pre1day)=1 and datediff(mydate,after1day)=-1)
or
(datediff(mydate,pre1day)=1 and datediff(mydate,pre2day)=2)  //t3

④将复合条件的日期,再和原表进行关联,求出每日的具体流水

select ul.*
from t3 join user_low_carbon ul
on t3.user_id=ul.user_id and t3.data_dt=ul.data_dt

总的

select ul.*
from 
(select  user_id,regexp_replace(mydate,'-','/') data_dt
from
(select user_id,mydate,
lag(mydate,1,'1970-1-1') over(partition by user_id order by mydate) pre1day,
lag(mydate,2,'1970-1-1') over(partition by user_id order by mydate) pre2day,
lead(mydate,1,'1970-1-1') over(partition by user_id order by mydate) after1day,
lead(mydate,2,'1970-1-1') over(partition by user_id order by mydate) after2day
from
(select user_id,regexp_replace(data_dt,'/','-') mydate,sum(low_carbon) sumCarbon
from user_low_carbon
where year(regexp_replace(data_dt,'/','-'))=2017
group by user_id,data_dt
having sumCarbon>100 ) t1)t2
where 
(datediff(mydate,after1day)=-1 and datediff(mydate,after2day)=-2)
or
(datediff(mydate,pre1day)=1 and datediff(mydate,after1day)=-1)
or
(datediff(mydate,pre1day)=1 and datediff(mydate,pre2day)=2))t3 join user_low_carbon ul
on t3.user_id=ul.user_id and t3.data_dt=ul.data_dt

3.解法2

思路二: 在3天连续上进行改进求法
3天连续=连续+3天

总结规律:
假设A列,起始值为a,每次递增x
假设B列,起始值为b,每次递增y
如果A列和B列都连续递增,那么A列和B列,每两行之间的差值,都以 y-x 递增
A B 做差
a b b-a
a+x b+y (b-a)+(y-x)
a+2x b+2y (b-a)+2(y-x)
①如何判断日期是连续的
日期如果是连续的,那么排序后,紧邻的两行日期做差差值一定为1

借助一个参考列,参考列一定是连续的,且以1为单位递增

举例: col1 差值
u_002 2017-1-2 1 2017-1-1
u_002 2017-1-3 2 2017-1-1
u_002 2017-1-4 3 2017-1-1
u_002 2017-1-6 4 2017-1-2
u_002 2017-1-7 5 2017-1-2
u_002 2017-1-9 6 2017-1-3
u_002 2017-1-10 7 2017-1-3
u_002 2017-1-11 8 2017-1-3

u_005 2017-1-2 1
u_005 2017-1-3 2

②如何判断此日期已经3天连续
做差后的数据,以差值和用户为单位分组,如果组内count(*)>=3,3天连续

N天连续:
做差后的数据,以差值和用户为单位分组,如果组内count(*)>=N,N天连续


select user_id,regexp_replace(data_dt,'/','-') mydate,sum(low_carbon) sumCarbon
from user_low_carbon
where year(regexp_replace(data_dt,'/','-'))=2017
group by user_id,data_dt
having sumCarbon>100    //t1

以用户为单位分区,求出参考列和差值

select user_id,mydate,row_number() over(partition by user_id) rn,
date_sub(mydate,row_number() over(partition by user_id)) diff
from t1    //t2

判断3天连续,以差值和用户为单位分组,如果组内count(*)>=3,3天连续
group by 无法取到日期数据,使用窗口函数取

select user_id,mydate,count(*) over(partition by user_id,diff) mycount
from t2  //t3

过滤出3天连续的数据


select user_id,regexp_replace(mydate,'-','/') data_dt
from t3
where mycount>=3  //t4

将复合条件的日期,再和原表进行关联,求出每日的具体流水

select ul.*
from t4 join user_low_carbon ul
on t4.user_id=ul.user_id and t4.data_dt=ul.data_dt

总的

select ul.*
from 
(select user_id,regexp_replace(mydate,'-','/') data_dt
from 
(select user_id,mydate,count(*) over(partition by user_id,diff) mycount
from (select user_id,mydate,row_number() over(partition by user_id) rn,
date_sub(mydate,row_number() over(partition by user_id)) diff
from (select user_id,regexp_replace(data_dt,'/','-') mydate,sum(low_carbon) sumCarbon
from user_low_carbon
where year(regexp_replace(data_dt,'/','-'))=2017
group by user_id,data_dt
having sumCarbon>100)t1)t2)t3
where mycount>=3)t4 join user_low_carbon ul
on t4.user_id=ul.user_id and t4.data_dt=ul.data_dt
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值