一有连续问题,就想到row_number,因为row_number可以创造一个连续的数序列,再通过目标字段与row_number的计算判断连续 !!!
(以下代码为DB2)
1、判断连续的核心:
- 连续的日期之间差是1,连续的符号之间差是1
- 如果连续,则这几行日期-符号的结果差是相同的,就要求排序不能有占位,如果有占位了,排序的行之间的查不为1,
- 故不能有rank(),只能为row_number或者dense_rank()
2、登陆连续三天:
快速理解:就用登陆日期-row_number列,如果这一列值相等,就是连续的
自己的实践:问题:求2023年2月连续5天以上产生报销费用的机构
1)首先筛选满足其他条件的数据
(下图是部门报销表举例看其中的60208439的部分)
60208439机构在0207、0208及0213、0214、0215有连续报销,0208产生了两次,0210产生了三次
2)制造连续
前置条件!!!
- 首先确定相同日期的登陆是算一天,还是算多天
- 如果算一天,就要distinct后,再row_number排序,否则
看到0210其实没有连续登录,但按照row_number也会判断为连续,只有distinct,才能满足每个日期不同,连续日期的差与连续数的差相等
- 如果算多天,就用dense_rank()而不是rank,因为rank会占位,使得排序之间的差不是1
第一种:如果相同日期的报销只是算一天
SELECT COMPANY_SAP_CODE,DHR_DEPT,cha,count(CHA)
FROM
(
SELECT a.*,ROW_NUMBER()over(PARTITION BY COMPANY_SAP_CODE,DHR_DEPT ORDER BY dat_e asc) AS rn,
int(DAT_E) - ROW_NUMBER()over(PARTITION BY COMPANY_SAP_CODE,DHR_DEPT ORDER BY dat_e asc) AS cha --在hive中用date_sub函数
FROM
(SELECT
DISTINCT
COMPANY_SAP_CODE,
DHR_DEPT,
date(TIME_M) AS dat_e
FROM table
WHERE COMPANY_SAP_CODE = '4200'
) a
)
GROUP BY COMPANY_SAP_CODE,DHR_DEPT,cha
HAVING count(cha)>2
第二种:如果相同日期的报销算多天
SELECT COMPANY_SAP_CODE,DHR_DEPT,cha,count(CHA)
FROM
(SELECT
COMPANY_SAP_CODE,DHR_DEPT,dat_e,rn,int(dat_e)-rn AS CHA
FROM
(SELECT
COMPANY_SAP_CODE,
DHR_DEPT,
date(TIME_M) AS dat_e,
DENSE_RANK()over(PARTITION BY DHR_DEPT ORDER BY date(TIME_M) asc) AS rn--相同日期的登陆算为几天
--ROW_NUMBER()over(PARTITION BY DHR_DEPT ORDER BY date(TIME_M) asc) AS rn--
--INT(TIME_M) - ROW_NUMBER()over(PARTITION BY DHR_DEPT ORDER BY date(TIME_M) asc) AS cal
FROM table
WHERE COMPANY_SAP_CODE = '4200')
) GROUP BY COMPANY_SAP_CODE,DHR_DEPT,cha
HAVING COUNT(cha)>2
里面的子查询部分:
最后的结果:
20230411更新:在尚硅谷的这道题中,也是先要判断同一天登录的算不算连续,还要注意在hive中,不能直接用date-rn,只能是用date_sub()
SELECT d.user_id from
(SELECT c.user_id,c.cha,count(c.cha) as cha
from
(
SELECT b.user_id,b.create_date,date_sub(b.create_date,b.rn) as cha
from
(
SELECT a.user_id,a.create_date,
row_number()over(partition by a.user_id order by a.create_date) as rn
from
(select DISTINCT user_id,create_date from order_info) a
)b
)c group by c.user_id,c.cha
HAVING count(c.cha)>3
)d