HQL:求最高在线人数以及最高峰时间段
问题描述:
数据为主播ID,开播时间以及关播时间。
求该平台某一天主播同时在线人数最高为多少以及出现最高峰的时间段是哪个时间?
原始数据:
id startDate endDate
1001 2021-06-10 12:12:12 2021-06-10 17:12:12
1002 2021-06-10 12:12:12 2021-06-10 17:12:12
1003 2021-06-10 12:14:12 2021-06-10 17:12:12
1004 2021-06-10 12:12:12 2021-06-10 15:12:12
1005 2021-06-10 13:12:12 2021-06-10 17:12:12
1001 2021-06-10 19:12:12 2021-06-10 21:12:12
建表语句:
create table player
(
id string,
startDate string,
endDate string
)
row format delimited fields terminated by '\t';
分析:
1)因为这里有开播时间和关播时间,因此我们可以将这些数据当作流式数据做处理。
所以第一步:将startDate和endDate两列不同的数据合并到一列中,标记好上播和关播,并且按照时间排序。
1001,2021-06-10 12:12:12,login
1002,2021-06-10 12:12:12,login
1004,2021-06-10 12:12:12,login
1003,2021-06-10 12:14:12,login
1005,2021-06-10 13:12:12,login
1004,2021-06-10 15:12:12,logout
1001,2021-06-10 17:12:12,logout
1002,2021-06-10 17:12:12,logout
1003,2021-06-10 17:12:12,logout
1005,2021-06-10 17:12:12,logout
1001,2021-06-10 19:12:12,login
1001,2021-06-10 21:12:12,logout
子查询:
select id,
startDate actiontime,
'login' flag
from player
union all
select id,
endDate actiontime,
'logout' flag
from player
order by actiontime;
2)按照时间依次进行在线人数的统计,如果标记为上播,则人数+1,如果标记为关播,则人数-1。
1001,2021-06-10 12:12:12,login,1
1002,2021-06-10 12:12:12,login,2
1004,2021-06-10 12:12:12,login,3
1003,2021-06-10 12:14:12,login,4
1005,2021-06-10 13:12:12,login,5
1004,2021-06-10 15:12:12,logout,4
1001,2021-06-10 17:12:12,logout,3
1002,2021-06-10 17:12:12,logout,2
1003,2021-06-10 17:12:12,logout,1
1005,2021-06-10 17:12:12,logout,0
1001,2021-06-10 19:12:12,login,1
1001,2021-06-10 21:12:12,logout,0
子查询:
select id,
actiontime,
flag,
sum(if(flag = 'login', 1, -1)) over (order by actiontime rows between unbounded preceding and current row ) ct
from (
select id,
startDate actiontime,
'login' flag
from player
union all
select id,
endDate actiontime,
'logout' flag
from player
order by actiontime
) t1;
3)求出最高在线人数。
5
子查询:
select max(ct) max_ct
from (
select id,
actiontime,
flag,
sum(if(flag = 'login', 1, -1))
over (order by actiontime rows between unbounded preceding and current row ) ct
from (
select id,
startDate actiontime,
'login' flag
from player
union all
select id,
endDate actiontime,
'logout' flag
from player
order by actiontime
) t1
) t2;
4)求出每一次在线人数的开始时间和结束时间(每一个在线人数的时间范围)。
2021-06-10 12:12:12,2021-06-10 12:12:12,1
2021-06-10 12:12:12,2021-06-10 12:12:12,2
2021-06-10 12:12:12,2021-06-10 12:14:12,3
2021-06-10 12:14:12,2021-06-10 13:12:12,4
2021-06-10 13:12:12,2021-06-10 15:12:12,5
2021-06-10 15:12:12,2021-06-10 17:12:12,4
2021-06-10 17:12:12,2021-06-10 17:12:12,3
2021-06-10 17:12:12,2021-06-10 17:12:12,2
2021-06-10 17:12:12,2021-06-10 17:12:12,1
2021-06-10 17:12:12,2021-06-10 19:12:12,0
2021-06-10 19:12:12,2021-06-10 21:12:12,1
2021-06-10 21:12:12,9999-99-99,0
子查询:
select actiontime,
lead(actiontime, 1, '9999-99-99') over (order by actiontime) lead_time,
ct
from (
select id,
actiontime,
flag,
sum(if(flag = 'login', 1, -1))
over (order by actiontime rows between unbounded preceding and current row ) ct
from (
select id,
startDate actiontime,
'login' flag
from player
union all
select id,
endDate actiontime,
'logout' flag
from player
order by actiontime
) t1
) t2;
5)过滤出最高在线人数的时间范围。
2021-06-10 13:12:12,2021-06-10 15:12:12,5
最终查询:
select actiontime, -- 高峰段开始时间
lead_time, -- 高峰段结束时间
ct -- 最高在线人数
from (
select actiontime,
lead(actiontime, 1, '9999-99-99') over (order by actiontime) lead_time,
ct
from (
select id,
actiontime,
flag,
sum(if(flag = 'login', 1, -1))
over (order by actiontime rows between unbounded preceding and current row ) ct
from (
select id,
startDate actiontime,
'login' flag
from player
union all
select id,
endDate actiontime,
'logout' flag
from player
order by actiontime
) t1
) t2
) t3
join (
select max(ct) max_ct
from (
select id,
actiontime,
flag,
sum(if(flag = 'login', 1, -1))
over (order by actiontime rows between unbounded preceding and current row ) ct
from (
select id,
startDate actiontime,
'login' flag
from player
union all
select id,
endDate actiontime,
'logout' flag
from player
order by actiontime
) t1
) t2
) t4 on t3.ct = t4.max_ct;