最近在工作中遇到这样一个需求:
从一天24小时中找出客流量最大的90分钟。
基础数据格式如下:
一个字段是时段,一个字段是客流
如果通过python或者java循环遍历,非常easy,这里主要是通过sql实现。
用到了窗口函数
ROWS BETWEEN [ { m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | n } FOLLOWING] ]
RANGE BETWEEN [ {m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | n } FOLLOWING] ]
详细可以参考impala分析函数
实现:
1、滑动累计求和:
sum(od_cnt)over(partition BY line_name,up_down ORDER BY begin_time ROWS BETWEEN CURRENT ROW AND 8 FOLLOWING)AS slide_sum
2、根据滑动的值进行排名或者取最大
rank()over(partition BY line_name,up_down ORDER BY slide_sum)