HIVE卡口流量需求分析

HIVE卡口流量需求分析

目录

HIVE卡口流量需求分析

1.创建表格 插入数据

2.需求

3.总结:


1.创建表格 插入数据

CREATE TABLE learn3.veh_pass(
id STRING COMMENT "卡口编号",
pass_time STRING COMMENT "进过时间",
pass_num int COMMENT "过车数"
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE;

load data local inpath "/usr/local/soft/hive-3.1.2/data/veh_pass.txt" INTO TABLE learn3.veh_pass; 

2.需求

需求1:查询四月的设备及其设备种类总数

(如果是查询当前月可以使用语句:substr(pass_time,1,7) = substr(current_date,1,7))


-- 写法1
SELECT
T.id
,count(*) OVER()
FROM (
SELECT
id 
,pass_time
FROM learn3.veh_pass
WHERE substr(pass_time,1,7) = substr(current_date,1,7)
) T GROUP BY T.id


-- 错误写法
SELECT
DISTINCT id 
,count(*) OVER()
FROM (
SELECT
id 
,pass_time
FROM learn3.veh_pass
WHERE substr(pass_time,1,7) = substr(current_date,1,7)
)T 


-- 写法2:
SELECT
T1.id
,count(*) OVER()
FROM (
SELECT
DISTINCT T.id
FROM (
SELECT
id 
,pass_time
FROM learn3.veh_pass
WHERE substr(pass_time,1,7) = "2022-04"
)T )T1;

如果求的是四月每辆车的出现次数

select
t1.id
,count(*)
from
(
select
v.id
,v.pass_time
from learn3.veh_pass v
where substr(pass_time,1,7) = "2022-04"
) t1 group by t1.id;



+---------------------+-----------------+
|        t1.id        | count_window_0  |
+---------------------+-----------------+
| 451000000000071117  | 5               |
| 451000000000071116  | 5               |
| 451000000000071115  | 5               |
| 451000000000071114  | 5               |
| 451000000000071113  | 5               |
+---------------------+-----------------+

+---------------------+
|         id          |
+---------------------+
| 451000000000071113  |
| 451000000000071114  |
| 451000000000071115  |
| 451000000000071116  |
| 451000000000071117  |
+---------------------+



3.总结:

OVER():会为每条数据都开启一个窗口,默认窗口大小就是当前数据集的大小

OVER(PARTITION BY)会按照指定的字段进行分区,在获取一条数据时,窗口大小为整个分区的大小,之后根据分区中的数据进行计算

OVER(PARTITION BY ... ORDER BY ...)根据给定的分区,在获取一条数据时,窗口大小为整个分区的大小,并且对分区中的数据进行排序


-- 需求2:查询所有流量明细及所有设备月流量总额

SELECT
T1.id
,T1.pass_time
,T1.pass_num
,SUM(T1.pass_num) OVER(PARTITION BY SUBSTRING(T1.pass_time,1,7)) as total_pass
FROM learn3.veh_pass T1;


需求3:按设备编号日期顺序展示明细 并求
  

OVER中的取数据格式
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING

OVER():指定分析函数工作的数据窗口大小,这个数据窗口大小可能会随着行的改变而变化。
CURRENT ROW:当前行
n PRECEDING:往前n行数据
n FOLLOWING:往后n行数据
UNBOUNDED :起点,
UNBOUNDED PRECEDING 表示从前面的起点, 
UNBOUNDED FOLLOWING 表示到后面的终点

     

假设我们现在要取当前行 当前行的前一行数据和后一行数据 我们可以写、

ROW BETWEEN 1 PRECEDING and 1 FOLLOWING

1)从第一天开始到当前天数 对流量进行累加
    

SELECT
T1.*
,SUM(T1.pass_num) OVER(ORDER BY T1.pass_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM (
SELECT 
*
FROM learn3.veh_pass ORDER BY pass_time
) T1;

    

2)昨天与当前天流量累加
      


SELECT
T1.*
,SUM(T1.pass_num) OVER(ORDER BY T1.pass_time ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
FROM (
SELECT 
*
FROM learn3.veh_pass ORDER BY pass_time
) T1;

3)当前天数的前一天与后一天流量累加
        

SELECT
T1.*
,SUM(T1.pass_num) OVER(ORDER BY T1.pass_time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
FROM (
SELECT 
*
FROM learn3.veh_pass ORDER BY pass_time
) T1;

4)当前天与下一天的累加和
        

SELECT
T1.*
,SUM(T1.pass_num) OVER(ORDER BY T1.pass_time ROWS BETWEEN CURRENT ROW  AND 1 FOLLOWING)
FROM (
SELECT 
*
FROM learn3.veh_pass ORDER BY pass_time
) T1;

5)当前天数与之后所有天流量累加和

SELECT
T1.*
,SUM(T1.pass_num) OVER(ORDER BY T1.pass_time ROWS BETWEEN CURRENT ROW  AND UNBOUNDED FOLLOWING)
FROM (
SELECT 
*
FROM learn3.veh_pass ORDER BY pass_time
) T1;

需求4:查询每个设备编号上次有数据日期和下一次有数据日期
 

LAG(col,n,default_val):往前第n行数据
LEAD(col,n, default_val):往后第n行数据
NTILE(n):把有序窗口的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。

SELECT
T1.*
, LAG(T1.pass_time,1,"2022-01-01") OVER(PARTITION BY T1.id ORDER BY T1.pass_time) as before_time
, LEAD(T1.pass_time,1,"2022-12-31") OVER(PARTITION BY T1.id ORDER BY T1.pass_time) as after_time
FROM learn3.veh_pass T1;

  • 19
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值