Hive通过炸裂函数实现按小时差X小时炸出X行数据

Hive通过炸裂函数实现按小时差X小时炸出X行数据

输入数据如下:
在这里插入图片描述
期待sql处理后的结果:
在这里插入图片描述

Hive SQL:

SELECT
	uid,
  start_date as start_date,
  next_mid_date as end_date
from 
(
  SELECT
    uid,
    start_date,
    end_date,
    mid_date,
    lead(mid_date,1,end_date) over (partition by uid,start_date order by pos) as next_mid_date,
    pos
  FROM
  (
    SELECT
      uid,
      start_date,
      end_date,
      mid_date,
      pos
    FROM
    (
      select 
        tmp.*, 
        t.*, 
        from_unixtime(unix_timestamp(start_date)+pos*3600,'yyyy-MM-dd HH:00:00') as mid_date
    from(
        select 
            4001 as uid
      ,'2023-12-17 07:19:36' as start_date
            ,'2023-12-17 07:23:10' as end_date
    UNION ALL
            select 
            4001 as uid
      ,'2023-12-17 07:32:30' as start_date
            ,'2023-12-17 09:24:56' as end_date
    UNION ALL
            select 
            4001 as uid
      ,'2023-12-17 09:25:57' as start_date
            ,'2023-12-17 09:49:49' as end_date
    UNION ALL
            select 
            4002 as uid
      ,'2023-12-18 19:25:57' as start_date
            ,'2023-12-18 20:49:49' as end_date
    UNION ALL
            select 
            4002 as uid
      ,'2023-12-18 21:25:57' as start_date
            ,'2023-12-18 23:49:49' as end_date
    )tmp
    lateral view posexplode( split( space( ceil( (unix_timestamp(end_date)-unix_timestamp(start_date))/3600 )), '' ) ) t as pos, val
    )
    where mid_date < end_date  
  )
)
where pos = 0

union all 

SELECT
	uid,
  mid_date as start_date,
  next_mid_date as end_date
from 
(
  SELECT
    uid,
    start_date,
    end_date,
    mid_date,
    lead(mid_date,1,end_date) over (partition by uid,start_date order by pos) as next_mid_date,
    pos
  FROM
  (
    SELECT
      uid,
      start_date,
      end_date,
      mid_date,
      pos
    FROM
    (
      select 
        tmp.*, 
        t.*, 
        from_unixtime(unix_timestamp(start_date)+pos*3600,'yyyy-MM-dd HH:00:00') as mid_date
    from(
        select 
            4001 as uid
      ,'2023-12-17 07:19:36' as start_date
            ,'2023-12-17 07:23:10' as end_date
    UNION ALL
            select 
            4001 as uid
      ,'2023-12-17 07:32:30' as start_date
            ,'2023-12-17 09:24:56' as end_date
    UNION ALL
            select 
            4001 as uid
      ,'2023-12-17 09:25:57' as start_date
            ,'2023-12-17 09:49:49' as end_date
          UNION ALL
            select 
            4002 as uid
      ,'2023-12-18 19:25:57' as start_date
            ,'2023-12-18 20:49:49' as end_date
    UNION ALL
            select 
            4002 as uid
      ,'2023-12-18 21:25:57' as start_date
            ,'2023-12-18 23:49:49' as end_date
    )tmp
    lateral view posexplode( split( space( ceil( (unix_timestamp(end_date)-unix_timestamp(start_date))/3600 )), '' ) ) t as pos, val
    )
    where mid_date < end_date  
  )
)
where pos != 0

1.unix_timestamp(date): 这是一个函数,它返回一个给定日期的UNIX时间戳。UNIX时间戳是从1970年1月1日开始的秒数。

2.unix_timestamp(end_date) - unix_timestamp(start_date): 这部分代码计算两个日期之间的秒数差。

3.ceil(…/3600): 这部分代码将上述计算的秒数差除以3600(一个小时的秒数)。ceil函数确保结果是一个整数,即使结果是3601秒,它也会四舍五入为1小时。

4.space(…, ‘’): 这个函数用于生成一个字符串,该字符串包含由两个参数确定的空格。在这里,第一个参数是上一步计算的结果(小时数),第二个参数是一个空字符串。因此,它将生成一个由多个空格字符组成的字符串,长度等于小时数。

5.split(…, ’ '): 这个函数将上述生成的空格字符串分割成一个数组。每个空格代表一个小时,所以分割后的数组包含从开始时间到结束时间的每个小时。

6.posexplode(…): 这是一个Hive的特殊函数,它对数组进行操作,并为数组中的每个元素生成两个字段:一个是索引(在这里是小时的序号),另一个是值(在这里是小时的名称)。

7.lateral view: 这是Hive SQL的一个关键字,用于与posexplode函数一起使用。它允许我们将posexplode的结果与原始表中的行相关联。

8.t as pos, val: 这部分代码为posexplode的结果定义了别名。t是索引字段的别名,pos是小时的序号,val是小时的名称。

  • 10
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值