hive 拉链表

拉链表的概念

以订单表为例,表中90%的数据基本不会随着时间而变化,只有最近一段时间内的数据会有变化 
对于这种类型的表,我们往往需要保存好每一条数据的每一天的状态

方案一:可以每天保存一份全量表,并长期存储,这样可以实现每天状态的保存,
也方便查询任何一天中数据的状态;弊端:由于表中90%的数据都不会变化,因此,
各天的全量表,其实大量数据都是相同的,存储冗余度太高
方案二:使用拉链表模型,来实现每条数据每天状态的变化情况。
优点:既能保留每天状态,又比较节省存储空间;弊端:使用、查询的时候,略增加了一点复杂性

实例:

2.8日原始数据

linux>vi zip.txt
10001,zhangsan,888,未支付,2023-02-01,2023-02-02
10001,zhangsan,888,已支付,2023-02-03,2023-02-03
10001,zhangsan,888,已发货,2023-02-04,2023-02-07
10001,zhangsan,888,已完成,2023-02-07,9999-12-31
10002,lisi,800,未支付,2023-02-06,2023-02-07
10002,lisi,800,已支付,2023-02-08,9999-12-31

2.9日新增数据

linux>vi incr.txt
10002,lisi,800,已发货
10003,wangwu,888,未支付

变化过程

10001,zhangsan,888,未支付,2023-02-01,9999-12-31	#第一次生成数据,2.1号状态‘未支付’,结束时间为9999-12-31

10001,zhangsan,888,未支付,2023-02-01,2023-02-02	#第二次用户支付 状态发生变化,第一次结束时间闭合,支付时间为2.2号
10001,zhangsan,888,已支付,2023-02-03,9999-12-31	#第二次状态变化,状态已支付完成,结束时间为9999-12-31

hive建表导入数据

hive>create table zip1(
id int,
name string,
money string,
OrderStatus string,
StartTime string,
EndTime string
)
partitioned by (dt string)
row format delimited fields terminated by ',';
load data local inpath '/root/tmp_data/test/zip.txt' into table zip1 partition(dt='2023-02-08');



hive>create table incr1(
id int,
name string,
money string,
OrderStatus string
)
partitioned by (dt string)
row format delimited fields terminated by ',';
load data local inpath '/root/tmp_data/test/incr.txt' into table incr1 partition(dt='2023-02-09');

hive>create table MergeZipper(
id int,
name string,
money string,
OrderStatus string,
StartTime string,
EndTime string
)
partitioned by (dt string)
row format delimited fields terminated by ',';

计算拉链并写入到MergeZipper表中

hive>with zips as(
select
id,
name,
money,
OrderStatus,
StartTime,
EndTime
from zip1 where dt='2023-02-08'
),
incrs as (
select
id,
name,
money,
OrderStatus
from incr1 where dt='2023-02-09'
)
INSERT into table MergeZipper   partition(dt='2023-02-09')
select
z.id,
z.name,
z.money,
z.OrderStatus,
z.StartTime,
if( z.EndTime='9999-12-31' and i.id is not null,'2023-02-08',z.EndTime) as EndTime
from zips z left join incrs i
on (z.id=i.id)
union all
select
i.id,i.name,i.money,i.OrderStatus,'2023-02-09' as StartTime,'9999-12-31' as EndTime
from incrs i;

或者

hive>with zips as(
select
id,
name,
money,
OrderStatus,
StartTime,
EndTime
from zip1 where dt='2023-02-08'
),
incrs as (
select
id,
name,
money,
OrderStatus
from incr1 where dt='2023-02-09'
)
INSERT overwrite table MergeZipper   partition(dt='2023-02-09')
select
z.id,
z.name,
z.money,
z.OrderStatus,
z.StartTime,
if( z.EndTime='9999-12-31' and i.id is not null,'2023-02-08',z.EndTime) as EndTime
from zips z left join incrs i
on (z.id=i.id)
union all
select
i.id,i.name,i.money,i.OrderStatus,'2023-02-09' as StartTime,'9999-12-31' as EndTime
from incrs i;

查询拉链表数据

hive>select * from MergeZipper where  StartTime<='2023-02-08' and EndTime>='2023-02-08';

+-----------------+-------------------+--------------------+--------------------------+------------------------+----------------------+-----------------+-
-+| mergezipper.id  | mergezipper.name  | mergezipper.money  | mergezipper.orderstatus  | mergezipper.starttime  | mergezipper.endtime  | mergezipper.dt  |
+-----------------+-------------------+--------------------+--------------------------+------------------------+----------------------+-----------------+-
-+| 10001           | zhangsan          | 888                | 已完成                      | 2023-02-07             | 9999-12-31           | 2023-02-09     
 || 10002           | lisi              | 800                | 已支付                      | 2023-02-08             | 2023-02-08           | 2023-02-09     
 || 10001           | zhangsan          | 888                | 已完成                      | 2023-02-07             | 9999-12-31           | 2023-02-10     
 || 10002           | lisi              | 800                | 已支付                      | 2023-02-08             | 2023-02-08           | 2023-02-10     
 |+-----------------+-------------------+--------------------+--------------------------+------------------------+----------------------+-----------------+-
-+4 rows selected (0.163 seconds)


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值