拉链表:记录一个事物从开始到当前的状态变化
REFERENCE:
一个简单的拉链表,记录了每个用户随着时间的变化其等级的变动情况
user_id | name | level | start_time | end_time |
---|---|---|---|---|
1 | 田家炳 | A | 2000-01-01 | 9999-12-31 |
2 | 张国荣 | B | 2000-01-01 | 2000-01-02 |
2 | 张国荣 | A | 2000-01-02 | 9999-12-31 |
3 | 尊龙 | B | 2000-01-02 | 9999-12-31 |
简单实现流程
-
创建 2000-01-01 的数据
CREATE TABLE IF NOT EXISTS a( user_id BIGINT ,name STRING ,level STRING ,time STRING ); INSERT INTO a VALUES (1,'田家炳','A','2000-01-01'), (2,'张国荣','B','2000-01-01'); SELECT * FROM a;
user_id name level time 1 田家炳 A 2000-01-01 2 张国荣 B 2000-01-01 -
创建 2000-01-02 的数据
CREATE TABLE IF NOT EXISTS b( user_id BIGINT ,name STRING ,level STRING ,time STRING ); INSERT INTO b VALUES (1,'田家炳','A','2000-01-02'), (2,'张国荣','A','2000-01-02'), (3,'尊龙','B','2000-01-02'); SELECT * FROM b;
user_id name level time 1 田家炳 A 2000-01-02 2 张国荣 A 2000-01-02 3 尊龙 B 2000-01-02 -
创建历史表存储 2000-01-01 的数据,并对格式进行整理
CREATE TABLE IF NOT EXISTS level_his( user_id BIGINT ,name STRING ,level STRING ,start_time STRING ,end_time STRING ); INSERT OVERWRITE TABLE level_his SELECT * FROM ( SELECT DISTINCT(user_id),name,level,'2000-01-01' as start_time,'9999-12-31' as end_time FROM a ) AS t; SELECT * from level_his;
user_id name level start_time end_time 1 田家炳 A 2000-01-01 9999-12-31 2 张国荣 B 2000-01-01 9999-12-31 -
创建变动表,找到 2000-01-02 表里面变动和新增的
-- level 不相同,查的是变更的 -- 左表 level 为空,查的是右表的新增 SELECT b.* FROM a RIGHT JOIN b ON a.user_id = b.user_id WHERE a.level != b.level OR a.level IS NULl;
CREATE TABLE IF NOT EXISTS level_update( user_id BIGINT ,name STRING ,level STRING ,time STRING ); INSERT OVERWRITE TABLE level_update SELECT b.* FROM a RIGHT JOIN b ON a.user_id = b.user_id WHERE a.level != b.level OR a.level IS NULl; SELECT * FROM level_update;
user_id name level time 2 张国荣 A 2000-01-02 3 尊龙 B 2000-01-02 -
最终结果合并
将变动表中的数据插入到历史表
INSERT OVERWRITE TABLE level_his -- 之前已存在,发生变更前的记录 SELECT a.user_id ,a.name ,a.level ,a.start_time ,(CASE WHEN a.end_time ='9999-12-31' AND b.user_id IS NOT NULL THEN '2000-01-02' ELSE a.end_time END) AS end_time FROM level_his AS a LEFT JOIN level_update b ON a.user_id=b.user_id UNION -- 之前已经存在,发生变更后的记录;之前不存在,新增的记录 SELECT user_id ,name ,level ,'2000-01-02' AS start_time ,'9999-12-31' AS end_time FROM level_update;
user_id name level start_time end_time 1 田家炳 A 2000-01-01 9999-12-31 2 张国荣 B 2000-01-01 2000-01-02 user_id name level start_time end_time 2 张国荣 A 2000-01-02 9999-12-31 3 尊龙 B 2000-01-02 9999-12-31