- 本方案适合小表全量导入时
绪论、需求说明
- 将ODS维表数据全量抽取到DIM层,保留数据所有的历史变化状态
create table T_YYBZB_TGH_BANKINFO
(
id int(8),
bank_id int(8),
bank_name varchar(200),
source_date varchar(200)
);
insert into T_YYBZB_TGH_BANKINFO (ID, BANK_ID, BANK_NAME)values (11, 11, '工商银行(广州)','20210101');
1、创建Hive拉链表
(1)编写建表脚本
- 维表中添加三列,分别为start_time,end_time,status
create table dim.dim_t_yybzb_tgh_bankinfo1_di
(
id int,
bank_id int,
bank_name string,
start_time string,
end_time string,
flag int
)
row format delimited fields terminated by ','
(2)配置SQL组件
-
数据源:Spark
-
sql类型:非查询
(3)部署上线
在这里插入图片描述
(4)运行脚本
- 进入画布
- 运行建表脚本
- 注意:一次性脚本都是在画布中单独运行,不要直接调度整个JOB
2、初始化导入数据
(1)编写初始化SQL脚本,将数据抽到tmp表中
insert overwrite table tmp.tmp_t_yybzb_tgh_bankinfo1_di
select id,bank_id,bank_name,
substr(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyyMMdd HH:mm:ss'),1, 7) as start_time,
'99991231' as end_date,
'1' as flag
from ods.ods_t_yybzb_tgh_bankinfo_di
(2)编写SQL脚本,将tmp表数据抽到dim层
INSERT OVERWRITE TABLE dim.dim_t_yybzb_tgh_bankinfo1_di
select * from tmp.tmp_t_yybzb_tgh_bankinfo1_di
(3)其余步骤同上
3、拉链表更新配置
(1)整体任务流概览
(2)编写业务逻辑脚本, 更新start_time,end_time
INSERT OVERWRITE TABLE tmp.tmp_t_yybzb_tgh_bankinfo1_di
SELECT * FROM
(
SELECT A.id,
A.bank_id,
A.bank_name,
CASE
WHEN A.end_time = '99991231' AND B.id IS NOT NULL THEN '20170101'
ELSE A.end_time
END AS t_end_time
FROM dim.dim_t_yybzb_tgh_bankinfo1_di AS A
LEFT JOIN ods.ods_t_yybzb_tgh_bankinfo_di AS B
ON A.id = B.id
UNION
SELECT C.id,
C.bank_id,
C.bank_name,
'20170102' AS start_time,
'99991231' AS end_time
FROM ods.ods_t_yybzb_tgh_bankinfo1_di AS C
) AS T
(3)将临时表数据刷回dim层
INSERT OVERWRITE TABLE dim.dim_t_yybzb_tgh_bankinfo1_di
select * from tmp.tmp_t_yybzb_tgh_bankinfo1_di
(4)配置depedent组件
- 依赖一:昨天的datax作业
- 依赖二:昨天的dim的数据同步作业
(5)配置SQL组件
- SQL组件一:配置业务逻辑SQL将数据导入tmp表
- SQL组件二:配置数据同步SQL将数据导入dim表
(6)定时调度设置
- 上线部署
- 定时设置: 每天6点定时的调度