Hive利用增量表更新全量表

最新推荐文章于 2024-02-29 10:57:51 发布

鸭梨山大哎

最新推荐文章于 2024-02-29 10:57:51 发布

阅读量2k

点赞数 1

分类专栏： hive 文章标签： hive 增量表全量表

本文链接：https://blog.csdn.net/u010711495/article/details/114215855

版权

hive 专栏收录该内容

114 篇文章 14 订阅

订阅专栏

需求

要求将只存在于u1而不存在于u2的的ID记录全部插入u2中，并用u1中的记录更新u2中相同ID的记录。

不要被题目误导了,这个应该先更新数据,然后再插入,不要被题目的顺序误导

数据源

drop table u1;
create table if not exists u1
(
    id   int,
    name string
)
    row format delimited
        fields terminated by ','
;
drop table u2;
create table if not exists u2
(
    id   int,
    name string
)
    row format delimited fields terminated by ','
;

load data local inpath '/data/u1.txt' into table u1;
load data local inpath '/data/u2.txt' into table u2;

数据集

u1文件中的数据如下：
1,a
2,b
3,c
4,d
7,y
8,u

u2文件中的数据如下：
2,bb
3,cc
7,yy
9,pp

实现SQL

--要求将只存在于u1而不存在于u2的的ID记录全部插入u2中，并用u1中的记录更新u2中相同ID的记录。
with a as (select u1.id,
                  case when u2.id is not null then u2.name else u1.name end `name`
           from u1
                    left join u2 on u1.id = u2.id
           union
           select id, name
           from u2
)
insert
overwrite
table
u2
select *
from a;

确认结果

select * from u2;
+----+----+
|id  |name|
+----+----+
|NULL|NULL|
|1   |a   |
|2   |bb  |
|3   |cc  |
|4   |d   |
|7   |yy  |
|8   |u   |
|9   |pp  |
+----+----+