hive 如果表不存在则创建_HIVE如何根据某些条件更新现有数据（如果存在）以及如何插入新数据（如果不存在）...

最新推荐文章于 2022-10-15 20:42:24 发布

weixin_39890629

最新推荐文章于 2022-10-15 20:42:24 发布

阅读量416

点赞数

文章标签： hive 如果表不存在则创建

本文链接：https://blog.csdn.net/weixin_39890629/article/details/111744252

版权

I want to update the existing data if it exists based on some condition(data with higher priority should be updated) and insert new data if not exists.

I have already written a query for this but somehow it is duplicating the number of rows. Here is the full explanation of what I have and what I want to achieve:

What I have:

Table 1 - columns - id,info,priority

hive> select * from sample1;

1 123 1.01

2 234 1.02

3 213 1.03

5 213423 1.32

Time taken: 1.217 seconds, Fetched: 4 row(s)

Table 2: columns - id,info,priority

hive> select * from sample2;

1 1234 1.05

2 23412 1.01

3 21 1.05

4 1232 1.1

2 3432423 1.6

3 34324 1.4

What I want is the final table should have only 1 row per id with the data according to the greatest priority:

1 1234 1.05

2 3432423 1.6

3 34324 1.4

4 1232 1.1

5 213423 1.32

The query that I have written is this:

insert overwrite table sample1

select a.id,

case when cast(TRIM(a.prio) as double) > cast(TRIM(b.prio) as double) then a.info else b.info end as info,

case when cast(TRIM(a.prio) as double) > cast(TRIM(b.prio) as double) then a.prio else b.prio end as prio

from sample1 a

join

sample2 b

on a.id=b.id where b.id in (select distinct(id) from sample1)

union all

select * from sample2 where id not in (select distinct(id) from sample1)

union all

select * from sample1 where id not in (select distinct(id) from sample2);

After running this query, I am getting this result:

hive> select * from sample1;

1 1234 1.05

2 234 1.02

3 21 1.05

2 3432423 1.6

3 34324 1.4

5 213423 1.32

4 1232 1.1

How do I modify the present query to achieve the correct result. Is there any other method/process that I can follow to achieve the end result. I am using hadoop 2.5.2 along with HIVE 1.2.1 . I am working on a 6 node cluster with 5 slaves and 1 NN.

解决方案

adding to previously good answers!

try this also:

insert overwrite table UDB.SAMPLE1

select

COALESCE(id2,id )

,COALESCE(info2,info)

,COALESCE(priority2, priority)

from

UDB.SAMPLE1 TAB1

full outer JOIN

(