hive 如果表不存在则创建_HIVE如何根据某些条件更新现有数据(如果存在)以及如何插入新数据(如果不存在)...

I want to update the existing data if it exists based on some condition(data with higher priority should be updated) and insert new data if not exists.

I have already written a query for this but somehow it is duplicating the number of rows. Here is the full explanation of what I have and what I want to achieve:

What I have:

Table 1 - columns - id,info,priority

hive> select * from sample1;

OK

1 123 1.01

2 234 1.02

3 213 1.03

5 213423 1.32

Time taken: 1.217 seconds, Fetched: 4 row(s)

Table 2: columns - id,info,priority

hive> select * from sample2;

OK

1 1234 1.05

2 23412 1.01

3 21 1.05

4 1232 1.1

2 3432423 1.6

3 34324 1.4

What I want is the final table should have only 1 row per id with the data according to the greatest priority:

1 1234 1.05

2 3432423 1.6

3 34324 1.4

4 1232 1.1

5 213423 1.32

The query that I have written is this:

insert overwrite table sample1

select a.id,

case when cast(TRIM(a.prio) as double) > cast(TRIM(b.prio) as double) then a.info else b.info end as info,

case when cast(TRIM(a.prio) as double) > cast(TRIM(b.prio) as double) then a.prio else b.prio end as prio

from sample1 a

join

sample2 b

on a.id=b.id where b.id in (select distinct(id) from sample1)

union all

select * from sample2 where id not in (select distinct(id) from sample1)

union all

select * from sample1 where id not in (select distinct(id) from sample2);

After running this query, I am getting this result:

hive> select * from sample1;

OK

1 1234 1.05

2 234 1.02

3 21 1.05

2 3432423 1.6

3 34324 1.4

5 213423 1.32

4 1232 1.1

How do I modify the present query to achieve the correct result. Is there any other method/process that I can follow to achieve the end result. I am using hadoop 2.5.2 along with HIVE 1.2.1 . I am working on a 6 node cluster with 5 slaves and 1 NN.

解决方案

adding to previously good answers!

try this also:

insert overwrite table UDB.SAMPLE1

select

COALESCE(id2,id )

,COALESCE(info2,info)

,COALESCE(priority2, priority)

from

UDB.SAMPLE1 TAB1

full outer JOIN

(

select id2, info2, priority2

from

(

select

id as id2

,info as info2

,priority as priority2

,row_number() over (partition by id order by priority desc) rn

from UDB.SAMPLE2

)TAB2_wt

where TAB2_wt.rn =1

)TAB2

on TAB2.id2 = TAB1.id

;

select * from SAMPLE1;

+-----+----------+-----------+--+

| id | info | priority |

+-----+----------+-----------+--+

| 1 | 1234 | 1.05 |

| 2 | 3432423 | 1.6 |

| 3 | 34324 | 1.4 |

| 4 | 1232 | 1.1 |

| 5 | 213423 | 1.32 |

+-----+----------+-----------+--+

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值