hive2.2之后的merge into

参考

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge

https://community.cloudera.com/t5/Community-Articles/Hive-ACID-Merge-by-Example/ta-p/245402

 

说明

Hive在2.2版本之后开始支持Merge操作,并且Merge只能在支持ACID的表上执行

 

语法

MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <boolean expression1>
WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <boolean expression3>] THEN DELETE
WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES<value list>

 

建表语句

CREATE DATABASE merge_data;
CREATE TABLE merge_data.transactions(
ID int,
TranValue string,
last_update_user string)
PARTITIONED BY (tran_date string)
CLUSTERED BY (ID) into 5 buckets
STORED AS ORC TBLPROPERTIES ('transactional'='true');
CREATE TABLE merge_data.merge_source(
ID int,
TranValue string,
tran_date string)
STORED AS ORC;

导入数据

INSERT INTO merge_data.merge_source VALUES 
(1, 'value_01', '20170410'),
(4, NULL, '20170410'),
(7, 'value_77777', '20170413'),
(8, NULL, '20170413'),
(8, 'value_08', '20170415'),
(11, 'value_11', '20170415');
 
INSERT INTO merge_data.transactions PARTITION (tran_date) VALUES
(1, 'value_01', 'creation', '20170410'),
(2, 'value_02', 'creation', '20170410'),
(3, 'value_03', 'creation', '20170410'),
(4, 'value_04', 'creation', '20170410'),
(5, 'value_05', 'creation', '20170413'),
(6, 'value_06', 'creation', '20170413'),
(7, 'value_07', 'creation', '20170413'),
(8, 'value_08', 'creation', '20170413'),
(9, 'value_09', 'creation', '20170413'),
(10, 'value_10','creation', '20170413');

在导入目标表数据会遇到以下问题 

错误一:

FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

set hive.exec.dynamic.partition.mode=nonstrict;

错误二:

FAILED: SemanticException [Error 10265]: This command is not allowed on an ACID table merge_data.transactions with a non-ACID transaction manager. Failed command: INSERT INTO merge_data.transactions PARTITION (tran_date) VALUES

set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

错误三:

FAILED: RuntimeException [Error 10264]: To use DbTxnManager you must set hive.support.concurrency=true
Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:1186)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1324)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

set hive.support.concurrency=true;

修改完以上三个参数之后,导入数据成功

 

Merge操作

MERGE INTO merge_data.transactions AS T 
USING merge_data.merge_source AS S
ON T.ID = S.ID and T.tran_date = S.tran_date
WHEN MATCHED AND (T.TranValue != S.TranValue AND S.TranValue IS NOT NULL) THEN UPDATE SET
     TranValue = S.TranValue
    ,last_update_user = 'merge_update'
WHEN MATCHED AND S.TranValue IS NULL THEN DELETE
WHEN NOT MATCHED THEN INSERT VALUES (
    S.ID
    , S.TranValue
    , 'merge_insert'
    , S.tran_date
);

查看结果

insert into table merge_data.transactions PARTITION (tran_date)
select ID,TranValue,last_update_user,tran_date from merge_data.transactions_tmp
cluster by id;

cluster by (id) - 指定getPartition以哪个字段来进行hash散列,并且排序字段也是指定的字段,默认以正序进行排序
distribute by(uid) – 指定getPartition以哪个字段来进行hash散列
sort by(uid asc) – 指定排序字段,以及排序规则

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在Oracle中,MERGE INTO语句可以将数据从一个表复制到另一个表,同时执行插入、更新和删除操作。如果你想要在Hive中实现类似的功能,可以使用INSERT OVERWRITE语句和Hive的JOIN操作。 假设你有两个Hive表:table1和table2,它们具有相同的列名称和数据类型。你可以按照以下步骤使用INSERT OVERWRITE和JOIN来实现类似于Oracle MERGE INTO语句的操作: 1. 首先使用INSERT OVERWRITE语句将table2中的数据覆盖到table1中: ``` INSERT OVERWRITE TABLE table1 SELECT t2.* FROM table2 t2; ``` 2. 接下来,使用LEFT OUTER JOIN和WHERE子句来查找需要更新的记录: ``` UPDATE table1 SET column1 = t2.column1, column2 = t2.column2, ... FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.key_column = t2.key_column WHERE t2.key_column IS NOT NULL; ``` 这将使用table2中匹配的行更新table1中的行。请注意,我们在LEFT OUTER JOIN中使用了WHERE子句来查找匹配的行,并排除了那些没有匹配的行。 3. 最后,使用NOT EXISTS和INSERT INTO语句来插入在table1中不存在的新行: ``` INSERT INTO TABLE table1 SELECT t2.* FROM table2 t2 WHERE NOT EXISTS ( SELECT 1 FROM table1 t1 WHERE t1.key_column = t2.key_column ); ``` 这将从table2中选择那些在table1中不存在的记录,并将它们插入到table1中。 需要注意的是,Hive中的INSERT OVERWRITE和INSERT INTO语句不能在同一个语句中使用,因此我们必须将它们分开使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值