CDH Hive行级更新操作--实现update和delete
一、问题出现原因
在Hive表执行update和delete操作时,报如下错误:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
如图所示:
经过查阅资料发现,要实现update和delete操作,需要开启事务功能,详细见官网:https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions
在Configuration配置有详细的操作,如图所示:
Minimally, these configuration parameters must be set appropriately to turn on transaction support in Hive:
Client Side
hive.support.concurrency – true
hive.enforce.bucketing – true (Not required as of Hive 2.0)
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
Server Side (Metastore)
hive.compactor.initiator.on – true (See table below for more details)
hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service
The following sections list all of the configuration parameters that affect Hive transactions and compaction. Also see Limitations above and Table Properties below.
二、CDH详细操作配置
2.1 hive --> 配置 --> 搜索 hive-site
2.2 修改服务的和客户端的配置
服务端配置
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads </name>
<value>1</value>
</property>
客户端配置
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
修改后如下图所示:
修改完配置保存后,需要重启集群
重启Hive后,也需要重新启动客户端进行操作,重新执行以下update操作,却发现报以下错误:
hive> update t_user set name='aaa' where id=1;
FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table test.t_user that does not use an AcidOutputFormat or is not bucketed
如图所示:
解决方法:
如果一个表要实现update和delete功能,该表就必须支持ACID,而支持ACID,就必须满足以下条件:
1.表的存储格式必须是ORC(STORED AS ORC);
2.表必须进行分桶(CLUSTERED BY (col_name, col_name, …) INTO num_buckets BUCKETS);
3.Table property中参数transactional必须设定为True(tblproperties(‘transactional’=‘true’));
然后修改建表语句:
create table if not exists t_user1 (
id int,
name string
) COMMENT '用户表'
clustered by (id) into 2 buckets
row format delimited fields terminated by '|'
stored as orc TBLPROPERTIES('transactional'='true')
执行操作:
insert into t_user1 values(1,'qwr');
update t_user1 set name='qaz' where id=1;
delete from t_user1 where id=1;