Hive Transaction(Hive 事务管理)

Hive 事务在 Hive 3 得到增强。

hive-site.xml 配置

<property>
   <name>hive.txn.manager</name>
   <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
   <description>
     Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive
     transactions, which also requires appropriate settings for hive.compactor.initiator.on,
     hive.compactor.worker.threads, hive.support.concurrency (true),
     and hive.exec.dynamic.partition.mode (nonstrict).
     The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides
     no transactions.
   </description>
 </property>
 <property>
   <name>hive.support.concurrency</name>
   <value>true</value>
 </property>
 <property>
   <name>hive.compactor.initiator.on</name>
   <value>true</value>
   <description>
     Whether to run the initiator and cleaner threads on this metastore instance or not.
     Set this to true on one instance of the Thrift metastore service as part of turning
     on Hive transactions. For a complete list of parameters required for turning on
     transactions, see hive.txn.manager.
   </description>
 </property>
 <property>
   <name>hive.compactor.worker.threads</name>
   <value>2</value>
   <description>
     How many compactor worker threads to run on this metastore instance. Set this to a
     positive number on one or more instances of the Thrift metastore service as part of
     turning on Hive transactions. For a complete list of parameters required for turning
     on transactions, see hive.txn.manager.
     Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions
     themselves. Increasing the number of worker threads will decrease the time it takes
     tables or partitions to be compacted once they are determined to need compaction.
     It will also increase the background load on the Hadoop cluster as more MapReduce jobs
     will be running in the background.
   </description>
 </property>
 <property>
   <name>hive.exec.dynamic.partition.mode</name>
   <value>nonstrict</value>
   <description>
     In strict mode, the user must specify at least one static partition
     in case the user accidentally overwrites all partitions.
     In nonstrict mode all partitions are allowed to be dynamic.
   </description>
 </property>

Hive 创建的表自动是事务表的配置。

<property>
   <name>metastore.strict.managed.tables</name>
   <value>false</value>
   <description>
     Whether strict managed tables mode is enabled. With this mode enabled, only transactional tables (both full and insert-only) are allowed to be created as managed tables
   </description>
 </property>
 <property>
   <name>hive.create.as.insert.only</name>
   <value>false</value>
   <description>
     Whether the eligible tables should be created as ACID insert-only by default. Does not apply to external tables, the ones using storage handlers, etc.
   </description>
 </property>
  <property>
   <name>metastore.create.as.acid</name>
   <value>false</value>
   <description>
     Whether the eligible tables should be created as full ACID by default. Does not apply to external tables, the ones using storage handlers, etc.
   </description>
 </property>

测试

创建事务表

create table t1(c1 int,c2 int) stored as orc tblproperties('transactional'='true');

执行以下操作。执行之后,可以看到每个操作在表的目录下生成相应的 delta 目录。

insert into t1 values(1,1),(2,2),(3,3);
insert into t1 values(4,4);
insert into t1 values(5,5);
insert into t1 values(6,6);
insert into t1 values(7,7);
insert into t1 values(8,8);
insert into t1 values(9,9);
insert into t1 values(10,10);
insert into t1 values(11,11);
insert into t1 values(12,12);
insert into t1 values(13,13);

delete from t1 where c1=13;
insert into t1 values(13,14);
delete from t1 where c1=13;
insert into t1 values(13,15);

insert_only 事务

insert_only 事务不要求表必须是 orc 格式,可以是任何格式,如 parquet。
t2 表仅支持 insert,不支持 delete, update。
insert 语句成功, delete 失败。

create table t2(c1 int,c2 int) stored as orc tblproperties('transactional'='true','transactional_properties'='insert_only');

insert into t2 values(1,1),(2,2),(3,3);
delete from t2 where c1=3;

修改现有表支持事务

全部事务支持

表 t3 从创建的时候没有支持事务,用 alter table 支持事务,之后可以执行 insert, delete 操作。

create table t3(c1 int,c2 int) stored as orc;
alter table t3 set  tblproperties('transactional'='true','transactional_properties'='default');
insert into t3 values(1,1),(2,2),(3,3);
delete from t3 where c1=3;

insert_only 事务支持

create table t4(c1 int,c2 int) stored as orc;
alter table t4 set  tblproperties('transactional'='true','transactional_properties'='insert_only');
insert into t4 values(1,1),(2,2),(3,3);

Hive 官方文档https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

常见问题

问题原因总结:

  1. 一个表,可以同时运行多个 SQL 进行 insert 和 select 操作。但是不允许同时运行多个 SQL 进行 UPDATE, DELETE 或者 MERGE操作, 因为他们都会生成 delete_delta 目录。如两个 SQL,它们同时 update 不同的记录也会报错。
  2. SQL 解析阶段获取 一个表的 write_id list,但是在执行之前获取锁,在执行之后释放锁,导致SQL 解析和执行之前会出现并发错误。

问题列表

  1. 作业报错
    如果在执行的时候,发生了 compaction,compaction 之后删除了原来的文件,导致抛出 FileNotFoundException: File does not exist:.
  2. 2 个会话同时执行 insert overwrite 抛出 LockException
  • 创建表
create table t1(c1 int) stored as orc tblproperties('transactional'='true');

在两个窗口里分别用 beeline 连接 hiveserver。

  • 在会话1,执行以下命令,函数 timesleep 会 sleep 10秒钟,返回 10001。
insert overwrite table t1 select default.timesleep(10000);
  • 在会话2,执行以下命令,会话2 的命令在会话1之前执行完毕。
insert overwrite table t1 select 1;

会话1抛出以下异常

ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:126.  Reason: Aborting [txnid:126,128] due to a write conflict on test/t1 committed by [txnid:127,127] u/u)
org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:126.  Reason: Aborting [txnid:126,128] due to a write conflict on test/t1 committed by [txnid:127,127] u/u
	at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:670)

由于函数在编译阶段给出具体值。

select * from t1;
+--------+
| t1.c1  |
+--------+
| 1      |
| 2      |
| 3      |
+--------+

以下两个语句:
会话1:

insert overwrite table t2 select c1,default.timesleep(10000 * c1) from t1;

会话2:

insert overwrite table t2 select c1,default.timesleep(10 * c1) from t1;

会话2 会在 commit 的时候失败。

ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:163.  Reason: Aborting [txnid:163,163] due to a write conflict on test/t2 committed by [txnid:162,163] u/u)
org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:163.  Reason: Aborting [txnid:163,163] due to a write conflict on test/t2 committed by [txnid:162,163] u/u
	at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:670)

  1. 在执行 insert overwrite 时,在另外的会话执行drop table,导致第 1 个会话抛出表找不到异常。示例如下:
0: jdbc:hive2://localhost:10000/default> insert overwrite table t1 select default.timesleep(10000);
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 't1' (state=42S02,code=10001)

  • show transactions 可以显示 aborted 的 transaction。
show transactions;
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+
|      txnid      |       state        |  startedtime   |  lastheartbeattime   |    user     |          host          |
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+
| Transaction ID  | Transaction State  | Started Time   | Last Heartbeat Time  | User        | Hostname               |
| 126             | ABORTED            | 1646189665000  | 1646189665000        | houzhizhen  | localhost.localdomain  |
| 130             | OPEN               | 1646189960000  | 1646189973000        | houzhizhen  | localhost.localdomain  |
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+

  1. 两个会话同时 insert 不会有冲突。

参考资料:
官方文档:Hive+Transactions
PPT Transactional Operations in Apache Hive:Present and Future

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Hive的保留关键字包括:ADD,ADMIN,AFTER,ALL,ALTER,ANALYZE,AND,ARCHIVE,ARRAY,AS,ASC,BEFORE,BETWEEN,BIGINT,BINARY,BOOLEAN,BOTH,BUCKET,BUCKETS,BY,CASCADE,CASE,CAST,CHANGE,CLUSTER,CLUSTERED,CLUSTERSTATUS,COLLECTION,COLUMN,COLUMNS,COMMENT,COMPACT,COMPACTIONS,COMPUTE,CONCATENATE,CONF,CONTINUE,CREATE,CROSS,CURRENT,CURRENT_DATE,CURRENT_TIMESTAMP,DATA,DATABASE,DATABASES,DATE,DATETIME,DAY,DBPROPERTIES,DECIMAL,DEFERRED,DEFINED,DELIMITED,DEPENDENCY,DESC,DESCRIBE,DFS,DIRECTORIES,DIRECTORY,DISABLE,DISTRIBUTE,DIV,DROP,ELSE,ENABLE,END,ESCAPED,EXCLUSIVE,EXISTS,EXPLAIN,EXPORT,EXTENDED,EXTERNAL,FAILED,FALSE,FETCH,FIELDS,FILE,FILEFORMAT,FIRST,FOLLOWING,FOR,FROM,FULL,FUNC,FUNCTION,GE,GENERATE,GET,GLOBAL,GRANT,GROUP,GROUPING,HAVING,HOLD_DDLTIME,HOUR,IDX,IF,IGNORE,IMPORT,IN,INCLUDE,INDEX,INDEXES,INNER,INPATH,INPUTDRIVER,INPUTFORMAT,INSERT,INTERSECT,INT,BIGINTO,BIGINTO,BIGINTO,BIGINTO,BIGINTO,BIGINTO,BIGINTO,BIGINTO,IOSFORMATTED,IOWRITE,KAFKA,KEYS,KILL,LARGE,LAST,LEFT,LIMIT,LINES,LIST,LOCATION,LOCK,LOCKS,LOGICAL,MAP,MAPJOIN,MAPRED,MASTER,MATCHED,MATERIALIZE,MERGE,MINUS,MINUTE,MONTH,MSCK,NATIVE,NATURAL,NO,NOT,NULL,NULLS,OF,OFFLINE,OFFSET,OLD,ON,ONLY,OPEN,OR,ORDER,OUT,OUTER,OUTPUTDRIVER,OUTPUTFORMAT,OVER,OVERWRITE,OWNER,PARTIALSCAN,PARTITION,PARTITIONED,PARTITIONS,PERCENT,PLUS,PRECEDING,PRECISION,PRESERVE,PRETTY,PRINT,PROCEDURE,PROTOSHUFFLE,PUBLIC,RAWREAD,RAWWRITE,RCFILE,READ,READONLY,READS,REBUILD,RECORDREADERCLASS,RECORDREPLACEMENTCHARACTER,RECORDWRITERCLASS,REDUCE,REGEXP,RELOAD,RENAME,REPAIR,REPLACE,REPLICATION,RESTRICT,RESUME,RETURN,RETURNS,REVOKE,RIGHT,RLIKE,ROLE,ROLES,ROLLBACK,ROLLUP,ROW,ROWS,SCHEMA,SELECT,SEQUENCEFILE,SEPARATOR,SERDE,SERDEPROPERTIES,SET,SHARED,SHOW,SHUTDOWN,SKIP,SMALLINT,SORT,SORTED,START,STATS,STATUS,STRAIGHT_JOIN,STRING,STRUCT,TABLE,TABLES,TABLESAMPLE,TBLPROPERTIES,TEMPORARY,TERMINATED,THEN,TIME,TIMESTAMP,TINYINT,TO,TOUCH,TRAILING,TRANSACTION,TTL,TUPLE,TYPE,TYPES,UNARCHIVE,UNBOUNDED,UNCACHE,UNION,UNIQUE,UNLOCK,UNSET,UNSIGNED,UPDATE,URI,USE,USER,USING,UTC,UTCTIMESTAMP,VALUE,VALUED,VALUES,VARCHAR,VIEWS,WHEN,WHERE,WHILE,WINDOW,WITH,WORK,WRITE,YEAR.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值