hive acid及事务表踩坑学习实录

目录

一、概念

二、表类型和ACID特性对照图

三、事务表的适用场景

四、开启事务的配置

五、参数详解


一、概念

什么是ACID?

ACID就是常见数据库事务的四大特性

Atomicity(原子性)、Consistency(一致性)、Isolation(隔离性)、Durability(持久性)。

但是我们知道hive是在hdfs上以文件存储,只支持文件的新增和删除,不支持delete和update。

从Hive 0.13开始,Hive可以支持行级别的ACID语义

事务表:支持原子性update操作的Hive表

事务表的2种类型

1.ORC格式,支持CRUD操作(增删改查)

2.TextFile文件格式,只符合insert-only特性

存储在事务表中的数据会被分成两种类型的文件

1.base文件,用来存放平常的数据

2.delta文件,用来存储新增、更新、删除的数据。每一个事务处理数据的结果都会单独新建一个delta文件夹用来存储数据。

二、表类型和ACID特性对照图

表类型ACID文件格式插入更新/删除
托管表:CRUD事务ORC
托管表:仅插入式事务任意格式没有
托管表:临时没有任意格式没有
外部表没有任意格式没有

三、事务表的适用场景

事务表适用于数仓中的行级数据更新删除需求比较频繁的表

如果事务表太多,并且存在大量的更新操作,metastore后台启动的合并线程会定期的提交MapReduce Job,也会一定程度上增重集群的负担

注意事项

1.不支持 BEGIN、COMMIT、ROLLBACK 等语句,所有的语句都是自动提交

2.仅支持ORC格式

四、开启事务的配置

客户端方面的修改
会话中设置或者修改HiveServer2的配置文件:

hive.support.concurrency – true
hive.enforce.bucketing – true (Hive 2.0之后就不用专门设置了)
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager


服务端方面
主要修改MetaStore上的配置

hive.compactor.initiator.on – true
hive.compactor.worker.threads – 压缩任务的数量

五、参数详解

Configuration key

Values

Location

Notes

hive.txn.manager

Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager

Value required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

Client/
HiveServer2

DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions.

hive.txn.strict.locking.modeDefault: true

Client/ HiveServer2

In strict mode non-ACID resources use standard R/W lock semantics, e.g. INSERT will acquire exclusive lock. In non-strict mode, for non-ACID resources, INSERT will only acquire shared lock, which allows two concurrent writes to the same partition but still lets lock manager prevent DROP TABLE etc. when the table is being written to (as of Hive 2.2.0).

hive.txn.timeout

Default: 300

Client/
HiveServer2/

Metastore 

Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. It's critical that this property has the same value for all components/services.5

hive.txn.heartbeat.threadpool.sizeDefault: 5

Client/

HiveServer2

The number of threads to use for heartbeating (as of Hive 1.3.0 and 2.0.0).
hive.timedout.txn.reaper.startDefault: 100sMetastoreTime delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of Hive 1.3.0). Controls AcidHouseKeeperServcie above.

hive.timedout.txn.reaper.interval

Default: 180sMetastore

Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of Hive 1.3.0). Controls AcidHouseKeeperServcie above.

hive.txn.max.open.batch

Default: 1000

Client

Maximum number of transactions that can be fetched in one call to open_txns().1

hive.max.open.txnsDefault: 100000

HiveServer2/

 Metastore

Maximum number of open transactions. If current open transactions reach this limit, future open transaction requests will be rejected, until the number goes below the limit. (As of Hive 1.3.0 and 2.1.0.)
hive.count.open.txns.intervalDefault: 1s

HiveServer2/

 Metastore

Time in seconds between checks to count open transactions (as of Hive 1.3.0 and 2.1.0).
hive.txn.retryable.sqlex.regexDefault: "" (empty string)

HiveServer2/

 Metastore

Comma separated list of regular expression patterns for SQL state, error code, and error message of retryable SQLExceptions, that's suitable for the Hive metastore database (as of Hive 1.3.0 and 2.1.0).

For an example, see Configuration Properties.

hive.compactor.initiator.on

Default: false

Value required for transactions: true (for exactly one instance of the Thrift metastore service)

Metastore

Whether to run the initiator and cleaner threads on this metastore instance. Prior to Hive 1.3.0 it's critical that this is enabled on exactly one standalone metastore service instance (not enforced yet).

As of Hive 1.3.0 this property may be enabled on any number of standalone metastore instances.

hive.compactor.worker.threads

Default: 0

Value required for transactions: > 0 on at least one instance of the Thrift metastore service

Metastore

How many compactor worker threads to run on this metastore instance.2

hive.compactor.worker.timeout

Default: 86400

Metastore

Time in seconds after which a compaction job will be declared failed and the compaction re-queued.

hive.compactor.cleaner.run.intervalDefault: 5000MetastoreTime in milliseconds between runs of the cleaner thread. (Hive 0.14.0 and later.)

hive.compactor.check.interval

Default: 300

Metastore

Time in seconds between checks to see if any tables or partitions need to be compacted.3

hive.compactor.delta.num.threshold

Default: 10

Metastore

Number of delta directories in a table or partition that will trigger a minor compaction.

hive.compactor.delta.pct.threshold

Default: 0.1

Metastore

Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.

hive.compactor.abortedtxn.threshold

Default: 1000

Metastore

Number of aborted transactions involving a given table or partition that will trigger a major compaction.

hive.compactor.aborted.txn.time.thresholdDefault: 12hMetastoreAge of table/partition's oldest aborted transaction when compaction will be triggered. Default time unit is: hours. Set to a negative number to disable.

hive.compactor.max.num.delta

Default: 500MetastoreMaximum number of delta files that the compactor will attempt to handle in a single job (as of Hive 1.3.0).4

hive.compactor.job.queue

Default: "" (empty string) Metastore Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of Hive 1.3.0).
Compaction History

hive.compactor.history.retention.succeeded

Default: 3MetastoreNumber of successful compaction entries to retain in history (per partition).

hive.compactor.history.retention.failed

Default: 3MetastoreNumber of failed compaction entries to retain in history (per partition).

hive.compactor.history.retention.attempted

Default: 2MetastoreNumber of attempted compaction entries to retain in history (per partition).

hive.compactor.initiator.failed.compacts.threshold

Default: 2MetastoreNumber of of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use ALTER TABLE to initiate compaction. Once a manually initiated compaction succeeds auto initiated compactions will resume. Note that this must be less than hive.compactor.history.retention.failed.

hive.compactor.history.reaper.interval

Default: 2mMetastoreControls how often the process to purge historical record of compactions runs.
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值