[Hive] - Hive参数含义详解

最新推荐文章于 2022-11-27 16:43:20 发布

weixin_30950887

最新推荐文章于 2022-11-27 16:43:20 发布

阅读量777

点赞数

文章标签：大数据数据库

原文链接：http://www.cnblogs.com/liuming1992/p/4930831.html

版权

　　hive中参数分为三类，第一种system环境变量信息，是系统环境变量信息；第二种是env环境变量信息，是当前用户环境变量信息；第三种是hive参数变量信息，是由hive-site.xml文件定义的以及当前hive会话定义的环境变量信息。其中第三种hive参数变量信息中又由hadoop hdfs参数(直接是hadoop的)、mapreduce参数、metastore元数据存储参数、metastore连接参数以及hive运行参数构成。

Hive-0.13.1-cdh5.3.6参数变量信息详解
参数	默认值	含义(用处)
datanucleus.autoCreateSchema	true	creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once；如果数据元数据不存在，那么直接创建，如果设置为false，那么在之后创建。
datanucleus.autoStartMechanismMode	checked	throw exception if metadata tables are incorrect;如果数据元信息检查失败，抛出异常。可选value: checked, unchecked
datanucleus.cache.level2	false	Use a level 2 cache. Turn this off if metadata is changed independently of Hive metastore server; 是否使用二级缓存机制。
datanucleus.cache.level2.type	SOFT	SOFT=soft reference based cache, WEAK=weak reference based cache， none=no cache.二级缓存机制的类型，none是不使用，SOFT表示使用软引用，WEAK表示使用弱引用。
datanucleus.connectionPoolingType	BoneCP	metastore数据连接池使用。
datanucleus.fixedDatastore	false
datanucleus.identifierFactory	datanucleus1	Name of the identifier factory to use when generating table/column names etc.创建metastore数据库的工厂类。
datanucleus.plugin.pluginRegistryBundleCheck	LOG	Defines what happens when plugin bundles are found and are duplicated [EXCEPTION\|LOG\|NONE]
datanucleus.rdbms.useLegacyNativeValueStrategy	true
datanucleus.storeManagerType	rdbms	元数据存储方式
datanucleus.transactionIsolation	read-committed	事务机制，Default transaction isolation level for identity generation.
datanucleus.validateColumns	false	validates existing schema against code. turn this on if you want to verify existing schema,对于存在的表是否进行检查schema
datanucleus.validateConstraints	false	对于存在的表是否检查约束
datanucleus.validateTables	false	检查表
dfs.block.access.key.update.interval	600
hive.archive.enabled	false	Whether archiving operations are permitted；是否允许进行归档操作。
hive.auto.convert.join	true	Whether Hive enables the optimization about converting common join into mapjoin based on the input file size；是否允许进行data join 优化
hive.auto.convert.join.noconditionaltask	true	Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the specified size, the join is directly converted to a mapjoin (there is no conditional task).针对没有条件的task，是否直接使用data join。
hive.auto.convert.join.noconditionaltask.size	10000000	If hive.auto.convert.join.noconditionaltask is off, this parameter does not take affect. However, if it is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than this size, the join is directly converted to a mapjoin(there is no conditional task). The default is 10MB；如果${hive.auto.convert.join.noconditionaltask}设置为true，那么表示控制文件的大小值，默认10M；也就是说如果小于10M，那么直接使用data join。
hive.auto.convert.join.use.nonstaged	false	For conditional joins, if input stream from a small alias can be directly applied to join operator without filtering or projection, the alias need not to be pre-staged in distributed cache via mapred local task. Currently, this is not working with vectorization or tez execution engine.对于有条件的数据join，对于小文件是否使用分布式缓存。
hive.auto.convert.sortmerge.join	false	Will the join be automatically converted to a sort-merge join, if the joined tables pass the criteria for sort-merge join.如果可以转换，自动转换为标准的sort-merge join方式。
hive.auto.convert.sortmerge.join.bigtable.selection.policy	org.apache.hadoop.hive.ql.optimizer.AvgPartitionSizeBasedBigTableSelectorForAutoSMJ
hive.auto.convert.sortmerge.join.to.mapjoin	false	是否穿件sort-merge join到map join方式
hive.auto.progress.timeout	0	How long to run autoprogressor for the script/UDTF operators (in seconds). Set to 0 for forever. 执行脚本和udtf过期时间，设置为0表示永不过期。
hive.autogen.columnalias.prefix.includefuncname	false	hive自动产生的临时列名是否加function名称，默认不加
hive.autogen.columnalias.prefix.label	_c	hive的临时列名主体部分
hive.binary.record.max.length	1000	hive二进制记录最长长度
hive.cache.expr.evaluation	true	If true, evaluation result of deterministic expression referenced twice or more will be cached. For example, in filter condition like ".. where key + 10 > 10 or key + 10 = 0" "key + 10" will be evaluated/cached once and reused for following expression ("key + 10 = 0"). Currently, this is applied only to expressions in select or filter operator. 是否允许缓存表达式的执行，默认允许；先阶段只缓存select和where中的表达式结果。
hive.cli.errors.ignore	false
hive.cli.pretty.output.num.cols	-1
hive.cli.print.current.db	false	是否显示当前操作database名称，默认不显示
hive.cli.print.header	false	是否显示具体的查询头部信息，默认不显示。比如不显示列名。
hive.cli.prompt	hive	hive的前缀提示信息,，修改后需要重新启动客户端。
hive.cluster.delegation.token.store.class	org.apache.hadoop.hive.thrift.MemoryTokenStore	hive集群委托token信息存储类
hive.cluster.delegation.token.store.zookeeper.znode	/hive/cluster/delegation	hive zk存储
hive.compactor.abortedtxn.threshold	1000	分区压缩文件阀值
hive.compactor.check.interval	300	压缩间隔时间，单位秒
hive.compactor.delta.num.threshold	10	子分区阀值
hive.compactor.delta.pct.threshold	0.1	压缩比例
hive.compactor.initiator.on	false
hive.compactor.worker.threads	0
hive.compactor.worker.timeout	86400	单位秒
hive.compat	0.12	兼容版本信息
hive.compute.query.using.stats	false
hive.compute.splits.in.am	true
hive.conf.restricted.list	hive.security.authenticator.manager,hive.security.authorization.manager
hive.conf.validation	true
hive.convert.join.bucket.mapjoin.tez	false
hive.counters.group.name	HIVE
hive.debug.localtask	false
hive.decode.partition.name	false
hive.default.fileformat	TextFile	指定默认的fileformat格式化器。默认为textfile。
hive.default.rcfile.serde	org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe	rcfile对应的序列化类
hive.default.serde	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	默认的序列化类
hive.display.partition.cols.separately	true	hive分区单独的显示列名
hive.downloaded.resources.dir	/tmp/${hive.session.id}_resources	hive下载资源存储文件
hive.enforce.bucketing	false	是否允许使用桶
hive.enforce.bucketmapjoin	false	是否允许桶进行map join
hive.enforce.sorting	false	是否允许在插入的时候使用sort排序。
hive.enforce.sortmergebucketmapjoin	false
hive.entity.capture.transform	false
hive.entity.separator	@	Separator used to construct names of tables and partitions. For example, dbname@tablename@partitionname
hive.error.on.empty.partition	false	Whether to throw an exception if dynamic partition insert generates empty results.当启用动态hive的时候，如果插入的partition为空，是否抛出异常信息。
hive.exec.check.crossproducts	true	检查是否包含向量积
hive.exec.compress.intermediate	false	中间结果是否压缩，压缩机制采用hadoop的配置信息mapred.output.compress*
hive.exec.compress.output	false	最终结果是否压缩
hive.exec.concatenate.check.index	true
hive.exec.copyfile.maxsize	33554432
hive.exec.counters.pull.interval	1000
hive.exec.default.partition.name	__HIVE_DEFAULT_PARTITION__
hive.exec.drop.ignorenonexistent	true	当执行删除的时候是否忽略不存在的异常信息，默认忽略，如果忽略，那么会报错。
hive.exec.dynamic.partition	true	是否允许动态指定partition，如果允许的话，那么我们修改内容的时候可以不指定partition的值。
hive.exec.dynamic.partition.mode	strict	动态partition模式，strict模式要求至少给定一个静态的partition值。nonstrict允许全部partition为动态的值。
hive.exec.infer.bucket.sort	false
hive.exec.infer.bucket.sort.num.buckets.power.two	false
hive.exec.job.debug.capture.stacktraces	true
hive.exec.job.debug.timeout	30000
hive.exec.local.scratchdir	/tmp/hadoop
hive.exec.max.created.files	100000	在mr程序中最大创建的hdfs文件个数
hive.exec.max.dynamic.partitions	1000	动态分区的总的分区最大个数
hive.exec.max.dynamic.partitions.pernode	100	每个MR节点的最大创建个数
hive.exec.mode.local.auto	false	是否允许hive运行本地模式
hive.exec.mode.local.auto.input.files.max	4	hive本地模式最大输入文件数量
hive.exec.mode.local.auto.inputbytes.max	134217728	hive本地模式组大输入字节数
hive.exec.orc.default.block.padding	true
hive.exec.orc.default.buffer.size	262144
hive.exec.orc.default.compress	ZLIB
hive.exec.orc.default.row.index.stride	10000
hive.exec.orc.default.stripe.size	268435456
hive.exec.orc.dictionary.key.size.threshold	0.8
hive.exec.orc.memory.pool	0.5
hive.exec.orc.skip.corrupt.data	false
hive.exec.orc.zerocopy	false
hive.exec.parallel	false	是否允许并行执行，默认不允许。
hive.exec.parallel.thread.number	8	并行执行线程个数，默认8个。
hive.exec.perf.logger	org.apache.hadoop.hive.ql.log.PerfLogger
hive.exec.rcfile.use.explicit.header	true
hive.exec.rcfile.use.sync.cache	true
hive.exec.reducers.bytes.per.reducer	1000000000	size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers. 默认reducer节点处理数据的规模，默认1G。
hive.exec.reducers.max	999	reducer允许的最大个数。当mapred.reduce.tasks指定为负值的时候，该参数起效。
hive.exec.rowoffset	false
hive.exec.scratchdir	/etc/hive-hadoop
hive.exec.script.allow.partial.consumption	false
hive.exec.script.maxerrsize	100000
hive.exec.script.trust	false
hive.exec.show.job.failure.debug.info	true
hive.exec.stagingdir	.hive-staging
hive.exec.submitviachild	false
hive.exec.tasklog.debug.timeou	20000
hive.execution.engine	mr	执行引擎mr或者Tez(hadoop2)
hive.exim.uri.scheme.whitelist	hdfs,pfile
hive.explain.dependency.append.tasktype	false
hive.fetch.output.serde	org.apache.hadoop.hive.serde2.DelimitedJSONSerDe
hive.fetch.task.aggr	false
hive.fetch.task.conversion	minimal
hive.fetch.task.conversion.threshold	-1
hive.file.max.footer	100
hive.fileformat.check	true
hive.groupby.mapaggr.checkinterval	100000
hive.groupby.orderby.position.alias	false
hive.groupby.skewindata	false
hive.hadoop.supports.splittable.combineinputformat	false
hive.hashtable.initialCapacity	100000
hive.hashtable.loadfactor	0.75
hive.hbase.generatehfiles	false
hive.hbase.snapshot.restoredir	/tmp
hive.hbase.wal.enabled	true
hive.heartbeat.interval	1000
hive.hmshandler.force.reload.conf	false
hive.hmshandler.retry.attempts	1
hive.hmshandler.retry.interval	1000
hive.hwi.listen.host	0.0.0.0
hive.hwi.listen.port	9999
hive.hwi.war.file	lib/hive-hwi-${version}.war
hive.ignore.mapjoin.hint	true
hive.in.test	false
hive.index.compact.binary.search	true
hive.index.compact.file.ignore.hdfs	false
hive.index.compact.query.max.entries	10000000
hive.index.compact.query.max.size	10737418240
hive.input.format	org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
hive.insert.into.external.tables	true
hive.insert.into.multilevel.dirs	false
hive.jobname.length	50
hive.join.cache.size	25000
hive.join.emit.interval	1000
hive.lazysimple.extended_boolean_literal	false
hive.limit.optimize.enable	false
hive.limit.optimize.fetch.max	50000
hive.limit.optimize.limit.file	10
hive.limit.pushdown.memory.usage	-1.0
hive.limit.query.max.table.partition	-1
hive.limit.row.max.size	100000
hive.localize.resource.num.wait.attempts	5
hive.localize.resource.wait.interval	5000
hive.lock.manager	org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
hive.mapred.partitioner	org.apache.hadoop.hive.ql.io.DefaultHivePartitioner
hive.mapred.reduce.tasks.speculative.execution	true
hive.mapred.supports.subdirectories	false
hive.metastore.uris	thrift://hh:9083
hive.metastore.warehouse.dir	/user/hive/warehouse
hive.multi.insert.move.tasks.share.dependencies	false
hive.multigroupby.singlereducer	true
hive.zookeeper.clean.extra.nodes	false	在会话结束的时候是否清楚额外的节点数据
hive.zookeeper.client.port	2181	客户端端口号
hive.zookeeper.quorum		zk的服务器端ip
hive.zookeeper.session.timeout	600000	zk的client端会话过期时间
hive.zookeeper.namespace	hive_zookeeper_namespace
javax.jdo.PersistenceManagerFactoryClass	org.datanucleus.api.jdo.JDOPersistenceManagerFactory
javax.jdo.option.ConnectionDriverName	改为：com.mysql.jdbc.Driver
javax.jdo.option.ConnectionPassword	改为：hive
javax.jdo.option.ConnectionURL	xxx
javax.jdo.option.ConnectionUserName	xxx
javax.jdo.option.DetachAllOnCommit	true
javax.jdo.option.Multithreaded	true
javax.jdo.option.NonTransactionalRead	true

转载于:https://www.cnblogs.com/liuming1992/p/4930831.html

weixin_30950887

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
[Hive] - Hive参数含义详解

　　hive中参数分为三类，第一种system环境变量信息，是系统环境变量信息；第二种是env环境变量信息，是当前用户环境变量信息；第三种是hive参数变量信息，是由hive-site.xml文件定义的以及当前hive会话定义的环境变量信息。其中第三种hive参数变量信息中又由hadoop hdfs参数(直接是hadoop的)、mapreduce参数、metastore元数据存储参数、metast...
复制链接

扫一扫