Hive On Spark优化

最新推荐文章于 2024-07-29 20:37:45 发布

大数据供成屎

最新推荐文章于 2024-07-29 20:37:45 发布

阅读量814

点赞数 1

文章标签： hive spark hadoop

本文链接：https://blog.csdn.net/lbg20211023/article/details/125681961

版权

本文详细介绍了Hive on Spark的优化策略，包括Hive官方建议、集群规划、YARN配置、Container资源设定、Executor参数调整、Join与Group by优化、数据倾斜处理、并行度控制以及小文件优化等方面，旨在提升Hive on Spark的性能和效率。

摘要由CSDN通过智能技术生成

1.Hive官方建议的Hive On Spark优化

 mapreduce.input.fileinputformat.split.maxsize=750000000
 hive.vectorized.execution.enabled=true
 
 hive.cbo.enable=true
 hive.optimize.reducededuplication.min.reducer=4
 hive.optimize.reducededuplication=true
 hive.orc.splits.include.file.footer=false
 hive.merge.mapfiles=true
 hive.merge.sparkfiles=false
 hive.merge.smallfiles.avgsize=16000000
 hive.merge.size.per.task=256000000
 hive.merge.orcfile.stripe.level=true
 hive.auto.convert.join=true
 hive.auto.convert.join.noconditionaltask=true
 hive.auto.convert.join.noconditionaltask.size=894435328
 hive.optimize.bucketmapjoin.sortedmerge=false
 hive.map.aggr.hash.percentmemory=0.5
 hive.map.aggr=true
 hive.optimize.sort.dynamic.partition=false
 hive.stats.autogather=true
 hive.stats.fetch.column.stats=true
 hive.vectorized.execution.reduce.enabled=false
 hive.vectorized.groupby.checkinterval=4096
 hive.vec