Hive优化参数

xhu_406

已于 2023-08-11 17:02:33 修改

阅读量363

点赞数

文章标签： mapreduce hadoop hive

于 2021-06-28 16:28:42 首次发布

本文链接：https://blog.csdn.net/qq_22093679/article/details/118301707

版权

set hive.cbo.enable=true; //开启CBO优化器

set hive.exec.parallel=true; //开启并发执行

set hive.exec.parallel.thread.number=8;//开启并发执行

set hive.auto.convert.join=true; //将common join转化为mapjoin

## 动态分区

set hive.exec.dynamic.partition.mode=nonstrict; //strict,分区表会插入失败

set hive.exec.max.dynamic.partitions=1000; //最大动态分区个数，默认1000

## mapper阶段相关配置

set hive.vectorized.execution.enabled=true; //开启向量模式

set hive.map.aggr=true；//开启map端聚合

set mapreduce.map.output.compress=true;//Map输出压缩功能

set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;//Map阶段压缩算法

## reducer阶段相关配置

set mapred.reduce.tasks=999; //设置reducer任务数

set hive.exec.reducer.max=999; //设置作业最大reducer个数

set mapreduce. input.fileinputformat.list-status.num-threads=16; //设置读文件线程数

yarn资源配置

（原文链接：https://blog.csdn.net/Jasoncbk/article/details/108720706）
● yarn.scheduler.minimum-allocation-mb：默认值1024MB，是每个容器
请求被分配的最小内存。如果容器请求的内存资源小于该值，会以1024MB 进
行分配；如果NodeManager可被分配的内存小于该值，则该NodeManager
将会被ResouceManager给关闭。
● yarn.scheduler.maximum-allocation-mb：默认值8096MB，是每个容器
请求被分配的最大内存。如果容器请求的资源超过该值，程序会抛出
InvalidResourceRequest Exception的异常。
● yarn.scheduler.minimum-allocation-vcores：默认值1，是每个容器请求
被分配的最少虚拟CPU 个数，低于此值的请求将被设置为此属性的值。此外，
配置为虚拟内核少于此值的NodeManager将被ResouceManager关闭。
● yarn.scheduler.maximum-allocation-vcores：默认值4，是每个容器请求
被分配的最少虚拟 CPU个数，高于此值的请求将抛出
InvalidResourceRequestException的异常。如果开发者所提交的作业需要处
理的数据量较大，需要关注上面配置项的配置。
YARN还能对容器使用的硬件资源进行控制，通过如下的配置：
●yarn.nodemanager.resource.percentage-physical-cpu-limit：默认值100。一个节点内所有容器所能使用的物理CPU的占比，默认为100%。即如
果一台机器有16核，CPU的使用率最大为1600%，且该比值为100%，则所有
容器最多能使用的CPU资源为1600%，如果该比值为50%，则所有容器能使用
的CPU资源为800%。
● yarn.nodemanager.linux-container-executor.cgroups.strict-resource usage：默认值为false，表示开启CPU的共享模式。共享模式告诉系统容器除
了能够使用被分配的CPU资源外，还能使用空闲的CPU资源。

性能调优： hive调优 - 掘金

xhu_406

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Hive优化参数

set hive.cbo.enable=true; //开启CBO优化器set hive.exec.parallel=true; //开启并发执行set hive.exec.parallel.thread.number=2;//开启并发执行set hive.auto.convert.join=true; //将common join转化为mapjoin## 动态分区set hive.exec.dynamic.partition.mode=nonstrict; //strict,分区表..
复制链接

扫一扫