Hive on Tez 参数调优

最新推荐文章于 2024-06-20 19:44:08 发布

liuwei063608

最新推荐文章于 2024-06-20 19:44:08 发布

阅读量3.4k

点赞数 2

文章标签： hive hadoop big data

本文链接：https://blog.csdn.net/liuwei063608/article/details/125151614

版权

Hive on Tez 调优
一、配置参数调优
1、开启ORC表向量化执行：
向量化查询执行通过一次处理一个 1024 行的块来大幅提高IO效率（必须以ORC格式存储数据）
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true; – 当前环境hive版本暂不支持
2、优化ORC表谓词下推
根据ORC表的特性尽早过滤数据，提高执行效率
SET hive.optimize.ppd=true;
SET hive.optimize.ppd.storage=true;
SET hive.optimize.index.filter=true;
3、优化 limit 查询条件
对于有简单的SQL，比如SELECT id, money FROM m limit 10; 如果开启以下参数，则直接开启Fetch任务，对于上述简单的列查询不在启用MapReduce job，减少资源占用
set hive.fetch.task.conversion=more;
4、优化cout(1) 查询
对于 select count(*),count(1) from 直接从元数据保存的统计信息中获取表中记录条数, 而不用扫描全表，大幅提升效率。
set hive.compute.query.using.stats=true;

5、多租户的 Tez 设置
在 Hiveserver2 中，为了通过关闭 JDBC 查询之间的一些隔离和共享会话来提高性能，多租户的 tez 设置是
tez.am.container.session.delay-allocation-millis=1000
6、开启CBO ，基于成本代价的优化
set hive.cbo.enable=true;
set hive.stats.autogather=true;
set hive.compute.query.using.stats=true; – 需要配置，使用元数据统计信息
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
7、Tez 容器大小配置
将 Tez 容器大小设置为 YARN 容器大小 (4GB) 的更大倍数：
set hive.tez.container.size=4096MB

设置此内存的多少可用于存储为哈希映射的表（建议使用 Tez 容器大小的三分之一）：
SET hive.auto.convert.join.noconditionaltask.size=1370MB
注意：大小在 hive-site.xml 文件中以字节为单位显示

动态分区设置：
set hive.exec.dynamic.partition.mode=nonstrict;

为了防止在ORC中存在许多分区列生成许多损坏文件的危险
set hive.optimize.sort.dynamic.partition=true;

Mapjoin 中小表的大小，默认值为10MB，调大为20MB
set hive.auto.convert.join.noconditionaltask.size
8、tez 设置reduce数量
hive.tez.auto.reducer.parallelism=true; --Tez会估计数据量大小，自动设置并行度
hive.tez.min.partition.factor=0.25;
hive.tez.max.partition.factor=2.0;
hive.exec.reducers.bytes.per.reducer=1073741824; // 1GB
set hive.exec.parallel=true;
set hive.exec.parallel.thread.number=16;
set hive.execution.engine=tez;
9、LLAP 参数配置
set hive.llap.io.use.lrfu=true
set hive.llap.io.cache.orc.size 默认1G
set hive.llap.io.threadpool.size 默认 10个
10、开启hive本地模式自动切换，对数据量比较小的操作，就可以在本地执行
set hive.exec.mode.local.auto=true; //开启本地mr，默认是关闭false。

set hive.exec.mode.local.auto.inputbytes.max=50000000;
//设置local mr的最大输入数据量，当输入数据量小于这个值时采用local mr的方式，默认为134217728，即128M。现在设置其值为50000000，不必为128Mb的整数倍

set hive.exec.mode.local.auto.input.files.max=10;
//设置local mr的最大输入文件个数，当输入文件个数小于这个值时采用local mr的方式，默认为4
11、开启JVM重用
set mapred.job.reuse.jvm.num.tasks=10;

liuwei063608

关注

2
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
Hive on Tez 参数调优

Hive on Tez 调优一、配置参数调优1、开启ORC表向量化执行：向量化查询执行通过一次处理一个 1024 行的块来大幅提高IO效率（必须以ORC格式存储数据）set hive.vectorized.execution.enabled = true;set hive.vectorized.execution.reduce.enabled = true; – 当前环境hive版本暂不支持2、优化ORC表谓词下推根据ORC表的特性尽早过滤数据，提高执行效率...
复制链接

扫一扫