1,USE TEZ
set hive.execution.engine=tez;
2,use orcfile
3,USE VECTORIZATION
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
(
Vectorized query execution improves performance of operations like scans, aggregations, filters and joins, by performing them in batches of 1024 rows at once instead of single row each time.Introduced in Hive 0.13, this feature significantly improves query execution time, and is easily enabled with two parameters settings:)
4,COST BASED QUERY OPTIMIZATION
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
analyze table tweets compute statistics;
analyze table tweets compute statistics for columns sender, topic;
analyze table tweets compute statistics for columns;
5,WRITE GOOD SQL
6,
https://streever.atlassian.net/wiki/display/HADOOP/Hive+Performance+Tips
7,
http://www.cnblogs.com/smartloli/p/4356660.html
8 小表join大表
set hive.auto.convert.join=true;
hive性能优化
最新推荐文章于 2024-06-20 19:44:08 发布
文章介绍了提升ApacheHive性能的几种方法,包括使用Tez执行引擎,利用ORCFile格式,启用向量化执行以批量处理数据,开启成本基础查询优化以提高查询效率,以及考虑小表JOIN大表的策略。同时,提供了两个关于Hive性能优化的参考资料链接。
摘要由CSDN通过智能技术生成