Hive优化

最新推荐文章于 2024-07-16 13:43:40 发布

领悟大数据

最新推荐文章于 2024-07-16 13:43:40 发布

阅读量142

点赞数 1

分类专栏： hive 文章标签： hive 优化

本文链接：https://blog.csdn.net/weixin_42898914/article/details/85013885

版权

hive 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

 -》压缩 
	(1)开启Map阶段输出压缩
		开启输出压缩功能:
		set hive.exec.compress.intermediate=true; 
		开启map输出压缩功能:
		set mapreduce.map.output.compress=true;
		设置压缩方式:
		set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
	(2)开启reduce输出端压缩
		开启最终输出压缩功能
		set hive.exec.compress.output=true;
		开启最终数据压缩功能
		set mapreduce.output.fileoutputformat.compress=true;
		设置压缩方式
		set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
		设置块压缩
		set mapreduce.output.fileoutputformat.compress.type=BLOCK; 

-》存储
	Hive存储格式:TextFile/SequenceFile(前两个为行存储，查询速度快)/orc/Parquet （前两个为列存储，减少数据的查询量）
	orc:Index Data/row Data/stripe Footer
	
	压缩比:
	    orc > parquet > textFile
	查询速度:
	    orc > textFile
	    50s > 54s

	create table itstar_log(time string, host string)
	row format
	delimited fields
	terminated by '\t'
	stored as orc;

	create table itstar(time string, host string)
	row format
	delimited fields
	terminated by '\t'
	stored as orc;

	load data local inpath '/root/a.log' into table itstar;

	insert into itstar_log select * from itstar;

-》Group by优化 

	分组:mr程序，map阶段把相同key的数据分发给一个reduce,一个key的量很大。

	解决方案: 在map端进行聚合(combiner) 
	set hive.map.aggr=true;

	设置负载均衡(map的结果随机分配到reduce中)
	set hive.groupby.skewindata=true; 

-》数据倾斜
	(1)合理避免数据倾斜
		合理设置map数
		合并小文件
		set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; 
		合理设置reduce数

	(2)解决数据倾斜 
		在map端进行聚合(combiner)
		set hive.map.aggr=true; 

		设置负载均衡
		set hive.groupby.skewindata=true;
	
	(3)JVM重用 （使一个jvm实例在同一个任务中重用N次）
		mapred-site.xml 
			mapreduce.job.jvm.numtasks 
			10~20

领悟大数据

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
Hive优化

-》压缩 (1)开启Map阶段输出压缩开启输出压缩功能: set hive.exec.compress.intermediate=true; 开启map输出压缩功能: set mapreduce.map.output.compress=true; 设置压缩方式: set mapreduce.map.output.compress.codec=org.apache....
复制链接

扫一扫