hive 测试存储和压缩

最新推荐文章于 2024-04-11 10:48:29 发布

青龙悟空

最新推荐文章于 2024-04-11 10:48:29 发布

阅读量167

点赞数

分类专栏： hive

本文链接：https://blog.csdn.net/qq_39674417/article/details/113756277

版权

hive 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

测试存储和压缩

官网：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC

ORC存储方式的压缩：

Key	Default	Notes
orc.compress	ZLIB	high level compression (one of NONE, ZLIB, SNAPPY)
orc.compress.size	262,144	number of bytes in each compression chunk
orc.stripe.size	268,435,456	number of bytes in each stripe
orc.row.index.stride	10,000	number of rows between index entries (must be >= 1000)
orc.create.index	true	whether to create row indexes
orc.bloom.filter.columns	""	comma separated list of column names for which bloom filter should be created
orc.bloom.filter.fpp	0.05	false positive probability for bloom filter (must >0.0 and <1.0)

注意：所有关于ORCFile的参数都是在HQL语句的TBLPROPERTIES字段里面出现

1）创建一个非压缩的的ORC存储方式

（1）建表语句

create table log_orc_zlib(

track_time string,

url string,

session_id string,

referer string,

ip string,

end_user_id string,

city_id string

)

row format delimited fields terminated by '\t'

stored as orc

tblproperties("orc.compress"="ZLIB");

（2）插入数据

insert into log_orc_zlib select * from log_text;

（3）查看插入后数据

hive (default)> dfs -du -h /user/hive/warehouse/log_orc_none/ ;

7.7 M /user/hive/warehouse/log_orc_none/000000_0

2）创建一个SNAPPY压缩的ORC存储方式

（1）建表语句

create table log_orc_snappy(

track_time string,

url string,

session_id string,

referer string,

ip string,

end_user_id string,

city_id string

)

row format delimited fields terminated by '\t'

stored as orc

tblproperties("orc.compress"="SNAPPY");

（2）插入数据

insert into log_orc_snappy select * from log_text;

（3）查看插入后数据

hive (default)> dfs -du -h /user/hive/warehouse/log_orc_snappy/ ;

3.8 M /user/hive/warehouse/log_orc_snappy/000000_0

3）上一节中默认创建的ORC存储方式，导入数据后的大小为

2.8 M /user/hive/warehouse/log_orc/000000_0

比Snappy压缩的还小。原因是orc存储文件默认采用ZLIB压缩，ZLIB采用的是deflate压缩算法。比snappy压缩的小。

4）存储方式和压缩总结

在实际的项目开发当中，hive表的数据存储格式一般选择：orc或parquet。压缩方式一般选择snappy，lzo。

青龙悟空

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive 测试存储和压缩

测试存储和压缩官网：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORCORC存储方式的压缩： Key Default Notes orc.compress ZLIB high level compression (one of NONE, ZLIB, SNAPPY) orc.co.
复制链接

扫一扫

专栏目录