压缩率和压缩速度成反比:
压缩比:bzip2 > gzip > lzo > snappy ,压缩速度:snappy > lzo> gzip > bzip2
压缩以及解压是高消耗cpu的过程,故若机器的负载很高时就不能使用压缩,资源不够可通过扩容快速解决
Hive中建表(列式+压缩)语句:
(1)orc格式
#Hive中默认压缩是zlib,写不写都一样
create table page_views_orc_zlib
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC
TBLPROPERTIES("orc.compress"="ZLIB")
as select * from page_views;
create table page_views_orc_snappy
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC
TBLPROPERTIES("orc.compress"="SNAPPY")
as select * from page_views;
(2)parquent格式
create table pag