行式存储的优点:
同一行数据存放在同一个block块里面,select * from table_name;数据能直接获取出来;
INSERT/UPDATE比较方便
行式存储的缺点:
不同类型数据存放在同一个block块里面,压缩性能不好;
select id,name from table_name;这种类型的列查询,所有数据都要读取,而不能跳过。
列式存储的优点:
同类型数据存放在同一个block块里面,压缩性能好;
任何列都能作为索引。
列式存储的缺点:
select * from table_name;这类全表查询,需要数据重组;
INSERT/UPDATE比较麻烦。
create table page_views_orc_zlib
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC
TBLPROPERTIES("orc.compress"="ZLIB")
as select * from page_views;
#默认是zlib,写不写都一样
create table page_views_orc_snappy
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC
TBLPROPERTIES("orc.compress"="SNAPPY")
as select * from page_views;
create table page_views_parquet
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS PARQUET
as select * from page_views;
set parquet.compression=gzip;
create table page_views_parquet_gzip
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS PARQUET
as select * from page_views;
【来自@若泽大数据】