前言
测一下parquet、snappy、gzip、textfile这些方式在hdfs中占用的存储大小。
在impala中直接建内部表。
测试
存储格式 | 压缩格式 | 文件大小 | 建表时间 |
---|---|---|---|
textfile | none | 3.0 G | 38.74s |
parquet | none | 1.5 G | 32.33s |
parquet | snappy | 709.3 M | 31.71s |
parquet | gzip | 471.5 M | 48.23s |
snappy
snappy的官方描述。
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compre