Compression就是在用CPU换IO吞吐量/磁盘空间,如果没有什么特殊原因推荐针对Column Family设置compression,下面主要有三种算法: GZIP, LZO, Snappy,作者推荐使用Snappy,因为它有较好的Encoding/Decoding速度和可以接受的压缩率。
HBase comes with support for a number of compression algorithims that can be enabled at the column family level. Enabling compression is recommended unless you have a reason not to do so, for example, when using already compressed content, such as JPEG images. For every other use-case compression usually will yield an overall better performance, because the overhead of the CPU performing the compression and decompression is less than what is required to read more data from disk.
Available Codecs
You can choose from a fixed list of supported compression algorithms. They have different qualities when it comes to compression ratio, as well as CPU a