HBase数据压缩方式的选择


官方文档:http://hbase.apache.org/book.html#_which_compressor_or_data_block_encoder_to_use


The compression or codec type to use depends on the characteristics of your data. Choosing the wrong type could cause your data to take more space rather than less, and can have performance implications. 
In general, you need to weigh your options between smaller size and faster compression/decompression. Following are some general guidelines, expanded from a discussion at Documenting Guidance on compression and codecs.

  • If you have long keys (compared to the values) or many columns, use a prefix encoder. FAST_DIFF is recommended, as more testing is needed for Prefix Tree encoding.

  • If the values are large (and not precompressed, such as images), use a data block compressor.

  • Use GZIP for cold data, which is accessed infrequently. GZIP compression uses more CPU resources than Snappy or LZO, but provides a higher compression ratio. 
    GZIP压缩适合冷数据场景,相比较Snappy和LZO压缩,压缩率更高,但是CPU消耗的也更多。

  • Use Snappy or LZO for hot data, which is accessed frequently. Snappy and LZO use fewer CPU resources than GZIP, but do not provide as high of a compression ratio.

  • In most cases, enabling Snappy or LZO by default is a good choice, because they have a low performance overhead and provide space savings.

  • Before Snappy became available by Google in 2011, LZO was the default. Snappy has similar qualities as LZO but has been shown to perform better. 
    Snappy压缩出现之前谷歌默认使用的是LZO,但是Snappy出现之后在性能上更加出色,因此Snappy成了默认压缩方式。


HBase配置Snappy压缩:http://blog.csdn.net/maomaosi2009/article/details/47019913

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值