elasticsearch 2.2+ index.codec: best_compression启用压缩

官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:

index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.

注意:2.1以下都是实验特性!2.2+才稳定!

 

Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space. 

摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch

 

下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0

The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details. 

TestString fields_allindex size /w LZ4index size /w DEFLATEexpansion ratio /w LZ4expansion ratio /w DEFLATEImpact of DEFLATE
Structured data file. Original file size: 67644119       
1analyzed and not_analyzed enabled63047579531315920.9320.785-0.157
2analyzed and not_analyzed disabled48271433383271060.7130.566-0.206
3not_analyzeddisabled38920800290147960.5750.428-0.254
3bnot_analyzed, except for 'message' field which is retained and analyzeddisabled65382872495328580.9660.732-0.242
4not_analyzed, except for 'agent' field which is analyzeddisabled43083702320636020.6360.474-0.255
Semi-structured data file.
Original file size: 75037027
       
1analyzed and not_analyzed enabled100478376821327821.3391.094-0.182
2analyzed and not_analyzed disabled75238480569116381.0020.758-0.243
3not_analyzeddisabled71866672535535610.9570.713-0.254
3bnot_analyzed, except for 'message' field which is retained and analyzeddisabled104638750838243981.3941.117-0.198
4not_analyzed, except for 'agent' field which is analyzeddisabled72925624546038820.9710.727-0.251

With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters. 

As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.

Conclusion

There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.

 

转载于:https://www.cnblogs.com/bonelee/p/6269582.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值