3. elasticsearch 汇总排序问题剖析

        es主要解决简单大量数据的搜索问题,不擅长进行数据的汇总聚合计算。因为es是分布式搜索,对于数据的汇总聚合并且排序存在误差的问题。我们进行问题的复现、问题的原因和解决的方法。

1.es 汇总排序问题剖析

1.1 创建erp信息索引

put os_erp_data

{

"settings": {

"number_of_shards": "2",

"number_of_replicas": "1"

},

"mappings": {

"os_erp_data_type": {

"properties": {

"erp_id": {

"type": "keyword"

},

"order_d7_sale_qtty": {

"type": "integer"

}

}

}

}

}

1.2 索引插入文档

分片1

os_erp_data/os_erp_data_type?routing=a

{"erp_id":"erp_14","order_d7_sale_qtty":22491}

{"erp_id":"erp_8","order_d7_sale_qtty":21632}

{"erp_id":"erp_4","order_d7_sale_qtty":21502}

{"erp_id":"erp_15","order_d7_sale_qtty":21234}

{"erp_id":"erp_26","order_d7_sale_qtty":20731}

{"erp_id":"erp_10","order_d7_sale_qtty":20306}

{"erp_id":"erp_17","order_d7_sale_qtty":19942}

{"erp_id":"erp_9","order_d7_sale_qtty":19418}

{"erp_id":"erp_25","order_d7_sale_qtty":19191}

{"erp_id":"erp_16","order_d7_sale_qtty":18797}

{"erp_id":"erp_6","order_d7_sale_qtty":18306}

{"erp_id":"erp_3","order_d7_sale_qtty":18166}

{"erp_id":"erp_22","order_d7_sale_qtty":17669}

{"erp_id":"erp_24","order_d7_sale_qtty":16971}

{"erp_id":"erp_27","order_d7_sale_qtty":16911}

{"erp_id":"erp_18","order_d7_sale_qtty":16758}

{"erp_id":"erp_23","order_d7_sale_qtty":16527}

{"erp_id":"erp_13","order_d7_sale_qtty":15705}

{"erp_id":"erp_7","order_d7_sale_qtty":15251}

{"erp_id":"erp_11","order_d7_sale_qtty":15019}

{"erp_id":"erp_12","order_d7_sale_qtty":14387}

{"erp_id":"erp_2","order_d7_sale_qtty":14329}

{"erp_id":"erp_30","order_d7_sale_qtty":14023}

{"erp_id":"erp_5","order_d7_sale_qtty":13421}

{"erp_id":"erp_29","order_d7_sale_qtty":13309}

{"erp_id":"erp_1","order_d7_sale_qtty":12574}

{"erp_id":"erp_28","order_d7_sale_qtty":12189}

{"erp_id":"erp_19","order_d7_sale_qtty":11673}

{"erp_id":"erp_21","order_d7_sale_qtty":11460}

{"erp_id":"erp_20","order_d7_sale_qtty":10576}

分片2

os_erp_data?routing=b

{"erp_id":"erp_19","order_d7_sale_qtty":168589}

{"erp_id":"erp_21","order_d7_sale_qtty":164705}

{"erp_id":"erp_16","order_d7_sale_qtty":162088}

{"erp_id":"erp_9","order_d7_sale_qtty":161579}

{"erp_id":"erp_8","order_d7_sale_qtty":160459}

{"erp_id":"erp_28","order_d7_sale_qtty":159775}

{"erp_id":"erp_15","order_d7_sale_qtty":158124}

{"erp_id":"erp_26","order_d7_sale_qtty":156609}

{"erp_id":"erp_24","order_d7_sale_qtty":156208}

{"erp_id":"erp_11","order_d7_sale_qtty":153976}

{"erp_id":"erp_4","order_d7_sale_qtty":153479}

{"erp_id":"erp_23","order_d7_sale_qtty":152833}

{"erp_id":"erp_12","order_d7_sale_qtty":152052}

{"erp_id":"erp_20","order_d7_sale_qtty":150718}

{"erp_id":"erp_29","order_d7_sale_qtty":150320}

{"erp_id":"erp_17","order_d7_sale_qtty":149352}

{"erp_id":"erp_10","order_d7_sale_qtty":148473}

{"erp_id":"erp_2","order_d7_sale_qtty":147812}

{"erp_id":"erp_5","order_d7_sale_qtty":147791}

{"erp_id":"erp_3","order_d7_sale_qtty":146158}

{"erp_id":"erp_6","order_d7_sale_qtty":145604}

{"erp_id":"erp_7","order_d7_sale_qtty":145439}

{"erp_id":"erp_18","order_d7_sale_qtty":144984}

{"erp_id":"erp_13","order_d7_sale_qtty":144784}

{"erp_id":"erp_14","order_d7_sale_qtty":144004}

{"erp_id":"erp_27","order_d7_sale_qtty":143564}

{"erp_id":"erp_30","order_d7_sale_qtty":140984}

{"erp_id":"erp_22","order_d7_sale_qtty":140309}

{"erp_id":"erp_25","order_d7_sale_qtty":133879}

{"erp_id":"erp_1","order_d7_sale_qtty":133233}

1.3 问题复现

os_erp_data/_search

{

"size":0,

"aggs":{

"erp_name":{

"terms":{

"field":"erp_id",

"size" :5,

"order":{

"order_d7_sale_qtty_sum":"desc"

}

}

,

"aggs":{

"order_d7_sale_qtty_sum":{

"sum":{

"field":"order_d7_sale_qtty"

}

}}

}

}

}

总结:

  1. 在聚合排序的操作中,实际上是每个分片自身先做汇总排序,然后将每个分片的前17名放在一起再次聚合,再排序,将排序后的前5条记录作为结果返回;
  2. 为什么用每个分片的前17名?这是用官方给出的算式得来的,地址是:Terms Aggregation | Elasticsearch Reference [6.1] | Elastic ,如下图:

shard_size =size*1.5+10。

1.4 调整 shard_size 参数,设置过大会有性能问题

os_erp_data/_search

{

"size":0,

"aggs":{

"erp_name":{

"terms":{

"field":"erp_id",

"size" :5,

"shard_size": 28,

"order":{

"order_d7_sale_qtty_sum":"desc"

}

}

,

"aggs":{

"order_d7_sale_qtty_sum":{

"sum":{

"field":"order_d7_sale_qtty"

}

}}

}

}

}

两种解决方式:调整 shard_size 大小 和 需要统计排序的数据放到一个分片中。

大量的数据实时聚合计算,建议更换为 clickhouse,更适合做这些事情。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值