一次ES多字段分组分页的设计与开发
背景
- 需要根据erpGoodsId、batchNum、wmsProductCode三个字段进行分组,并完成分页查询。
- 数据量:上千万
- 我的ES版本6.2.4
思考与解决**
- 几千万的数据,数据库分组肯定受不了,哪怕是取10条,mysql也会走全表,一次分组要20多S。好在我们这个数据已经在ES上了,我决定使用ES进行分组。
- 这是一个分页查询功能,那么就要考虑深度分页的问题(比如我是几千万数据,访问最后一页的情况),ES分组是没有分页的,一般是使用Aggregations指定分组结果大小进行分组,然后手动分页,但是这种深度分页肯定就搞不定了,我们不可能一下子查几百万条的分组数据。
- 我看了下官方文档,composite-aggregation是最符合我的需求,比Aggregations更强,因为它可以对多字段进行分组且分组后的响应数据也非常友好,响应速度也快。而且他有一个after属性,专门用于分页,类似search
after查询,一直向后滚动查询。但是分组后的总数是没有返回的,所以经过沟通,这个分页查询每次最大只给用户分100页,因为对于用户来说100页的数据已经满足他们的需求了。于是我们的分组查询,每次查询1000条,然后对分组结果进行手动分页。
补充:
对响应速度更有追求的,可以使用after进行滚动查询,当然就是处理起来麻烦一点,每次请求都要带上上一页的最后一条数据,且只能向后滚动。我的解决方案是:前端每次点击下一页时,有一个数组下标,会存储这个下一页的最后一条数据,当然第一页为空,如果搜索条件或者每页数量变了,都会清空这个数组,从第一页重新查询。
ES语句如下:
{
"query": {
"bool": {
"must": [
{
"match": {
"supplierNo": "C2217"
}
},
{
"match": {
"companyCode": "10101"
}
}
]
}
},
//设置为0代表只需要聚合的结果
"size": 0,
"track_total_hits": false,
"aggs": {
"my_buckets": {
"composite": {
//存上一个位置游标,查找下一页,如果往前翻页,前端给after,每次翻页,前端存一个after数组,下标对应页数
// "after": {
// "erpGoodsId": "P100009",
// "batchNum": "2201055103"
// },
"size": 2,
"sources": [
{
"erpGoodsId": {
"terms": {
"field": "erpGoodsId",
"order": "asc"
}
}
},
{
"batchNum": {
"terms": {
"field": "batchNum",
"order": "asc"
}
}
},
{
"wmsProductCode": {
"terms": {
"field": "wmsProductCode",
//为空以null展示
"missing": "null",
"order": "asc"
}
}
}
]
},
"aggs": {
"filter_empty_dest": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": "params.count != null"
}
}
}
}
}
}
java代码语句如下:
里只是完成分组查询,没有进行手动分页
@Override
public List<PageSupplierSaleFlowVo> pageSupplierSaleFlowIndex(PageSupplierSaleFlowRequest request) {
List<PageSupplierSaleFlowVo> result = new ArrayList<>();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
//销售客户
if(StringUtils.isNotBlank(request.getSupplierNo())){
boolQueryBuilder.must(QueryBuilders.matchQuery(SupplierNewFlowIndex.SUPPLIER_NO,request.getSupplierNo()));
}
//站点
if(StringUtils.isNotBlank(request.getCompanyCode())){
boolQueryBuilder.must(QueryBuilders.matchQuery(SupplierNewFlowIndex.COMPANY_CODE,request.getCompanyCode()));
}
//商品名称
if(StringUtils.isNotBlank(request.getProductNm())){
boolQueryBuilder.must(QueryBuilders.wildcardQuery(SupplierNewFlowIndex.PRODUCT_NM,"*"+request.getProductNm()+"*"));
}
//生产厂家
if(StringUtils.isNotBlank(request.getFactory())){
boolQueryBuilder.must(QueryBuilders.matchQuery(SupplierNewFlowIndex.FACTORY,request.getFactory()));
}
//商品批号
if(StringUtils.isNotBlank(request.getBatchNum())){
boolQueryBuilder.must(QueryBuilders.wildcardQuery(SupplierNewFlowIndex.BATCH_NUM,"*"+request.getBatchNum()+"*"));
}
//路单ID
if(StringUtils.isNotBlank(request.getWmsWayBillId())){
boolQueryBuilder.must(QueryBuilders.matchQuery(SupplierNewFlowIndex.WMS_WAYBILL_ID,request.getWmsWayBillId()));
}
//记账时间
if(Objects.nonNull(request.getFlowStartDate())){
boolQueryBuilder.must(QueryBuilders.rangeQuery(SupplierNewFlowIndex.FLOW_DATE).gte(request.getFlowStartDate().getTime()));
}
if(Objects.nonNull(request.getFlowEndDate())){
boolQueryBuilder.must(QueryBuilders.rangeQuery(SupplierNewFlowIndex.FLOW_DATE).lte(request.getFlowEndDate().getTime()));
}
//销售类型,只看销售
boolQueryBuilder.must(QueryBuilders.matchQuery(SupplierNewFlowIndex.FLOW_TYPE, SupplierFlowTypeEnum.SALE.toValue()));
TermsValuesSourceBuilder erpAgg = new TermsValuesSourceBuilder(SupplierNewFlowIndex.ERP_GOODS_ID).field(SupplierNewFlowIndex.ERP_GOODS_ID).order(SortOrder.ASC);
TermsValuesSourceBuilder batchAgg = new TermsValuesSourceBuilder(SupplierNewFlowIndex.BATCH_NUM).field(SupplierNewFlowIndex.BATCH_NUM).order(SortOrder.ASC);
TermsValuesSourceBuilder wmsAgg = new TermsValuesSourceBuilder(SupplierNewFlowIndex.WMS_PRODUCT_CODE).field(SupplierNewFlowIndex.WMS_PRODUCT_CODE).missing("null").order(SortOrder.ASC);;
List<CompositeValuesSourceBuilder<?>> sources = new ArrayList<>();
sources.add(erpAgg);
sources.add(batchAgg);
sources.add(wmsAgg);
//每次分组最大1000条
CompositeAggregationBuilder compositeAgg = new CompositeAggregationBuilder("my_buckets", sources).size(1000);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(boolQueryBuilder);
searchSourceBuilder.size(0);
searchSourceBuilder.aggregation(compositeAgg);
SearchRequestBuilder searchRequestBuilder = elasticsearchTemplate.getClient().prepareSearch();
searchRequestBuilder.setIndices("supplier_flow_index_new");
searchRequestBuilder.setQuery(boolQueryBuilder);
searchRequestBuilder.setSize(0);
searchRequestBuilder.setSource(searchSourceBuilder);
//发起请求
SearchResponse response = searchRequestBuilder.execute().actionGet();
CompositeAggregation compositeAggResult = response.getAggregations().get("my_buckets");
for (CompositeAggregation.Bucket bucket : compositeAggResult.getBuckets()) {
String erpGoodsId = bucket.getKey().get(SupplierNewFlowIndex.ERP_GOODS_ID).toString();
String batchNum = bucket.getKey().get(SupplierNewFlowIndex.BATCH_NUM).toString();
String wmsProductCode = bucket.getKey().get(SupplierNewFlowIndex.WMS_PRODUCT_CODE).toString();
result.add(new PageSupplierSaleFlowVo(erpGoodsId,batchNum,wmsProductCode));
}
return result;
}