Elasticsearch使用msearch提高聚合效率(与search检索对比)

描述

数据量共约3000万+,在使用es进行term聚合的时候,发现执行耗费时间巨大,因此采用了msearch的检索方式

多搜索接口编辑 多搜索 API 从单个 API 请求执行多个搜索。 请求的格式类似于批量 API 格式,并使用 换行符分隔的 JSON (NDJSON) 格式。
结构类型于下

GET my-index-000001/_msearch
{ }
{"query" : {"match" : { "message": "this is a test"}}}
{"index": "my-index-000002"}
{"query" : {"match_all" : {}}}

kibana查询操作

使用kibana进行msearch操作

POST cqu_dev_journal_paper/_msearch
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"dfs_query_then_fetch","ccs_minimize_roundtrips":true}
{"from":10,"size":10,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id","sid","title","author","journalId","journalVolumeId","publisherName","doi","pageInfo"],"excludes":[]},"sort":[{"createTime":{"order":"desc"}}],"track_total_hits":2147483647}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"year":{"terms":{"field":"year","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"subject":{"terms":{"field":"subject","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"publisherName":{"terms":{"field":"publisherName.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"journalId":{"terms":{"field":"journalId","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"publishCountry":{"terms":{"field":"publishCountry.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"language":{"terms":{"field":"language.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}


语句解析

msearch查询的数据结构如下,

header\n
body\n
header\n
body\n

因此下面语句中第一个json就是header,第二个是请求的body,

{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"dfs_query_then_fetch","ccs_minimize_roundtrips":true}
{"from":10,"size":10,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id","sid","title","author","journalId","journalVolumeId","publisherName","doi","pageInfo"],"excludes":[]},"sort":[{"createTime":{"order":"desc"}}],"track_total_hits":2147483647}

官网对于请求的说明:
地址: es 多搜索接口

请求正文包含以换行符分隔的搜索列表和 搜索对象。
<header> (必填,对象) 用于限制或更改搜索的参数。
此对象对于每个搜索正文都是必需的,但可以为空 () 或空白 线。{} 

 <body>
(可选,对象) 包含搜索请求的参数:

对象的属性<body>
aggregations
(可选,聚合对象) 您希望在搜索期间运行的聚合。请参阅聚合。
query
(可选,查询 DSL 对象)您希望在 搜索。响应中将返回与此查询匹配的命中。
from
(可选,整数) 返回命中的起始偏移量。默认值为 。0
size
(可选,整数) 要返回的命中数。默认值为 。10

代码操作验证对比

使用SpringDataElaticsearch进行对es检索的操作,代码如下:

MultiSearch 检索

  //MultiSearch 查询
        List<Query> queryList = new ArrayList<>();

        NativeSearchQueryBuilder listNativeSearchQueryBuilder = new NativeSearchQueryBuilder();
        //返回数据总数量
        listNativeSearchQueryBuilder.withTrackTotalHits(true);
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        boolQueryBuilder.must(QueryBuilders.existsQuery("title"));

        //设置检索词和聚合项的查询条件
        listNativeSearchQueryBuilder.withQuery(boolQueryBuilder);

        //构造分页排序
        Pageable pageable = PageRequest.of(1, 10, Sort.by(Sort.Direction.DESC, "createTime"));
        listNativeSearchQueryBuilder.withPageable(pageable);

        //设置返回字段
        listNativeSearchQueryBuilder.withFields("id", "sid", "title", "author", "journalId", "journalVolumeId", "publisherName", "doi", "pageInfo");

        //列表查询的query放入查询列表
        queryList.add(listNativeSearchQueryBuilder.build());

        HashMap<String, String> aggFieldMap = new HashMap<>();
        aggFieldMap.put("subject", "subject");
        aggFieldMap.put("year", "year");
        aggFieldMap.put("journalId", "journalId");
        aggFieldMap.put("publisherName", "publisherName.raw");
        aggFieldMap.put("language", "language.raw");
        aggFieldMap.put("publishCountry", "publishCountry.raw");

        NativeSearchQueryBuilder aggNativeSearchQueryBuilder;
        for (String name : aggFieldMap.keySet()) {
            aggNativeSearchQueryBuilder = new NativeSearchQueryBuilder();
            aggNativeSearchQueryBuilder.withQuery(boolQueryBuilder);
            //表示不返回列表数据
            aggNativeSearchQueryBuilder.withPageable(PageRequest.of(0, 1));
            aggNativeSearchQueryBuilder.withSearchType(SearchType.QUERY_THEN_FETCH);
            aggNativeSearchQueryBuilder.withFields("id");
            aggNativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms(name).field(aggFieldMap.get(name))
                    .collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST).size(50));
            queryList.add(aggNativeSearchQueryBuilder.build());
        }

        long  pageStart = System.currentTimeMillis();
        //multiSearch:允许在一次请求中执行多个查询操作,并将查询结果一起返回
        List<SearchHits<JournalPaperEsBean>> searchHitsList = elasticsearchOperations.multiSearch(queryList, JournalPaperEsBean.class);
        System.out.println("==========es agg search:" + (System.currentTimeMillis() - pageStart));

    }

执行耗费时间:3723ms

search检索

@Test
    void test() {
        //search 查询
        NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
        //返回数据总数量
        nativeSearchQueryBuilder.withTrackTotalHits(true);
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        boolQueryBuilder.must(QueryBuilders.existsQuery("title"));

        //设置检索词和聚合项的查询条件
        nativeSearchQueryBuilder.withQuery(boolQueryBuilder);

        //构造分页排序
        Pageable pageable = PageRequest.of(1, 10, Sort.by(Sort.Direction.DESC, "createTime"));
        nativeSearchQueryBuilder.withPageable(pageable);

        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("subject").field("subject").size(50));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("year").field("year").size(50).order(BucketOrder.key(false)));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("journalId").field("journalId").size(50));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("publisherName").field("publisherName.raw").size(50));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("language").field("language.raw").size(50));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("publishCountry").field("publishCountry.raw").size(50));

        //设置返回字段
        nativeSearchQueryBuilder.withFields("id", "sid", "title", "author", "journalId", "journalVolumeId", "publisherName", "doi", "pageInfo");

        long start = System.currentTimeMillis();
        //执行检索操作
        SearchHits<JournalPaperEsBean> searchHits = elasticsearchOperations.search(nativeSearchQueryBuilder.build(), JournalPaperEsBean.class);

        System.out.println("=====agg:" + (System.currentTimeMillis() - start));
    }

耗时:15298ms

对比可以发现:使用msearch比search效率提高了近5倍!

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

徐州蔡徐坤

又要到饭了兄弟们

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值