es在使用聚合查询根据字段进行分组的时候,发现一个情况
// 声明where 条件
BoolQueryBuilder qbs = QueryBuilders.boolQuery();
QueryBuilder qb1 = QueryBuilders.rangeQuery("create_date").from("2018-07-01 00:00:00").to("2018-08-23 00:00:00").includeLower(true).includeUpper(true);
BoolQueryBuilder qbs1 = QueryBuilders.boolQuery().must(qb1);
qbs.must(qbs1);
SearchRequestBuilder requestBuilder = client.prepareSearch("user_login_detail")
.setTypes("login_detail");
requestBuilder.setQuery(qbs);
requestBuilder.setFrom(0);
requestBuilder.setSize(1000000);
GroupBy groupBy = new GroupBy(requestBuilder, "count_name", "user_name", true);
groupBy.addCountAgg("count_name", "user_name");
Map<String, Object> groupbyResponse = groupBy.getGroupbyResponse();
for (Map.Entry<String, Object> entry : groupbyResponse.entrySet()) {
String bucketKey = entry.getKey();
}
上面是查询用户的登录流水,查询出一百万的数据进行分组,表数据4000W+ 此次分组执行时间在38s,当去掉setSize属性之后执行时间为1.8s。在添加size属性之后es会根据查询的数据进行分组,并且在分组之后,返回查询的数据作为结果集,在不设置size的属性时,es默认只返回聚合后的结果集,数据量大幅度减少所以提高效率