Elasticsearch 中如何巧妙地使用聚合函数达到数据库中having的效果

版权声明:本文为博主原创文章,未经博主允许不得转载。交流请联系:351605040 https://blog.csdn.net/Arvinzr/article/details/79228229

在现实开发中难免会遇到一些业务场景,通过聚合得出相应的频次并进行筛选

1.使用 minDocCount  直接上代码,大家可自行根据业务场景更改

//正确答案
SearchRequestBuilder search = transportlient.prepareSearch("bigdata_idx_2").setTypes("captureCompare");
FilterAggregationBuilder sub= AggregationBuilders.filter("channel_longitudeC").filter(QueryBuilders.rangeQuery("fcmp_time").from(startTime).to(endTime));
//分组字段是id,排序由多个字段排序组成
TermsBuilder tb= AggregationBuilders.terms("fcmp_fobj_id").field("fcmp_fobj_id").valueType(Terms.ValueType.STRING).order(Terms.Order.compound(
Terms.Order.aggregation("channel_longitudeC",false)//先按count,降序排
//如果count相等情况下,使用code的和排序
));
//求和字段1
ValueCountBuilder sb= AggregationBuilders.count("channel_longitudeC");
tb.subAggregation(sb).minDocCount(400);//添加到分组聚合请求中

//将分组聚合请求插入到主请求体重
// search.setPostFilter()
search.addAggregation(tb);

2.稍微复杂些,还有另外一种场景,就是我聚合的同时,需要把其他相应的字段信息也同时返回出来 Top Hits Aggregation

类似SQL : select *,count(*) from XXX group by a ......

SearchResponse response = null;
		SearchRequestBuilder responsebuilder = transportlient.prepareSearch("syrk_bigdata_capturecmp_passer_idx")
				.setTypes("captureCompare").setFrom(0).setSize(100000);
		AggregationBuilder aggregation = AggregationBuilders
				.terms("agg")
				.field("idNumb")
				.subAggregation(
						AggregationBuilders.topHits("top").setFrom(0)
								.setSize(1)).size(100000);
		response = responsebuilder.setQuery(QueryBuilders.boolQuery()
				.must(QueryBuilders.rangeQuery("fcapTime").from(Long.valueOf(startTime)).to(Long.valueOf(endTime))))
				.addSort("idNumb", SortOrder.ASC)
				.addAggregation(aggregation)// .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
				.setExplain(true).execute().actionGet();
		SearchHits hits = response.getHits();//最后取结果时不要使用此hits
		Terms agg = response.getAggregations().get("agg");
		long end = System.currentTimeMillis();
		System.out.println("ES run time: " + (end - start) + "ms");
		/**插入之前首先清除当天数据,以免重复添加**/
		SyrkRegionFcapperPasserStatistics temp = new SyrkRegionFcapperPasserStatistics();
		temp.setDate(Long.valueOf(startTime));
		try{
			syrkRegionFcapperPasserStatisticsService.deletePasser(temp);
			for (Terms.Bucket entry : agg.getBuckets()) {
				String key = (String) entry.getKey(); // bucket key
				long docCount = entry.getDocCount(); // Doc count
				
				// We ask for top_hits for each bucket
				TopHits topHits = entry.getAggregations().get("top");
				for (SearchHit hit : topHits.getHits().getHits()) {
					compareUuid= (String) hit.getSource().get("idNumb");
					
				}
				/** 读取数据写入mysql **/
			}
			logger.info("All Analysis Data has insert : date is "+startTime);
		}catch (Exception e){
			logger.info("Analysis Result Data failed ,date is "+startTime);
		}


聚合后的总数取相应的 docCount  其他字段信息从hits 中获取

切记,不要取最外层的hits ,因为外层的hits 和聚合的hits数量会不一致,遍历取回造成数据不一致


阅读更多

扫码向博主提问

Arvinzr

有问必答,博客之家www.sapv.cn
  • 擅长领域:
  • 大数据或JAVA开发
  • 大数据和站点部署优化
  • SEO排名提升
  • 提问会认真解决问题
  • 不满意可退款
去开通我的Chat快问
想对作者说点什么?

博主推荐

换一批

没有更多推荐了,返回首页