一、不管单条doc中的对应数组有出现了几次相同元素,都按一次来计算, mapping设计数组不为nested
举例:下面这个数据中,如果id进行聚合,那么虽然id=22出现两次,但是聚合出来id:22 数量为1,他是统计的doc条数
场景:比如主体doc为爱好,数组里面是人名,此种聚合可以统计出美每个人名有几种爱好
"_source": {
"id": 1,
"files": [{
"id": "22",
"filename": "方法"
},
{
"id": "22",
"filename": "方哈"
}
]
}
对应maping:
{
"test1": {
"mappings": {
"test1": {
"properties": {
"files": {
"properties": {
"filename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "keyword"
}
}
},
"id": {
"type": "keyword"
}
}
}
}
}
}
代码:就是单纯的agg+terms聚合,网上很多就不写了
二、单条doc中的对应数组有出现了几次相同元素,就按几次来计算, mapping设计数组为nested类型,还拿上面数据举例
举例:下面这个数据中,如果id进行聚合,id=22出现两次,聚合出来id:22 数量为2,统计的是id中某个元素出现的次数
场景:比如主体是一个申请单,数组是申请时候提交的材料格式,此种统计可以统计出,所有提交材料每种格式的数量
"_source": {
"id": 1,
"files": [{
"id": "22",
"filename": "方法"
},
{
"id": "22",
"filename": "方哈"
}
]
}
对应mapping
{
"test1": {
"mappings": {
"test1": {
"properties": {
"files": {
"type": "nested",
"properties": {
"filename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "keyword"
}
}
},
"id": {
"type": "keyword"
}
}
}
}
}
}
代码:
//es聚类文件类型,path是点到数组那层,field是点到需要聚合的字段那层
private Map<String ,Long> getAgg(String path,String field,Class T){
String KEY1 = "key1";
String KEY2 = "key2";
Map<String ,Long> typeCount = new HashMap<>();
NativeSearchQueryBuilder nsq = new NativeSearchQueryBuilder();
TermsAggregationBuilder testCount = AggregationBuilders.terms(KEY1).field(field);
NestedAggregationBuilder test = AggregationBuilders.nested(KEY2, path).subAggregation(testCount);
nsq.addAggregation(test);
AggregatedPage<AdminReviewApplication> res = elasticsearchRestTemplate.queryForPage(nsq.build(), T);
Aggregations aggregations = res.getAggregations();
ParsedNested test1 = aggregations.get(KEY2);
ParsedStringTerms filetType = test1.getAggregations().get(KEY1);
List<? extends Terms.Bucket> buckets = filetType.getBuckets();
for (Terms.Bucket bucket : buckets) {
typeCount.put(bucket.getKeyAsString(),bucket.getDocCount());
}
return typeCount;
}