elasticsearch聚合案例--分组、求最大值再求最大值的均值

一、需求

A、B、C代表3个用户,第二列代表各自的得分,求A、B、C的最好成绩以及A、B、C最好成绩的均值

A 10
A 11
A 13
B 11
B 11
B 12
C 10
C 10
C 11
C 15

二、思路

先terms分组,求最大值,最后加一个pipeline均值。一开始想用bucket_script解决,实验发现走不通,但是bucket_script在聚合结果之上操作很有用

三、测试数据

PUT sport 
{
  "mappings": {
    "grade": {
      "properties": {
        "user": {
          "type": "keyword"
        },
        "grade":{
          "type": "integer"
        }
      }
    }
  }
}

PUT sport/grade/1
{
  "user":"A",
  "grade":10
}

PUT sport/grade/2
{
  "user":"A",
  "grade":11
}

PUT sport/grade/3
{
  "user":"A",
  "grade":13
}

PUT sport/grade/4
{
  "user":"B",
  "grade":11
}
PUT sport/grade/5
{
  "user":"B",
  "grade":11
}

PUT sport/grade/6
{
  "user":"B",
  "grade":12
}


PUT sport/grade/7
{
  "user":"C",
  "grade":10
}

PUT sport/grade/8
{
  "user":"C",
  "grade":10
}

PUT sport/grade/9
{
  "user":"C",
  "grade":11
}

PUT sport/grade/10
{
  "user":"C",
  "grade":15
}

四、聚合

GET sport/_search
{
  "size": 0,
  "aggs": {
    "avg_score": {
      "terms": {
        "field": "user"
      },
      "aggs": {
        "max_score": {
          "max": {
            "field": "grade"
          }
        }
      }
    },
    "avg_max_score": {
      "avg_bucket": {
        "buckets_path": "avg_score>max_score"
      }
    }
  }
}

结果:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 10,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avg_score": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 4,
          "max_score": {
            "value": 15 }
        },
        {
          "key": "A",
          "doc_count": 3,
          "max_score": {
            "value": 13 }
        },
        {
          "key": "B",
          "doc_count": 3,
          "max_score": {
            "value": 12 }
        }
      ]
    },
    "avg_max_score": {
      "value": 13.333333333333334
    }
  }
}
以下是使用Java实现的求ES分组最大值并保留2位小数的代码: ```java import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.common.unit.TimeValue; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.aggregations.AggregationBuilders; import org.elasticsearch.search.aggregations.Aggregator; import org.elasticsearch.search.aggregations.bucket.terms.Terms; import org.elasticsearch.search.aggregations.metrics.max.Max; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.sort.SortOrder; import java.io.IOException; import java.util.concurrent.TimeUnit; public class ESMaxAggregation { public static void main(String[] args) throws IOException { RestHighLevelClient client = new RestHighLevelClient(); // 创建ES客户端 SearchRequest searchRequest = new SearchRequest("your_index_name"); // 创建搜索请求 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); // 设置分组聚合 sourceBuilder.aggregation(AggregationBuilders.terms("group_by_field_name") .field("field_name") .subAggregation(AggregationBuilders.max("max_value").field("value_field_name"))); // 设置排序方式 sourceBuilder.sort("sort_field_name", SortOrder.DESC); // 设置查询条件 sourceBuilder.query(QueryBuilders.matchAllQuery()); // 设置超时时间 sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // 设置返回数据条数,默认是10条 sourceBuilder.size(100); searchRequest.source(sourceBuilder); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); Terms terms = searchResponse.getAggregations().get("group_by_field_name"); for (Terms.Bucket bucket : terms.getBuckets()) { Max max = bucket.getAggregations().get("max_value"); double maxValue = max.getValue(); System.out.println(bucket.getKeyAsString() + " 的最大值为:" + String.format("%.2f", maxValue)); } client.close(); // 关闭ES客户端 } } ``` 其中,需要替换的参数有: - `your_index_name`:替换为你要查询的ES索引名称。 - `group_by_field_name`:替换为你要分组的字段名称。 - `field_name`:替换为你要聚合的字段名称。 - `value_field_name`:替换为你要求最大值的字段名称。 - `sort_field_name`:替换为你要排序的字段名称。 注意:需要引入 Elasticsearch 的 Java API 依赖。
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

esc_ai

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值