ElasticSearch5.X尝试聚合(一)

最新推荐文章于 2024-08-13 18:01:48 发布

东境物语

最新推荐文章于 2024-08-13 18:01:48 发布

阅读量2.2k

点赞数

分类专栏： elasticsearch 文章标签： Elasticsearch Aggregations 度量指标

本文链接：https://blog.csdn.net/wwd0501/article/details/78501842

版权

elasticsearch 专栏收录该内容

105 篇文章 39 订阅

订阅专栏

尝试聚合

我们可以用以下几页定义不同的聚合和它们的语法，但学习聚合的最佳途径就是用实例来说明。一旦我们获得了聚合的思想，以及如何合理地嵌套使用它们，那么语法就变得不那么重要了。

聚合的桶操作和度量的完整用法可以在 Elasticsearch 参考中找到。本章中会涵盖其中很多内容，但在阅读完本章后查看它会有助于我们对它的整体能力有所了解。

所以让我们先看一个例子。我们将会创建一些对汽车经销商有用的聚合，数据是关于汽车交易的信息：车型、制造商、售价、何时被出售等。

首先我们批量索引一些数据：

POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

拷贝为 CURL 在 SENSE 中查看

有了数据，开始构建我们的第一个聚合。汽车经销商可能会想知道哪个颜色的汽车销量最好，用聚合可以轻易得到结果，用 terms 桶操作：

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}

拷贝为 CURL 在 SENSE 中查看

	聚合操作被置于顶层参数 `aggs` 之下（如果你愿意，完整形式 `aggregations` 同样有效）。
	然后，可以为聚合指定一个我们想要名称，本例中是： `popular_colors` 。
	最后，定义单个桶的类型 `terms` 。

聚合是在特定搜索结果背景下执行的，这也就是说它只是查询请求的另外一个顶层参数（例如，使用 /_search 端点）。聚合可以与查询结对，但我们会晚些在限定聚合的范围（Scoping Aggregations）中来解决这个问题。

可能会注意到我们将 size 设置成 0 。我们并不关心搜索结果的具体内容，所以将返回记录数设置为 0 来提高查询速度。设置 size: 0 与 Elasticsearch 1.x 中使用 count 搜索类型等价。

然后我们为聚合定义一个名字，名字的选择取决于使用者，响应的结果会以我们定义的名字为标签，这样应用就可以解析得到的结果。

随后我们定义聚合本身，在本例中，我们定义了一个单 terms 桶。这个 terms 桶会为每个碰到的唯一词项动态创建新的桶。因为我们告诉它使用 color 字段，所以 terms 桶会为每个颜色动态创建新桶。

让我们运行聚合并查看结果：

{
...
   "hits": {
      "hits": [] 
   },
   "aggregations": {
      "popular_colors": { 
         "buckets": [
            {
               "key": "red", 
               "doc_count": 4 
            },
            {
               "key": "blue",
               "doc_count": 2
            },
            {
               "key": "green",
               "doc_count": 2
            }
         ]
      }
   }
}

	因为我们设置了 `size` 参数，所以不会有 hits 搜索结果返回。
	`popular_colors` 聚合是作为 `aggregations` 字段的一部分被返回的。
	每个桶的 `key` 都与 `color` 字段里找到的唯一词对应。它总会包含 `doc_count` 字段，告诉我们包含该词项的文档数量。
	每个桶的数量代表该颜色的文档数量。

响应包含多个桶，每个对应一个唯一颜色（例如：红或绿）。每个桶也包括 聚合进 该桶的所有文档的数量。例如，有四辆红色的车。

前面的这个例子完全是实时执行的：一旦文档可以被搜到，它就能被聚合。这也就意味着我们可以直接将聚合的结果源源不断的传入图形库，然后生成实时的仪表盘。不久，你又销售了一辆银色的车，我们的图形就会立即动态更新银色车的统计信息。

瞧！这就是我们的第一个聚合！

java代码实现：

①、实例化es

private static TransportClient client;
static {
	try {
		String esClusterName = "shopmall-es";
		List<String> clusterNodes = Arrays.asList("http://172.16.32.69:9300","http://172.16.32.48:9300");
		Settings settings = Settings.builder().put("cluster.name", esClusterName).build();
		client = new PreBuiltTransportClient(settings);
		for (String node : clusterNodes) {
			URI host = URI.create(node);
			client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
		}
	} catch (Exception e) {
		log.error("elasticsearchUtils init exception", e);
	}
}

public static void close() {
	client.close();
}

②、度量聚合实现

/**
      * Description:指标聚合查询，COUNT(color) ,min ,max,sum等相当于指标
      * 
      * @author wangweidong
      * CreateTime: 2017年11月9日 上午10:43:18
      * 
      * 数据格式：{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
      * 
      * 1、可能会注意到我们将 size 设置成 0 。我们并不关心搜索结果的具体内容，所以将返回记录数设置为 0 来提高查询速度。 设置 size: 0 与 Elasticsearch 1.x 中使用 count 搜索类型等价。
      * 
      * 2、对text 字段上的脚本进行排序，聚合或访问值时，出现Fielddata is disabled on text fields by default. Set fielddata=true on [color] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
      *    Fielddata默认情况下禁用文本字段，因为Fielddata可以消耗大量的堆空间，特别是在加载高基数text字段时。一旦fielddata被加载到堆中，它将在该段的生命周期中保持在那里。此外，加载fielddata是一个昂贵的过程，可以导致用户体验延迟命中。
      *    可以使用使用该my_field.keyword字段进行聚合，排序或脚本或者启用fielddata（不建议使用）
     */
    @Test
    public void metricsAggregation() {
    	try {
    		String index = "cars";
        	String type = "transactions";
            
        	SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
//        	MaxAggregationBuilder field = AggregationBuilders.max("max_price").field("price");
        	//MinAggregationBuilder 统计最小值
//        	MinAggregationBuilder field = AggregationBuilders.min("min_price").field("price");
        	//SumAggregationBuilder 统计合计
//        	SumAggregationBuilder field = AggregationBuilders.sum("sum_price").field("price");
        	//StatsAggregationBuilder 统计聚合即一次性获取最小值、最小值、平均值、求和、统计聚合的集合。
        	StatsAggregationBuilder field = AggregationBuilders.stats("stats_price").field("price");
            searchRequestBuilder.addAggregation(field);
            searchRequestBuilder.setSize(0);
            SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
            
            System.out.println(searchResponse.toString());
            Aggregations agg = searchResponse.getAggregations();
            if(agg == null) {
            	return;
            }
            
//            Max max = agg.get("max_price");
//            System.out.println(max.getValue());
            
//            Min min = agg.get("min_price");
//            System.out.println(min.getValue());
            
//            Sum sum = agg.get("sum_price");
//            System.out.println(sum.getValue());
            
            Stats stats = agg.get("stats_price");
            System.out.println("最大值："+stats.getMax());
            System.out.println("最小值："+stats.getMin());
            System.out.println("平均值："+stats.getAvg());
            System.out.println("合计："+stats.getSum());
            System.out.println("总条数："+stats.getCount());
            
		} catch (Exception e) {
			e.printStackTrace();
		}
    }

③、桶聚合实现

/**
     * Description:桶聚合查询，GROUP BY相当于桶
     * 
     * @author wangweidong
     * CreateTime: 2017年11月9日 下午3:47:54
     *
    */
   @Test
   public void bucketsAggregation() {
	   String index = "cars";
	   String type = "transactions";
	   SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
   	
       TermsAggregationBuilder field = AggregationBuilders.terms("popular_colors").field("color.keyword");
       searchRequestBuilder.addAggregation(field);
       searchRequestBuilder.setSize(0);
       SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
       
       System.out.println(searchResponse.toString());
       
       Terms genders = searchResponse.getAggregations().get("popular_colors");
	    for (Terms.Bucket entry : genders.getBuckets()) {
	    	Object key = entry.getKey();      // Term
	        Long count = entry.getDocCount(); // Doc count
	        
	        System.out.println(key);
	        System.out.println(count);
	    }
   }

文章参考：https://www.elastic.co/guide/cn/elasticsearch/guide/current/_aggregation_test_drive.html#_aggregation_test_drive