ElasticSearch学习（十八） --聚合查询

最新推荐文章于 2024-04-03 07:16:11 发布

dicklong91

最新推荐文章于 2024-04-03 07:16:11 发布

阅读量210

点赞数

分类专栏： java 文章标签： elasticsearch es

原文链接：https://blog.csdn.net/qq_23536449/article/details/92614132

版权

java 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

转载自：https://blog.csdn.net/chengyuqiang/column/info/18392，ES版本号6.3.0
转载自：https://blog.csdn.net/qq_23536449/article/details/92614132

为了满足桶聚合多样性需求，修改文档如下：

DELETE my-index
PUT my-index
PUT my-index/persion/1
{
  "name":"张三",
  "age":27,
  "gender":"男",
  "salary":15000,
  "dep":"bigdata"
}
PUT my-index/persion/2
{
  "name":"李四",
  "age":26,
  "gender":"女",
  "salary":15000,
  "dep":"bigdata"
}
PUT my-index/persion/3
{
  "name":"王五",
  "age":26,
  "gender":"男",
  "salary":17000,
  "dep":"AI"
}
PUT my-index/persion/4
{
  "name":"刘六",
  "age":27,
  "gender":"女",
  "salary":18000,
  "dep":"AI"
}
PUT my-index/persion/5
{
  "name":"程裕强",
  "age":31,
  "gender":"男",
  "salary":20000,
  "dep":"bigdata"
}
PUT my-index/persion/6
{
  "name":"hadron",
  "age":30,
  "gender":"男",
  "salary":20000,
  "dep":"AI"
}

Terms Aggreation：用于分组聚合
（1）根据薪资水平进行分组，统计每个薪资水平的人数

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "terms": {"field": "salary"}
    }
  }
}

返回结果

{
  "took": 24,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 15000,
          "doc_count": 2
        },
        {
          "key": 20000,
          "doc_count": 2
        },
        {
          "key": 17000,
          "doc_count": 1
        },
        {
          "key": 18000,
          "doc_count": 1
        }
      ]
    }
  }
}

有点类似mysql中的key,count(1) doc_count from table group by salary的味道

（2）统计上面每个分组的平均年龄

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "terms": {
        "field": "salary"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

返回结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 15000,
          "doc_count": 2,
          "avg_age": {
            "value": 26.5
          }
        },
        {
          "key": 20000,
          "doc_count": 2,
          "avg_age": {
            "value": 30.5
          }
        },
        {
          "key": 17000,
          "doc_count": 1,
          "avg_age": {
            "value": 26
          }
        },
        {
          "key": 18000,
          "doc_count": 1,
          "avg_age": {
            "value": 27
          }
        }
      ]
    }
  }
}

（3）统计每个部门的人数

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "terms": {
        "field": "dep"
      }
    }
  }
}

返回结果

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my-index",
        "node": "MLCxR4WISROEMF55izFigQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

根据错误提示可知需要开启fielddata参数。只需要设置某个字段"fileddata":true即可。此外根据文档提示se the my_field.keyword field for aggregations, sorting, or in scripts，可以尝my_field.keyword格式用于聚合操作

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "terms": {
        "field": "dep.keyword"
      }
    }
  }
}

返回结果

{
  "took": 13,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "AI",
          "doc_count": 3
        },
        {
          "key": "bigdata",
          "doc_count": 3
        }
      ]
    }
  }
}

Filter Aggregation：用于过滤器聚合，把满足过滤器条件的文件分到一组
（1）计算男人的平均年龄

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "filter": {
        "term": {
          "gender": "男"
        }
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

返回结果

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "doc_count": 4,
      "avg_age": {
        "value": 28.5
      }
    }
  }
}

Filters Aggregation
（1）统计body字段包含"error"和"warning"的文档数

PUT /logs/message/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "body" : "warning: page could not be rendered" }
{ "index" : { "_id" : 2 } }
{ "body" : "authentication error" }
{ "index" : { "_id" : 3 } }
{ "body" : "warning: connection timed out" }
GET logs/_search
{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "filters": {
          "errors" :   { "match" : { "body" : "error"   }},
          "warnings" : { "match" : { "body" : "warning" }}
        }
      }
    }
  }
}

返回结果

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "messages": {
      "buckets": {
        "errors": {
          "doc_count": 1
        },
        "warnings": {
          "doc_count": 2
        }
      }
    }
  }
}

（2）统计男女员工的平均年龄

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "filters": {
        "filters": [
          {"match":{"gender":"男"}},
           {"match":{"gender":"女"}}
          ]
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

返回结果

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "doc_count": 4,
          "avg_age": {
            "value": 28.5
          }
        },
        {
          "doc_count": 2,
          "avg_age": {
            "value": 26.5
          }
        }
      ]
    }
  }
}

Range Aggregation
from…to区间范围是[from,to),也就是说包含from点，不包含to点
（1）查询薪资在[0,10000),[10000,20000),[20000,+无穷)三个范围内的员工数

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "range": {
        "field": "salary",
        "ranges": [
          {"to":10000},
          {
            "from": 10000,
            "to": 20000
          },
          {"from": 20000}
        ]
      }
    }
  }
}

返回结果

{
  "took": 38,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "key": "*-10000.0",
          "to": 10000,
          "doc_count": 0
        },
        {
          "key": "10000.0-20000.0",
          "from": 10000,
          "to": 20000,
          "doc_count": 4
        },
        {
          "key": "20000.0-*",
          "from": 20000,
          "doc_count": 2
        }
      ]
    }
  }
}

（2）查询发布日期在2016-12-01之前、2016-12-01至2017-01-01、2017-01-01之后三个时间区间的文档数量

GET website/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "range": {
        "field": "postdate",
        "format":"yyyy-MM-dd",
        "ranges": [
            {"to": "2016-12-01"},
            {"from": "2016-12-01","to":"2017-01-01"},  
            {"from": "2017-01-01"}
        ]
      }
    }
  }
}

返回结果

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 10,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "key": "*-2016-12-01",
          "to": 1480550400000,
          "to_as_string": "2016-12-01",
          "doc_count": 0
        },
        {
          "key": "2016-12-01-2017-01-01",
          "from": 1480550400000,
          "from_as_string": "2016-12-01",
          "to": 1483228800000,
          "to_as_string": "2017-01-01",
          "doc_count": 7
        },
        {
          "key": "2017-01-01-*",
          "from": 1483228800000,
          "from_as_string": "2017-01-01",
          "doc_count": 2
        }
      ]
    }
  }
}

DateRange Aggregation：
专用于日期值的范围聚合。这种聚合和正常范围聚合的主要区别在于，起始和结束值可以在日期数学表达式中表示，并且还可以指定返回起始和结束响应字段的日期格式。请注意，此聚合包含from值并排除每个范围的值。

计算一年之前发表的博文数量和从一年前以来发表的博文总数

GET website/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "range": {
        "field": "postdate",
        "format":"yyyy-MM-dd",
        "ranges": [
            {"to": "now-12M/M"},
            {"from": "now-12M/M"}
        ]
      }
    }
  }
}

返回结果

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 10,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "key": "*-2018-02-01",
          "to": 1517443200000,
          "to_as_string": "2018-02-01",
          "doc_count": 9
        },
        {
          "key": "2018-02-01-*",
          "from": 1517443200000,
          "from_as_string": "2018-02-01",
          "doc_count": 0
        }
      ]
    }
  }
}

Missing Aggregation：
基于字段数据的单桶聚合，创建当前文档集上下文中缺少字段值（实际上是缺少字段或设置了NULL值）的所有文档的桶。此聚合器通常会与其他字段数据存储桶聚合器（如范围）一起使用，以返回由于缺少字段数据值而无法放置在其他存储桶中的所有文档的信息。

PUT my-index/persion/7
{
  "name":"test",
  "age":30,
  "gender":"男"
}
PUT my-index/persion/8
{
  "name":"abc",
  "age":28,
  "gender":"女"
}
PUT my-index/persion/9
{
  "name":"xyz",
  "age":32,
  "gender":"男",
  "salary":null,
  "dep":null
}

缺少salary字段的文档

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "noSalary_count": {
      "missing": {
        "field": "salary"
      }
    }
  }
}

返回结果

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "noSalary_count": {
      "doc_count": 3
    }
  }
}

children Aggregation:一个特殊的单桶聚合，用于选择具有指定类型的子文档，如join字段中定义的。
这种聚合有一个单一的选择：type-应该选择的子类型
（1）索引定义
下面通过join字段定义了一个单一关系,question是answer的父文档

PUT join_index
{
  "mappings": {
    "doc":{
      "properties": {
        "my_join_field":{
          "type": "join",
          "relations":{
            "question":"answer"
          }
        }
      }
    }
  }
}

（2）父文档question

PUT join_index/doc/1?refresh
{
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}
 
PUT join_index/doc/2?refresh
{
  "text": "This is a another question",
  "my_join_field": {
    "name": "question"
  }
}

（3）子文档answer

PUT join_index/doc/3?routing=1&refresh 
{
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}
 
PUT join_index/doc/4?routing=1&refresh
{
  "text": "This is another answer",
  "my_join_field": {
    "name": "answer",
    "parent": "1"
  }
}

（4）统计子文档数量

POST join_index/_search
{
  "size": 0,
  "aggs": {
    "to-answers": {
      "children": {
        "type": "answer"
      }
    }
  }
}

返回结果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "to-answers": {
      "doc_count": 2
    }
  }

下一篇：ElasticSearch学习（十九） --java API 基本使用