ElasticSearch之聚合aggs

写在前面

本文看下es的聚合相关内容。

1:什么是聚合

即,数据的统计分析。如sum,count,avg,min,max,分组等。

2:支持哪些聚合类型

2.1:bucket aggregation

对满足特定条件的文旦进行分组。如:
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
支持如下几种方式来对数据进行分桶:

1:terms 按照词项的值分桶
2:range 按照指定的数据范围分桶
3:histogram 按照直方图进行分桶,可以认为是range分桶的一种特殊情况
4:cardinality 按照terms分桶,然后计算桶的个数,这种准确来说是terms分区+metric的一种方式
  • range 分桶
    在这里插入图片描述
  • histogram 分桶
    在这里插入图片描述
  • cardinality
POST kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "天气的个数": {
      "cardinality": {
        "field": "OriginWeather"
      }
    }
  }
}

在这里插入图片描述
第一步:按照OriginWeather分成若干个桶,第二步计算桶的个数。相当于count(distinct field)

2.1:metric aggregation

数学运算。如:
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

2.3:pipeline aggregation

类似于linux命令中的管道,即在上一个agg的基础上再进行一次聚合分析。

从类型上分为两类,sibling和parent,如下:
在这里插入图片描述

2.4:matrix aggregation

对多个字段操作并提供一个结果矩阵???嘛意思???

3:实战

3.1:bucket aggregation

  • 按照飞行目的地分组:
    测试数据参考这里 。但如果你是按照专栏顺序看的,我想你本地已经有这些数据了。
GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "飞行目的地分组": {
      "terms": {
        "field": "DestCountry"
      }
    }
  }
}

在这里插入图片描述

  • range和histogram分组
    首先准备测试数据:
DELETE /employees
PUT /employees/
{
  "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "gender" : {
          "type" : "keyword"
        },
        "job" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 50
            }
          }
        },
        "name" : {
          "type" : "keyword"
        },
        "salary" : {
          "type" : "integer"
        }
      }
    }
}

PUT /employees/_bulk
{ "index" : {  "_id" : "1" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "index" : {  "_id" : "2" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "index" : {  "_id" : "3" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "index" : {  "_id" : "4" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "index" : {  "_id" : "5" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "index" : {  "_id" : "6" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "index" : {  "_id" : "7" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "index" : {  "_id" : "8" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "index" : {  "_id" : "9" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "index" : {  "_id" : "10" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "index" : {  "_id" : "11" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "index" : {  "_id" : "12" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "index" : {  "_id" : "13" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "index" : {  "_id" : "14" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "index" : {  "_id" : "15" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "index" : {  "_id" : "16" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "17" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "18" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "index" : {  "_id" : "19" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "index" : {  "_id" : "20" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}

range分组例子:

POST employees/_search
{
  "size": 0,
  "aggs": {
    "薪酬区间": {
      "range": {
        "field": "salary",
        "ranges": [
          {
            "to": 10000
          },
          {
            "from": 10000,
            "to": 20000
          },
          {
            "key": "->20000",
            "from": 20000
          }
        ]
      }
    }
  }
}

在这里插入图片描述

3.2:metric aggregation

按照飞行目的地分组后,对分组的数据取最大最小和平均值。

GET kibana_sample_data_flights/_search
{
    "size": 0,
    "aggs": {
        "飞行目的地分组": {
            "terms": {
                "field": "DestCountry"
            },
            "aggs": {
                "票价平均值": {
                    "avg": {
                        "field": "AvgTicketPrice"
                    }
                },
                "票价最大值": {
                    "max": {
                        "field": "AvgTicketPrice"
                    }
                },
                "票价最小值": {
                    "min": {
                        "field": "AvgTicketPrice"
                    }
                }
            }
        }
    }
}

在这里插入图片描述

按照飞行目的地分组后,通过stats取每个分组的统计信息(会列出min,max,avg等常用统计信息),并且对每个分组再按照天气做一次分组,并取前5条数据:

GET kibana_sample_data_flights/_search
{
    "size": 0,
    "aggs": {
        "飞行目的地分组": {
            "terms": {
                "field": "DestCountry"
            },
            "aggs": {
                "组内统计信息": {
                    "stats": {
                        "field": "AvgTicketPrice"
                    }
                },
                "weatherrr": {
                    "terms": {
                        "field": "DestWeather",
                        "size": 5
                    }
                }
            }
        }
    }
}

在这里插入图片描述
以上的例子是通过agg嵌套的方式先对数据分桶,然后再进行聚合,其实,也可以直接进行metric运行,此时查询到的数据集相当于都在一个桶中:

# 查询最远距离,最近距离DistanceKilometers
POST kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "最远距离": {
      "max": {
        "field": "DistanceKilometers"
      }
    },
    "最近距离": {
      "min": {
        "field": "DistanceKilometers"
      }
    },
    "平均距离": {
      "avg": {
        "field": "DistanceKilometers"
      }
    }
  }
}

3.3:pipeline aggregation

准备数据:

DELETE employees
PUT /employees/_bulk
{ "index" : {  "_id" : "1" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "index" : {  "_id" : "2" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "index" : {  "_id" : "3" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "index" : {  "_id" : "4" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "index" : {  "_id" : "5" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "index" : {  "_id" : "6" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "index" : {  "_id" : "7" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "index" : {  "_id" : "8" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "index" : {  "_id" : "9" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "index" : {  "_id" : "10" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "index" : {  "_id" : "11" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "index" : {  "_id" : "12" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "index" : {  "_id" : "13" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "index" : {  "_id" : "14" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "index" : {  "_id" : "15" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "index" : {  "_id" : "16" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "17" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "18" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "index" : {  "_id" : "19" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "index" : {  "_id" : "20" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}

3.3.1:sibling

与要基于其计算的聚合结果是平级的关系。

  • 找出平均工资最低的工作类型
    分为3步:1:按照工作类型进行分组 2:计算出每个分组的平均工资 3:找出平均工资最低的分组(使用pipeline aggr)
POST employees/_search
{
    "size": 0,
    "aggs": {
        "jobs": {
            "terms": {
                "field": "job.keyword",
                "size": 10
            },
            "aggs": {
                "avg_salary": {
                    "avg": {
                        "field": "salary"
                    }
                }
            }
        },
        "min_salary_by_job": {
          "min_bucket": {
            "buckets_path": "jobs.avg_salary"
          }
        }
    }
}

在这里插入图片描述
类似的还有。

  • 平均工资最高的工种
POST employees/_search
{
    "size": 0,
    "aggs": {
        "jobs": {
            "terms": {
                "field": "job.keyword",
                "size": 10
            },
            "aggs": {
                "avg_salary": {
                    "avg": {
                        "field": "salary"
                    }
                }
            }
        },
        "平均工资最高的工种": {
          "max_bucket": {
            "buckets_path": "jobs.avg_salary"
          }
        }
    }
}

结果:
在这里插入图片描述

  • 每个工种平均工资的平均工资
POST employees/_search
{
    "size": 0,
    "aggs": {
        "jobs": {
            "terms": {
                "field": "job.keyword",
                "size": 10
            },
            "aggs": {
                "avg_salary": {
                    "avg": {
                        "field": "salary"
                    }
                }
            }
        },
        "每个工种平均工资的平均工资": {
          "avg_bucket": {
            "buckets_path": "jobs.avg_salary"
          }
        }
    }
}

结果:
在这里插入图片描述

  • 多值输出
POST employees/_search
{
    "size": 0,
    "aggs": {
        "jobs": {
            "terms": {
                "field": "job.keyword",
                "size": 10
            },
            "aggs": {
                "avg_salary": {
                    "avg": {
                        "field": "salary"
                    }
                }
            }
        },
        "多值输出": {
          "stats_bucket": {
            "buckets_path": "jobs.avg_salary"
          }
        }
    }
}

结果:
在这里插入图片描述

3.3.2:parent

包含在要基于其计算的聚合结果中,即上个聚合结果作为其parent。

4:聚合的作用范围

4.1:什么是聚合的作用范围以及有哪些作用范围

基于哪些文档 进行聚合操作,叫做作用范围,在我们前面的例子中,我们没有加任何的作用范围,所以是针对的所有文档。

es提供了如下几种作用范围:

1:query
2:filter
3:post filter
4:global

4.2:各种作用范围实例

准备数据:

DELETE /employees
PUT /employees/
{
  "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "gender" : {
          "type" : "keyword"
        },
        "job" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 50
            }
          }
        },
        "name" : {
          "type" : "keyword"
        },
        "salary" : {
          "type" : "integer"
        }
      }
    }
}

PUT /employees/_bulk
{ "index" : {  "_id" : "1" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "index" : {  "_id" : "2" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "index" : {  "_id" : "3" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "index" : {  "_id" : "4" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "index" : {  "_id" : "5" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "index" : {  "_id" : "6" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "index" : {  "_id" : "7" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "index" : {  "_id" : "8" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "index" : {  "_id" : "9" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "index" : {  "_id" : "10" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "index" : {  "_id" : "11" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "index" : {  "_id" : "12" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "index" : {  "_id" : "13" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "index" : {  "_id" : "14" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "index" : {  "_id" : "15" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "index" : {  "_id" : "16" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "17" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "18" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "index" : {  "_id" : "19" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "index" : {  "_id" : "20" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}

4.2.1:query

基于es的query搜索功能先限定agg的作用范围。

POST employees/_search
{
    "size": 0,
    "query": {
        "range": {
            "age": {
                "gte": 40
            }
        }
    },
    "aggs": {
        "员工年龄大于40工作们": {
            "terms": {
                "field": "job.keyword"
            }
        }
    }
}

在这里插入图片描述
再比如查询:

POST employees/_search
{
    "size": 0,
    "query": {
        "range": {
            "age": {
                "gte": 40
            }
        }
    },
    "aggs": {
        "员工年龄大于40工作们": {
            "terms": {
                "field": "job.keyword"
            }
        },
        "员工年龄大于40员工最低工资": {
          "min": {
            "field": "salary"
          }
        }
    }
}

在这里插入图片描述

4.2.2:filter

通过filter可以对某个聚合操作设置单独的作用范围,如下查询年龄20到30的最低工资和年龄30到40的最高工资:

POST employees/_search
{
    "size": 0,
    "aggs": {
        "20到30员工": {
            "filter": {
                "range": {
                    "age": {
                        "from": 20,
                        "to": 30
                    }
                }
            },
            "aggs": {
                "最低工资": {
                    "min": {
                        "field": "salary"
                    }
                }
            }
        },
        "30到40员工": {
            "filter": {
                "range": {
                    "age": {
                        "from": 30,
                        "to": 40
                    }
                }
            },
            "aggs": {
                "最高工资": {
                    "max": {
                        "field": "salary"
                    }
                }
            }
        }
    }
}

在这里插入图片描述

4.2.3:post filter

用于对最终显式的hits结果做过滤,比如总共有10个文档被应用于聚合,但是并不希望展示这所有的10个文档,而只是在hits中返回满足特定条件的文档。post filter就是用来表达特定条件的。

也就是说,post filter和agg聚合本身没有毛线关系,只是用来影响hits结果。我猜因为常规的query过滤会影响aggs聚合,所以增加了post filter吧!

  • 按照工作分组并获取工资高于3万的员工信息
POST employees/_search
{
    "size": 100,
    "aggs": {
        "jobs": {
            "terms": {
                "field": "job.keyword"
            }
        }
    },
    "post_filter": {
        "range": {
            "salary": {
                "gt": 30000
            }
        }
    }
}

在这里插入图片描述

4.2.4:global

默认的query方式的作用范围会影响所有的agg聚合,如果希望某个聚合不受其影响的话,则可以使用global,如:

POST employees/_search
{
    "size": 0,
    "query": {
        "range": {
            "age": {
                "gte": 40
            }
        }
    },
    "aggs": {
        "年龄大于40的jobs": {
            "terms": {
                "field": "job.keyword"
            }
        },
        "所有的": {
            "global": {},
            "aggs": {
                "所有的jobs": {
                    "terms": {
                        "field": "job.keyword"
                    }
                }
            }
        }
    }
}

在这里插入图片描述

写在后面

参考文章列表

ElasticSearch之search API

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值