Elasticsearch实战(十五)---查询query,filter过滤,结合aggs 进行局部/全局聚合统计

Elasticsearch实战-查询query,filter过滤,结合aggs 进行局部/全局聚合统计

1.准备数据
POST /testcopy/_bulk
{"index":{"_id": 1}}
{"empId" : "111","name" : "员工1","age" : 20,"sex" : "男","mobile" : "19000001111","salary":1333,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"光谷大道","address":"湖北省武汉市洪山区光谷大厦","content" : "i like to write best elasticsearch article"}
{"index":{"_id": 2}}
{"empId" : "222","name" : "员工2","age" : 25,"sex" : "男","mobile" : "19000002222","salary":15963,"deptName" : "销售部","provice" : "湖北省","city":"武汉","area":"江汉区","address" : "湖北省武汉市江汉路","content" : "i think java is the best programming language"}
{"index":{"_id": 3}}
{ "empId" : "333","name" : "员工3","age" : 30,"sex" : "男","mobile" : "19000003333","salary":20000,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"经济技术开发区","address" : "湖北省武汉市经济开发区","content" : "i am only an elasticsearch beginner"}
{"index":{"_id": 4}}
{"empId" : "444","name" : "员工4","age" : 20,"sex" : "女","mobile" : "19000004444","salary":5600,"deptName" : "销售部","provice" : "湖北省","city":"武汉","area":"沌口开发区","address" : "湖北省武汉市沌口开发区","content" : "elasticsearch and hadoop are all very good solution, i am a beginner"}
{"index":{"_id": 5}}
{ "empId" : "555","name" : "员工5","age" : 20,"sex" : "男","mobile" : "19000005555","salary":9665,"deptName" : "测试部","provice" : "湖北省","city":"高新开发区","area":"武汉","address" : "湖北省武汉市东湖隧道","content" : "spark is best big data solution based on scala ,an programming language similar to java"}
{"index":{"_id": 6}}
{"empId" : "666","name" : "员工6","age" : 30,"sex" : "女","mobile" : "19000006666","salary":30000,"deptName" : "技术部","provice" : "武汉市","city":"湖北省","area":"江汉区","address" : "湖北省武汉市江汉路","content" : "i like java developer"}
{"index":{"_id": 7}}
{"empId" : "777","name" : "员工7","age" : 60,"sex" : "女","mobile" : "19000007777","salary":52130,"deptName" : "测试部","provice" : "湖北省","city":"黄冈市","area":"边城区","address" : "湖北省黄冈市边城区","content" : "i like elasticsearch developer"}
{"index":{"_id": 8}}
{"empId" : "888","name" : "员工8","age" : 19,"sex" : "女","mobile" : "19000008888","salary":60000,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"汉阳区","address" : "湖北省武汉市江汉大学","content" : "i like spark language"}
{"index":{"_id": 9}}
{"empId" : "999","name" : "员工9","age" : 40,"sex" : "男","mobile" : "19000009999","salary":23000,"deptName" : "销售部","provice" : "河南省","city":"郑州市","area":"二七区","address" : "河南省郑州市郑州大学","content" : "i like java developer"}
{"index":{"_id": 10}}
{"empId" : "101010","name" : "张湖北","age" : 35,"sex" : "男","mobile" : "19000001010","salary":18000,"deptName" : "测试部","provice" : "湖北省","city":"武汉","area":"高新开发区","address" : "湖北省武汉市东湖高新","content" : "i like java developer i also like  elasticsearch"}
{"index":{"_id": 11}}
{"empId" : "111111","name" : "王河南","age" : 61,"sex" : "男","mobile" : "19000001011","salary":10000,"deptName" : "销售部",,"provice" : "河南省","city":"开封市","area":"金明区","address" : "河南省开封市河南大学","content" : "i am not like  java "}
{"index":{"_id": 12}}
{"empId" : "121212","name" : "张大学","age" : 26,"sex" : "女","mobile" : "19000001012","salary":1321,"deptName" : "测试部",,"provice" : "河南省","city":"开封市","area":"金明区","address" : "河南省开封市河南大学","content" : "i am java developer  thing java is good"}
{"index":{"_id": 13}}
{"empId" : "131313","name" : "李江汉","age" : 36,"sex" : "男","mobile" : "19000001013","salary":1125,"deptName" : "销售部","provice" : "河南省","city":"郑州市","area":"二七区","address" : "河南省郑州市二七区","content" : "i like java and java is very best i like it do you like java "}
{"index":{"_id": 14}}
{"empId" : "141414","name" : "王技术","age" : 45,"sex" : "女","mobile" : "19000001014","salary":6222,"deptName" : "测试部",,"provice" : "河南省","city":"郑州市","area":"金水区","address" : "河南省郑州市金水区","content" : "i like c++"}
{"index":{"_id": 15}}
{"empId" : "151515","name" : "张测试","age" : 18,"sex" : "男","mobile" : "19000001015","salary":20000,"deptName" : "技术部",,"provice" : "河南省","city":"郑州市","area":"高新开发区","address" : "河南省郑州高新开发区","content" : "i think spark is good"}
2. ES 查询query,filter过滤,结合aggs 聚合统计
2.1 查询命中后,基于查询的数据进行聚合

前面我们讲的所有的聚合操作 都是没有查询的,都是上来直接 aggs 进行 聚合 avg, count, 如果现在我想统计 技术部的人 的平均年龄该如何实现?
实现 某个部门(技术部)的平均年龄的统计,先查询然后基于查询结果进行统计 技术部最大年龄,最小年龄,平均年龄

#query先查询,然后基于查询结果进行统计 max , min, avg
get /testcopy/_search
{
  "query":{
    "match_phrase": {
      "deptName.keyword": "技术部"
    }
  },
  //基于 query平级,进行aggs聚合操作,就是用query结果进行aggs聚合统计
  "aggs":{
    "tech_avg_age":{
      "avg": {
        "field": "age"
      }
    },
    "max_age":{
      "max": {
        "field": "age"
      }
    },
    "min_age":{
      "min": {
        "field": "age"
      }
    }
  }
}

查询结果 四个技术部员工, 是再查询出来这四个员工的 基础上 再次进行的统计分析
技术部 max age:30, min age:19, avg age:24.75
在这里插入图片描述

2.2 基于 filter 过滤后,基于此数据进行聚合

上面我们讲了 基于 query的数据 进行 aggs 统计分析, 那是否能和 filter 结合来进行过滤呢? 当然可以!

实现 过滤 年龄在 (25,60)之间的人, 然后 基于这部分数据进行 平均年龄的统计

# 过滤 filter 过滤 25-40的人,然后 基于过滤结果进行统计  avg
get /testcopy/_search
{
  "query":{
    "bool": {
      "filter": [
        {
          "range": {
            "age": {
              "gte": 25,
              "lte": 40
            }
          }
        }
      ]
    }
  },
  "aggs":{
    "avg_age":{
      "avg": {
        "field": "age"
      }
    }
  }
}


查询过滤 结果 6个技术部员工, 基于过滤基础之上 再次进行的统计分析, 得出平均年龄 32.66
在这里插入图片描述

2.3 基于查询query, filter 综合过滤后,基于此数据进行聚合

上面我们讲了 基于 query的数据 进行 aggs 统计分析, 那是否能和 filter 结合来进行过滤呢? 基于 查询, 过滤后的数据进行统计分析 能否可行?
当然可以!

实现 查询 技术部 过滤 年龄在 (25,60)之间的 :武汉的 然后 基于这部分数据进行 平均年龄的统计

#query先查询,然后 filter 过滤 25-60的人,然后 基于查询过滤结果进行统计  avg
get /testcopy/_search
{
  "query":{
    "bool": {
      "must": [
        {
          "match": {
            "deptName.keyword": "技术部"
          }
        }
      ],
      //must 平级 进行filter 过滤
      "filter": [
        {
          "range": {
            "age": {
              "gte": 25,
              "lte": 60
            }
          }
        }
      ]
    }
  },
  //query 结束, 平级 基于query查询过滤结果 进行 aggs
  "aggs":{
    "avg_age":{
      "avg": {
        "field": "age"
      }
    }
  }
}

查询过滤 结果 2个技术部员工, 基于基础之上 再次进行的统计分析, 得出平均年龄 30
在这里插入图片描述

3 Global bucket 全局bucket统计
3.1 局部bucket统计与全局global bucket统计

比如 现在 我想 部门的平均年龄和 所有整个公司的人的平均年龄的 来做对比,应该如何实现?分两次查询先查部门,然后查全部么,最后再做对比? 太麻烦了

  • 不是这样子的,ES提供了 global参数 来控制 全局统计, global定义了一个全局桶bucket
  • Global 忽略查询条件, 直接对所有document 数据进行统计
    场景:
    实现 某个部门的平均年龄和整个doc的平均年龄 的统计
#global:{} 在aggs 的分组名字内部, 就是忽略上面的查询条件, 进行全局统计
get /testcopy/_search
{
  "size":0,
  "query":{
    "match": {
      "deptName.keyword": "技术部"
    }
  },
  "aggs":{
    "tech_avg":{
      "avg": {
        "field": "age"
      }
    },
    //aggs内部 tech_avg 平级 进行全局统计 global bucket
    "all_avg_age":{
      "global": {},
      "aggs": {
        "all_of_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

查询结果 global 统计了11个doc, 然后 平均年龄30.45
单个技术部 有 4条数据doc,平均年龄是 24.75
在这里插入图片描述


至此 我们已经学习了 查询query,filter过滤,结合aggs 进行局部/全局聚合统计的基本用法,实现了 查询 query, filter过滤 及 融合aggs进行统计分析,及对比 局部/全局 global aggs 聚合统计, 下一篇,我们介绍下 TOP N 排名推荐

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值