Elasticsearch带权重的指标聚合方式 weighted_avg

聚合权重求平均:weighted_avg

关于聚合有权重时,求平均的方式,总结了以下几点:
1、首先确定平均值字段和权重字段,字段都为数值类型
2、权重的字段
​ ​ ​ ​ ​ ​如果权重都相等,则和常规求平均结果一致
​ ​ ​ ​ ​ 如果权重都为0,则平均值为null
​ ​ ​ ​ ​ ​如果权重值不同,可以根据下面的公式进行计划​ ​

废话少说,直接上代码,验证下:

1、先创建Index

//创建Index
PUT /demo_avg_test
{
  "mappings": {
    "properties": {
      "lesson":{
        "type": "text"
      },
      "scores": {
        "type": "double"
      },
      "weighttest": {
        "type": "integer"
      }
    }
  }
}

2、导入文档有两种形式:单个或批量

  1. 单个导入
//单个文档导入
PUT /demo_avg_test/_doc/1
{
  "id":1,
  "lesson":"语言",
  "scores":10,
  "weighttest":20
}
PUT /demo_avg_test/_doc/2
{
  "id":2,
  "lesson":"历史",
  "scores":40,
  "weighttest":100
}

PUT /demo_avg_test/_doc/3
{
  "id":3,
  "lesson":"数学",
  "scores":50,
  "weighttest":100
}

2.批量导入文档

//批量导入文档
PUT /_bulk
{"index":{"_index":"demo_avg_test","_id":1}}
{"lesson":"语言","scores":10,"weighttest":20}
{"index":{"_index":"demo_avg_test","_id":2}}
{"lesson":"历史","scores":20,"weighttest":40}
{"index":{"_index":"demo_avg_test","_id":3}}
{"lesson":"数学","scores":50,"weighttest":100}

3、查询全部:验证下数据

//查询全部
GET /demo_avg_test/_search
{
  "query": {"match_all": {}}
}

4、聚合权重求平均值

聚合权重平均查询

#聚合权重平均查询
GET /demo_avg_test/_search
{
  "size": 3,
  "aggs": {
    "weighted_type": {
      "weighted_avg": {
        "value": {
          "field": "scores"
        },
        "weight": {
          "field": "weighttest"
        }
      }
    }
  }
}

说明:
-求平均值字段:scores
-权重字段:weighttest

平均分计算规则:
加权平均值:∑(值*权重) / ∑(权重)

公式:
加 权 平 均 值 为 ∑ ( 值 ∗ 权 重 ) / ∑ ( 权 重 ) 加权平均值为∑(值*权重)/∑(权重) ()/()

1)权重值不等时
#结果1
{
  "took" : 281,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "语言",
          "scores" : 10,
          "weighttest" : 20
        }
      },
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "历史",
          "scores" : 20,
          "weighttest" : 40
        }
      },
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "数学",
          "scores" : 50,
          "weighttest" : 100
        }
      }
    ]
  },
  "aggregations" : {
    "weighted_type" : {
      "value" : 37.5
    }
  }
}

平均分数:37.5
1020+2040+50*100/(20+40+100)=37.5

2)权重值为0,则平均值为null
#结果2

{
  "took" : 719,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "语言",
          "scores" : 10,
          "weighttest" : 0
        }
      },
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "历史",
          "scores" : 20,
          "weighttest" : 0
        }
      },
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "数学",
          "scores" : 50,
          "weighttest" : 0
        }
      }
    ]
  },
  "aggregations" : {
    "weighted_type" : {
      "value" : null
    }
  }
}

平均分数:null

3)权重值都相等时
#结果3
#如果将权重值都修改为相同值,则权重如常规变量1一样
下面将权重修改为100,其平均值为:(10+20+50)/3=26.6666666
{
  "took" : 296,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "语言",
          "scores" : 10,
          "weighttest" : 100
        }
      },
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "历史",
          "scores" : 20,
          "weighttest" : 100
        }
      },
      {
        "_index" : "demo_avg_test",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "lesson" : "数学",
          "scores" : 50,
          "weighttest" : 100
        }
      }
    ]
  },
  "aggregations" : {
    "weighted_type" : {
      "value" : 26.666666666666668
    }
  }
}

平均值:26.6
下面将权重修改为100,其平均值为:(10+20+50)/3=26.6

到此结束了,关于聚合的权重求平均,还有脚本的方式,以及参数缺省的情况还有多种情况,后面再介绍了。

关于聚合其实官网提供了多种指标聚合方式,求最大、最小、平均、汇总、折叠等
今天在看文档时,看到这个挺意思,自测了下,作个记录。

官网链接:7.6.2.

加油!日拱一卒无有尽,功不唐捐终入海!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Elasticsearch 中,可以使用聚合(Aggregation)实现对文档进行聚合统计,其中包括出现次数的统计。下面是一个示例: 假设我们有一个名为 "sales" 的索引,包含以下文档: ``` { "product": "A", "price": 10.0, "timestamp": "2021-08-01T10:00:00Z" } { "product": "B", "price": 15.0, "timestamp": "2021-08-01T10:05:00Z" } { "product": "A", "price": 12.0, "timestamp": "2021-08-01T10:10:00Z" } { "product": "C", "price": 20.0, "timestamp": "2021-08-01T10:15:00Z" } { "product": "A", "price": 8.0, "timestamp": "2021-08-01T10:20:00Z" } { "product": "B", "price": 18.0, "timestamp": "2021-08-01T10:25:00Z" } ``` 现在,我们想要统计每个产品出现的次数,可以使用以下聚合查询: ``` { "aggs": { "products": { "terms": { "field": "product" } } } } ``` 其中,"aggs" 是聚合查询的关键字,"products" 是我们给这个聚合起的名字,"terms" 表示我们要按照某个字段进行分组,"field" 指定了我们要按照哪个字段进行分组。 运行上述查询后,得到的结果如下: ``` { "aggregations": { "products": { "buckets": [ { "key": "A", "doc_count": 3 }, { "key": "B", "doc_count": 2 }, { "key": "C", "doc_count": 1 } ] } } } ``` 其中,"key" 表示产品名称,"doc_count" 表示该产品出现的次数。 如果想要对出现次数进行排序,可以使用以下聚合查询: ``` { "aggs": { "products": { "terms": { "field": "product", "order": { "_count": "desc" } } } } } ``` 其中,"order" 表示按照什么字段进行排序,"_count" 表示按照出现次数进行排序,"desc" 表示降序排列。 运行上述查询后,得到的结果如下: ``` { "aggregations": { "products": { "buckets": [ { "key": "A", "doc_count": 3 }, { "key": "B", "doc_count": 2 }, { "key": "C", "doc_count": 1 } ] } } } ``` 其中,产品 A 出现的次数最多,排在第一位。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值