5.1 ElasticSearch聚合分析之Metric

1.简介
聚合分析是用来对ElasticSearch中存储的数据进行统计分析的,主要包括三种类型。
(1).Metric
指标分析类型主要分为两类,一类是单值分析,只输出一个分析结果,主要包括min、max、avg、sum和cardinality(没有count)。另一类是多值分析,可以输出多个分析结果,如stats、extended_stats、percentiles、percentile_ranks和top hits。

(2).Bucket
分桶分析类型是按照一定的规则将文档分配到不同的桶中,达到分类分析的目的,类似于sql中的group by。主要包括terms、range、date range、histogram、date histogram。

(3).Pipline
管道分析类型,基于上一级的聚合分析结果进行再分析,而且支持链式调用。pipeline的分析结果会输出到原结果中,根据输出位置的不同,主要分为两类,一类是结果与现有聚合分析同级的sibling,主要包括max、min、avg、sum bucket以及stats、extended stats、percentiles bucket。另一类是结果内嵌到现有聚合分析结果中的parent,主要包括derivative、moving average和cumulative。

(4).文档准备
打开kibana Dev Tools,分别添加索引和文档记录。

PUT /employee
{
	"settings": {
		"number_of_shards": 5,
		"number_of_replicas": 1
	}
}
PUT /employee/_mapping
{
	"properties": {
	  "name": {
			"type": "text"
		},
		"age": {
			"type": "integer"
		},
		"birthday": {
			"type": "date",
			"format": "yyyy-MM-dd"
		},
		"job": {
			"type": "keyword"
		},
		"salary": {
			"type": "float"
		}
	}
}
POST /employee/_doc
{
	"name": "James Harden",
	"job": "Java engineer",
	"age": 31,
	"salary": 30000.00,
	"birthday": "1991-01-01"
}
{
	"name": "Stephen Curry",
	"job": "Java engineer",
	"age": 27,
	"salary": 20000.00,
	"birthday": "1995-08-06"
}
{
	"name": "LeBron James",
	"job": "Technical director",
	"age": 35,
	"salary": 50000.00,
	"birthday": "1987-12-25"
}
{
	"name": "Damian Lillard",
	"job": "Vue engineer",
	"age": 25,
	"salary": 18000.00,
	"birthday": "1996-10-01"
}
{
	"name": "Kevin Durant",
	"job": "Vue engineer",
	"age": 30,
	"salary": 28000.00,
	"birthday": "1992-05-01"
}
{
	"name": "Chirs Paul",
	"job": "Java engineer",
	"age": 33,
	"salary": 29000.00,
	"birthday": "1988-12-02"
}
{
	"name": "Jason Tatum",
	"job": "Java engineer",
	"age": 24,
	"salary": 15000.00,
	"birthday": "1997-08-02"
}

2.单值查询
(1).min、max、avg、sum
min、max、avg、sum分别类似于sql中的min、max、avg、sum功能,如对salary字段进行avg查询。

POST /employee/_search
{
	"size": 0,
	"aggs": {
		"salary_avg": {
			"avg": {
				"field": "salary"
			}
		}
	}
}
{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "salary_avg" : {
      "value" : 26250.0
    }
  }
}

(2).cardinality
统计不同数值的个数,类似sql中的distinct功能,如对job字段进行cardinality查询。

POST /employee/_search
{
	"size": 0,
	"aggs": {
		"job_distinct": {
			"cardinality": {
				"field": "job"
			}
		}
	}
}
{
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "job_distinct" : {
      "value" : 3
    }
  }
}

3.多值查询
(1).stats
返回一系列数值类型的统计值,包含min、max、avg、sum和count,如对salary字段进行stats查询。

POST /employee/_search
{
	"size": 0,
	"aggs": {
		"salary_stats": {
			"stats": {
				"field": "salary"
			}
		}
	}
}
{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "salary_stats" : {
      "count" : 8,
      "min" : 15000.0,
      "max" : 50000.0,
      "avg" : 26250.0,
      "sum" : 210000.0
    }
  }
}

(2).extended_stats
对stats的扩展,包含了更多的统计数据,如平方和(sum_of_squares)、方差(variance)、标准差(std_deviation)和标准差范围(std_deviation_bounds),如对salary字段进行extended_stats查询。

POST /employee/_search
{
	"size": 0,
	"aggs": {
		"salary_extended_stats": {
			"extended_stats": {
				"field": "salary"
			}
		}
	}
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "salary_extended_stats" : {
      "count" : 8,
      "min" : 15000.0,
      "max" : 50000.0,
      "avg" : 26250.0,
      "sum" : 210000.0,
      "sum_of_squares" : 6.374E9,
      "variance" : 1.076875E8,
      "std_deviation" : 10377.25879025863,
      "std_deviation_bounds" : {
        "upper" : 47004.51758051726,
        "lower" : 5495.4824194827415
      }
    }
  }
}

(3).percentiles
百分位数统计,通常用于统计数据分布情况,如对salary字段进行percentiles查询。

POST /employee/_search
{
	"size": 0,
	"aggs": {
		"salary_percentiles": {
			"percentiles": {
				"field": "salary"
			}
		}
	}
}
{
  "took" : 18,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "salary_percentiles" : {
      "values" : {
        "1.0" : 15000.0,
        "5.0" : 15000.0,
        "25.0" : 19000.0,
        "50.0" : 24000.0,
        "75.0" : 29500.0,
        "95.0" : 50000.0,
        "99.0" : 50000.0
      }
    }
  }
}

百分位数统计名称解释:高等院校的入学考试成绩经常以百分位数的形式统计。假设某个考生在入学考试中的语文原始分数为54分。相对于参加同一考试的其他学生来说,并不容易知道他的成绩如何。但是如果原始分数54分恰好对应的是第70百分位数,我们就能知道大约70%的学生的考分比他低,而约30%的学生考分比他高。

(4).percentile_ranks
百分位数统计的逆操作,返回值所在的百分位,如对salary字段,查询其值为28000和30000的百分位。

POST /employee/_search
{
	"size": 0,
	"aggs": {
		"salary_percentile_ranks": {
			"percentile_ranks": {
				"field": "salary",
				"values": [28000,30000]
			}
		}
	}
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "salary_percentile_ranks" : {
      "values" : {
        "28000.0" : 61.111111111111114,
        "30000.0" : 75.59523809523809
      }
    }
  }
}

(5).top_hits
一般用于分桶后获取该桶内最匹配文档的顶部文档列表,即详情数据,如获取salary字段前3个详情。

POST /employee/_search
{
	"size": 0,
	"aggs": {
		"salary_top_hits": {
			"top_hits": {
				"size": 3,
				"sort": [{
					"salary": {
						"order": "desc"
					}
				}]
			}
		}
	}
}
{
  "took" : 363,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "salary_top_hits" : {
      "hits" : {
        "total" : {
          "value" : 7,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [
          {
            "_index" : "employee",
            "_type" : "_doc",
            "_id" : "w4l1hnsBEsHOdz1YM8pq",
            "_score" : null,
            "_source" : {
              "name" : "LeBron James",
              "job" : "Technical director",
              "age" : 35,
              "salary" : 50000.0,
              "birthday" : "1987-12-25"
            },
            "sort" : [
              50000.0
            ]
          },
          {
            "_index" : "employee",
            "_type" : "_doc",
            "_id" : "wYl0hnsBEsHOdz1Y4cqT",
            "_score" : null,
            "_source" : {
              "name" : "James Harden",
              "job" : "Java engineer",
              "age" : 31,
              "salary" : 30000.0,
              "birthday" : "1991-01-01"
            },
            "sort" : [
              30000.0
            ]
          },
          {
            "_index" : "employee",
            "_type" : "_doc",
            "_id" : "xol1hnsBEsHOdz1YXcqt",
            "_score" : null,
            "_source" : {
              "name" : "Chirs Paul",
              "job" : "Java engineer",
              "age" : 33,
              "salary" : 29000.0,
              "birthday" : "1988-12-02"
            },
            "sort" : [
              29000.0
            ]
          }
        ]
      }
    }
  }
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值