最近在工作中有使用到ES的Metrics 聚合统计功能,学习了一下,在此做记录。
数据为kibana中提供的index:kibana_sample_data_ecommerce
mapping:
GET kibana_sample_data_ecommerce
{
"kibana_sample_data_ecommerce" : {
"aliases" : { },
"mappings" : {
"properties" : {
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"currency" : {
"type" : "keyword"
},
"customer_birth_date" : {
"type" : "date"
},
"customer_first_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_full_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_gender" : {
"type" : "keyword"
},
"customer_id" : {
"type" : "keyword"
},
"customer_last_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_phone" : {
"type" : "keyword"
},
"day_of_week" : {
"type" : "keyword"
},
"day_of_week_i" : {
"type" : "integer"
},
"email" : {
"type" : "keyword"
},
"event" : {
"properties" : {
"dataset" : {
"type" : "keyword"
}
}
},
"geoip" : {
"properties" : {
"city_name" : {
"type" : "keyword"
},
"continent_name" : {
"type" : "keyword"
},
"country_iso_code" : {
"type" : "keyword"
},
"location" : {
"type" : "geo_point"
},
"region_name" : {
"type" : "keyword"
}
}
},
"manufacturer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"order_date" : {
"type" : "date"
},
"order_id" : {
"type" : "keyword"
},
"products" : {
"properties" : {
"_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"base_price" : {
"type" : "half_float"
},
"base_unit_price" : {
"type" : "half_float"
},
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"created_on" : {
"type" : "date"
},
"discount_amount" : {
"type" : "half_float"
},
"discount_percentage" : {
"type" : "half_float"
},
"manufacturer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"min_price" : {
"type" : "half_float"
},
"price" : {
"type" : "half_float"
},
"product_id" : {
"type" : "long"
},
"product_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
},
"analyzer" : "english"
},
"quantity" : {
"type" : "integer"
},
"sku" : {
"type" : "keyword"
},
"tax_amount" : {
"type" : "half_float"
},
"taxful_price" : {
"type" : "half_float"
},
"taxless_price" : {
"type" : "half_float"
},
"unit_discount_amount" : {
"type" : "half_float"
}
}
},
"sku" : {
"type" : "keyword"
},
"taxful_total_price" : {
"type" : "half_float"
},
"taxless_total_price" : {
"type" : "half_float"
},
"total_quantity" : {
"type" : "integer"
},
"total_unique_products" : {
"type" : "integer"
},
"type" : {
"type" : "keyword"
},
"user" : {
"type" : "keyword"
}
}
},
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"auto_expand_replicas" : "0-1",
"provided_name" : "kibana_sample_data_ecommerce",
"creation_date" : "1660034254891",
"number_of_replicas" : "1",
"uuid" : "w9HC4Ed2RXCrUeShpLRjUw",
"version" : {
"created" : "7100299"
}
}
}
}
}
Metrics aggregations
求和-sum
我们先做一个sum计算。基于taxful_total_price字段的所有数据的求和。
size写0的话,就不会出现明细数据,即hits里边返回的数据。
track_total_hits是获取查询结果的汇总。
GET kibana_sample_data_ecommerce/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"sum_taxful_total_price": {
"sum": {
"field": "taxful_total_price"
}
}
}
}
平均值-avg and others
GET kibana_sample_data_ecommerce/_search
{
"query": {
"term": {
"day_of_week": {
"value": "Monday"
}
}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"sum_taxful_total_price": {
"sum": {
"field": "taxful_total_price"
}
},
"avg_taxful_total_price":{
"avg": {
"field": "taxful_total_price"
}
}
}
}
ES中单值统计有 sum
、min
、max
、avg
,都是同样的写法。这里就不再写了。
stats可以统计上述所有的。
GET kibana_sample_data_ecommerce/_search
{
"query": {
"term": {
"day_of_week": {
"value": "Monday"
}
}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"stats_taxful_total_price": {
"stats": {
"field": "taxful_total_price"
}
}
}
}
extended_stats可以统计包含stats的,以及一些其他的,用于某些专业领域的。
GET kibana_sample_data_ecommerce/_search
{
"query": {
"term": {
"day_of_week": {
"value": "Monday"
}
}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"extended_stats_taxful_total_price": {
"extended_stats": {
"field": "taxful_total_price"
}
}
}
}
百分比统计-percentiles
,可以统计taxful_total_price这个字段的占比。
27.984375这个价钱以下的数据占比大概在5%左右。 可以用range查询进行计算。
GET kibana_sample_data_ecommerce/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"percentiles_taxful_total_price": {
"percentiles": {
"field": "taxful_total_price",
"percents": [
1,
5,
25,
50,
75,
95,
99
]
}
}
}
}
返回结果
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4675,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"percentiles_taxful_total_price" : {
"values" : {
"1.0" : 21.984375,
"5.0" : 27.984375,
"25.0" : 44.96875,
"50.0" : 63.96875,
"75.0" : 93.205078125,
"95.0" : 156.0,
"99.0" : 222.0
}
}
}
}
百分比占比-percentile_ranks
这个函数(percentile_ranks)可以查询到价钱100和200在所有数据中的占比。
GET kibana_sample_data_ecommerce/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"percentile_ranks_taxful_total_price": {
"percentile_ranks": {
"field": "taxful_total_price",
"values": [
100,
200
]
}
}
}
}
基数-cardinality
相当于mysql中的distinct
GET kibana_sample_data_ecommerce/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"cardinality_day_of_week": {
"cardinality": {
"field": "day_of_week"
}
}
}
}
字符长度统计-string_stats
Note:这里的统计都要基于keyword类型来统计,text类型会报错
GET kibana_sample_data_ecommerce/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"string_stats_customer_full_name": {
"string_stats": {
"field": "customer_full_name.keyword"
}
}
}
}
官网资料:
ES官网聚合