elasticsearch Histogram field type 使用及注意事项
Histogram
先附上文档链接: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/histogram.html
当在网络上搜索 elasticsearch Histogram 时,会有两个结果:
- type Histogram
- aggregation Histogram
但是 对于 aggregation 的结果会比较多,而 type 的却很少,那么,本篇博文主要记录 type Histogram 的使用以及注意事项。ps(本篇博文还有一些未理解的点待调研,因此,本篇博文会不断更新)
Histogram field type
Histogram 是由两个成对数组定义的类型。
它有以下注意事项:
- values 存储类型为 double 而且必须升序
- counts 必须是 integet 必须是正整数或者0
- 这两个数组的长度是一致的,这是因为他们的值一 一 对应
- 并且不支持 嵌套数组,以及排序。
Histogram 存储的数据为二进制文档,而不是索引,这样可以更快速的聚合,它的字节大小最多为 13*数组的长度。
Quick start
添加 mapping
PUT histogram_test
{
"mappings" : {
"properties" : {
"my_histogram" : {
"type" : "histogram"
},
"my_text" : {
"type" : "keyword"
}
}
}
}
添加数据
PUT histogram_test/_doc/1
{
"my_text" : "histogram_1",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5],
"counts" : [3, 7, 23, 12, 6]
}
}
PUT histogram_test/_doc/2
{
"my_text" : "histogram_2",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 1],
"counts" : [3, 7, 23, 12, 6]
}
}
Error example
错误示范: 添加 values 不是递增的字段
PUT histogram_test/_doc/1
{
"my_text" : "histogram_1",
"my_histogram" : {
"values" : [0.1, 0.2, 0.1, 0.4, 0.5],
"counts" : [3, 7, 23, 12, 6]
}
}
***********result**************
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [my_histogram] of type [histogram]",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
}
},
"status" : 400
}
错误示范:counts 的数值小于0
PUT histogram_test/_doc/3
{
"my_text" : "histogram_3",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 1],
"counts" : [3, 7, 23, 12, -6]
}
}
***********result**************
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [my_histogram] of type [histogram]",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
}
},
"status" : 400
}
Aggregation
- min aggregation
- max aggregation
- sum aggregation
- value_count aggregation
- avg aggregation
- percentiles aggregation (ps 还没搞懂,待调研)
- percentile ranks aggregation (ps 还没搞懂,待调研)
- boxplot aggregation (ps 还没搞懂,待调研)
- histogram aggregation
- range aggregation (ps 还没搞懂,待调研)
min aggregation
将 values 中 最小的值返回
GET /histogram_test/_search
{
"aggs": {
"min_latency": {
"min": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"min_latency" : {
"value" : 0.1
}
}
max
将 values 中 最大的值返回
GET /histogram_test/_search
{
"aggs": {
"max_histogram": {
"max": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"max_histogram" : {
"value" : 1.0
}
}
sum
将 values 和 counts 的一一对应的值进行相乘,最后在一起相加。
GET /histogram_test/_search
{
"aggs": {
"sum_histogram": {
"sum": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"sum_histogram" : {
"value" : 35.8
}
}
value_count
对所有 counts 的值进行相加。
GET /histogram_test/_search
{
"aggs": {
"count_histogram": {
"value_count": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"count_histogram" : {
"value" : 102
}
}
avg
将值数组 values 中的每个数字乘以其在计数数组 counts 中的关联计数。最终,它将计算所有直方图的这些值的平均值,可以理解成 sum / count.
GET /histogram_test/_search
{
"aggs": {
"avg_histogram": {
"avg": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"avg_histogram" : {
"value" : 0.3509803921568627
}
}
histogram aggregation
根据 values 计算出每个区间的数量。
interval 区间的间隔数。
GET /histogram_test/_search
{
"aggs": {
"histogram_histogram": {
"histogram": {
"field": "my_histogram",
"interval": 0.5
}
}
}
}
**********************value********************
"aggregations" : {
"histogram_histogram" : {
"buckets" : [
{
"key" : 0.0,
"doc_count" : 90
},
{
"key" : 0.5,
"doc_count" : 6
},
{
"key" : 1.0,
"doc_count" : 6
}
]
}
}
Query
只有指定的查询才可用。
exists query
GET /histogram_test/_search
{
"query": {
"exists": {
"field": "my_histogram"
}
}
}
END
博文中的待调研的部分,博主会在后续的时间里进行补齐,欢迎小伙伴们多多交流。