Elasticsearch: range 数据类型及基于range的聚合 (7.4发行版新功能)

962 篇文章 586 订阅

在Elasticsearch中有一种数据类型叫做 range 的数据类型。它目前支持的类型如下:

integer_range一个带符号的32位整数范围,最小值为,最大值为
float_range一系列单精度32位IEEE 754浮点值。
long_range一系列带符号的64位整数,最小值为-2的63次方,最大值为2的63次方-1。
double_range一系列双精度64位IEEE 754浮点值。
date_range自系 EPOCH 以来经过的一系列日期值,表示为无符号的64位整数毫秒。
ip_range支持IPv4或IPv6(或混合)地址的一系列ip值。

Range 数据类型搜索

下面是一个简单的例子来展示这个数据类型的。首先我们来创建一个叫做 range_index 的索引,并同时定义一个 mapping:

PUT range_index
{
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "properties": {
      "expected_attendees": {
        "type": "integer_range"
      },
      "time_frame": {
        "type": "date_range", 
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

然后,我们利用这个索引来输入一个文档:

PUT range_index/_doc/1?refresh
{
  "expected_attendees" : { 
    "gte" : 10,
    "lte" : 20
  },
  "time_frame" : { 
    "gte" : "2015-10-31 12:00:00", 
    "lte" : "2015-11-01"
  }
}

在上面的文档中,我们输入了两个 range 的数据,它们分别对应我们之前在 mapping 中定义的 integer_range 及 date_range。

下面我们可以使用一个 term query 来查询 integer_range 字段 expected_attendees:

GET range_index/_search
{
  "query": {
    "term": {
      "expected_attendees": {
        "value": "10"
      }
    }
  }
}

显示结果:

    "hits" : [
      {
        "_index" : "range_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "expected_attendees" : {
            "gte" : 10,
            "lte" : 20
          },
          "time_frame" : {
            "gte" : "2015-10-31 12:00:00",
            "lte" : "2015-11-01"
          }
        }
      }
    ]

因为 10 刚好是在我们之前的文档定义的 10-20 区间。为了验证我们的搜索是否有效,我们可以做另外的一个搜索:

GET range_index/_search
{
  "query": {
    "term": {
      "expected_attendees": {
        "value": "40"
      }
    }
  }
}

因为 40 不在我们的 10-20 的区间,所以我们搜索的结果显示为空。

同样地,我们可以针对时间区间来进行搜索:

GET range_index/_search
{
  "query" : {
    "range" : {
      "time_frame" : { 
        "gte" : "2015-10-31",
        "lte" : "2015-11-01",
        "relation" : "within" 
      }
    }
  }
}

因为上面的区间正好是在我们文档定义的区间之内,所以显示的结果为:

    "hits" : [
      {
        "_index" : "range_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "expected_attendees" : {
            "gte" : 10,
            "lte" : 20
          },
          "time_frame" : {
            "gte" : "2015-10-31 12:00:00",
            "lte" : "2015-11-01"
          }
        }
      }
    ]

相反,如果我们在这个时间之外的区间来进行搜索:

GET range_index/_search
{
  "query": {
    "range": {
      "time_frame": {
        "gte": "2017-10-31",
        "lte": "2018-11-01"
      }
    }
  }
}

显示的结果为空:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

Range 数据类型聚合

在这一节里,我们来针对 Range 的数据类型来做聚合展示。这是 Elasticsearch 7.4 发行版的一个新的功能。

在针对 range 聚合时,它会让用户可以更轻松地计算与特定存储桶重叠的范围数。例如,range 字段上的日期直方图聚合使用户可以计算在特定分钟内发生的电话呼叫次数,或者可以计算给定日期休假的员工人数。

准备数据

我们还是拿我们之前的那个 sports 数据来进行展示。首先,我们来创建一个索引及 mapping:

PUT sports
{
  "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      },
      "birthdate": {
        "type": "date",
        "format": "date_optional_time"
      },
      "goals": {
        "type": "integer"
      },
      "location": {
        "type": "geo_point"
      },
      "name": {
        "type": "keyword"
      },
      "rating": {
        "type": "integer"
      },
      "role": {
        "type": "keyword"
      },
      "score_weight": {
        "type": "float"
      },
      "sport": {
        "type": "keyword"
      },
      "age_range": {
        "type": "integer_range"
      }
    }
  }
}

请注意上面的一个字段 age_range。它的类型是 integer_range 类型的。我们利用 Elasticsearch 所提供的 Bulk API 接口来把如下的数据导入到 Elasticsearch 之中:
 

{"index":{"_index":"sports"}}
{"name":"Michael", "birthdate":"1989-10-1", "sport":"Football", "rating": ["5", "4"],  "location":"46.22,-68.45","goals": "43","score_weight":"3","role":"midfielder","age": 30, "age_range": {"gte": 27, "lte": 30}  }
{"index":{"_index":"sports"}}
{"name":"Bob", "birthdate":"1989-11-2", "sport":"Football", "rating": ["3", "4"],  "location":"45.21,-68.35", "goals": "54","score_weight":"2", "role":"forward", "age": 30, "age_range": {"gte": 27, "lte": 30} }
{"index":{"_index":"sports"}}
{"name":"Jim", "birthdate":"1988-10-3", "sport":"Football", "rating": ["3", "2"],  "location":"45.16,-63.58", "goals": "73", "score_weight":"2", "role":"forward", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Joe", "birthdate":"1992-5-20", "sport":"Basketball", "rating": ["4", "3"],  "location":"45.22,-68.53", "goals": "848", "score_weight":"3", "role":"midfielder", "age": 27, "age_range": {"gte": 27, "lte": 30}  }
{"index":{"_index":"sports"}}
{"name":"Tim", "birthdate":"1992-2-28", "sport":"Basketball", "rating": ["3", "3"],  "location":"46.22,-68.85", "goals": "942", "score_weight":"2","role":"forward", "age": 27, "age_range": {"gte": 27, "lte": 30} }
{"index":{"_index":"sports"}}
{"name":"Alfred", "birthdate":"1990-9-9", "sport":"Football", "rating": ["2", "2"],  "location":"45.12,-68.35", "goals": "53", "score_weight":"4", "role":"defender", "age": 29, "age_range": {"gte": 27, "lte": 30} }
{"index":{"_index":"sports"}}
{"name":"Jeff", "birthdate":"1990-4-1", "sport":"Hockey", "rating": ["2", "3"], "location":"46.12,-68.55", "goals": "93","score_weight":"3","role":"midfielder", "age": 29, "age_range": {"gte": 27, "lte": 30} }
{"index":{"_index":"sports"}}
{"name":"Will", "birthdate":"1988-3-1", "sport":"Hockey", "rating": ["4", "4"], "location":"46.25,-84.25", "goals": "124", "score_weight":"2", "role":"forward", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Mick", "birthdate":"1989-10-1", "sport":"Football", "rating": ["3", "4"],  "location":"46.22,-68.45","goals": "56","score_weight":"3", "role":"midfielder", "age": 30, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Pong", "birthdate":"1989-11-2", "sport":"Basketball", "rating": ["1", "3"],  "location":"45.21,-68.35","goals": "1483","score_weight":"2", "role":"forward", "age": 30, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Ray", "birthdate":"1988-10-3", "sport":"Football", "rating": ["2", "2"],  "location":"45.16,-63.58","goals": "84", "score_weight":"3", "role":"midfielder", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Ping", "birthdate":"1992-5-20", "sport":"Basketball", "rating": ["4", "3"],  "location":"45.22,-68.53","goals": "1328", "score_weight":"2", "role":"forward", "age": 27, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Duke", "birthdate":"1992-2-28", "sport":"Hockey", "rating": ["5", "2"],  "location":"46.22,-68.85", "goals": "218", "score_weight":"2", "role":"forward", "age": 27, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Hal", "birthdate":"1990-9-9", "sport":"Hockey", "rating": ["4", "2"],  "location":"45.12,-68.35","goals": "148", "score_weight":"3", "role":"midfielder", "age": 29, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Charge", "birthdate":"1990-4-1", "sport":"Football", "rating": ["3", "2"], "location":"44.19,-82.55","goals": "34", "score_weight":"4", "role":"defender", "age": 29, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Barry", "birthdate":"1988-3-1", "sport":"Football", "rating": ["5", "2"], "location":"36.45,-79.15", "age":"20", "score_weight":"4", "role":"defender", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Bank", "birthdate":"1988-3-1", "sport":"Handball", "rating": ["6", "4"], "location":"46.25,-54.53", "age":"25", "goals": "150", "score_weight":"4", "role":"defender", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Bingo", "birthdate":"1988-3-1", "sport":"Handball", "rating": ["10", "7"], "location":"46.25,-68.55", "age":"29", "goals": "143", "score_weight":"3", "role":"midfielder", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"James", "birthdate":"1988-3-1", "sport":"Basketball", "rating": ["10", "8"], "location":"41.25,-69.55", "age":"36", "goals": "1284", "score_weight":"2", "role":"forward", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Wayne", "birthdate":"1988-3-1", "sport":"Hockey", "rating": ["10", "10"], "location":"46.21,-68.55", "age":"25", "goals": "113", "score_weight":"3", "role":"midfielder", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Brady", "birthdate":"1988-3-1", "sport":"Handball", "rating": ["10", "10"], "location":"63.24,-84.55", "age":"29", "goals": "443", "score_weight":"2", "role":"forward", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Lewis", "birthdate":"1988-3-1", "sport":"Football", "rating": ["10", "10"], "location":"56.25,-74.55", "age":"24", "goals": "49", "score_weight":"3", "role":"midfielder", "age": 31, "age_range": {"gte": 31, "lte": 32} }

注意在我们的数据里,我们定义两个年龄段 27-30 及 30-32 。这个是在 age_range 字段里表示的。

首先,我们来做一个 histogram 的查询:

GET sports/_search
{
  "size": 0,
  "aggs": {
    "age_distogram": {
      "histogram": {
        "field": "age",
        "interval": 1
      }
    }
  }
}

我们按照年龄来进行一个直方图来表示我们的年龄的分布。显示的结果是:

  "aggregations" : {
    "age_distogram" : {
      "buckets" : [
        {
          "key" : 27.0,
          "doc_count" : 4
        },
        {
          "key" : 28.0,
          "doc_count" : 0
        },
        {
          "key" : 29.0,
          "doc_count" : 4
        },
        {
          "key" : 30.0,
          "doc_count" : 4
        },
        {
          "key" : 31.0,
          "doc_count" : 10
        }
      ]
    }
  }

我们也可以通过 Kibana 来表示:

从上面的图上我们可以看出来各个年龄的文档数量的分布情况。

我们仔细地看一下我们的一个文档:

        "_source" : {
          "name" : "Michael",
          "birthdate" : "1989-10-1",
          "sport" : "Football",
          "rating" : [
            "5",
            "4"
          ],
          "location" : "46.22,-68.45",
          "goals" : "43",
          "score_weight" : "3",
          "role" : "midfielder",
          "age" : 30,
          "age_range" : {
            "gte" : 27,
            "lte" : 30
          }
        }

我们可以看出来在我们的文档里含有一个字段叫做 age_range 的。它定义了这个运动员所在的年龄范围。我们可以通过这个字段来对我们的数据进行统计:

GET sports/_search
{
  "size": 0,
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age_range",
        "interval": 3
      }
    }
  }
}

在这里,我们使用 age_range 来进行聚合统计。那么返回的结果是:

  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
        {
          "key" : 27.0,
          "doc_count" : 12
        },
        {
          "key" : 30.0,
          "doc_count" : 22
        }
      ]
    }
  }

结果显示返回有两个 bucket。第一个 key 为 27 的 doc_count 是12,我们知道在 27-30 (因为我们的 interval 是3)岁之间的文档数是12个。第一个 bucket 刚好覆盖 range1 里的所有文档。而 key 为 30 的 doc_count 为 22,也就是文档的总数。这是为什么呢?

从上面可以看出来 30 岁这个年龄是跨两个 range:range1 及 range2,所以当我们统计的时候其实是把 range1 和 range2 里所有的文档相加起来算起的,也就是整个文档的数量

当然如果我们把 interval 设置为2,我们在来看一下我们的统计结果:

GET sports/_search
{
  "size": 0,
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age_range",
        "interval": 2
      }
    }
  }
}

返回的结果是:

  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
        {
          "key" : 26.0,
          "doc_count" : 12
        },
        {
          "key" : 28.0,
          "doc_count" : 12
        },
        {
          "key" : 30.0,
          "doc_count" : 22
        },
        {
          "key" : 32.0,
          "doc_count" : 10
        }
      ]
    }
  }

上面显示的第一个桶是 26-27 范围。因为 27 是在range 1里,由于 range1 里含有 12 个文档,所以返回的是 12。同样针对 key 为 28 的情况,它的范围是28-29,由于 29 是在 range1 范围里,所以返回值也是 12。对 key 为 30 的情况,因为它被包含在 range1 及 range2 里,那么返回的值等于 range1 及 range2 的总和,也就是 22。针对 key 为 32 的情况,它的范围是 32-34。因为 32 在 range2 里,而 range2 里只有 10 个文档,所以这个桶的值是r ange2的值,也就是10。

另外一个例子

我们希望使用 range 类型来实现由低价和高价值以及交易时间范围定义的股票标记值。 为此,请按照下列步骤操作:

为了填充我们的股票,我们需要创建一个包含 range 字段的索引。 让我们使用以下映射:

PUT test-range
{
  "mappings": {
    "properties": {
      "price": {
        "type": "float_range"
      },
      "timeframe": {
        "type": "date_range"
      }
    }
  }
}

现在,我们可以存储一些文档,如下所示:

PUT test-range/_bulk
{"index":{"_index":"test-range","_id":"1"}}
{"price":{"gte":1.5,"lt":3.2},"timeframe":{"gte":"2022-01-01T12:00:00","lt":"2022-01-01T12:00:01"}}
{"index":{"_index":"test-range","_id":"2"}}
{"price":{"gte":1.7,"lt":3.7},"timeframe":{"gte":"2022-01-01T12:00:01","lt":"2022-01-01T12:00:02"}}
{"index":{"_index":"test-range","_id":"3"}}
{"price":{"gte":1.3,"lt":3.3},"timeframe":{"gte":"2022-01-01T12:00:02","lt":"2022-01-01T12:00:03"}}

现在,我们可以执行查询以过滤价格和时间范围值,以检查数据的正确索引:

GET test-range/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "price": {
              "value": 2.4
            }
          }
        },
        {
          "term": {
            "timeframe": {
              "value": "2022-01-01T12:00:02"
            }
          }
        }
      ]
    }
  }
}

结果将类似于以下内容:

{
  "took" : 45,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "test-range",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "test-range",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "test-range",
        "_id" : "3",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值