Elasticsearch脚本教程——计算日期生成分组

Elasticsearch脚本教程——计算日期生成分组

本文我们一起通过实例实战Elasticsearch的Painless脚本。

1. 概述

本文环境为Elasticsearch 7.x版本,主要使用Painless脚本。

需求说明:我们有日志数据,字段分别为name、start_date、close_date.现在需要计算两个日期之差并对结果进行分组。

日期差以分钟为单位:
* 0~30 分钟.
* 31~60 分钟.
* 大于1 小时.

日期差以小时为单位:
* 0~2 小时.
* 2~24 小时.
* 大于1 天.
*
日期差以天为单位:
* 0~3 天.
* 3~7 天.
* 大于1 周.

2. 准备数据

首先我们创建索引:

PUT logs
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "start_date": {
        "type": "date",
        "format": "date_optional_time"
      },
      "close_date": {
        "type": "date",
        "format": "date_optional_time"
      }
    }
  }
}

接着增加数据:

PUT logs/_bulk?refresh
{"index":{"_id":1}}
{"name":"Person AA","start_date":"2016-01-01T14:00:56","close_date":"2016-01-10T14:00:50"}
{"index":{"_id":2}}
{"name":"Person CC","start_date":"2016-01-09T15:01:52","close_date":"2016-01-19T15:45:41"}
{"index":{"_id":3}}
{"name":"Person BB","start_date":"2016-01-09T15:01:53","close_date":"2016-01-20T16:50:41"}
{"index":{"_id":4}}
{"name":"Person BB","start_date":"2016-01-09T15:01:53","close_date":"2016-01-19T16:00:41"}
{"index":{"_id":5}}
{"name":"Person CC","start_date":"2016-01-09T15:01:53","close_date":"2016-01-19T15:30:41"}
{"index":{"_id":6}}
{"name":"Person EE","start_date":"2016-01-09T15:01:54","close_date":"2016-01-09T17:00:41"}
{"index":{"_id":7}}
{"name":"Person EE","start_date":"2016-01-09T15:01:55","close_date":"2016-01-19T15:00:41"}
{"index":{"_id":8}}
{"name":"Person CC","start_date":"2016-01-09T15:01:55","close_date":"2016-01-10T17:00:41"}
{"index":{"_id":9}}
{"name":"Person AA","start_date":"2016-01-09T15:01:56","close_date":"2016-01-19T15:30:41"}
{"index":{"_id":10}}
{"name":"Person BB","start_date":"2016-01-09T15:01:56","close_date":"2016-01-19T15:00:41"}
{"index":{"_id":11}}
{"name":"Person BB","start_date":"2016-01-09T15:01:56","close_date":"2016-01-19T15:00:41"}
{"index":{"_id":12}}
{"name":"Person BB","start_date":"2016-01-09T15:01:57","close_date":"2016-01-19T15:00:41"}
{"index":{"_id":13}}
{"name":"Person AA","start_date":"2016-01-09T15:01:57","close_date":"2016-01-10T14:00:50"}
{"index":{"_id":14}}
{"name":"Person EE","start_date":"2016-01-09T15:01:58","close_date":"2016-01-19T15:00:41"}
{"index":{"_id":15}}
{"name":"Person CC","start_date":"2016-01-09T15:01:58","close_date":"2016-01-10T14:00:50"}
{"index":{"_id":16}}
{"name":"Person BB","start_date":"2016-01-09T15:01:59","close_date":"2016-01-19T15:00:41"}
{"index":{"_id":17}}
{"name":"Person AA","start_date":"2016-01-09T15:01:59","close_date":"2016-01-09T15:45:41"}
{"index":{"_id":18}}
{"name":"Person EE","start_date":"2016-01-09T15:02:00","close_date":"2016-01-19T15:00:41"}
{"index":{"_id":19}}
{"name":"Person BB","start_date":"2016-01-09T15:02:00","close_date":"2016-01-10T14:00:50"}
{"index":{"_id":20}}
{"name":"Person CC","start_date":"2016-01-09T15:02:00","close_date":"2016-01-20T15:00:41"}
{"index":{"_id":21}}
{"name":"Person AA","start_date":"2016-01-09T15:02:01","close_date":"2016-01-19T15:30:41"}
{"index":{"_id":22}}
{"name":"Person BB","start_date":"2016-01-09T15:02:01","close_date":"2016-01-10T14:00:50"}
{"index":{"_id":23}}
{"name":"Person CC","start_date":"2016-01-09T15:02:02","close_date":"2016-02-01T15:00:41"}
{"index":{"_id":24}}
{"name":"Person AA","start_date":"2016-01-09T15:02:02","close_date":"2016-01-19T15:00:41"}
{"index":{"_id":25}}
{"name":"Person BB","start_date":"2016-01-09T15:02:03","close_date":"2016-01-09T20:00:41"}

准备好后,我们可以使用下列命令进行验证:

GET logs/_search

3. 生成范围分组

首先根据分钟单位进行分组:

GET logs/_search
{
  "size": 0,
  "aggs": {
    "groupby": {
      "range": {
        "script": {
          "source": "(doc['close_date'].value.toInstant().toEpochMilli() - doc['start_date'].value.toInstant().toEpochMilli()) / (3600000.0/60) "
        },
        "ranges": [
          {
            "from": 0.0,
            "to": 30.0,
            "key": "0~30分钟"
          },
          {
            "from": 30.0,
            "to": 60.0,
            "key": "30~60分钟"
          },
          {
            "from": 60.0,
            "key": "大于1小时"
          }
        ]
      }
    }
  }
}

以更可读方式看脚本:

( doc['close_date'].value.toInstant().toEpochMilli() - doc['start_date'].value.toInstant().toEpochMilli() ) / (3600000.0/60) 

doc[‘close_date’].value返回JodaCompatibleZonedDateTime类型,然后调用toInstant().toEpochMilli()方法获得毫秒数,为了转为分钟,需要除以(3600000.0/60)

后台对每个文档按照这种方式进行计算获取分钟数,接着对分钟数进行范围分组。最后需要说明的是范围分组包括from的值,不包括to的值,定义范围时需要考虑这点。

总结下思路:

  1. 计算日期差
  2. 根据需要格式化结果
  3. 对结果进行分组

接下来,按照小时进行分组:

GET logs/_search
{
  "size": 0,
  "aggs": {
    "groupbyHours": {
      "range": {
        "script": {
          "source": "(doc['close_date'].value.toInstant().toEpochMilli() - doc['start_date'].value.toInstant().toEpochMilli()) / (3600000.0) "
        },
        "ranges": [
          {
            "from": 0,
            "to": 2,
            "key": "0~2小时"
          },
          {
            "from": 2.0,
            "to": 24.0,
            "key": "2~24小时"
          },
          {
            "from": 24.0,
            "key": "大于1天"
          }
        ]
      }
    }
  }
}

最后按照天为单位进行分组:

GET logs/_search
{
  "size": 0,
  "aggs": {
    "groupbyHours": {
      "range": {
        "script": {
          "source": "(doc['close_date'].value.toInstant().toEpochMilli() - doc['start_date'].value.toInstant().toEpochMilli()) / (3600000.0*24) "
        },
        "ranges": [
          {
            "from": 0,
            "to": 3,
            "key": "0~3天"
          },
          {
            "from": 3.0,
            "to": 7.0,
            "key": "3~7天"
          },
          {
            "from": 7.0,
            "key": "大于1周"
          }
        ]
      }
    }
  }
}

我们也可以通过脚本实现条件查询。下面示例查询0~3天的数据:

GET logs/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "lang": "painless",
            "source": " ((doc['close_date'].value.toInstant().toEpochMilli() - doc['start_date'].value.toInstant().toEpochMilli()) / (3600000.0*24)) < 3.0 "
          }
        }
      }
    }
  }
}

这个示例与前面示例有两个差异:

  1. 这不是聚集查询,仅仅查询原始数据
  2. Painless脚本调整为返回boolean类型的表达式,即日期差小于3天。

4. 总结

本文通过示例介绍了Painless脚本实现复杂分组和查询。

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值