python实现elastcsearch中timestampe(long)类型的date_histogram聚合测试

由于老版本的elasticsearch不支持date类型,因此之前的存储(5.0版本)都用了timestamp来进行设计。

 

当新的es版本(6.0)支持日期date_histogram统计聚合函数时,发现其interval可以设置相当灵活用于设置各种间隔,如下:

Here are the valid time specifications and their meanings:

milliseconds (ms)

Fixed length interval; supports multiples.

seconds (s)

1000 milliseconds; fixed length interval (except for the last second of a minute that contains a leap-second, which is 2000ms long); supports multiples.

minutes (m)

All minutes begin at 00 seconds.

  • One minute (1m) is the interval between 00 seconds of the first minute and 00 seconds of the following minute in the specified timezone, compensating for any intervening leap seconds, so that the number of minutes and seconds past the hour is the same at the start and end.
  • Multiple minutes (nm) are intervals of exactly 60x1000=60,000 milliseconds each.

hours (h)

All hours begin at 00 minutes and 00 seconds.

  • One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00 minutes of the following hour in the specified timezone, compensating for any intervening leap seconds, so that the number of minutes and seconds past the hour is the same at the start and end.
  • Multiple hours (nh) are intervals of exactly 60x60x1000=3,600,000 milliseconds each.

days (d)

All days begin at the earliest possible time, which is usually 00:00:00 (midnight).

  • One day (1d) is the interval between the start of the day and the start of of the following day in the specified timezone, compensating for any intervening time changes.
  • Multiple days (nd) are intervals of exactly 24x60x60x1000=86,400,000 milliseconds each.

weeks (w)

  • One week (1w) is the interval between the start day_of_week:hour:minute:second and the same day of the week and time of the following week in the specified timezone.
  • Multiple weeks (nw) are not supported.

months (M)

  • One month (1M) is the interval between the start day of the month and time of day and the same day of the month and time of the following month in the specified timezone, so that the day of the month and time of day are the same at the start and end.
  • Multiple months (nM) are not supported.

quarters (q)

  • One quarter (1q) is the interval between the start day of the month and time of day and the same day of the month and time of day three months later, so that the day of the month and time of day are the same at the start and end.
  • Multiple quarters (nq) are not supported.

years (y)

  • One year (1y) is the interval between the start day of the month and time of day and the same day of the month and time of day the following year in the specified timezone, so that the date and time are the same at the start and end.
  • Multiple years (ny) are not supported

 

然而对于原先老版本的timestamp如何实现其date_histogram,网上很多说法是无法进行直接的利用。而设置interval为相应秒数的情况下也无法确认为周或者月。

然而具体测试结果发现,ES能够自动识别数据的情况,进行测试。具体测试脚本如下:

(1)写入es,按照long的timestamp类型进行写入


'''
    写入ES
'''
def WriteES():
    es = Elasticsearch()
    
    base = datetime.datetime.today()
    numdays = 100
    
    j = 0
    actions = []
    while (j <= 100):
        d1 = base - datetime.timedelta(days = j)
        ts= int(time.mktime(d1.timetuple())*1000)
        action = {
            "_index": "tickets",
            "_type": "last",
            "_id": j,
            "_source": {
                "count":randint(0,1000),
                "timestamp": ts
                }
            }
        actions.append(action)
        j += 1
    
    helpers.bulk(es, actions)

(2) 聚合测试:

def AggES():
    client = Elasticsearch()
    
    s = Search(using=client)
    s.aggs.bucket('per_tag', 'date_histogram', field='timestamp', interval='week') \
        .metric('clicks_per_day', 'sum', field='count')# \
    
    response = s.execute()
    
    print('查询结果')
    for hit in response:
        st = datetime.fromtimestamp(hit.timestamp//1000).strftime('%Y-%m-%d %H:%M:%S')
        print(hit.meta.score, hit.count,st)
    
    print('聚合结果')
    for tag in response.aggregations.per_tag.buckets:
        st = datetime.fromtimestamp(tag.key//1000).strftime('%Y-%m-%d %H:%M:%S')
        print(st, tag.clicks_per_day.value)

(3)打印输出过程,可以发现可以快速实现按周的统计

查询结果
1.0 720 2018-11-06 16:44:03
1.0 438 2018-10-23 16:44:03
1.0 403 2018-10-18 16:44:03
1.0 113 2018-10-15 16:44:03
1.0 503 2018-10-13 16:44:03
1.0 928 2018-10-12 16:44:03
1.0 89 2018-10-11 16:44:03
1.0 590 2018-10-08 16:44:03
1.0 854 2018-09-27 16:44:03
1.0 846 2018-09-26 16:44:03
聚合结果
2018-07-23 08:00:00 618.0
2018-07-30 08:00:00 3657.0
2018-08-06 08:00:00 4519.0
2018-08-13 08:00:00 3609.0
2018-08-20 08:00:00 3204.0
2018-08-27 08:00:00 3378.0
2018-09-03 08:00:00 3365.0
2018-09-10 08:00:00 4609.0
2018-09-17 08:00:00 3594.0
2018-09-24 08:00:00 3918.0
2018-10-01 08:00:00 3098.0
2018-10-08 08:00:00 4251.0
2018-10-15 08:00:00 3235.0
2018-10-22 08:00:00 2689.0
2018-10-29 08:00:00 4493.0
2018-11-05 08:00:00 1254.0
work done!
 

(4)按月的统计:只需要修改相应配置

 interval='month'

聚合结果
2018-07-01 08:00:00 2162.0
2018-08-01 08:00:00 15719.0
2018-09-01 08:00:00 16590.0
2018-10-01 08:00:00 15752.0
2018-11-01 08:00:00 3268.0

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值