es scroll 时间_ElasticSearch scroll分页查询

最新推荐文章于 2024-04-16 23:51:36 发布

weixin_39978282

最新推荐文章于 2024-04-16 23:51:36 发布

阅读量404

点赞数

文章标签： es scroll 时间

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39978282/article/details/111839758

版权

from size

from + size不能大于10000, 适用小数据量的查询，总量大于10000时这种方法就不适用了。

scroll_id分页查询

通过游标的方式查，无查询上限，实际是一种分页机制。

from elasticsearch import Elasticsearch

class MyElastic:

def __init__(self):

self.es = Elasticsearch(['192.168.199.32'], http_auth=('elastic', 'passwd'), port=9200)

def query_by_ScrollId(self, index, body):

with open('es_query_answer.txt', 'w') as fw:

res = self.es.search(index=index, doc_type='_doc', scroll='5m', timeout='1m', size=1000, body=body)

total = res["hits"]["total"]['value']

print(f'符合Query的记录总数：{total}, 使用scroll分页查：')

cur_length = len(res['hits']['hits'])

for x in res['hits']['hits']:

fw.write(x['_source']['name'])

fw.write('\n')

print('当前：', cur_length)

# 通过游标scroll_id查出全部数据

scroll_id = res["_scroll_id"]

for i in range(int(total / 1000)+1): # scroll分页, 每次size=1000

res = self.es.scroll(scroll_id=scroll_id, scroll='5m')

for x in res['hits']['hits']: # 写入文件

fw.write(x['_source']['name'])

fw.write('\n')

cur_length += 1000

print('当前：', cur_length)

es = MyElastic()

body = { # match: 匹配name包含xxx的数据

"_source": ["tld.subdomain", "tld.domain", 'name'], # 选取字段

"query": {

"match": {

"name": '.xyz'

}

}

}

es.query_by_ScrollId('fdns_a_2020-05', body)

weixin_39978282

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
es scroll 时间_ElasticSearch scroll分页查询

from sizefrom + size不能大于10000, 适用小数据量的查询，总量大于10000时这种方法就不适用了。scroll_id分页查询通过游标的方式查，无查询上限，实际是一种分页机制。from elasticsearch import Elasticsearchclass MyElastic:def __init__(self):self.es = Elasticsearch(['1...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。