Python中使用scroll读取大量数据

最新推荐文章于 2023-12-28 12:57:54 发布

小木树

最新推荐文章于 2023-12-28 12:57:54 发布

阅读量747

点赞数

分类专栏： python 文章标签： es中使用scroll es读取超过10000条数据

本文链接：https://blog.csdn.net/weixin_42000303/article/details/103396673

版权

本文介绍如何利用Python在Elasticsearch（ES）中使用scroll API高效读取超过10000条数据，并将这些数据保存到JSON文件中，实现大数据量的检索与存储操作。

摘要由CSDN通过智能技术生成

Python操作ES，读取大量数据，写入到JSON文件中

# -*- coding: utf-8 -*-
import json
from elasticsearch import Elasticsearch

HOST_PORT  = [{"host": "xxx.xx.xx.xx", "port": 9200}] # 连接ES的主机IP和端口号
TIME_OUT   = 20 # 设置请求的超时时间（秒）
INDEX_NAME = "test_2019.11.28" # 连接的ES的Index名称
#INDEX_NAME = "test_*" # 连接的ES的index名称，匹配所有以test_开头的所有Index

es = Elasticsearch(HOST_PORT, timeout=TIME_OUT)
json_file_name = "test" # 生成的JSON文件名称

def func_name() :
	# body 为你的ES语句
    body = {
		"query": {
			...
		},
		"aggs":{
			...
		},
		"size": 10000 # 每次读取10000条数据写入
    }
    try :
        res = es.search(index=INDEX_NAME, body=body, scroll="2m")
        sid = res["_scroll_id"]
        scroll_size = len(res["hits"]["hits"])
        data_handle(res["hits"]["hits"])
        while scroll_size

最低0.47元/天解锁文章

小木树

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python中使用scroll读取大量数据

Python操作ES，读取大量数据，写入到JSON文件中# -*- coding: utf-8 -*-import jsonfrom elasticsearch import ElasticsearchHOST_PORT = [{"host": "xxx.xx.xx.xx", "port": 9200}] # 连接ES的主机IP和端口号TIME_OUT = 20 # 设置请求...
复制链接

扫一扫

专栏目录