elasticsearch查询10000条以后的数据

最新推荐文章于 2024-07-12 12:16:53 发布

qq_32457341

最新推荐文章于 2024-07-12 12:16:53 发布

阅读量1.5k

点赞数 1

文章标签： elasticsearch

本文链接：https://blog.csdn.net/qq_32457341/article/details/112967074

版权

本文介绍了如何在Elasticsearch中查询超过默认限制的大量数据。首先，可以通过设置`max_result_window`参数扩大查询上限，但不推荐，因为可能导致资源消耗过大。更推荐使用游标（scroll）方式分批查询，提供了一段Python代码示例，展示如何利用scroll API遍历超过10000条的数据，这种方式在处理大量数据时效率更高。

摘要由CSDN通过智能技术生成

elsaticsearch是一个分布式的、RESTful 风格的搜索和数据分析引擎，由于是分布式的，在es中默认设置最多可以查询前10000条的数据，当要查询10000条之后的数据，可以采用如下的两种方式配置，具体如下：
（1）通过设置索引的max_result_window的这个参数，可以修改最大查询的值；

put请求     http://127.0.0.1:9200/indexname/_settings
    {
        "index": {
            "max_result_window": "100000"
        }
   }

通过采用这样的方式，可以修改elasticsearch查询最多数据的条数，这里设置成了10w；
注：不推荐使用设置max_result_window的方式，当max_result_window很大时，会导致消耗的内存和cpu过多；

（2）通过游标的方式进行处理（比较推荐）
这里提供了python的代码，调用elasticsearch lib的方式

from elasticsearch import Elasticsearch

#本地服务没有设置权限认证
es = Elasticsearch("127.0.0.1:9200")
body = {
	"query":{
		"match_all":{}
	}
}

index = 'xxxxx'
#第一次查询，可以查询出100条数据
#scroll代表了#保持游标查询窗口5分钟
query = es.search(index=index, body=body, scroll='5m', size=100)

#第一次查询出的数据
first_hits = query['hits']['hits']
#获取一共有多少条数据
total = query['hits']['total']


#获取scroll_id，方便查询在100条之后的数据
scroll_id = query['_scroll_id']

#查询前100条之后的数据
for i in range(0, int(total/100)):
	#每次查询出100条的数据并进行合并
	query_scroll = es.scroll(scroll_id=scroll_id,scroll='5m')['hits']['hits']
	hits = query_scroll['hits']['hits']

采用上述的方式，可以查询出10000条之后的数据，当数据量越大时，效果会越好；