ELK分页解决方案

最新推荐文章于 2024-06-12 10:23:28 发布

散_步

最新推荐文章于 2024-06-12 10:23:28 发布

阅读量379

点赞数

分类专栏： ELK

本文链接：https://blog.csdn.net/zhumengguang/article/details/112726268

版权

ELK 专栏收录该内容

12 篇文章 2 订阅

订阅专栏

本文介绍了Elasticsearch中的两种分页方式：size+from浅分页和scroll深分页。浅分页适合少量数据，但随着from增大，查询效率下降，可能导致内存不足。而scroll分页则通过维护快照，提供更高效的大量数据分页解决方案，避免多次读取磁盘。在处理大数据量时，scroll分页更为适用。

摘要由CSDN通过智能技术生成

1、导入数据

DELETE us
POST /_bulk
{ "create": { "_index": "us", "_type": "tweet", "_id": "1" }}
{ "email" : "john@smith.com", "name" : "John Smith", "username" : "@john" }
{ "create": { "_index": "us", "_type": "tweet", "_id": "2" }}
{ "email" : "mary@jones.com", "name" : "Mary Jones", "username" : "@mary" }
{ "create": { "_index": "us", "_type": "tweet", "_id": "3" }}
{ "date" : "2014-09-13", "name" : "Mary Jones", "tweet" : "Elasticsearch means full text search has never been so easy", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "4" }}
{ "date" : "2014-09-14", "name" : "John Smith", "tweet" : "@mary it is not just text, it does everything", "user_id" : 1 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "5" }}
{ "date" : "2014-09-15", "name" : "Mary Jones", "tweet" : "However did I manage before Elasticsearch?", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "6" }}
{ "date" : "2014-09-16", "name" : "John Smith", "tweet" : "The Elasticsearch API is really easy to use", "user_id" : 1 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "7" }}
{ "date" : "2014-09-17", "name" : "Mary Jones", "tweet" : "The Query DSL is really powerful and flexible", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "8" }}
{ "date" : "2014-09-18", "name" : "John Smith", "user_id" : 1 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "9" }}
{ "date" : "2014-09-19", "name" : "Mary Jones", "tweet" : "Geo-location aggregations are really cool", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "10" }}
{ "date" : "2014-09-20", "name" : "John Smith", "tweet" : "Elasticsearch surely is one of the hottest new NoSQL products", "user_id" : 1 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "11" }}
{ "date" : "2014-09-21", "name" : "Mary Jones", "tweet" : "Elasticsearch is built for the cloud, easy to scale", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "12" }}
{ "date" : "2014-09-22", "name" : "John Smith", "tweet" : "Elasticsearch and I have left the honeymoon stage, and I still love her.", "user_id" : 1 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "13" }}
{ "date" : "2014-09-23", "name" : "Mary Jones", "tweet" : "So yes, I am an Elasticsearch fanboy", "user_id" : 2 }
{ "create": { "_index": "us", "_type": "tweet", "_id": "14" }}
{ "date" : "2014-09-24", "name" : "John Smith", "tweet" : "How many more cheesy tweets do I have to write?", "user_id" : 1 }

2、size+from浅分页

按照一般的查询流程来说，如果我想查询前10条数据：

1 客户端请求发给某个节点
2 节点转发给个个分片，查询每个分片上的前10条
3 结果返回给节点，整合数据，提取前10条
4 返回给请求客户端

from定义了目标数据的偏移值，size定义当前返回的事件数目

GET /us/_search?pretty
{
"from" : 0 , "size" : 5
}

GET /us/_search?pretty
{
"from" : 5 , "size" : 5
}

这种浅分页只适合少量数据，因为随from增大，查询的时间就会越大，而且数据量越大，查询的效率指数下降

优点：from+size在数据量不大的情况下，效率比较高

缺点：在数据量非常大的情况下，from+size分页会把全部记录加载到内存中，这样做不但运行速递特别慢，而且容易让es出现内存不足而挂掉

scroll深分页

对于上面介绍的浅分页，当Elasticsearch响应请求时，它必须确定docs的顺序，排列响应结果。

如果请求的页数较少（假设每页20个docs）, Elasticsearch不会有什么问题，但是如果页数较大时，比如请求第20页，Elasticsearch不得不取出第1页到第20页的所有docs，再去除第1页到第19页的docs，得到第20页的docs。

解决的方式就是使用scroll，scroll就是维护了当前索引段的一份快照信息--缓存（这个快照信息是你执行这个scroll查询时的快照）。

可以把 scroll 分为初始化和遍历两步： 1、初始化时将所有符合搜索条件的搜索结果缓存起来，可以想象成快照； 2、遍历时，从这个快照里取数据；

初始化

GET us/_search?scroll=3m

{

"query": {"match_all": {}},

"size": 3

}

初始化的时候就像是普通的search一样

其中的scroll=3m代表当前查询的数据缓存3分钟

Size：3 代表当前查询3条数据

遍历

在遍历时候，拿到上一次遍历中的scrollid，然后带scroll参数，重复上一次的遍历步骤，知道返回的数据为空，就表示遍历完成

GET /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAPXFk0xN1BmSnlVUldhYThEdWVzZ19xbkEAAAAAAAAAIxZuQWVJU0VSZ1JzcVZtMGVYZ3RDaFlBAAAAAAAAA9oWTVZOdHJ2cXBSOU9wN3c1dk5vcWd4QQAAAAAAAAPYFk0xN1BmSnlVUldhYThEdWVzZ19xbkEAAAAAAAAAIhZuQWVJU0VSZ1JzcVZtMGVYZ3RDaFlB"
}

【注意】：每次都要传参数scroll，刷新搜索结果的缓存时间，另外不需要指定index和type（不要把缓存的时时间设置太长，占用内存）

对比

浅分页，每次查询都会去索引库（本地文件夹）中查询pageNum*page条数据，然后截取掉前面的数据，留下最后的数据。这样的操作在每个分片上都会执行，最后会将多个分片的数据合并到一起，再次排序，截取需要的。

深分页，可以一次性将所有满足查询条件的数据，都放到内存中。分页的时候，在内存中查询。相对浅分页，就可以避免多次读取磁盘。

散_步

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ELK分页解决方案

1、导入数据DELETE usPOST /_bulk{ "create": { "_index": "us", "_type": "tweet", "_id": "1" }}{ "email" : "john@smith.com", "name" : "John Smith", "username" : "@john" }{ "create": { "_index": "us", "_type": "tweet", "_id": "2" }}{ "email" : "mary@jones.com",.
复制链接

扫一扫

专栏目录