本文基于Elasticsearch7.x版本.
在学习本篇博客前先了解下Elasticsearch全文搜索之基础语法API
Rest API
Elasticsearch分页api有三种:
- from/size
- search after
- scroll
search after 和 scroll用来解决深度分页时的性能问题.
分布式系统中深度分页的问题
Elasticsearch是分布式系统, 数据保存在不同机器的分片上, 分页查询数据时默认按照相关度分数排序(降序). 当我们进行一个分页查询from=990, size=10时:
- 首先在每个分片上先都获取 1000 个文档
- 通过 Coordinating Node 聚合所有结果
- 再通过排序选取前 1000 个文档
页数越深,占用内存就越多, 为了避免深度分页带来的内存开销, Elasticsearch默认分页搜索的最大文档数为10000(Index.max_result_window)
添加搜索实例数据
POST /blogs/_bulk
{"index": {}}
{"post_date": "2020-01-01", "title": "Quick brown rabbits", "content": "Brown rabbits are commonly seen.", "author_id": 11401}
{"index": {}}
{"post_date": "2020-01-02", "title": "Keeping pets healthy", "content": "My quick brown fox eats rabbits on a regular basis.", "author_id": 11402}
{"index": {}}
{"post_date": "2020-01-03", "title": "My dog barks", "content": "I see a lot of barking dogs on the road.", "author_id": 11403}
from/size
from+size必须小于10000.
GET /blogs/_search
{
"query": {
"match": {
"content": "rabbits"
}
},
"sort": [
{
"author_id": {
"order": "desc"
}
}
],
"from": 0,
"size": 2
}
search after
search after用来实时的获取下一页文档信息, 它不支持指定页数(from), 只能往下翻.
(1) 语法