深度分页
引出: SQL语句 分页查询limit 10000, 10 会查出10010条数据,然后去掉前10000条, 解决:可以使用流式查询(fetch size) 或按照id升序,每次id大于上一次查询结果的最大值。
同样,es中存在类似的问题 即深度分页 解决: scroll滚动查询,或者使用Search-After查询
DSL语句:
POST http://xxx:9200/enterprise_wechat_test.alias/_search?scroll=1m
{
"query": { "match_all": {}},
"size": 10
}
参数 scroll,表示暂存搜索结果的时间
返回一个 _scroll_id,_scroll_id 用来下次取数据用
1m表示scroll的context会保持一分钟,在这一分钟内都可以通过scrollid继续读取命中的文档。elasticsearch的scroll是流式读取。
POST http://xxx:9200/_search/scroll=1m
{
"scroll": "DXF1ZXJ5QW5kRmV0Y2gBAAAAABflI0EWdWZjRzZsdmVSZjZ4V2hDbV95QmVsUQ=="
}
这里的 scroll_id 即 上一次遍历取回的 _scroll_id 或者是初始化返回的 _scroll_id,同样的,需要带 scroll 参数。
注意,每次都要传参数 scroll,刷新搜索结果的缓存时间。另外,不需要指定 index 和 type。
Scroll查询java代码:
public List<MsgInfoVo> queryScroll(MsgQueryInfo queryParam) throws Exception {
// 创建search请求
SearchRequest searchRequest = new SearchRequest(indexName).types(typeName);
// 构建检索条件
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQuery = new BoolQueryBuilder();
addTermFilter(boolQuery, "city", queryParam.getCity());
sourceBuilder.query(boolQuery);
// 排序
SortBuilder id4Sort = SortBuilders.fieldSort("id4Sort").order(SortOrder.ASC);
sourceBuilder.sort(id4Sort);
sourceBuilder.size(3000);
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
searchRequest.source(sourceBuilder);
searchRequest.scroll(TimeValue.timeValueMinutes(1L));
SearchResponse searchResponse = null;
try {
searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
log.error("highLevelClient IOException {}." , e);
}
// 返回结果集
List<MsgInfoVo> list = new ArrayList<>();
// 若数据量不足3000则执行一次; 若大于3000则每3000条查询一次并且返回
do{
SearchHits hits = searchResponse.getHits();
for (SearchHit hit : hits) {
MsgInfoVo miv = new MsgInfoVo();
Map<String, Object> map = hit.getSourceAsMap();
String msgId = formatString(map.get("id4Sort"));
miv.setMsgId(msgId);
// System.out.println("miv = " + miv);
list.add(miv);
}
//获取scroll_id并再次查询
SearchScrollRequest scrollRequest = new SearchScrollRequest(searchResponse.getScrollId());
scrollRequest.scroll(TimeValue.timeValueSeconds(3000));
searchResponse = client.searchScroll(scrollRequest);
} while (searchResponse.getHits().getHits().length != 0);
return list;
}
SearchAfter查询Java代码:
public MsgInfoHitsVo querySearchAfter(MsgQueryInfo queryParam) throws Exception {
// 创建search请求
SearchRequest searchRequest = new SearchRequest(indexName).types(typeName);
// 构建检索条件
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQuery = new BoolQueryBuilder();
addTermFilter(boolQuery, "city", queryParam.getCity());
// 消息id的检索 - 使用id4Sort; 不使用msgId
addTermFilter(boolQuery, "id4Sort", queryParam.getMsgId());
sourceBuilder.query(boolQuery);
// 排序
SortBuilder id4Sort = SortBuilders.fieldSort("id4Sort").order(SortOrder.ASC);
sourceBuilder.sort(id4Sort);
int page = queryParam.getPage();
int pageSize = queryParam.getPageSize();
int start = 0;
if ((page - 1) > 0) {
start = (page - 1) * pageSize;
}
sourceBuilder.from(0);
sourceBuilder.size(pageSize);
sourceBuilder.searchAfter(new Object[]{"18059"});
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
if (queryParam.isDebug()){
log.info("sourceBuilder: " + sourceBuilder);
}
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
log.error("highLevelClient IOException {}." , e);
}
if (null == searchResponse){
return null;
}
SearchHits hits = searchResponse.getHits();
SearchHit[] lastResult = searchResponse.getHits().getHits();
Object[] sortValues1 = lastResult[lastResult.length - 1].getSortValues();
System.out.println("最后一条数据sort_id为:"+ Arrays.toString(sortValues1));
long totalHits = hits.getTotalHits();
SearchHit[] searchHits = hits.getHits();
for (SearchHit hit : searchHits) {
MsgInfoVo miv = new MsgInfoVo();
Map<String, Object> map = hit.getSourceAsMap();
miv.setMsgId(formatString(map.get("id4Sort")));
System.out.println(miv);
}
return null;
}