ElasticSearch Scroll API 大量数据查询

ElasticSearch 要查询大量数据的时候,比如全量数据遍历的时候,最好使用scroll search api。

1. 初始化Scroll Context

SearchRequest searchRequest = new SearchRequest("posts");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQuery("title", "Elasticsearch"));
searchSourceBuilder.size(size); 
searchRequest.source(searchSourceBuilder);
searchRequest.scroll(TimeValue.timeValueMinutes(1L)); 
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId(); 
SearchHits hits = searchResponse.getHits();  

这里需要设置size的大小(每一页多少数据),同时也需要设置scroll 告诉每个scroll保留多长时间,这里的时间是处理完每页所用的时间。

2. 检索相关数据

SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); 
scrollRequest.scroll(TimeValue.timeValueSeconds(30));
SearchResponse searchScrollResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = searchScrollResponse.getScrollId();  
hits = searchScrollResponse.getHits(); 
assertEquals(3, hits.getTotalHits());
assertEquals(1, hits.getHits().length);
assertNotNull(scrollId);

使用新获得的scrollId传递给SearchScrollRequest,遍历所有关联的数据。

3 清理Scroll context

ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); 
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
boolean succeeded = clearScrollResponse.isSucceeded();

这里需要说明中间过程执行的scrollId不需要清理,api会自动更新有效时间等,只需要清理最后一个scrollId对应的context。

教训:我曾经就每次scrollId都清理,跑一会儿就自动停止了不再继续遍历了。

遍历过程代码汇总:

final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest("posts");
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQuery("title", "Elasticsearch"));
searchSourceBuilder.size(1000);
searchRequest.source(searchSourceBuilder);

SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); 
String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();

while (searchHits != null && searchHits.length > 0) { 
    
    SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); 
    scrollRequest.scroll(scroll);
    searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
    scrollId = searchResponse.getScrollId();
    searchHits = searchResponse.getHits().getHits();
}

ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); 
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
boolean succeeded = clearScrollResponse.isSucceeded();

参考:

Search Scroll API | Java REST Client [6.8] | Elastic

https://www.cnblogs.com/tonggc1668/p/16319029.html 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值