问题
- 业务需要,深度分页返回内容和高亮结果,必须要满足
深度分页、内容、高亮
三个条件,但是由于单条数据索引内容较大,分页查询全部返回source和高亮,会拖慢查询速度和撑爆内存 - 想着先分页,到
最后一页(当前页)
再加source返回和高亮,但是并不顺利 - 分页查询,最普通的是from+size,但它是逐页翻的,超过10000条数据会报错
- scroll游标分页,可以深度分页,可以去只取最后一页的结果,效率还可以
- 但是scroll分页,如果返回source和字段高亮,必须在最初的查询语句里增加source返回和高亮语句,这样查询会很慢
scroll 分页
- 不获取source时,scroll 分页查询还是很快的,可以查到当前页数据后,再遍历查询,获取每条数据的值。至于高亮,只能自己去实现了
- 主要是查询时,不获取source(
fetchSource(false)
),只对最后的结果hits,自己遍历,再逐条获取数据和高亮处理
@Override
public Pagination scrollSearchForPage(EsQueryVO queryVO) {
Long t1 = System.currentTimeMillis();
Long pageIndex = new Long(queryVO.getPageIndex());
int pageSize = queryVO.getPageSize();
SearchRequestBuilder searchRequestBuilder = esQueryBuild(queryVO);
searchRequestBuilder.setFetchSource(false);
searchRequestBuilder.setSize(pageSize);
String sortField = queryVO.getSortField();
if (StringUtils.isEmpty(sortField)) {
sortField = "createDate";
}
if (queryVO.getOrderType() != null && queryVO.getOrderType() == 1) {
if ("desc".equals(queryVO.getSortOrder())) {
searchRequestBuilder.addSort(sortField, SortOrder.DESC);
} else {
searchRequestBuilder.addSort(sortField, SortOrder.ASC);
}
}
searchRequestBuilder.setScroll(TimeValue.timeValueMinutes(1));
logger.info("scroll查询次数:{}", pageIndex + 1);
SearchResponse scrollResp = searchRequestBuilder.get();
if (pageIndex > 0) {
do {
scrollResp = elasticsearchTemplate.getClient().prepareSearchScroll(scrollResp.getScrollId()).setScroll(TimeValue.timeValueMinutes(1)).execute().actionGet();
pageIndex--;
} while (scrollResp.getHits().getHits().length != 0 && pageIndex > 0);
}
Long total = scrollResp.getHits().getTotalHits();
logger.info("命中总数量:{}", total);
SearchHits resultHits = scrollResp.getHits();
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollResp.getScrollId());
elasticsearchTemplate.getClient().clearScroll(clearScrollRequest).actionGet();
Long t2 = System.currentTimeMillis();
logger.info("scroll查询耗时:" + (t2 - t1));
Pagination pagination = new Pagination();
List<Object> data = new ArrayList<>();
for(SearchHit hit : resultHits) {
SolrIndexVO indexVO = searchVoById(new EsQueryVO(hit.getIndex(), hit.getType(), hit.getId()));
if(queryVO.getIsHighlight() == 1){
if(queryVO.getHightFields().contains("title")){
indexVO.setTitle(highLightByHand(indexVO.getTitle(), queryVO.getKeywords()));
}
if(queryVO.getHightFields().contains("content")){
indexVO.setContent(highLightByHand(indexVO.getContent(), queryVO.getKeywords()));
}
}
data.add(JSON.toJSONStringWithDateFormat(indexVO,"yyyy-MM-dd HH:mm:ss"));
}
pagination.setData(data);
pagination.setPageIndex(pageIndex);
pagination.setPageSize(pageSize);
pagination.setTotal(total);
pagination.setPageCount(total/pageSize + 1);
return pagination;
}
private static String highLightByHand(String value,String keyword){
if(StringUtils.isEmpty(value) || StringUtils.isEmpty(keyword)){
return "";
}
if(value.length()>200){
value = value.substring(0,200);
}
Set<String> strs = new HashSet<>();
for(char c : keyword.toCharArray()){
strs.add(c+"");
}
String[] strtArr = strs.toArray(new String[strs.size()]);
for(int i=0;i<strtArr.length;i++){
value = value.replaceAll(strtArr[i],"<span style=\"color:red\">" + strtArr[i] + "</span>");
}
return value;
}
解决
- 第三种分页方式,search_after,可以解决
- search_after必须有排序,且必须唯一,因为它需要根据排序得到sortValues,再根据sortValues向后查询
- 对于智能排序,是根据得分排序的,而得分是可能相同的,不能作为唯一排序值,必须加一个其他排序,才能使用
- 先遍历查询,到最后一页,加上source获取和高亮,并不影响原有查询,解决
第一版
public Pagination searchAfterForPage(EsQueryVO queryVO) {
Long t1 = System.currentTimeMillis();
Long pageIndex = new Long(queryVO.getPageIndex());
int pageSize = queryVO.getPageSize();
SearchRequestBuilder searchRequestBuilder = esQueryBuild(queryVO);
Map<String,Object> queryResultMap = new HashMap<>();
if(queryVO.getOrderType() != null && queryVO.getOrderType() == 1){
queryResultMap = searchAfter(searchRequestBuilder,queryVO);
}else {
searchRequestBuilder.addSort(SortBuilders.scoreSort());
queryResultMap = searchAfter(searchRequestBuilder,queryVO);
}
Long t2 = System.currentTimeMillis();
logger.info("scroll查询耗时:"+(t2-t1));
Pagination pagination = new Pagination();
List<Object> data = new ArrayList<>();
long total = (long)queryResultMap.get("total");
SearchHit[] resultHits = (SearchHit[])queryResultMap.get("data");
for(SearchHit hit : resultHits) {
Map<String, Object> source = hit.getSource();
Map<String, Object> resultMap = new HashedMap();
for (Map.Entry<String, Object> entry : source.entrySet()) {
String key = entry.getKey();
String value = String.valueOf(entry.getValue());
if(queryVO.getIsHighlight() == 1){
if(queryVO.getHightFields().contains("title") && key.equals("title") && null != hit.getHighlightFields()
&& hit.getHighlightFields().size()>0 && hit.getHighlightFields().get("title") != null){
Text[] text = hit.getHighlightFields().get("title").getFragments();
for (Text str : text) {
value = str.string();
}
}
if(queryVO.getHightFields().contains("content") && key.equals("content") && null != hit.getHighlightFields()
&& hit.getHighlightFields().size()>0 && hit.getHighlightFields().get("content") != null){
Text[] text = hit.getHighlightFields().get("content").getFragments();
for (Text str : text) {
value = str.string();
}
}
}
resultMap.put(key,value);
}
data.add(JSON.toJSONStringWithDateFormat(resultMap,"yyyy-MM-dd HH:mm:ss"));
}
pagination.setData(data);
pagination.setPageIndex(pageIndex);
pagination.setPageSize(pageSize);
pagination.setTotal(total);
pagination.setPageCount(total/pageSize + 1);
return pagination;
}
private Map<String,Object> searchAfterX(SearchRequestBuilder searchRequestBuilder, EsQueryVO queryVO){
Map<String,Object> result = new HashMap<>();
Long pageIndex = new Long(queryVO.getPageIndex());
int pageSize = queryVO.getPageSize();
String sortField = queryVO.getSortField();
if(StringUtils.isEmpty(sortField)){
sortField = "createDateLong";
}
if("desc".equals(queryVO.getSortOrder())){
searchRequestBuilder.addSort(sortField, SortOrder.DESC);
}else {
searchRequestBuilder.addSort(sortField, SortOrder.ASC);
}
searchRequestBuilder.setSize(pageSize);
searchRequestBuilder.setFetchSource(false);
SearchResponse searchResponse = null;
do {
if(searchResponse != null){
SearchHit[] hits = searchResponse.getHits().getHits();
Object[] sortValues = hits[hits.length - 1].getSortValues();
if(sortValues != null && sortValues.length > 0){
searchRequestBuilder = searchRequestBuilder.searchAfter(sortValues);
}
}
if(pageIndex == 0){
searchRequestBuilder.setFetchSource(true);
if(!StringUtils.isEmpty(queryVO.getKeywords()) && (queryVO.getFuzzySearch() || queryVO.getIsHighlight() == 1)){
setHighLight(searchRequestBuilder,queryVO);
}
}
searchResponse = searchRequestBuilder.get();
pageIndex--;
} while(searchResponse.getHits().getHits().length != 0 && pageIndex > -1);
result.put("total",searchResponse.getHits().getTotalHits());
result.put("data",searchResponse.getHits().getHits());
return result;
}
search_aftet第二版
- skip跳页查询
- 增加了
searchRequestBuilder.addSort("id.keyword", SortOrder.DESC);
保证唯一性 - 经测试,性能很好,直接查500多页,也在300ms以内
private Map<String,Object> searchAfter(SearchRequestBuilder searchRequestBuilder, EsQueryVO queryVO){
Map<String,Object> result = new HashMap<>();
Long pageIndex = new Long(queryVO.getPageIndex());
int pageSize = queryVO.getPageSize();
String sortField = queryVO.getSortField();
if(StringUtils.isEmpty(sortField)){
sortField = "createDateLong";
}
if("desc".equals(queryVO.getSortOrder())){
searchRequestBuilder.addSort(sortField, SortOrder.DESC);
}else {
searchRequestBuilder.addSort(sortField, SortOrder.ASC);
}
searchRequestBuilder.addSort("id.keyword", SortOrder.DESC);
searchRequestBuilder.setFetchSource(false);
if(pageIndex>0){
long skipStart = System.currentTimeMillis();
Object[] sortValues = skipPages(pageSize,pageIndex,searchRequestBuilder);
long skipEnd = System.currentTimeMillis();
logger.info("skip耗时{}",skipEnd-skipStart);
searchRequestBuilder = searchRequestBuilder.searchAfter(sortValues);
}
long search1 = System.currentTimeMillis();
searchRequestBuilder.setFetchSource(true);
searchRequestBuilder.setSize(pageSize);
if(!StringUtils.isEmpty(queryVO.getKeywords()) && (queryVO.getFuzzySearch() || queryVO.getIsHighlight() == 1)){
setHighLight(searchRequestBuilder,queryVO);
}
logger.info("查询语句为:\n" + searchRequestBuilder);
SearchResponse searchResponse = searchRequestBuilder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).get();
long search2 = System.currentTimeMillis();
logger.info("search耗时{}",search2-search1);
long total = searchResponse.getHits().getTotalHits();
result.put("total", total);
result.put("data",searchResponse.getHits().getHits());
logger.info("查询结束,total = " + total);
return result;
}
private static final Object[] skipPages(int pageSize, Long pageIndex,SearchRequestBuilder searchRequestBuilder){
long t1 = System.currentTimeMillis();
Object[] sortValues = null;
int skip = pageSize * (pageIndex.intValue());
searchRequestBuilder.highlighter(null).setFetchSource(false);
int size = 10000;
do {
if (sortValues != null) {
searchRequestBuilder.searchAfter(sortValues);
}
if (skip >= size) {
skip -= size;
searchRequestBuilder.setSize(size);
} else {
searchRequestBuilder.setSize(skip);
skip = 0;
}
logger.info("skipSearch:{}", searchRequestBuilder);
SearchResponse searchResponse = searchRequestBuilder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).get();
SearchHit[] hits = searchResponse.getHits().getHits();
if(hits!= null && hits.length >0){
sortValues = hits[hits.length - 1].getSortValues();
}else{
sortValues = null;
}
}while (skip > 0 && sortValues != null);
long t2 = System.currentTimeMillis();
logger.info("skipSearch:{}", t2 - t1);
return sortValues;
}