Elasticsearch深度分页和高亮显示:search_after

问题

  • 业务需要,深度分页返回内容和高亮结果,必须要满足深度分页、内容、高亮三个条件,但是由于单条数据索引内容较大,分页查询全部返回source和高亮,会拖慢查询速度和撑爆内存
  • 想着先分页,到最后一页(当前页)再加source返回和高亮,但是并不顺利
  • 分页查询,最普通的是from+size,但它是逐页翻的,超过10000条数据会报错
  • scroll游标分页,可以深度分页,可以去只取最后一页的结果,效率还可以
  • 但是scroll分页,如果返回source和字段高亮,必须在最初的查询语句里增加source返回和高亮语句,这样查询会很慢

scroll 分页

  • 不获取source时,scroll 分页查询还是很快的,可以查到当前页数据后,再遍历查询,获取每条数据的值。至于高亮,只能自己去实现了
  • 主要是查询时,不获取source(fetchSource(false)),只对最后的结果hits,自己遍历,再逐条获取数据和高亮处理
@Override
    public Pagination scrollSearchForPage(EsQueryVO queryVO) {
        Long t1 = System.currentTimeMillis();
        Long pageIndex = new Long(queryVO.getPageIndex());
        int pageSize = queryVO.getPageSize();
        //根据查询条件EsQueryVO,拼接ES查询语句
        SearchRequestBuilder searchRequestBuilder = esQueryBuild(queryVO);
        searchRequestBuilder.setFetchSource(false);
        searchRequestBuilder.setSize(pageSize);
        //默认相关度排序;orderType为1时,按照字段倒序
        String sortField = queryVO.getSortField();
        if (StringUtils.isEmpty(sortField)) {
            sortField = "createDate";
        }
        //Elasticsearch默认相关度排序,只有按照字段排序才需要加排序语句
        if (queryVO.getOrderType() != null && queryVO.getOrderType() == 1) {
            if ("desc".equals(queryVO.getSortOrder())) {
                searchRequestBuilder.addSort(sortField, SortOrder.DESC);
            } else {
                searchRequestBuilder.addSort(sortField, SortOrder.ASC);
            }
        }

        //设置 search context 维护1分钟的有效期
        searchRequestBuilder.setScroll(TimeValue.timeValueMinutes(1));
        logger.info("scroll查询次数:{}", pageIndex + 1);
        // 首先查询一次,获取scrollId
        SearchResponse scrollResp = searchRequestBuilder.get();
        if (pageIndex > 0) {
            do {
                //将scrollId循环传递
                scrollResp = elasticsearchTemplate.getClient().prepareSearchScroll(scrollResp.getScrollId()).setScroll(TimeValue.timeValueMinutes(1)).execute().actionGet();
                pageIndex--;
                //当searchHits的数组为空的时候结束循环,至此数据全部读取完毕
            } while (scrollResp.getHits().getHits().length != 0 && pageIndex > 0);
        }

        //打印命中数量
        Long total = scrollResp.getHits().getTotalHits();
        logger.info("命中总数量:{}", total);
        // 最后一次scroll的结果
        SearchHits resultHits = scrollResp.getHits();
        //删除scroll
        ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
        clearScrollRequest.addScrollId(scrollResp.getScrollId());
        elasticsearchTemplate.getClient().clearScroll(clearScrollRequest).actionGet();
        Long t2 = System.currentTimeMillis();
        logger.info("scroll查询耗时:" + (t2 - t1));
        //对结果处理,放入分页对象Pagination
        Pagination pagination = new Pagination();
        List<Object> data = new ArrayList<>();
        for(SearchHit hit : resultHits) {
            // 获取数据
            SolrIndexVO indexVO = searchVoById(new EsQueryVO(hit.getIndex(), hit.getType(), hit.getId()));
            //高亮处理
            if(queryVO.getIsHighlight() == 1){
                if(queryVO.getHightFields().contains("title")){
                    indexVO.setTitle(highLightByHand(indexVO.getTitle(), queryVO.getKeywords()));
                }
                if(queryVO.getHightFields().contains("content")){
                    indexVO.setContent(highLightByHand(indexVO.getContent(), queryVO.getKeywords()));
                }
            }
            data.add(JSON.toJSONStringWithDateFormat(indexVO,"yyyy-MM-dd HH:mm:ss"));
        }
        pagination.setData(data);
        pagination.setPageIndex(pageIndex);
        pagination.setPageSize(pageSize);
        pagination.setTotal(total);
        pagination.setPageCount(total/pageSize + 1);
        return pagination;
    }
	/**
	 * 手动高亮
	 * @param value
	 */
	private static String highLightByHand(String value,String keyword){
		if(StringUtils.isEmpty(value) || StringUtils.isEmpty(keyword)){
			return "";
		}
		if(value.length()>200){
			value = value.substring(0,200);
		}
		Set<String> strs = new HashSet<>();
		for(char c : keyword.toCharArray()){
			strs.add(c+"");
		}
		String[] strtArr = strs.toArray(new String[strs.size()]);
		for(int i=0;i<strtArr.length;i++){
			value = value.replaceAll(strtArr[i],"<span style=\"color:red\">" + strtArr[i] + "</span>");
		}
		return value;
	}

解决

  • 第三种分页方式,search_after,可以解决
  • search_after必须有排序,且必须唯一,因为它需要根据排序得到sortValues,再根据sortValues向后查询
  • 对于智能排序,是根据得分排序的,而得分是可能相同的,不能作为唯一排序值,必须加一个其他排序,才能使用
  • 先遍历查询,到最后一页,加上source获取和高亮,并不影响原有查询,解决

第一版

  • 类似scroll写法
  • 错误示范
public Pagination searchAfterForPage(EsQueryVO queryVO) {
        Long t1 = System.currentTimeMillis();
        Long pageIndex = new Long(queryVO.getPageIndex());
        int pageSize = queryVO.getPageSize();
        //根据查询条件EsQueryVO,拼接ES查询语句
        SearchRequestBuilder searchRequestBuilder = esQueryBuild(queryVO);
        Map<String,Object> queryResultMap = new HashMap<>();
        //默认相关度排序;orderType为1时,按照日期等字段倒序,可以直接使用search_after,否则必须加上SortBuilders.scoreSort()
        if(queryVO.getOrderType() != null && queryVO.getOrderType() == 1){
            queryResultMap = searchAfter(searchRequestBuilder,queryVO);
        }else {
            searchRequestBuilder.addSort(SortBuilders.scoreSort());
            queryResultMap = searchAfter(searchRequestBuilder,queryVO);
        }
        Long t2 = System.currentTimeMillis();
        logger.info("scroll查询耗时:"+(t2-t1));
        //对结果处理,放入分页对象Pagination
        Pagination pagination = new Pagination();
        List<Object> data = new ArrayList<>();
        long total = (long)queryResultMap.get("total");
        SearchHit[] resultHits = (SearchHit[])queryResultMap.get("data");
        for(SearchHit hit : resultHits) {
            Map<String, Object> source = hit.getSource();
            Map<String, Object> resultMap = new HashedMap();
            for (Map.Entry<String, Object> entry : source.entrySet()) {
                String key = entry.getKey();
                String value = String.valueOf(entry.getValue());
                //高亮处理
                if(queryVO.getIsHighlight() == 1){
                    if(queryVO.getHightFields().contains("title") && key.equals("title") && null != hit.getHighlightFields()
                            && hit.getHighlightFields().size()>0 && hit.getHighlightFields().get("title") != null){
                        Text[] text = hit.getHighlightFields().get("title").getFragments();
                        for (Text str : text) {
                            value = str.string();
                        }
                    }
                    if(queryVO.getHightFields().contains("content") && key.equals("content") && null != hit.getHighlightFields()
                            && hit.getHighlightFields().size()>0 && hit.getHighlightFields().get("content") != null){
                        Text[] text = hit.getHighlightFields().get("content").getFragments();
                        for (Text str : text) {
                            value = str.string();
                        }
                    }
                }
                resultMap.put(key,value);
            }
            data.add(JSON.toJSONStringWithDateFormat(resultMap,"yyyy-MM-dd HH:mm:ss"));
        }
        pagination.setData(data);
        pagination.setPageIndex(pageIndex);
        pagination.setPageSize(pageSize);
        pagination.setTotal(total);
        pagination.setPageCount(total/pageSize + 1);
        return pagination;
    }

    /**
     * 这是一个错误示范,由于之前使用scroll的经验,直接用了类似scroll的方式去使用search_after,导致效率不高,页数多了后查询很慢
     * 使用search_after查询
     * @param searchRequestBuilder
     * @param queryVO
     * @return
     */
    private Map<String,Object> searchAfterX(SearchRequestBuilder searchRequestBuilder, EsQueryVO queryVO){
        Map<String,Object> result = new HashMap<>();
        // pageIndex从0开始
        Long pageIndex = new Long(queryVO.getPageIndex());
        int pageSize = queryVO.getPageSize();
        //默认相关度排序;orderType为1时,按照字段倒序
        String sortField = queryVO.getSortField();
        if(StringUtils.isEmpty(sortField)){
            sortField = "createDateLong";
        }
        //Elasticsearch默认相关度排序,只有按照字段排序才需要加排序语句
        if("desc".equals(queryVO.getSortOrder())){
            searchRequestBuilder.addSort(sortField, SortOrder.DESC);
        }else {
            searchRequestBuilder.addSort(sortField, SortOrder.ASC);
        }
        searchRequestBuilder.setSize(pageSize);
        searchRequestBuilder.setFetchSource(false);
        SearchResponse searchResponse = null;
        do {
            if(searchResponse != null){
                SearchHit[] hits = searchResponse.getHits().getHits();
                Object[] sortValues = hits[hits.length - 1].getSortValues();
                if(sortValues != null && sortValues.length > 0){
                    searchRequestBuilder = searchRequestBuilder.searchAfter(sortValues);
                }
            }
            if(pageIndex == 0){
                searchRequestBuilder.setFetchSource(true);
                if(!StringUtils.isEmpty(queryVO.getKeywords()) && (queryVO.getFuzzySearch() || queryVO.getIsHighlight() == 1)){
                    setHighLight(searchRequestBuilder,queryVO);
                }
            }
            searchResponse = searchRequestBuilder.get();
            pageIndex--;
            //当searchHits的数组为空的时候结束循环,至此数据全部读取完毕
        } while(searchResponse.getHits().getHits().length != 0 && pageIndex > -1);
        result.put("total",searchResponse.getHits().getTotalHits());
        result.put("data",searchResponse.getHits().getHits());
        return result;
    }

search_aftet第二版

  • skip跳页查询
  • 增加了searchRequestBuilder.addSort("id.keyword", SortOrder.DESC);保证唯一性
  • 经测试,性能很好,直接查500多页,也在300ms以内
/**
     * search_after查询
     * @param searchRequestBuilder
     * @param queryVO
     * @return
     */
    private Map<String,Object> searchAfter(SearchRequestBuilder searchRequestBuilder, EsQueryVO queryVO){
        Map<String,Object> result = new HashMap<>();
        // pageIndex从0开始
        Long pageIndex = new Long(queryVO.getPageIndex());
        int pageSize = queryVO.getPageSize();
        //默认相关度排序;orderType为1时,按照字段倒序
        String sortField = queryVO.getSortField();
        if(StringUtils.isEmpty(sortField)){
            sortField = "createDateLong";
        }
        //Elasticsearch默认相关度排序,只有按照字段排序才需要加排序语句
        if("desc".equals(queryVO.getSortOrder())){
            searchRequestBuilder.addSort(sortField, SortOrder.DESC);
        }else {
            searchRequestBuilder.addSort(sortField, SortOrder.ASC);
        }
        // 为了保证唯一,加上id
        searchRequestBuilder.addSort("id.keyword", SortOrder.DESC);
        searchRequestBuilder.setFetchSource(false);
        if(pageIndex>0){
            long skipStart = System.currentTimeMillis();
            Object[] sortValues = skipPages(pageSize,pageIndex,searchRequestBuilder);
            long skipEnd = System.currentTimeMillis();
            logger.info("skip耗时{}",skipEnd-skipStart);
            searchRequestBuilder = searchRequestBuilder.searchAfter(sortValues);
        }
        long search1 = System.currentTimeMillis();
        searchRequestBuilder.setFetchSource(true);
        searchRequestBuilder.setSize(pageSize);
        if(!StringUtils.isEmpty(queryVO.getKeywords()) && (queryVO.getFuzzySearch() || queryVO.getIsHighlight() == 1)){
            setHighLight(searchRequestBuilder,queryVO);
        }
        logger.info("查询语句为:\n" + searchRequestBuilder);
        SearchResponse searchResponse = searchRequestBuilder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).get();
        long search2 = System.currentTimeMillis();
        logger.info("search耗时{}",search2-search1);
        long total = searchResponse.getHits().getTotalHits();
        result.put("total", total);
        result.put("data",searchResponse.getHits().getHits());
        logger.info("查询结束,total = " + total);
        return result;
    }

    /**
     * 根据分页条件计算得出需要跳过的sortValues
     * @param pageSize 每页显示条数
     * @param pageIndex 页码,从0开始
     * @param searchRequestBuilder
     * @return java.lang.Object[] 返回需要跳过的值
     **/
    private static final Object[] skipPages(int pageSize, Long pageIndex,SearchRequestBuilder searchRequestBuilder){
        long t1 = System.currentTimeMillis();
        Object[] sortValues = null;
        // 需要跳过的条数
        int skip = pageSize * (pageIndex.intValue());
        searchRequestBuilder.highlighter(null).setFetchSource(false);
        // es支持的最大size为1000
        int size = 10000;
        // 循环处理,不超过10000可以直接跳过,超过10000,每次跳过10000,直到不足10000
        do {
            if (sortValues != null) {
                searchRequestBuilder.searchAfter(sortValues);
            }
            if (skip >= size) {
                skip -= size;
                searchRequestBuilder.setSize(size);
            } else {
                //  否则 直接跳到目标位置
                searchRequestBuilder.setSize(skip);
                skip = 0;
            }
            logger.info("skipSearch:{}", searchRequestBuilder);
            SearchResponse searchResponse = searchRequestBuilder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).get();
            SearchHit[] hits = searchResponse.getHits().getHits();
            if(hits!= null && hits.length >0){
                sortValues = hits[hits.length - 1].getSortValues();
            }else{
                sortValues = null;
            }
        }while (skip > 0 && sortValues != null);
        long t2 = System.currentTimeMillis();
        logger.info("skipSearch:{}", t2 - t1);
        return sortValues;
    }
  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

坚持是一种态度

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值