elasticsearch 7.9.3知识归纳整理(三)之 java实战操作es(rest-high-level)

java实战操作es

一、项目配置

1.1 引包(7.x的用法基本无差别)
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.9.3</version>
</dependency>
1.2 配置连接

无账号密码版

@Configuration
public class MyEsConfig {
    public static final RequestOptions COMMON_OPTIONS;
    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
        COMMON_OPTIONS = builder.build();
    }
    @Bean
    public RestHighLevelClient esRestClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        //new HttpHost("localhost", 9200, "http"),
                        //集群配置法
                        new HttpHost("localhost", 9200, "http")));
        return client;
    }
}

有密码版

@Configuration
public class MyEsConfig {
    public static final RequestOptions COMMON_OPTIONS;
    /**
     * 配置文件的配置
     * elasticsearch:
     *   urls: 192.168.183.130:9200,192.168.183.131:9200
     *   account: elastic
     *   cypher: 123456
     */
    @Value("${elasticsearch.urls}")
    private String urls;
    @Value("${elasticsearch.account}")
    private String account;
    @Value("${elasticsearch.cypher}")
    private String cypher;

    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
        COMMON_OPTIONS = builder.build();
    }
    @Bean
    public RestHighLevelClient esRestClient() {

        RestClientBuilder builder = null;
        String ipAddr = null;
        String[] urlArr;
        Integer port = null;
        if (!StringUtils.isBlank(urls)) {
            String[] urlsArr = urls.split(",");
            for (int i = 0; i < urlsArr.length; i++) {
                String url = urlsArr[i];
                urlArr = url.split(":");
                ipAddr = urlArr[0];
                port = (urlArr[1] == null ? 0 : Integer.parseInt(urlArr[1]));
                builder = RestClient.builder(new HttpHost(ipAddr, port, "http"));
            }
        }
        CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(account, cypher));
        builder.setHttpClientConfigCallback(f -> f.setDefaultCredentialsProvider(credentialsProvider));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);
        return restHighLevelClient;
    }
}

二、基本用法

1.插入数据
1.1 单条插入数据
		Map<String,Object> map = new HashMap<>();
   		map.put("recordDate", "data1");
        map.put("recordDateMillis", "data2");
        IndexRequest request = new IndexRequest("eslog");
        request.source(map);
        IndexResponse response = null;
        try {
            response = restHighLevelClient.index(request, MyEsConfig.COMMON_OPTIONS);
            if ("created".equalsIgnoreCase(response.getResult().name())) {
                return true;
            } else {
                return false;
            }
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
1.2 批量插入数据
 		String index = "eslog";
        BulkRequest request = new BulkRequest();
        list.stream().forEach(item ->{
            request.add(new IndexRequest(index).source(item));
        });

        try {
            BulkResponse bulk = restHighLevelClient.bulk(request, MyEsConfig.COMMON_OPTIONS);
            if(bulk.status().getStatus()==200){
                return R.ok("批量插入成功");
            }

        } catch (IOException e) {
            e.printStackTrace();
            return R.error("批量插入失败!");
        }

客户端的几个参数(被面试过):

  • bulkActions 达到多少条刷一次数据
  • BulkSize 桶的大小
  • flushInterval 刷入数据的时间间隔
  • concurrentRequests并发数
BulkProcessor.Builder builder = BulkProcessor.builder(((bulkRequest, bulkResponseActionListener) -> {
            restHighLevelClient.bulkAsync(bulkRequest, RequestOptions.DEFAULT, bulkResponseActionListener);
        }), listener);
        //到达10000条时刷新
        builder.setBulkActions(10000);
        //内存到达8M时刷新
        builder.setBulkSize(new ByteSizeValue(8L, ByteSizeUnit.MB));
        //设置的刷新间隔10s
        builder.setFlushInterval(TimeValue.timeValueSeconds(10));
        //设置允许执行的并发请求数。
        builder.setConcurrentRequests(8);
        //设置重试策略
        builder.setBackoffPolicy(BackoffPolicy.constantBackoff(TimeValue.timeValueSeconds(1), 3));
        return builder.build();
2.删除
		//1.创建DeleteByQueryRequest
        DeleteByQueryRequest request = new DeleteByQueryRequest();
        request.indices("eslog");
		//2、指定检索条件 身高 低于 170
        request.setQuery(QueryBuilders.rangeQuery("height").lte(170));
        try {
			//执行删除
            BulkByScrollResponse response =
                    restHighLevelClient.deleteByQuery(request, ElasticsearchConfig.COMMON_OPTIONS);
            log.info("删除成功,返回:"+response);
        } catch (IOException e) {
            e.printStackTrace();
        }
3.查询
3.1综合条件查询
public R queryByKeyword(Map map) {
        String keyWord = map.get("keyWord") == null ? null : (String) map.get("keyWord");
        Integer start = map.get("start") == null ? 0 : Integer.parseInt((String) map.get("start"));
        Integer limit = map.get("limit") == null ? 0 : Integer.parseInt((String) map.get("limit"));
        String startTime = map.get("startTime") == null ? null : (String) map.get("startTime");
        String endTime = map.get("endTime") == null ? null : (String) map.get("endTime");
        Long startMillis = 0L;
        Long endMillis = 0L;

        if (!StringUtils.isBlank(startTime)) {
            try {
                Date startTimeDate = sdf.parse(startTime);
                startMillis = startTimeDate.getTime();
            } catch (ParseException e) {
                e.printStackTrace();
                startMillis = 0L;
            }
        }
        if (!StringUtils.isBlank(endTime)) {
            try {
                Date endTimeDate = sdf.parse(endTime);
                endMillis = endTimeDate.getTime();
            } catch (ParseException e) {
                e.printStackTrace();
                endMillis = 0L;
            }
        }

        //1.创建请求
        SearchRequest request = new SearchRequest();
        request.indices("logsrecords");
        //2、创建请求参数
        SearchSourceBuilder ssb = new SearchSourceBuilder();
        //3、分页排序
        ssb
                .from(start - 1).size(limit)
                .sort("recordDateMillis", SortOrder.DESC)
                //超时时间
                .timeout(new TimeValue(60, TimeUnit.SECONDS))
                //返回总数
                .trackTotalHits(true);
        //4.布尔条件查询
        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
        //关键字模糊查询--类似于 like %keyword%
        if (!StringUtils.isBlank(keyWord)) {
            //短语查询
            boolQueryBuilder
            	.should(QueryBuilders.matchPhraseQuery("name", keyWord))
            	.should(QueryBuilders.matchPhraseQuery("job", keyWord));;
        }
        if (startMillis != null && startMillis != 0L) {
            //范围查询
            boolQueryBuilder.filter(QueryBuilders.rangeQuery("recordDateMillis").gt(startMillis).lte(endMillis));
            //精确匹配
            boolQueryBuilder.filter(QueryBuilders.termQuery("address", "北京"));
        }
        ssb.query(boolQueryBuilder);
        request.source(ssb);
        SearchResponse response = null;
        RestStatus status = null;
        Map<String, Object> data = new HashMap<>();
        List<Map<String, Object>> list = new ArrayList<>();
        try {
            response = restHighLevelClient.search(request, MyEsConfig.COMMON_OPTIONS);
            status = response.status();
            data.put("status", status);
            long totalHits = response.getHits().getTotalHits().value;
            Integer totalPage = (int) Math.ceil((double) totalHits / limit);
            data.put("pageSize", limit);
            data.put("totalPage", totalPage);
            data.put("totalCount", totalHits);
            data.put("currentPage", start);

            for (SearchHit hit : response.getHits().getHits()) {
                Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                list.add(sourceAsMap);
            }
            data.put("data", list);
        } catch (IOException e) {
            e.printStackTrace();
            return R.error("查询失败!");
        }
        return R.ok("查询成功!").put("result", data);
    }
3.2 id查询 ids查询

根据单个id查询::类似于mysql的 where id=id1

 		GetRequest request = new GetRequest("eslog","oI9GVHQBH0SEUrtlhvX7");
        try {
            log.info("======"+request);
            GetResponse response = restHighLevelClient.get(request, ElasticsearchConfig.COMMON_OPTIONS);
            Map<String, Object> sourceAsMap = response.getSourceAsMap();
            return R.ok("查询成功").put("sourceAsMap",sourceAsMap);
        } catch (IOException e) {
            e.printStackTrace();
        }

根据多个id查询:类似于mysql的 where id in (id1,id2,id3)

SearchRequest request = new SearchRequest();
        request.indices("eslog");
        SearchSourceBuilder builder = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
		//核心部分
        boolQueryBuilder.filter(QueryBuilders.idsQuery().addIds("oI9GVHQBH0SEUrtlhvX7", "oY9HVHQBH0SEUrtlaPUO", "3Fz9aHQBxI7zG-AK_rLc"));
        builder.query(boolQueryBuilder);
        request.source(builder);
        List<Map<String, Object>> list = new ArrayList<>();
        Map<String, Object> map = new HashMap<>();
        try {
            SearchResponse response = restHighLevelClient.search(request, ElasticsearchConfig.COMMON_OPTIONS);
            SearchHit[] searchHits = response.getHits().getHits();
            for (SearchHit hit : searchHits) {
                Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                list.add(sourceAsMap);
            }
            map.put("data", list);
            return R.ok("查询成功!").put("result", map);
        } catch (IOException e) {
            e.printStackTrace();
        }
3.3 prefix查询

前缀查询,可以通过一个关键字去指定一个field的前缀,从而查询到指定文档

		SearchRequest request = new SearchRequest();
        request.indices("eslog");
        SearchSourceBuilder builder = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
        boolQueryBuilder.filter(QueryBuilders.prefixQuery("name.keyword","王"));
        builder.query(boolQueryBuilder);

        request.source(builder);
        List<Map<String, Object>> list = new ArrayList<>();
        Map<String, Object> map = new HashMap<>();
        try {
            SearchResponse response = restHighLevelClient.search(request, ElasticsearchConfig.COMMON_OPTIONS);
            SearchHit[] searchHits = response.getHits().getHits();
            for (SearchHit hit : searchHits) {
                Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                list.add(sourceAsMap);
            }
            map.put("data", list);
            return R.ok("查询成功!").put("result", map);
        } catch (IOException e) {
            e.printStackTrace();
        }
3.4 wildcard查询

通配查询: 与MYSQL中的like是一样的,在查询时可以指定通配符和占位符
*是通配符 ?是占位符

SearchRequest request =new SearchRequest();
        request.indices("eslog");
        SearchSourceBuilder builder = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
		//boolQueryBuilder.filter(QueryBuilders.wildcardQuery("name.keyword","王*"));
        boolQueryBuilder.filter(QueryBuilders.wildcardQuery("name.keyword","王??"));
        builder.query(boolQueryBuilder);
        request.source(builder);
        log.info("======="+builder);
        List<Map<String, Object>> list = new ArrayList<>();
        Map<String, Object> map = new HashMap<>();
        try {
            SearchResponse response = restHighLevelClient.search(request, ElasticsearchConfig.COMMON_OPTIONS);
            SearchHit[] searchHits = response.getHits().getHits();
            for (SearchHit hit : searchHits) {
                Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                list.add(sourceAsMap);
            }
            map.put("data", list);
            return R.ok("查询成功!").put("result", map);
        } catch (IOException e) {
            e.printStackTrace();
        }
3.5 高亮查询

高亮就是用户输入关键字,以一定特殊样式展示给用户,让用户知道为什么这个结果被检索出来
高亮展示数据,本身就是文档中的一个field,单独将field以highlight的形式返回给你
ES提供一个highlight属性,和query同级别
1、fragment_size:指定高亮数据展示多少个字符回来
2、pre_tags:指定前缀标签
3、post_tags:指定后缀标签

4、fields:指定哪个field以高亮形式返回

//查询方法
 public Map<String, Object> getKeyWordHighLightQuery(String keyWord) {
        Integer start = 0;
        Integer limit = 10;
        SearchRequest request = new SearchRequest();
        request.indices("eslog");
        SearchSourceBuilder ssb = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
        if (!StringUtils.isBlank(keyWord)) {
            /**
             * QueryBuilders.multiMatchQuery 多个字段field匹配一个值,模糊查询:
             * 第一参数是输入的要查询的值,后面的都是要匹配的字段
             */
              boolQueryBuilder
                    .should(QueryBuilders.matchPhraseQuery("companyCode", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("loginType", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("userId", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("appid", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("loginIp", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("unitCode", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("id", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("userType", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("isInWhitlelist", keyWord))
                    .should(QueryBuilders.matchPhraseQuery("env", keyWord));
        }
        ssb.query(boolQueryBuilder);
        ssb.trackTotalHits(true);
        /**
        *设置高亮查询
        **/
        ssb.highlighter(getHighlightBuilder());
        request.source(ssb);
        SearchResponse response = null;
        List<Map<String, Object>> list = new ArrayList<>();
        Map<String, Object> map = new HashMap<>();
        list.clear();
        map.clear();
        
        try {
            response = restHighLevelClient.search(request, EsConfig.COMMON_OPTIONS);
            long total = response.getHits().getTotalHits().value;
            Integer totalPage = (int) Math.ceil((double) total / limit);
            map.put("currPage", start + 1);
            map.put("pageSize", limit);
            map.put("totalPage", totalPage);
            map.put("totalCount", total);
            SearchHit[] searchHits = response.getHits().getHits();
            for (SearchHit hit : searchHits) {
                Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                Map<String, HighlightField> highlightFieldsMap = hit.getHighlightFields();
                /**
                *获取高亮查询的结果并覆盖正常查询到的数据
                **/
                sourceAsMap = getHighLightMap(sourceAsMap, highlightFieldsMap);

                list.add(sourceAsMap);
            }
            map.put("data", list);
            return map;
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }


构建高亮字段

  /**
     * 高亮设置
     *
     * @return
     */
    private HighlightBuilder getHighlightBuilder() {
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.preTags("<font color='#e75213'>");
        highlightBuilder.postTags("</font>");
        /**
         * highlighterType可选:unified,plain和fvh
         * unified : 使用Lucene的统一highlighter。
         * 这个突出显示器将文本分成句子,并使用BM25算法对单个句子进行评分,
         * 就好像它们是语料库中的文档一样。它还支持准确的短语和多项(模糊,前缀,正则表达式)突出显示
         *
         *plain highlighter最适合在单一领域突出简单的查询匹配。
         * 为了准确反映查询逻辑,它会创建一个微小的内存索引,
         * 并通过Lucene的查询执行计划程序重新运行原始查询条件,
         * 以访问当前文档的低级匹配信息。对于需要突出显示的每个字段和每个文档都会
         * 重复此操作。如果要在复杂查询的大量文档中突出显示很多字段,
         * 我们建议使用unified highlighter postings或term_vector字段
         *
         *fvh highlighter使用Lucene的Fast Vector highlighter。此突出显示器可用于映射中term_vector设置为的
         * 字段with_positions_offsets。Fast Vector highlighter
         */
        highlightBuilder.highlighterType("unified");
        /**
         * 这只高亮字段,我这里设置为要查询的字段一致
         */
        highlightBuilder.field("companyCode")
                .field("loginType").field("userId").field("appid").field("loginIp").field("unitCode").field("id").field("userType");
		//如果要多个字段高亮,这项要为false
        highlightBuilder.requireFieldMatch(false);
        /**
         * fragmentSize  设置要显示出来的fragment文本判断的长度,默认是100
         * numOfFragments 代表要显示几处高亮(可能会搜出多段高亮片段)。默认是5
         * noMatchSize  即使字段中没有关键字命中,也可以返回一段文字,该参数表示从开始多少个字符被返回
         */
        //highlightBuilder.fragmentSize(size).numOfFragments(3).noMatchSize(100);

        return highlightBuilder;
    }

返回值得高亮数据覆盖原来的数据

/**
*高亮结果返回:这里要做字段的部分高亮,比如 “我是中国人” 如果只匹配上“是中”,那么就只高亮这两个字其他不亮
*正常查询和高亮查询分开返回的,也就是高亮部分数据不会影响正常数据
*这里要用高亮数据覆盖正常返回数据,这样返回前端就是匹配的都是高亮显示了
**/
 private Map<String, Object> getHighLightMap(Map<String, Object> map, Map<String, HighlightField> highlightFields) {
        HighlightField highlightCompanyCode = highlightFields.get("companyCode");
        HighlightField highlightLoginType = highlightFields.get("loginType");
        HighlightField highlightUserId = highlightFields.get("userId");
        HighlightField highlightAppid = highlightFields.get("appid");
        HighlightField highlightLoginIp = highlightFields.get("loginIp");
        HighlightField highlightUnitCode = highlightFields.get("unitCode");
        HighlightField highlightId = highlightFields.get("id");
        HighlightField highlightUserType = highlightFields.get("userType");
 		boolean flag = false;
        if (highlightCompanyCode != null) {
            map.put("companyCode", highlightCompanyCode.fragments()[0].string());
            flag = true;
        }
        if (highlightLoginType != null) {
            map.put("loginType", highlightLoginType.fragments()[0].string());
            flag = true;
        }
        if (highlightUserId != null) {
            map.put("userId", highlightUserId.fragments()[0].string());
            flag = true;
        }
        if (highlightAppid != null) {
            map.put("appid", highlightAppid.fragments()[0].string());
            flag = true;
        }
        if (highlightLoginIp != null) {
            map.put("loginIp", highlightLoginIp.fragments()[0].string());
            flag = true;
        }
        if (highlightUnitCode != null) {
            map.put("unitCode", highlightUnitCode.fragments()[0].string());
            flag = true;
        }
        if (highlightId != null) {
            map.put("id", highlightId.fragments()[0].string());
            flag = true;
        }
        if (highlightUserType != null) {
            map.put("userType", highlightUserType.fragments()[0].string());
            flag = true;
        }
		/**
		*从这里就是开始字段部分内容的高亮显示,如果并不需要这么细,注释掉下面if里面的内容和它调用的内容即可。
		**/
 		 if (map.size() > 0) {
            //遍历map
            String replaceMentvalue = null;
            String mapValue = null;
            for (String key : map.keySet()) {
                mapValue = map.get(key) + "";
                if (!StringUtils.isBlank(mapValue)) {
                    replaceMentvalue = subGetManyStr(mapValue, keyWord);
                    map.put(key, replaceMentvalue);
                }
            }
        }
        return map;
    }
 /**
     * @Author liaochao
     * @Param input
     * @Param regex
     * 作用:关键字匹配不区分大小写,且给原字符加上样式
     * 获取字符串中包含另一个字符串(多个),且为原字符串加上样式,即高亮显示。
     * 例如:
     * 原字符串: "I like Java,jAva is very easy and jaVa is so popular.";
     * 关键字: "java";
     * 加样式后:
     * I like <font color='#e75213'>Java</font>,<font color='#e75213'>jAva</font> is very easy
     * and <font color='#e75213'>jaVa</font> is so popular.
     */
    public static String subGetManyStr(String input, String regex) {

        int length = regex.length();
        int indexNum;
        String substring;
        List<String> originList = new ArrayList<>();
        originList.clear();
        List<String> cssList = new ArrayList<>();
        cssList.clear();
        String strCss;
        String substr;
        while (true) {
            indexNum = input.toUpperCase().indexOf(regex.toUpperCase());
            if (indexNum != -1) {
                substring = input.substring(indexNum, indexNum + length);
                strCss = "<font color='#e75213'>" + substring + "</font>";
                cssList.add(strCss);
                //获取每次新输入的第一段
                substr = input.substring(0, indexNum);
                originList.add(substr);
            } else {
                //最后一段
                originList.add(input);
                break;
            }
            input = input.substring(indexNum + length);
        }
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < originList.size(); i++) {
            if (i != originList.size() - 1) {
                sb.append(new StringBuilder(originList.get(i) + (cssList.get(i) == null ? "" : cssList.get(i))));
            } else {
                sb.append(new StringBuilder(originList.get(i)));
            }
        }
        return new String(sb);
    }

3.6 聚合查询

1、 ElasticSearch在对海量数据进行聚合分析的时候会损失搜索的精准度来满足实时性的需求。
不精准的原因:
数据分散到多个分片,聚合是每个分片的取 Top X,导致结果不精准

2、 思考:如何提高聚合精确度?
方案1:设置主分片为1
注意7.x版本已经默认为1。 适用场景:数据量小的小集群规模业务场景

方案2:调大 shard_size 值 设置 shard_size 为比较大的值,官方推荐:size*1.5+10。shard_size值越大,结果越趋近于精准聚合结果值。此外,还可以通过show_term_doc_count_error参数显示最差情况下的错误值,用于辅助确定
shard_size 大小。 size:是聚合结果的返回值,客户期望返回聚合排名前三,size值就是 3。 shard_size:
每个分片上聚合的数据条数。shard_size 原则上要大于等于 size 适用场景:数据量大、分片数多的集群业务场景。

方案3:将size设置为全量值,来解决精度问题

3、Elasticsearch 聚合性能优化
启用 eager global ordinals 提升高基数聚合性能
插入数据时对索引进行预排序
使用节点查询缓存
使用分片请求缓存

earchRequest request = new SearchRequest();
        request.indices("eslog");
        SearchSourceBuilder builder = new SearchSourceBuilder();
        //按照名字聚合
        builder.aggregation(AggregationBuilders.terms("agg").field("name.keyword").size(10));
        //按照平均身高聚合
        builder.aggregation(AggregationBuilders.avg("heightAvg").field("height"));
        request.source(builder);
        List<Map<String, Object>> list = new ArrayList<>();
        Map<String, Object> map = new HashMap<>();
        Map<String, Object> mapAgg = new HashMap<>();
        try {
            SearchResponse response = restHighLevelClient.search(request, ElasticsearchConfig.COMMON_OPTIONS);
            SearchHit[] searchHits = response.getHits().getHits();
            for (SearchHit hit : searchHits) {
                Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                list.add(sourceAsMap);
            }
            /**
             * 获取聚合信息
             */
            Aggregations aggregations = response.getAggregations();
            Terms terms = aggregations.get("agg");
            for (Terms.Bucket bucket : terms.getBuckets()) {
                String keyAsString = bucket.getKeyAsString();
                long docCount = bucket.getDocCount();
                mapAgg.put(keyAsString, docCount);
            }
            //获取平均值
            Avg avg = aggregations.get("heightAvg");
            map.put("mapAgg", mapAgg);
            map.put("avg", avg.getValue());
            map.put("result", list);
            return R.ok("查询成功!").put("result", map);
        } catch (IOException e) {
            e.printStackTrace();
        }
3.7 查询字段超长的数据

应用场景:数据清洗。例如:我要查询索引中的某些字段的长度大于20的全部数据,在数据清洗时常常要求删除字段长度大于多少的数据。怎么整

 SearchRequest request = new SearchRequest();
        request.indices("eslog");
        SearchSourceBuilder ssb = new SearchSourceBuilder();

        Script script1 = new Script("doc['address.keyword'].value.length()>"+20);
        Script script2 = new Script("doc['job.keyword'].value.length()>"+20);
        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
		/**
		*should是或者的意思
		**/
        boolQueryBuilder.should(QueryBuilders.scriptQuery(script1));
        boolQueryBuilder.should(QueryBuilders.scriptQuery(script2));

        ssb.query(boolQueryBuilder);
        request.source(ssb);

        SearchResponse response = null;
        List<Map<String, Object>> list = new ArrayList<>();
        list.clear();
        try {
             response = restHighLevelClient.search(request, ElasticsearchConfig.COMMON_OPTIONS);
             for(SearchHit hit : response.getHits().getHits()){
                 Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                 list.add(sourceAsMap);
             }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return R.ok(list);
    }
3.8 查询字段为空的数据

在数据清洗时常常要求删除字段为空数据,这里包括索引中**没有这个字段** 和 **有这个字段但没值**。怎么查?mustNot,查出来的数据包括: 不存在字段 和 字段存在但没值

注:它的使用最好单独用不与其他条件混合,要不然不起作用,至于原因我也没查出来

  /**
         * 1、创建请求,指定索引
         */
        SearchRequest request = new SearchRequest();
        request.indices("eslog");
        /**
         * 2、创建请求参数
         */
        SearchSourceBuilder ssb = new SearchSourceBuilder();
        ssb.from(0).size(100);
        /**
         * 3、组装查询条件:这里查出来的数据包括: 不存在字段 和字段存在但没值
         */
        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
        boolQueryBuilder.should(QueryBuilders.boolQuery().mustNot(QueryBuilders.existsQuery("age")));
        boolQueryBuilder.should(QueryBuilders.boolQuery().mustNot(QueryBuilders.existsQuery("salary")));
        /**
         * 4、查询
         */
        ssb.query(boolQueryBuilder);
        request.source(ssb);
        SearchResponse response = null;
        List<Map<String, Object>> list = new ArrayList<>();
        list.clear();
        try {
             response = restHighLevelClient.search(request, ElasticsearchConfig.COMMON_OPTIONS);
             for(SearchHit hit : response.getHits().getHits()){
                 Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                 list.add(sourceAsMap);
             }
        } catch (IOException e) {
            e.printStackTrace();
        }
3.9 数据重复的处理办法

如果是es库里面已经重复,比较难处理。我这里是提供的从源头上去掉数据重复的办法
在es中有个自动生成的_id,每条数据都会自动生成,且不重复!
所以在要插入的数据里面拿出id(这里默认id是不重复的,类似mysql的主键),在插入数据时,用数据中的id替换掉es的自己生的_id,而es的_id是不可重复的,如果遇到重复的数据就插不进去(自动忽略),那么插入的数据就是不重复数据了。
用数据内容替换es的_id的办法:

插入数据时指定es的_id,其值id来源于业务数据中的id

 request.id(id);
  • 2
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
注:下文中的 *** 代表文件名中的组件名称。 # 包含: 中文-英文对照文档:【***-javadoc-API文档-中文(简体)-英语-对照版.zip】 jar包下载地址:【***.jar下载地址(官方地址+国内镜像地址).txt】 Maven依赖:【***.jar Maven依赖信息(可用于项目pom.xml).txt】 Gradle依赖:【***.jar Gradle依赖信息(可用于项目build.gradle).txt】 源代码下载地址:【***-sources.jar下载地址(官方地址+国内镜像地址).txt】 # 本文件关键字: 中文-英文对照文档,中英对照文档,java,jar包,Maven,第方jar包,组件,开源组件,第方组件,Gradle,中文API文档,手册,开发手册,使用手册,参考手册 # 使用方法: 解压 【***.jar中文文档.zip】,再解压其中的 【***-javadoc-API文档-中文(简体)版.zip】,双击 【index.html】 文件,即可用浏览器打开、进行查看。 # 特殊说明: ·本文档为人性化翻译,精心制作,请放心使用。 ·本文档为双语同时展示,一行原文、一行译文,可逐行对照,避免了原文/译文来回切换的麻烦; ·有原文可参照,不再担心翻译偏差误导; ·边学技术、边学英语。 ·只翻译了该翻译的内容,如:注释、说明、描述、用法讲解 等; ·不该翻译的内容保持原样,如:类名、方法名、包名、类型、关键字、代码 等。 # 温馨提示: (1)为了防止解压后路径太长导致浏览器无法打开,推荐在解压时选择“解压到当前文件夹”(放心,自带文件夹,文件不会散落一地); (2)有时,一套Java组件会有多个jar,所以在下载前,请仔细阅读本篇描述,以确保这就是你需要的文件;

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

神雕大侠mu

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值