SpringCloud-ES学习 第七天

数据聚合

  • 聚合的种类
  • DSL实现聚合
  • RestAPI实现聚合

聚合的分类 

聚合(aggregations)可以实现对文档数据的统计、分析、运算。聚合常见的有三类:

  • 桶(bucket)聚合:用来对文档做分组
    • TermAggregation:按照文档字段值分组
    • Date Histogram:按照日期划分,列如一周为一组,或者一月为一组
  • 度量(Metric)聚合:用以计算一些值,比如:最大值、最小值、平均值等
    • Avg:求平均值
    • Max:求最大值
    • Min:求最小值
    • Stats:同时求max、min、avg、sum等
  • 管道(pipeline)聚合:其他聚合的结果为基础做聚合

总结:

什么是聚合?

  • 聚合是对文档数据的统计、分析、计算

聚合的常见种类有哪些?

  • Bucket:对文档数据分组,并统计每组数量
  • Metric:对文档数据做计算,列如avg
  • Pipeline:基于其他聚合结果再做聚合

参与聚合的字段类型必须是:

  • keyword
  • 数值
  • 日期
  • 布尔

DSL实现Bucket聚合

现在,我们要统计所有数据中的酒店品牌有几种,此时可以根据酒店品牌的名称做聚合。类型为term类型,DSL示例:

GET /hotel/_search
{
  "size":0, # 设置size为0,结果中不包含文档,只包含聚合结果,就是没有具体数据
  "aggs":{ # 定义聚合
    "brandAgg":{ # 给聚合起个名字
      "terms":{ # 聚合的类型,按照品牌值的聚合,所以选择term
        "field":"brand", # 参与聚合的字段
        "size": 20 # 希望获取的聚合的结果数量
      }
  }
}

结果示例:

Bucket聚合- 自定义排序规则

添加order字段

GET /hotel/_search
{
  "size": 0,
  "aggs": {
    "brandAggs": {
      "terms": {
        "field": "brand",
        "size": 20,
        "order": {
          "_count": "asc" #根据count字段升序排序
        }
      }
    }
  }
}

Bucket聚合-限定聚合范围

默认情况下,bucket绝活是对索引库所有文档做聚合,我们可以限定要聚合的文档范围,只要添加query条件即可:

GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 10,
        "lte": 200
      }
    }
  },
  "size": 0, 
  "aggs": {
    "brandAggs": {
      "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

总结:

aggs代表聚合,与query同级,此时query的作用是?

  • 限定聚合查询文档范围

聚合必须的三要素:

  • 聚合的名称
  • 聚合的类型
  • 聚合的字段

聚合可配置属性有:

  • size:指定聚合结果数量
  • order:指定聚合结果排序方式
  • field:指定聚合的字段

题外知识:

实现多条件复合查询:

通过bool连接多条条件。

# 多条件查询
GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "price": {
              "gte": 10,
              "lte": 200
            }
          }
        },{
          "term": {
            "brand":{
              "value": "7天酒店"
            }
          }
        }
      ]
    }
  }
}

DSl实现Metrics聚合

聚合查询按照品牌分组的酒店的max,min,avg,sum分数

GET /hotel/_search
{
  "size": 0,
  "aggs": {
    "brandAggs": {
      "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "scoreAggs.avg": "desc"
        }
      },
      "aggs": {
        "scoreAggs": {
          "stats": {
            "field": "score"
          }
        }
      }
    }
  }
}

 例子:数据聚合带过滤条件的数据聚合

过滤条件

    private SearchRequest extracted(RequestParams requestParams, SearchRequest searchRequest) {
        //2.1构建query
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
        if (StringUtils.hasText(requestParams.getKey())){
            boolQuery.must(QueryBuilders.matchQuery("all", requestParams.getKey()));
        }else {
            boolQuery.must(QueryBuilders.matchAllQuery());
        }
        //keyword字符串过滤 品牌
        if (StringUtils.hasText(requestParams.getBrand())){
            boolQuery.filter(QueryBuilders.termQuery("brand", requestParams.getBrand()));
        }
        //keyword字符串过滤 城市
        if (StringUtils.hasText(requestParams.getCity())){
            boolQuery.filter(QueryBuilders.termQuery("city", requestParams.getCity()));
        }
        //keyword字符串过滤 星级
        if (StringUtils.hasText(requestParams.getStarName())) {
            boolQuery.filter(QueryBuilders.termQuery("starName", requestParams.getStarName()));
        }
        //range价格过滤 价格 gte是>= lte是<=
        if (requestParams.getMinPrice() != null && requestParams.getMaxPrice() != null) {
            boolQuery.filter(QueryBuilders.rangeQuery("price").gte(requestParams.getMinPrice()).lte(requestParams.getMaxPrice()));
        }

        //算分控制
        FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders.functionScoreQuery(
                //原始查询
                boolQuery,
                //function score数组
                new FunctionScoreQueryBuilder.FilterFunctionBuilder[]{
                        //一个具体的functionscore数组
                        new FunctionScoreQueryBuilder.FilterFunctionBuilder(
                                //过滤条件
                                QueryBuilders.termQuery("isAD", true),
                                //算分函数
                                ScoreFunctionBuilders.weightFactorFunction(100)
                        )
                });


        //2.2分页
        final int page = requestParams.getPage();
        final int size = requestParams.getSize();
        searchRequest.source().query(functionScoreQueryBuilder).from((page-1)*size).size(size);

        //2.3排序
        if (StringUtils.hasText(requestParams.getLocation())){
            final String location = requestParams.getLocation();
            searchRequest.source().sort(SortBuilders.
                    geoDistanceSort("location", new GeoPoint(location))
                    .order(SortOrder.ASC)
                    .unit(DistanceUnit.KILOMETERS)
            );
        }

        return searchRequest;
    }

数据聚合和解析结果

    @Override
    public Map<String, List<String>> filters(RequestParams params) {
        try {

            //准备reuquest
            SearchRequest searchRequest = new SearchRequest("hotel");
            //准备DSl
            //query
            extracted(params,searchRequest);

            buildRequest(searchRequest);
            SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
            List<String> list = getListByName(search,"brandAgg");
            List<String> list1 = getListByName(search,"cityAgg");
            List<String> list2 = getListByName(search,"starAgg");
            return Map.of(
                    "brand",list,
                    "city",list1,
                    "starName",list2
            );
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private List<String> getListByName(SearchResponse search,String key) {
        Aggregations aggregations = search.getAggregations();
        Terms terms =  aggregations.get(key);
        List<? extends Terms.Bucket> buckets = terms.getBuckets();
        return buckets.stream().map(MultiBucketsAggregation.Bucket::getKeyAsString).collect(Collectors.toList());
    }

自动补全

  • 拼音分词器
  • 自定义分词器
  • 自动补全查询
  • 实现酒店搜索框自动补全

 拼音分词器安装

下载连接1、微服务开发框架SpringCloud+RabbitMQ+Docker+Redis+搜索+分布式微服务全技术栈课程_免费高速下载|百度网盘-分享无限制 (baidu.com)https://pan.baidu.com/s/169SFtYEvel44hRJhmFTRTQ#list/path=%2Fsharelink3232509500-496165211763170%2F1%E3%80%81%E5%BE%AE%E6%9C%8D%E5%8A%A1%E5%BC%80%E5%8F%91%E6%A1%86%E6%9E%B6SpringCloud%2BRabbitMQ%2BDocker%2BRedis%2B%E6%90%9C%E7%B4%A2%2B%E5%88%86%E5%B8%83%E5%BC%8F%E5%BE%AE%E6%9C%8D%E5%8A%A1%E5%85%A8%E6%8A%80%E6%9C%AF%E6%A0%88%E8%AF%BE%E7%A8%8B%2F%E5%AE%9E%E7%94%A8%E7%AF%87%2F%E5%AD%A6%E4%B9%A0%E8%B5%84%E6%96%99%2Fday07-Elasticsearch03%2F%E8%B5%84%E6%96%99&parentPath=%2Fsharelink3232509500-496165211763170

解压缩之后拖放到Es的挂在目录中,然后重启es

查询容器挂载目录

docker inspect 容器id | grep Mounts -A 20 

自定义分词器

 elasticsearch分词器(analyzer)的组成包含三部分:

  • character filters: 在tokenizer之前对文本进行处理。列如删除字符、替换字符
  • tokenizer:将文本呢按照一定的规则切割成词条(term)。列如keyword,就是部分此;还有ik_smart。
  • tokenizer filter:将tokenizer输出的词条做进一步处理。列如大小写转换、同义词处理、拼英处理等

创建索引库的时候使用拼英分词器,搜索的时候需要注意如果再用拼英分词器就会搜索同音查询不对,所以搜索的时候换一个用ik_max_word

# 自定义拼音分词器
PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": { 
        "my_analyzer": { 
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
        "py": { 
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}


POST /test/_doc/1
{
  "id":1,
  "name":"私自"
}

POST /test/_doc/2
{
  "id":2,
  "name":"四字"
}

GET /test/_search
{
  "query": {
    "match": {
      "name": "调入私自"
    }
  }
}

查询结果就会出错,因为查询也用的拼英分词器

 搜索的时候应该用ik_smart

在定义索引库的时候

# 自定义拼音分词器
PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": { 
        "my_analyzer": { 
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
        "py": { 
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

DELETE /test

POST /test/_doc/1
{
  "id":1,
  "name":"私自"
}

POST /test/_doc/2
{
  "id":2,
  "name":"四字"
}

GET /test/_search
{
  "query": {
    "match": {
      "name": "调入私自"
    }
  }
}

查询结果

 

自动补全

completion suggester 查询

elasticsearch提供了Completion Suggester查询来实现自动补全功能。这个查询会匹配以用户输入内容开头的词条并返回。为了提高补全查询的效率,对于文档中字段的类型有一些约束:

新建一个索引库,添加索引文档:

PUT comtest
{
  "mappings": {
    "properties": {
      "title":{
        "type": "completion"
      }
    }
  }
}


POST comtest/_doc
{
  "title":["Sony","WH-1000XM3"]
}
POST comtest/_doc
{
  "title":["SK-II","PITERA"]
}
POST comtest/_doc
{
  "title":["Nintendo","switch"]
}

索引库查询:注意field要全部小写

#  自动补全查询
GET comtest/_search
{
  "suggest": {
    "testcom": {
      "text": "s", #关键字
      "completion": {
        "FIELD": "title", # 补全字段
        "skip_duplicates":true, # 跳过重复的
        "size":10 # 获取前10条结果
      }
    }
  }
}


GET comtest/_search
{
  "suggest": {
    "titlesuggest": {
      "text": "s",
      "completion": {
        "field": "title",
        "skip_duplicates": true,
        "size": 10
      }
    }
    
  }
}

酒店数据自动补全 例子

实现思路如下:

1.修改hotel索引库结构,设置自定义拼音分词器

2.修改索引库的name、all字段,使用自定义分词器

3.索引库添加一个新字段suggestion,类型为completion类型,使用自定义分词器

4.给HotelDoc类添加suggestion字段,内容包含brand、business

5.重新导入数据到hotel库

1.修改hotel数据库索引结构

2.修改索引库的name、all字段,使用自定义分词器

3.索引库添加一个新字段suggestion,类型为completion类型,使用自定义分词器

PUT /hotel
{
  "settings": {
    "analysis": { #自定义分词器
      "analyzer": { 
        "text_anlyzer":{
        "tokenizer": "ik_max_word",
        "filter": "py"
      },
      "completion_analyzer":{
        "tokenizer": "ik_max_word",
        "filter":"py"
      }
    }, 
     "filter":{
        "py":{ #拼英分词器过滤器
          "type":"pinyin",
          "keep_full_pinyin":false,
          "keep_joined_full_pinyin":true,
          "keep_original":true,
          "limit_first_letter_length":16,
          "remove_duplicated_term":true,
          "none_chinese_pinyin_tokenize":false
        }
      }  
    }
  },
  "mappings": {
    "properties": {
      "id":{
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart",
        "copy_to": "all"
      },
      "address":{
        "type": "keyword",
        "index": false
      },
      "price":{
        "type": "integer"
      },
      "score":{
        "type": "integer"
      },
      "brand":{
        "type": "keyword",
        "copy_to": "all"
      },
      "city":{
        "type": "keyword"
      },
      "starName":{
        "type": "keyword"
      },
      "business":{
        "type": "keyword",
        "copy_to": "all"
      },
      "location":{
        "type": "geo_point"
      },
      "pic":{
        "type": "keyword",
        "index": false
      },
      "all":{
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart"
      },
      "suggestion":{# 搜索补全
          "type": "completion",
          "analyzer": "completion_analyzer"
      }
    }
  }
  
}

4.给HotelDoc类添加suggestion字段,内容包含brand、business

5.重新导入数据到hotel库

@Data
@NoArgsConstructor
public class HotelDoc {
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String location;
    private String pic;
    private Object distance;
    private String isAD;

    private List<String> suggestion;

    public HotelDoc(Hotel hotel) {
        this.id = hotel.getId();
        this.name = hotel.getName();
        this.address = hotel.getAddress();
        this.price = hotel.getPrice();
        this.score = hotel.getScore();
        this.brand = hotel.getBrand();
        this.city = hotel.getCity();
        this.starName = hotel.getStarName();
        this.business = hotel.getBusiness();
        this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
        this.pic = hotel.getPic();
        if (this.business.contains("/")){
            String[] split = this.business.split("/");
            this.suggestion = new ArrayList<>();
            this.suggestion.add(this.brand);
            Collections.addAll(this.suggestion,split);
        }else {
            this.suggestion = Arrays.asList(this.brand,this.business);

        }
    }
}

RestAPI实现自动补全

先看请求参数构造的API:

 先创建一个request对象,然后创建dsl搜索对象,client发送请求

/**
     * 实现字段补充查询
     */
    @Test
    void testSuggest() throws IOException {
        SearchRequest searchRequest = new SearchRequest("hotel");
        searchRequest.source().suggest(new SuggestBuilder().addSuggestion(
                "mysuggestion",
                SuggestBuilders.completionSuggestion("suggestion")
                        .prefix("h")
                        .skipDuplicates(true)
                        .size(10)
        ));
        SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
        System.out.println(search);
    }

结果解析:

RestAPI实现

 /**
     * 实现字段补充查询
     */
    @Test
    void testSuggest() throws IOException {
        SearchRequest searchRequest = new SearchRequest("hotel");
        searchRequest.source().suggest(new SuggestBuilder().addSuggestion(
                "mysuggestion",
                SuggestBuilders.completionSuggestion("suggestion")
                        .prefix("h")
                        .skipDuplicates(true)
                        .size(10)
        ));
        SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

        //  4.处理结果
        Suggest suggest = response.getSuggest();
        //  4.1根据名称获取补全结果
        CompletionSuggestion suggestion = suggest.getSuggestion("mysuggestion");
        //  4.2获取options并遍历
        for (Suggest.Suggestion.Entry.Option completionSuggestion : suggestion.getOptions()){
            //  4.3获取一个option中的text,也就是补全词条
            System.out.println(completionSuggestion.getText().toString());
        }
    }

 

实现酒店搜索页面输入框的自动补全

查看前端页面,可以发现当我们在输入框键入时,前端会发起ajax请求

在服务端编写接口,接受该请求,返回补全结果的集合,类型为List<String>

数据同步

数据同步问题分析

elasticsearch中的酒店数据来自于mysql数据库,因此mysql数据发生改变时,elasticsearch也必须跟着改变,这个就是elasticsearch与mysql之间的数据同步

 方案一:同步调用

 方案二:异步通知

 方案三:监听binlog

 方式一:同步调用

  • 优点:实现简单,粗暴
  • 缺点:业务耦合度高

方式二:异步通知

  • 低耦合,实现难度一般
  • 依赖mq的可靠性

方式二:监听binlog

  • 优点:完全接触服务间耦合
  • 缺点:开启binlog增加数据库负担,实现复杂度高

 案例:利用MQ实现mysql与elasticsearch数据同步

利用课前资料提供的hotel-admin项目作为酒店管理的微服务。当酒店数据发生增删改时候,要求对elasticsearch中数据也要完成相同操作。

 步骤:

  • 导入课前资料提供的hotel-admin项目,启动并测试酒店数据的CRUD
  • 声明exchange、queue、Routingkey
  • 在hotel-admin中的增、删、改业务中完成消息发送
  • 在hotel-demo中完成消息监听,并更新elasticsearch中数据
  • 启动并测试数据同步功能

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值