Elasticsearch从结构到集群一站式学习

阿伟在自律

已于 2023-04-01 04:28:53 修改

阅读量1.6k

点赞数

分类专栏：中间件文章标签： elasticsearch 搜索引擎学习

于 2023-04-01 03:57:03 首次发布

本文链接：https://blog.csdn.net/weixin_54232686/article/details/129891956

版权

中间件专栏收录该内容

6 篇文章 0 订阅

订阅专栏

elasticsearch

elasticsearch结合kibana、Logstash、Beats，也就是elastic stack (ELK)。被广泛应用在日志数据分析、实时监控等领域。
在这里插入图片描述

什么是elasticsearch?

一个开源的分布式搜索引擎，可以用来实现搜索、日志统计、分析系统监控等功能

什么是elastic stack (ELK) ?

是以elasticsearch为核心的技术栈，包括beats、Logstash、kibana、elasticsearch

什么是Lucene?

是Apache的开源搜索引擎类库，提供了搜索引擎的核心API

正向索引跟倒排索引

在这里插入图片描述

什么是文档和词条?

每一条数据就是一个文档
对文档中的内容分词，得到的词语就是词条

什么是正向索引?

基于文档id创建索引。查询词条时必须先找到文档，而后判断是否包含词条

什么是倒排索引?

对文档内容分词，对词条创建索引，并记录词条所在文档的信息。查询时先根据词条查询到文档id，而后获取到文档

ES/Mysql区别

在这里插入图片描述

分词器

在这里插入图片描述

ik分词器

详情可见

在这里插入图片描述

POST /_analyze
{
  "text": ["马化腾是一个人啊,奥力给啊额！"],
  "analyzer": "ik_max_word"
}

pinyin分词器

配置地址：https://github.com/medcl/elasticsearch-analysis-pinyin

POST /_analyze
{
  "text": ["马化腾是一个人啊,奥力给啊额！"],
  "analyzer": "pinyin"
}

自定义分词器

es中分词器的组成包含三部分：

character filters：在tokenizer之前对文本进行处理。例如：删除字符，替代字符
tokenizer：将文本按照一定的规则切割成词条 (term)。例如：keyword，就是不分词；还有ik_smart
tokenizer filter：将tokenizer输出的词条做进一步处理。例如：大小写转换、同义词处理、拼音处理等

在这里插入图片描述

在创建索引库的时候，通过settings来配置自定义的分词器

settings：索引库配置

settins可以指定三部分：

character filter：特殊字符分词器
tokenizer：分词器
filter：拼音分词器

不规定一定要都使用，视情况而定

mapping要指定创建索引库的分词器和搜索分词器

“analyzer”: “myAnalyzer”,
“search_analyzer”: “ik_max_word”

为什么要分开指定？

因为拼音分词器在创建索引库的时候使用，比如下面的狮子，柿子。创建的时候分为：shizi，sz，狮子跟柿子，因为使用了拼音分词器所以狮子跟柿子都有shizi，sz。用户在搜索的时候如果使用了拼音分词器：搜索=shizi，就会根据shizi在索引库里找，找到柿子跟狮子。所以搜索的时候就不能带着拼音分词器，应该使用ik分词器，通过ik分词器去索引库里根据拼音分词器查找

在这里插入图片描述

#自定义分词器
PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "myAnalyzer":{
          "tokenizer":"ik_max_word",
          "filter": "py" //指定拼音分词器的名称
        }
      },
      //拼音分词器名称
      "filter": { 
        "py":{
          "type": "pinyin", //类型
          "keep_full_pinyin": false,//当启用这个选项,如: 刘德华 >[ liu , de , hua ),默认值:真的
          "keep_joined_full_pinyin": true,//当启用此选项时，例如: 刘德华 >[ liudehua ]，默认:false
          "keep_original": true,//当启用此选项时，将保留原始输入，默认值:false
          "limit_first_letter_length": 16,//set first_letter结果的最大长度，默认值:16
          "remove_duplicated_term": true,//当此选项启用时，重复项将被删除以保存索引，例如: de的 > de ，默认:false，注:职位相关查询可能会受到影响
          "none_chinese_pinyin_tokenize" :false //非中国字母分解成单独的拼音词如果拼音,默认值:true,如:liu , de , hua , a , li , ba , ba , 13 , zhuang , han ,注意: keep_none_chinese 和 keep_none_chinese_together 应该启用
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "analyzer": "myAnalyzer",
        "search_analyzer": "ik_max_word"
      }
    }
  }
}

索引库

Mapping属性

在这里插入图片描述

# 创建索引库
PUT /firsttable
{
  "mappings": {
    "properties": {
      "info": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "age": {
        "type": "integer"
      },
      "Weight": {
        "type": "double"
      },
      "isMarried": {
        "type": "boolean"
      },
      "email": {
        "type": "keyword",
        "index": false
      },
      "score": {
        "type": "double"
      },
      "name": {
        "type": "object",
        "properties": {
          "firstName": {
            "type": "keyword"
          },
          "lastName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

# 查询索引库
GET /firsttable

# 修改索引库，不能改只能增加
PUT /firsttable/_mapping 
{
  "properties":{
    "age2":{
      "type": "double"
    }
  }
}

# 删除
DELETE /firsttable

文档

# 新增文档
POST /firsttable/_doc/1
{
  "info": "未婚男性",
  "age": "20",
  "Weight": "21.3",
  "isMarried": false,
  "email": "213@qq.com",
  "score": "21.2",
  "name": {
    "firstName": "张",
    "lastName": "三"
  }
}

#查询文档
GET /firsttable/_doc/1

#删除文档
DELETE /firsttable/_doc/1

#修改文档
#1.全量修改，会删除旧文档，添加新文档
PUT /firsttable/_doc/1
{
  "info": "未婚男性222",
  "age": "20",
  "Weight": "21.3",
  "isMarried": false,
  "email": "213@qq.com",
  "score": "21.2",
  "name": {
    "firstName": "张",
    "lastName": "三"
  }
}

#2.局部修改
POST /firsttable/_update/1
{
  "doc": {
    "info": "未婚男性333"
  }
}

RestClient操作

DSL语句

在这里插入图片描述

#hotel
PUT /hotel
{
  "mappings":{
    "properties":{
      "id":{
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "address":{
        "type": "keyword",
        "index": false,
        "copy_to": "{all}"
      },
      "price":{
        "type": "double"
      },
      "score":{
        "type": "integer"
      },
      "brand":{
        "type": "keyword",
        "copy_to": "{all}"
      },    
      "city":{
        "type": "keyword",
        "copy_to": "{all}"
      },
      "starName":{
        "type": "keyword"
      },      
      "business":{
        "type": "keyword"
      },
      "location":{
        "type": "geo_point"
      },
      "pic":{
        "type": "keyword"
      },
      "all":{
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

引入依赖

    <properties>
        <java.version>1.8</java.version>
        <elasticsearch.version>7.12.1</elasticsearch.version>
    </properties>
<!--        es的javaRestLeveClient依赖-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
        </dependency>

初始化ResthighLevelClient

@SpringBootTest
class HotelDemoApplicationTests {
    private RestHighLevelClient client;
    @Test
    void contextLoads() {
        System.out.println(client);
    }
    @BeforeEach
    void setUp(){
        this.client = new RestHighLevelClient(
                RestClient.builder(
                        HttpHost.create("http://192.168.163.129:9200")));
    }
    @AfterEach
    void clear() throws IOException {
        this.client.close();
    }

}

索引库操作

创建索引库

    @Test
    void contextLoads() throws IOException {
//        1.创建request对象
        CreateIndexRequest request = new CreateIndexRequest("hotel");
//        2.准备dsl语句，MAPPING_HOTEL是String类型的创建hotel的Dsl语句
        request.source(MAPPING_HOTEL,XContentType.JSON);
//        3.发送请求，indices拿到的是操作索引库的所有方法：put del post get
        client.indices().create(request,RequestOptions.DEFAULT);
    }

删除索引库

@Test
public void testDel() throws IOException {
    DeleteIndexRequest hotel = new DeleteIndexRequest("hotel");
    client.indices().delete(hotel,RequestOptions.DEFAULT);
}

判断索引库是否存在

@Test
public void testExists() throws IOException {
    GetIndexRequest hotel = new GetIndexRequest("hotel");
    System.out.println(client.indices().exists(hotel, RequestOptions.DEFAULT));
}

文档操作

新增文档

    @Test
    public void testAddData() throws IOException {
//        从数据库里查出数据
        Hotel hotel = hotelService.getById(61083L);
//        转化成索引库的结构
        HotelDoc hotelDoc = new HotelDoc(hotel);
//        封装Dsl语句，根据索引库名称跟id新增文档
        IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
//        文档数据，JSON数据
        request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
        client.index(request, RequestOptions.DEFAULT);
    }

查询文档

@Test
public void testGet() throws IOException {
    GetRequest request = new GetRequest("hotel").id("61083");
    GetResponse response = client.get(request, RequestOptions.DEFAULT);
    String jsonStr = response.getSourceAsString();
    HotelDoc hotelDoc = JSON.parseObject(jsonStr, HotelDoc.class);
    System.out.println(hotelDoc);
}

删除文档

@Test
public void testDel() throws IOException {
    DeleteRequest request = new DeleteRequest("hotel").id("61083");
    DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
    System.out.println(response.status());
}

修改文档

@Test
public void testUpdate() throws IOException {
    UpdateRequest request = new UpdateRequest("hotel","61083");
    request.doc(
            "score", "18",
                "city", "东莞"
    );
    UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
    System.out.println(response.status());
}

批量新增文档

    @Test
    public void testBulk() throws IOException {
        QueryWrapper<Hotel> wrapper = new QueryWrapper<>();
//        wrapper.last("limit 5");
        List<Hotel> list = hotelService.list(wrapper);
        BulkRequest request = new BulkRequest("hotel");
        for (Hotel item: list){
            HotelDoc hotelDoc = new HotelDoc(item);
            request.add(
                    new IndexRequest("hotel")
                            .id(item.getId().toString())
                            .source(JSON.toJSONString(hotelDoc),XContentType.JSON));
        }
        client.bulk(request,RequestOptions.DEFAULT);
    }

DSL查询

查询所有：查询出所有数据，一般测试用。例如：match_all

全文检索(fulltext)查询:利用分词器对用户输入内容分词，然后去倒排索引库中匹配。例如：

match_query

multi_match_query

精确查询：根据精确词条值查找数据，一般是查找keyword、数值、日期、boolean等类型字段。例如:

ids

range，根据值的范围查询

term，根据词条精确值查询

地理 (geo)查询:：根据经纬度查询。例如:

geo_distance

geo_bounding_box

复合(compound)查询:：复合查询可以将上述各种查询条件组合起来，合并查询条件。例如:

bool

function_score

查询所有

GET /hotel/_search
{
  "explain":true,# 查看分片所在位置
  "query": {
    "查询类型": {
       "查询条件": "条件值"
    }
  }
}
//查询所有
GET /hotel/_search
{
  "query": {
    "match_all": {}
  }
}

全文检索

# match查询
GET /hotel/_search
{
  "query": {
    "match": {
      "all": "上海如家"
    }
  }
}
# multi_match查询，跟match查询是有一点区别，match是匹配一个字段，但是multi_match是拿值去匹配规定的字段，如果match的all刚好是multi_match规定的字段，那这个时候match跟multi_match就是一样的
GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "上海如家",
      "fields": ["brand", "name", "address"]
    }
  }
}

精确查询

# term 精确查询，根据词条精确值查询
GET /hotel/_search
{
  "query": {
    "term": {
      "city": {
        "value": "上海"
      }
    }
  }
}
# range查询，根据值的范围查询
GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 100,
        "lte": 200
      }
    }
  }
}

地理查询

# distance查询，根据坐标距离查询
GET /hotel/_search
{
  "query": {
    "geo_distance": {
      "distance": "3km",
      "location": "31.21, 121.5"
    }
  }
}

# box查询，根据提供的坐标作矩阵查询
GET /hotel/_search
{
  "query": {
    "geo_bounding_box": {
      "location":{
        "top_left": {
          "lat": 31.3,
          "lon": 121.5
        },
        "bottom_right": {
          "lat": 30.3,
          "lon": 121.7
        }
      }
    }
  }
}

复合查询

在这里插入图片描述

# function_score，查询city=上海，定义brand=如家 的酒店权重=10，将查询结果中匹配到的如家酒店的得分*10，其他酒店不变，而显示的时候是根据得分排序的，所以如家酒店的排名就在前面
GET /hotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "city": "上海"
        }
      },
      "functions": [
        {
          "filter": {
            "term": {"brand": "如家"}
          },
          "weight":10 
        }
      ],
      "boost_mode": "multiply"
    }
  }
}

boold查询的逻辑关系：

must：必须匹配的条件，可以理解为 ”与“
should：选择性匹配的条件，可以理解为 ”或“
must_not：必须不匹配的条件，不参与打分，可以理解为 ”非“
filter：必须匹配的条件，不参与打分

# bool查询，查询名字是如家，价格低于400，距离31.21,121.5周围10km以内的酒店
# filter，must_not放在match外面是不参与算分的，只有放在match里面才会参与算分，但是参与算分性能会下降
GET /hotel/_search
{
  "query": {
   "bool": {
     "must": [
       {
         "match": {
           "name": "如家"
         }
       }
     ],
     "must_not": [
       {
         "range": {
           "price": {
             "gt":400 
           }
         }
       }
     ],
     "filter": [
       {
         "geo_distance": {
           "distance": "10km",
           "location": {
             "lat": 31.21,
             "lon": 121.5
           }
         }
       }
     ]
   }
  }
}

排序

一旦开启了排序就不会再打分了

# sort排序查询，查询brand=如家，按照得分降序，得分一样按价格升序
GET /hotel/_search
{
  "query": {
    "match": {
      "brand": "如家"
    }
  },
  "sort": [
    {
      "score": {
        "order": "desc"
      },
      "price": {
        "order": "asc"
      }
    }
  ]
}

# sort查询，查询坐标附近的酒店按照升序排序，显示单位为km
GET /hotel/_search
{
  "query": {
    "match": {
      "brand": "如家"
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 31.240417 ,
          "lon": 121.503134
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

分页

ES默认只返回top10的数据，想要查询到更多数据就需要修改分页参数了。

ES通过修改from，size参数来控制要返回的分页结果

ES受限于倒排索引，每次分页查询都是查出全部数据，然后截取数据，比如查询990-1000的数据，就需要查询出1000条数据，截取出最后10条数据

ES是支持分布式的，为了尽可能多的存储数据肯定会采用分布式ES，而每个分片都会有自己的数据，那么如果使用分页查询990-1000的数据咋办，是不是要每个分片都查询自己的前1000条数据，那如何判断哪些数据是拿来用的？比如10个分片每个分片查询1000条数据，取后10条数据，那也有100条数据，怎么办？实际上ES会将这十个分片的总记录合并起来，即1w条记录数，重新排序1000条数据，取990-1000

# 分页查询
# sort查询
GET /hotel/_search
{
  "query": {
    "match": {
      "brand": "如家"
    }
  },
  "sort": [
    {
      "score": {
        "order": "desc"
      },
      "price": {
        "order": "asc"
      }
    }
  ],
  "from": 0,
  "size": 2
}

深度分页问题

在这里插入图片描述

深度分页解决方案

在这里插入图片描述

建议使用search after：

优点：没有查询上限（单词查询的size不超过1w）
缺点：只能向后主页查询，不支持随机翻页
场景：没有随机翻页需求的搜索，例如：手机向下滚动翻页

高亮显示

这里查询的是all，而all是由字段copy_to来的，但是fields中高亮字段的是name，ES默认采用的是查询字段跟高亮字段一致，可以使用require_field_match修改配置

# 高亮查询
GET /hotel/_search
{
  "query": {
    "match": {
      "all": "上海如家"
    }
  },
  "highlight": {
    "fields": {
      "name": {
        "require_field_match": "false"
      }
    }
  }
}

RestClient查询操作

查询所有matchAll

    @Test
    public void testMatchALl() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        request.source().query(QueryBuilders.matchAllQuery());
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
//        解析响应结果文档，获取hits
        SearchHits searchHits = response.getHits();
//        获取记录总条数
        long value = searchHits.getTotalHits().value;
        System.err.println("<=====共有条"+value+"数据====>");
//        获取hits里的文档数组
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits) {
            String jsonStr = hit.getSourceAsString();
            HotelDoc hotelDoc = JSONObject.parseObject(jsonStr, HotelDoc.class);
            System.err.println("hotelDoc---> " + hotelDoc);
        }
    }

    //查询所有
    GET /hotel/_search
    {
      "query": {
        "match_all": {}
      }
    }

全文检索

    /**
     * 全文检索match
     * @throws IOException
     */
    @Test
    public void testMatch() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        request.source().query(QueryBuilders.matchQuery("all","上海如家"));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    }

   GET /hotel/_search
   {
     "query": {
       "match": {
         "all": "上海如家"
       }
     }
   }

/**
 * 全文检索multiMatch
 * @throws IOException
 */
@Test
public void testMultiMatch() throws IOException {
    SearchRequest request = new SearchRequest("hotel");
    request.source().query(QueryBuilders.multiMatchQuery("上海如家","brand","name","address"));
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}

GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "上海如家",
      "fields": ["brand", "name", "address"]
    }
  }
}

精确查询

/**
 * 精确查询term
 * @throws IOException
 */
@Test
public void testTerm() throws IOException {
    SearchRequest request = new SearchRequest("hotel");
    request.source().query(QueryBuilders.termQuery("city","上海"));
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}

GET /hotel/_search
{
  "query": {
    "term": {
      "city": {
        "value": "上海"
      }
    }
  }
}

/**
 * 范围查询range
 * @throws IOException
 */
@Test
public void testRange() throws IOException {
    SearchRequest request = new SearchRequest("hotel");
    request.source().query(QueryBuilders.rangeQuery("price").gte(100).lte(200));
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}

GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 100,
        "lte": 200
      }
    }
  }
}

地理查询

/**
 * 地理查询Distance
 * @throws IOException
 */
@Test
public void testDistance() throws IOException {
    SearchRequest request = new SearchRequest("hotel");
    request.source().query(QueryBuilders.geoDistanceQuery("location").distance("3km").point(31.21,121.5));
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
# distance查询，根据坐标距离查询
GET /hotel/_search
{
  "query": {
    "geo_distance": {
      "distance": "3km",
      "location": "31.21, 121.5"
    }
  }
}

复合查询

    /**
     * 组合查询bool
     * @throws IOException
     */
    @Test
    public void testBool() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
//        准备DSL
//        准备BoolQueryBuilder
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
//        添加must
        boolQuery.must(QueryBuilders.matchQuery("name","如家"));
//        添加mustNot
        boolQuery.mustNot(QueryBuilders.rangeQuery("price").gt("400"));
//        添加filter
        boolQuery.filter(QueryBuilders.geoDistanceQuery("location").distance("10km").point( 31.21,121.5));

        request.source().query(boolQuery);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    }

GET /hotel/_search
{
  "query": {
   "bool": {
     "must": [
       {
         "match": {
           "name": "如家"
         }
       }
     ],
     "must_not": [
       {
         "range": {
           "price": {
             "gt":400 
           }
         }
       }
     ],
     "filter": [
       {
         "geo_distance": {
           "distance": "10km",
           "location": {
             "lat": 31.21,
             "lon": 121.5
           }
         }
       }
     ]
   }
  }
}

    /**
     * 复合查询FunctionScore
     * @throws IOException
     */
    @Test
    public void testFunctionScore() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//        创建match语句
        QueryBuilder queryBuilder = QueryBuilders.matchQuery("city", "上海");
//        创建function语句
        FunctionScoreQueryBuilder.FilterFunctionBuilder[] filterFunctionBuilders = {
                new FunctionScoreQueryBuilder.FilterFunctionBuilder(
                        QueryBuilders.termQuery("brand", "如家"),
                        new WeightBuilder().setWeight(10)
                )
        };
//        把function跟query放到一个functionScoreQuery里
        FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders.functionScoreQuery(queryBuilder, filterFunctionBuilders);
        searchSourceBuilder.query(functionScoreQueryBuilder);
        request.source(searchSourceBuilder);

        SearchResponse response = client.search(request,RequestOptions.DEFAULT);
    }

GET /hotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "city": "上海"
        }
      },
      "functions": [
        {
          "filter": {
            "term": {"brand": "如家"}
          },
          "weight":10 
        }
      ],
      "boost_mode": "multiply"
    }
  }
}

排序

    /**
     * 排序sort和分页
     * @throws IOException
     */
    @Test
    public void testSort() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        MatchQueryBuilder query = QueryBuilders.matchQuery("brand", "如家");
//        两个排序
        FieldSortBuilder score = SortBuilders.fieldSort("score").order(SortOrder.DESC);
        FieldSortBuilder price = SortBuilders.fieldSort("price").order(SortOrder.ASC);
//        把两个排序放到一个sort里
        List<SortBuilder<?>> builders = new ArrayList<>();
        builders.add(score);
        builders.add(price);
        request.source().sort(builders);
        request.source().query(query);
        request.source().from(0);
        request.source().size(2);
        SearchResponse response = client.search(request,RequestOptions.DEFAULT);
        SearchHits searchHits = response.getHits();
        System.out.println(searchHits.getTotalHits());
    }

GET /hotel/_search
{
  "query": {
    "match": {
      "brand": "如家"
    }
  },
  "sort": [
    {
      "score": {
        "order": "desc"
      },
      "price": {
        "order": "asc"
      }
    }
  ],
  "from": 0,
  "size": 2
}

高亮

/**
 * 高亮
 * @throws IOException
 */
@Test
public void testHighLight() throws IOException {
    SearchRequest request = new SearchRequest("hotel");
    MatchQueryBuilder query = QueryBuilders.matchQuery("all", "如家");
    HighlightBuilder highlightBuilder = new HighlightBuilder().field("name").requireFieldMatch(false);
    request.source().highlighter(highlightBuilder);
    request.source().query(query);
    SearchResponse response = client.search(request,RequestOptions.DEFAULT);
    SearchHits searchHits = response.getHits();
    System.out.println(searchHits.getTotalHits());
    SearchHit[] hits = searchHits.getHits();
    for (SearchHit hit : hits) {
        String jsonStr = hit.getSourceAsString();
        HotelDoc hotelDoc = JSONObject.parseObject(jsonStr, HotelDoc.class);
        Map<String, HighlightField> highlightFields = hit.getHighlightFields();
        HighlightField highlightField = highlightFields.get("name");
        if (highlightField!=null){
            String name = highlightField.getFragments()[0].string();
            hotelDoc.setName(name);
        }
        System.out.println(hotelDoc);
    }
}

GET /hotel/_search
{
  "query": {
    "match": {
      "all": "上海如家"
    }
  },
  "highlight": {
    "fields": {
      "name": {
        "require_field_match": "false"
      }
    }
  }
}

数据聚合

聚合可以实现对文档数据的统计，分析，运算，常见聚合有：

桶（Bucket）聚合：用来对文档做分组
TermAggregation：按照文档字段子分组
Date Histogram：按照日期接替分组，例如：一周一组，一月一组

度量（Metric）聚合：计算值
AVG：求平均值
Max：求最大值
Min：求最小值
Stats：同时求：max，min，avg，sum等

管道（pipeline）聚合：其他聚合的结果为基础做聚合

参加聚合的字段类型必须是不能分词的：keyword，数值，日志，布尔

Bucket桶

默认情况下，Bucket聚合会统计Bucket内的文档数量，记为：_count，并且按照_count降序排序

默认情况下，Bucket聚合是对索引库的所有文档做聚合，可以限定要聚合的文档范围，只要添加query条件即可

聚合三要素：

聚合名称
聚合类型
聚合字段

聚合配置属性：

size：聚合结果数量
order：聚合结果排序方式
field：聚合字段

# bucket聚合
GET /hotel/_search
{
#限制聚合文档的范围
  "query": {
    "range": {
      "price": {
        "gte": 200,
        "lte": 1000
      }
    }
  }, 
  "size": 1,
  "aggs": {
    "demo": {
      "terms": {
        "field": "brand",
        # 修改排序方式
        "order": {
          "_count": "asc"
        }, 
        "size": 20
      }
    }
  }
}

Metrics聚合

# Metrics聚合
GET /hotel/_search
{
  "size": 0,
  "aggs": {
  #主聚合，聚合名称：demo，聚合类型是terms，聚合字段是brand，按照子聚合metricsAgg.avg的结果降序排序，显示20个结果
    "demo": {
      "terms": {
        "field": "brand",
        "order": {
          "metricsAgg.avg": "desc"
        }
        "size": 20
      },
      #子聚合，在上面的聚合结果基础上，继续聚合，聚合名称是metricsAgg，聚合类型是stats，对score字段聚合
      #求每个品牌的得分情况，min/max/avg/sum
      "aggs": {
        "metricsAgg": {
          "stats": {
            "field": "score"
          }
        }
      }
    }
  }
}

自动补全

拼音分词

elasticsearch提供了ompletion Suggester查询来实现自动补全功能。这个查询会匹配以用户输入内容开头的词条并返回。为了提高补全查询的效率，对于文档中字段的类型有一些约束：

参与补全查询的字段必须是completion类型

#创建索引库
PUT /test2
{
  "mappings":{
    "properties": {
      "title": {
        "type": "completion"
      }
    }
  }
}
POST /test2/_doc/1
{
  "title":["Sony","WH1000"],
  "id":1
}
POST /test2/_doc/2
{
  "title":["SKny","PH1000"],
  "id":1
}
POST /test2/_doc/3
{
  "title":["Nony","sH1000"],
  "id":1
}
#自动补全
GET /test2/_search
{
  "suggest": {
    "mySuggest": {
      "text": "so",
      "completion": {
        "field": "title",
        "skip_duplicates": true,
        "size":10
      }
    }
  }
}

#hotel
PUT /hotel
{
  "mappings":{
    "properties":{
      "id":{
        "type": "keyword"
      },
      "address":{
        "type": "keyword",
        "copy_to": "all"
      },
      "price":{
        "type": "double"
      },
      "score":{
        "type": "integer"
      },
      "brand":{
        "type": "keyword",
        "copy_to": "all"
      },    
      "city":{
        "type": "keyword",
        "copy_to": "all"
      },
      "starName":{
        "type": "keyword"
      },      
      "business":{
        "type": "keyword"
      },
      "location":{
        "type": "geo_point"
      },
      "pic":{
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "text_analyzere",
        "search_analyzer": "ik_smart",
        "copy_to": "all"
      },
      #all是搜索字段，添加文档的时候采用text_analyzere，最大粒度分词和拼音分词，搜索的时候就采用最大粒度搜索，根据用户的输入逐个拆分
      "all":{
        "type": "text",
        "analyzer": "text_analyzere",
        "search_analyzer": "ik_max_word"
      },
      #额外添加的字段，用来专门处理自动补全的，类型是completion，在新增文档的时候，从数据库中查询的数据，就已经把需要的数据放到suggestion这个字段里了，是个数组
      "suggestion":{
        "type": "completion",
        "analyzer": "completion_analyzere"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "text_analyzere":{
          "tokenizer":"ik_max_word",
          "filter":"py"
        },
        "completion_analyzere":{
          "tokenizer":"keyword",
          "filter":"py"
        }
      },
      "filter": {
        "py":{
          "type": "pinyin", 
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize" :false 
        }
      }
    }
  }
}

RestClient操作

/**
 * 自动补全查询
 */
@Test
public void testSuggestion() {
    try {
        SearchRequest request = new SearchRequest("hotel");
        request.source().suggest(new SuggestBuilder()
                .addSuggestion("mySuggestion",
                        SuggestBuilders
                                .completionSuggestion("suggestion")
                                .prefix("s")
                                .skipDuplicates(true)
                                .size(10)));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        CompletionSuggestion mySuggestion = response.getSuggest().getSuggestion("mySuggestion");
        List<CompletionSuggestion.Entry.Option> list = mySuggestion.getOptions();
        for (CompletionSuggestion.Entry.Option option : list) {
            System.err.println(option.getText().string());
        }
    } catch (IOException e) {
        System.out.println(e);
    }
}

@Data
@NoArgsConstructor
public class HotelDoc {
	省略

    private Object distance;

    private Boolean isAD;
    private List<String> suggestion;
    public HotelDoc(Hotel hotel) {
	省略
        if (this.business.contains("、")){
            String[] arr = this.business.split("、");
            this.suggestion = new ArrayList<>();
            this.suggestion.add(this.brand);
            Collections.addAll(this.suggestion,arr);
        }else if (this.business.contains("/")){
            String[] arr = this.business.split("/");
            this.suggestion = new ArrayList<>();
            this.suggestion.add(this.brand);
            Collections.addAll(this.suggestion,arr);
        }else {
            this.suggestion = Arrays.asList(this.brand,this.business);
        }
    }
}

数据同步

elasticsearch中的酒店数据来自于mysql数据库，因此mysql数据发生改变时，elasticsearch也必须跟着改变，这个就是elasticsearch与mysql之间的数据同步

异步通知

在这里插入图片描述

监听binlog

在这里插入图片描述

同步调用：

优点:实现简单，粗暴
缺点:业务耦合度高

异步通知：

优点：低耦合，实现难度一般
缺点：依赖mg的可靠性

监听binlog：

优点：完全解除服务间耦合
缺点：开启binlog增加数据库负担、实现复杂度高

ES集群

在这里插入图片描述

ES集群脑裂

在这里插入图片描述

master eligible节点的作用是什么?

参与集群选主
主节点可以管理集群状态、管理分片信息、处理创建和删除索引库的请求

data节点的作用是什么?

数据的CRUD

coordinator节点的作用是什么?

路由请求到其它节点
合并查询到的结果，返回给用户

ES集群的分布式存储

在这里插入图片描述

ES集群的分布式查询

在这里插入图片描述

分布式新增如何确定分片?

coordinating node根据id做hash运算，得到结果对shard数量取余，余数就是对应的分片

分布式查询

分散阶段：coordinating node将查询请求分发给不同分片
收集阶段：将查询结果汇总到coordinating node ，整理并返回给用户