分布式搜索引擎elasticsearch（一）

坚持学习的博客

已于 2023-10-06 22:41:18 修改

阅读量104

点赞数

文章标签：搜索引擎分布式 elasticsearch

于 2023-10-03 19:02:41 首次发布

本文链接：https://blog.csdn.net/weixin_59752672/article/details/133498063

版权

一、背景

1、什么是elasticsearch？

elasticsearch是一款强大的搜索引擎，可以帮我们在海量的数据中找到需要的内容。

elasticSearch结合kibana，Logstash，Beats。组成了elastic stack（ELK），被广泛应用于日志分析，实时监控等领域。

其中，elasticsearch是elastic stack 的核心。负责存储，搜索，分析数据。

elastcisearch的优势就是：

1）支持分布式，可以水平拓展。

2）提供了Restful接口，可以被任何语言来调用。

2、elasticsearch的使用方式

1）正排索引：

传统的数据库，mysql采取的是正向索引，来根据条件查找得到对应的数据。但是当我们查询某个中文是字段的时候，往往用到的都是like %，并且这个字段往往是不可以多个字段拼接的。例如上海是个城市字段，如家酒店是个名称字段。我们搜索的是时候，往往只是采用其后中一个字段进行模糊查询。而不能进行多个字段拼接查询。并且采取模糊搜索的话，往往会导致数据库的索引失效，降低查询效率。

2）倒排索引：

倒排索引两部分组成：

文档：每条数据就是一个文档。

词条：文档按照语义分成的词语

举例："小米" 两个字进行搜索。倒排索引也是根据正排索引将涉及到的有关小米的词语会及逆行一个汇总。其中小米就代表的是一个词条。这个词条的 id就是涉及到的正排索引的字段id的集合。比如词条小米的id就是------1，3，4

倒排索引中包含两个内容：

词条字典：记录所用的词条，以及词条与倒排列表（Posting List）之间的关系，会给词条创建索引，提高插入和查询的效率。

倒排列表：记录文档所在的文档id，词条出现频率，词条在文档中的位置等信息。

文档id：用于快速获取文档。

词条频率：文档在词条出现的次数，用于评分。

3、总结

1）什么是文档词条？‘

类似于 mysql的表。

每一条数据就是一个文档。对文档中的内容分词，得到的词语就是词条。

2）什么是正向索引？

基于文档id创建索引。查询词条时必须先找到文档，然后判断是否包含词条。

3）什么是倒排索引？

对文档的内容进行分词，对词条创建索引，并记录词条所在的文档的信息。查询时先根据词条查询得到文档，然后将文档根据分词频率（得分）按照先后后续展示出来。

二、elasticsearch的组成元素

1、文档

elasticsearch是面向文档存储的，可以看成是数据库中的一条数据，一个订单信息。文档数据会被序列化为json格式后存储到elasticsearch里面。

文档就相当于是我mysql的数据库表里面的一条数据。

2、索引（Index）

索引：相同类型的文档集合。

映射（mapping）：索引中文档的字段约束信息，类似表的结构约束

也就是类似于类的各个属性组成。

索引就相当于

3、安装elastic-----使用docker

1、查找 elasticsearch 的 7.12.1 的版本

docker 搜索es的命令------docker search elasticsearch

2、下载镜像到本地

docker pull elasticsearch 版本号

3、由于elasticsearch和kibana一起使用，kibana是elastic的可视化界面。所以而这要建立一个互联网

docker network create es-net

运行docker命令，部署单点es，将es镜像打包为容器并且启动：

docker run -d \
--name es \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "discovery.type=single-node" \
-v es-data:/usr/share/elasticsearch/data \
-v es-plugins:/usr/share/elasticsearch/plugins \
--privileged \
--network es-net \
-p 9200:9200 \
-p 9300:9300 \
elasticsearch:7.12.1

-e "cluster.name=es-docker-cluster"：设置集群名称

-e "http.host=0.0.0.0"：监听的地址，可以外网访问

-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"：内存大小

-e "discovery.type=single-node"：非集群模式

-v es-data:/usr/share/elasticsearch/data：挂载逻辑卷，绑定es的数据目录

-v es-logs:/usr/share/elasticsearch/logs：挂载逻辑卷，绑定es的日志目录

-v es-plugins:/usr/share/elasticsearch/plugins：挂载逻辑卷，绑定es的插件目录

--privileged：授予逻辑卷访问权

--network es-net ：加入一个名为es-net的网络中

-p 9200:9200：端口映射配置

在浏览器中输入：http://192.168.150.101:9200 即可看到elasticsearch的响应结果。

4、下载kibana的7.12.1的镜像版本(kibana是elasticsearch的可视化界面)

docker pull kibana 版本号

--network es-net ：加入一个名为es-net的网络中，与elasticsearch在同一个网络中

-e ELASTICSEARCH_HOSTS=http://es:9200"：设置elasticsearch的地址，因为kibana已经与elasticsearch在一个网络，因此可以用容器名直接访问elasticsearch

-p 5601:5601：端口映射配置

5、将 kibana的镜像打包为容器，并进行启动

docker run -id --name=my_elasticsearch

docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601 \
kibana:7.4.2

--network es-net ：加入一个名为es-net的网络中，与elasticsearch在同一个网络中

-e ELASTICSEARCH_HOSTS=http://es:9200"：设置elasticsearch的地址，因为kibana已经与elasticsearch在一个网络，因此可以用容器名直接访问elasticsearch

-p 5601:5601：端口映射配置

6、启动容器

docker start my_elasticsearch 启动elasticsearch

docker start my_kibana 启动kibana

在浏览器输入地址访问：http://192.168.150.101:5601，即可看到结果

三、ElasticSearch的语法

1、索引的语法

elasticsearach的数据库的索引就相当于MySQL的table 表，需要创建一个存储数据（文档）的地方。

创建索引-------lxhhotel

put /lxhhotel
{
"mappings":{
"properties":{
"age":{
"type":"integer"
},
"weight":{
"type":"float"
},
"isMarried":{
"type":"boolean"
},
"info":{
"type":"text"
},
"email":{
"type":"keyword",
"index":false

},
"score":{
"type":"object"
},
"name":{
"properties":{
"firstName":{
"type":"text"
},
"lastName":{
"type":"text"
}
}
}

}
}
}

查看索引

get /lxhhotel

删除索引

delete /lxhhotel

注意：索引一旦建立是不允许修改的，如果需要修改的话，需要将该索引删除，并重新创建。但是索引是允许添加的。

索引的添加 ------ 在lxhhotel 的索引添加了 lxh 这个字段

put /lxhhotel/_mapping
{
"properties":{
"lxh":{
"type":"text"
}
}
}

2、文档的语法

文档也就是我们所说的索引里面的内容，也就是相当于MySQL的表table里面的数据。

文档的添加 post /索引的名称/_doc/文档id

文档的删除，查询

文档的修改

3、索引的代码实战

第一步-----引入pom文件

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.12.1</version>
</dependency>

第二步-----创建es的链接

package com.hihonor.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class EleasticConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient(){
      return   new RestHighLevelClient(RestClient.builder(HttpHost.create("http://47.93.32.133:9200")));
    }
}

第三步-----索引操作

索引的创建

package com.hihonor.controller;


import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;

@RestController
@RequestMapping("/myES")
public class ElasticSearchAController {

    // 这个也即是 我们的ES的创建索引语法
    public static final String CREATE_MAPPING_HOTEL = "{\n" +
            "  \"mappings\":{\n" +
            "    \"properties\":{\n" +
            "      \"name\":{\n" +
            "        \"type\":\"text\",\n" +
            "        \"analyzer\":\"ik_max_word\",\n" +
            "        \"copy_to\":\"all\"\n" +
            "      },\n" +
            "      \"adress\":{\n" +
            "        \"type\":\"keyword\",\n" +
            "        \"index\":false\n" +
            "      },\n" +
            "      \"price\":{\n" +
            "        \"type\":\"integer\"\n" +
            "      },\n" +
            "      \"source\":{\n" +
            "        \"type\":\"integer\"\n" +
            "      },\n" +
            "      \"brand\":{\n" +
            "        \"type\":\"keyword\",\n" +
            "        \"copy_to\":\"all\"\n" +
            "      },\n" +
            "      \"city\":{\n" +
            "        \"type\":\"keyword\",\n" +
            "        \"copy_to\":\"all\"\n" +
            "      },\n" +
            "      \"star_name\":{\n" +
            "        \"type\":\"keyword\"\n" +
            "      },\n" +
            "      \"business\":{\n" +
            "        \"type\":\"keyword\"\n" +
            "      },\n" +
            "      \"all\":{\n" +
            "        \"type\":\"text\",\n" +
            "        \"analyzer\":\"ik_max_word\"\n" +
            "      },\n" +
            "      \"location\":{\n" +
            "        \"type\":\"geo_point\"\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";

    @Autowired
    private RestHighLevelClient restHighLevelClient;


    @PostMapping("/createES")
    public String createES() throws IOException {
        // 创建索引 myHotel
        CreateIndexRequest request = new CreateIndexRequest("myhotel");
        request.source(CREATE_MAPPING_HOTEL, XContentType.JSON);
        restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
        return "创建索引语句执行成功";

    }


}

索引的查询

package com.hihonor.controller;


import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;

@RestController
@RequestMapping("/myES")
public class ElasticSearchAController {

 
    @Autowired
    private RestHighLevelClient restHighLevelClient;


    @PostMapping("/getES")
    public boolean getES() throws IOException {
        // 查看索引 myHotel
        GetIndexRequest request = new GetIndexRequest("myhotel");
        return restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);

    }



}

索引的删除

package com.hihonor.controller;


import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;

@RestController
@RequestMapping("/myES")
public class ElasticSearchAController {


    @Autowired
    private RestHighLevelClient restHighLevelClient;


    @PostMapping("/deleteES")
    public void deleteES() throws IOException {
        // 删除索引 myHotel
        DeleteIndexRequest request = new DeleteIndexRequest("myhotel");
        restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
    }


}

4、文档的代码实战

文档的插入

package com.hihonor.controller;

import com.alibaba.fastjson.JSONObject;
import com.hihonor.bean.Hotel;
import com.hihonor.mapper.HotelMapper;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;

@RestController
@RequestMapping("/myES/doc")
public class ElasticSearchRowController {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Autowired
    private HotelMapper hotelMapper;


    // 进行文档的数据的插入和添加
    @PostMapping("/insertDoc")
    public String insertDoc() throws IOException {
        // 查询mysql的数据
        Hotel hotel = hotelMapper.getHotels(36934L);
        hotel.setLocation(hotel.getLatitude()+","+hotel.getLongitude());
        // 进行JSON转换
        String hotelString = JSONObject.toJSONString(hotel);
        IndexRequest request = new IndexRequest("myhotel").id(hotel.getId().toString());
        request.source(hotelString, XContentType.JSON);
        restHighLevelClient.index(request, RequestOptions.DEFAULT);
        return "文档数据插入成功";
    }

}

文档的删除

package com.hihonor.controller;

import com.alibaba.fastjson.JSONObject;
import com.hihonor.bean.Hotel;
import com.hihonor.mapper.HotelMapper;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;

@RestController
@RequestMapping("/myES/doc")
public class ElasticSearchRowController {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Autowired
    private HotelMapper hotelMapper;


    // 进行文档的数据的删除
    @PostMapping("/deleteDoc")
    public String deleteDoc() throws IOException {
        DeleteRequest request = new DeleteRequest("myhotel", "36934");
        restHighLevelClient.delete(request, RequestOptions.DEFAULT);
        return "文档的删除语句执行成功";
    }

}

文档的查询

package com.hihonor.controller;

import com.alibaba.fastjson.JSONObject;
import com.hihonor.bean.Hotel;
import com.hihonor.mapper.HotelMapper;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;

@RestController
@RequestMapping("/myES/doc")
public class ElasticSearchRowController {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Autowired
    private HotelMapper hotelMapper;


    // 进行文档的数据的查询
    @PostMapping("/getDoc")
    public Hotel getDoc() throws IOException {

        GetRequest request = new GetRequest("myhotel", "36934");
        GetResponse response = restHighLevelClient.get(request, RequestOptions.DEFAULT);
        String sourceAsString = response.getSourceAsString();
        Hotel hotel = JSONObject.parseObject(sourceAsString, Hotel.class);
        return hotel;
    }



}

文档的修改

package com.hihonor.controller;

import com.alibaba.fastjson.JSONObject;
import com.hihonor.bean.Hotel;
import com.hihonor.mapper.HotelMapper;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;

@RestController
@RequestMapping("/myES/doc")
public class ElasticSearchRowController {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Autowired
    private HotelMapper hotelMapper;




    // 进行文档的数据的修改
    @PostMapping("/updateDoc")
    public String updateDoc() throws IOException {
        UpdateRequest request = new UpdateRequest("myhotel","36934");
        request.doc(
                "price",20,
                "name","修改的名字"
        );
        restHighLevelClient.update(request, RequestOptions.DEFAULT);
        return "文档的修改语句执行成功";
    }



}

文档的批量添加

package com.hihonor.controller;

import com.alibaba.fastjson.JSONObject;
import com.hihonor.bean.Hotel;
import com.hihonor.mapper.HotelMapper;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;

@RestController
@RequestMapping("/myES/doc")
public class ElasticSearchRowController {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Autowired
    private HotelMapper hotelMapper;


    // 进行文档的数据的批量插入
    @PostMapping("/BulkDoc")
    public String BulkDoc() throws IOException {
        // 批量执行的方法
        BulkRequest bulkRequest = new BulkRequest();

        // 数据库批量查询得到多条数据
        List<Hotel> hotels = hotelMapper.getHotelRowTen();
        // 数据循环插入到批量的方法里面
        hotels.forEach(hotel -> {
            hotel.setLocation(hotel.getLatitude()+","+hotel.getLongitude());
            String hotelStr = JSONObject.toJSONString(hotel);
            IndexRequest request = new IndexRequest("myhotel")
                    .id(hotel.getId().toString())
                    .source(hotelStr, XContentType.JSON);
            bulkRequest.add(request);
        });
        restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return "文档的批量插入语句执行成功";
    }



}

5、分词器----IK分词器

1）概念

分词器是ES里面的核心观念，在ES进行词语的查询的时候，往往都是将一个词语进行一下切割，也就是分词，然后根据这个切割出来的词语进行查找。

2）安装分词器

# 进入容器内部
docker exec -it elasticsearch /bin/bash

# 在线下载并安装
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip

#退出
exit

#重启容器
docker restart elasticsearch

ik分词器包含两种模式：

ik_smart：最少切分，粗粒度

ik_max_word：最细切分，细粒度

最关键的是通过分词器我们可以指定哪些词语可以被分词，哪些词句不可以被分词。

扩展词词

随着互联网的发展，“造词运动”也越发的频繁。出现了很多新的词语，在原有的词汇列表中并不存在。比如：“奥力给”，“传智播客” 等。

所以我们的词汇也需要不断的更新，IK分词器提供了扩展词汇的功能。

步骤1：

1）打开IK分词器config目录。

2）在IKAnalyzer.cfg.xml配置文件内容添加。

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>


<entry key="ext_dict">ext.dic</entry>


<entry key="ext_stopwords">stopword.dic</entry>

</properties>

3）新建一个 ext.dic，可以参考config目录下复制一个配置文件进行修改

传智播客
奥力给

注意当前文件的编码必须是 UTF-8 格式，严禁使用Windows记事本编辑

4）重启elasticsearch

docker restart es

# 查看日志
docker logs -f elasticsearch

坚持学习的博客

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
分布式搜索引擎elasticsearch（一）

1）什么是文档词条？类似于 mysql的表。每一条数据就是一个文档。对文档中的内容分词，得到的词语就是词条。2）什么是正向索引？基于文档id创建索引。查询词条时必须先找到文档，然后判断是否包含词条。3）什么是倒排索引？对文档的内容进行分词，对词条创建索引，并记录词条所在的文档的信息。查询时先根据词条查询得到文档，然后将文档根据分词频率（得分）按照先后后续展示出来。
复制链接

扫一扫