分布式搜索ES：一

最新推荐文章于 2024-10-06 09:52:32 发布

稚白

最新推荐文章于 2024-10-06 09:52:32 发布

阅读量37

点赞数

分类专栏： SpringCloud 文章标签：分布式 elasticsearch 大数据

本文链接：https://blog.csdn.net/qq_51787352/article/details/132085920

版权

SpringCloud 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

初始ES

关于es

什么是elasticsearch？

一个开源的分布式搜索引擎，可以用来实现搜索、日志统计、分析、系统监控等功能

什么是elastic stack（ELK）？

是以elasticsearch为核心的技术栈，包括beats、Logstash、kibana、elasticsearch

什么是Lucene？

是Apache的开源搜索引擎类库，提供了搜索引擎的核心API

与mysql对比

MySQL	Elasticsearch	说明
Table	Index	索引(index)，就是文档的集合，类似数据库的表(table)
Row	Document	文档（Document），就是一条条的数据，类似数据库中的行（Row），文档都是JSON格式
Column	Field	字段（Field），就是JSON文档中的字段，类似数据库中的列（Column）
Schema	Mapping	Mapping（映射）是索引中文档的约束，例如字段类型约束。类似数据库的表结构（Schema）
SQL	DSL	DSL是elasticsearch提供的JSON风格的请求语句，用来操作elasticsearch，实现CRUD

关于搜索

什么是文档和词条？

每一条数据就是一个文档

对文档中的内容分词，得到的词语就是词条

什么是正向索引？

基于文档id创建索引。查询词条时必须先找到文档，而后判断是否包含词条

什么是倒排索引？

对文档内容分词，对词条创建索引，并记录词条所在文档的信息。查询时先根据词条查询到文档id，而后获取到文档

安装ES

因为我们还需要部署kibana容器，因此需要让es和kibana容器互联。而互联需要让它们在同一个网络下。这里先创建一个网络：

docker network create es-net

然后将资料中的es.tar和kibana.tar上传至虚拟机中，并导入docker成镜像，之后便可以运行es和kibana了

运行ES:

docker run -d \
	--name es \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1

命令解释：

- `-e "cluster.name=es-docker-cluster"`：设置集群名称
- `-e "http.host=0.0.0.0"`：监听的地址，可以外网访问
- `-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"`：内存大小
- `-e "discovery.type=single-node"`：非集群模式
- `-v es-data:/usr/share/elasticsearch/data`：挂载逻辑卷，绑定es的数据目录
- `-v es-logs:/usr/share/elasticsearch/logs`：挂载逻辑卷，绑定es的日志目录
- `-v es-plugins:/usr/share/elasticsearch/plugins`：挂载逻辑卷，绑定es的插件目录
- `--privileged`：授予逻辑卷访问权
- `--network es-net` ：加入一个名为es-net的网络中
- `-p 9200:9200`：端口映射配置

此时在浏览器中输入：http://192.168.153.100:9200 即可看到elasticsearch的响应结果：

运行kibana：

docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1

命令解释：

- `--network es-net` ：加入一个名为es-net的网络中，与elasticsearch在同一个网络中
- `-e ELASTICSEARCH_HOSTS=http://es:9200"`：设置elasticsearch的地址，因为kibana已经与elasticsearch在一个网络，因此可以用容器名直接访问elasticsearch
- `-p 5601:5601`：端口映射配置

成功运行后可以访问http://192.168.150.101:5601，即可看到结果

ik分词器

IK分词器包含两种模式：

* `ik_smart`：最少切分

* `ik_max_word`：最细切分

扩展词库和禁用词库

如果想要自定义词库，可以打开IK分词器的config目录，找到IKAnalyzer.cfg.xml配置文件，在其中添加：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典-->
        <entry key="ext_dict">ext.dic</entry>
        <!--用户可以在这里配置自己的扩展停止词字典  *** 添加停用词词典-->
        <entry key="ext_stopwords">stopword.dic</entry>
</properties>

然后再新建ext.dic和stopwork.dic，在这两个文件中添加要拓展或者要禁用的词即可。

添加完毕后，重启docker

docker restart es

进行测试，拓展和禁用词库成功

索引库操作

mapping属性

创建索引库

创建索引库可依照以下格式：

PUT /索引库名称
{
  "mappings": {
    "properties": {
      "字段名":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "字段名2":{
        "type": "keyword",
        "index": "false"
      },
      "字段名3":{
        "properties": {
          "子字段": {
            "type": "keyword"
          }
        }
      },
      // ...略
    }
  }
}

增删改查

查看索引库

GET /索引库名

删除索引库

DELETE /索引库名

修改索引库

索引库和mapping一旦创建无法修改，但是可以添加新的字段，语法如下：

PUT /索引库名/_mapping
{
  "properties": {
    "新字段名":{
      "type": "integer"
    }
  }
}

文档操作

新增文档

POST /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    "字段3": {
        "子属性1": "值3",
        "子属性2": "值4"
    },
    // ...
}

查询文档

GET /索引库名/_doc/文档id

删除文档

DELETE /索引库名/_doc/文档id

修改文档

方式一：全量修改，会删除旧文档，添加新文档，未指定的字段会被删除

PUT /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    // ... 略
}

方式二：增量修改，修改指定字段值

POST /索引库名/_update/文档id
{
    "doc": {
         "字段名": "新的值",
    }
}

RestClient操作索引库

创建索引库

1.导入数据库数据

2.引入依赖

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>      
</dependency>

因为springboot默认的ES版本是7.6.2，所以要覆盖掉它

    <properties>
        <java.version>1.8</java.version>
        <elasticsearch.version>7.12.1</elasticsearch.version>
    </properties>

3.初始化RestHighLevelClient

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.153.100:9200")
        ));
    }

4.创建索引库

    @Test
    void createHotelIndex() throws IOException {
        // 1.创建Request对象
        CreateIndexRequest request = new CreateIndexRequest("hotel");
        // 2.准备请求的参数：DSL语句
        request.source(MAPPING_TEMPLATE, XContentType.JSON);
        // 3.发送请求
        client.indices().create(request, RequestOptions.DEFAULT);
    }

其中MAPPING_TEMPLATE是一个静态常量，声明在类constants.HotelConstants中

public class HotelConstants {
    public static final String MAPPING_TEMPLATE = "{\n" +
            "  \"mappings\": {\n" +
            "    \"properties\": {\n" +
            "      \"id\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"name\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"address\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"price\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"score\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"brand\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"city\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"starName\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"business\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"location\":{\n" +
            "        \"type\": \"geo_point\"\n" +
            "      },\n" +
            "      \"pic\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"all\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\"\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";
}

然后启动测试类进行测试

在http://192.168.153.100:5601 中查询成功

删除索引库

    @Test
    void testDeleteHotelIndex() throws IOException {
        // 1.创建Request对象
        DeleteIndexRequest request = new DeleteIndexRequest("hotel");
        // 2.发送请求
        client.indices().delete(request, RequestOptions.DEFAULT);
    }

索引库是否存在

    @Test
    void testExistsHotelIndex() throws IOException {
        // 1.创建Request对象
        GetIndexRequest request = new GetIndexRequest("hotel");
        // 2.发送请求
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        // 3.输出
        System.out.println(exists ? "索引库已存在" : "索引库不存在");
    }

RestClient操作文档

1.新增文档

    @Autowired
    private IHotelService hotelService;

    private RestHighLevelClient client;

    @Test
    void testAddDocument() throws IOException {
        // 根据id查询酒店数据
        Hotel hotel = hotelService.getById(36934L);
        // 转为文档类型
        HotelDoc hotelDoc = new HotelDoc(hotel);
        
        // 1.创建Request对象
        IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
        // 2.准备JSON文档
        request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
        // 3.发送请求
        client.index(request, RequestOptions.DEFAULT);
    }

2.查询文档

    @Test
    void testGetDocumentById() throws IOException {
        // 1.创建request对象
        GetRequest request = new GetRequest("hotel", "36934");
        // 2.发送请求，得到响应
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        // 3.解析响应结果
        String source = response.getSourceAsString();

        HotelDoc hotelDoc = JSON.parseObject(source, HotelDoc.class);
        System.out.println(hotelDoc);
    }

3.更新文档

修改文档数据有两种方式：

方式一：全量更新。再次写入id一样的文档，就会删除旧文档，添加新文档

方式二：局部更新。只更新部分字段，这里演示方式二

    @Test
    void testUpdateDocument() throws IOException {
        // 1.创建Request对象
        UpdateRequest request = new UpdateRequest("hotel", "36934");
        // 2.准备请求参数
        request.doc(
                "price","633",
                "starName","五钻"
        );
        // 3.发送请求
        client.update(request,RequestOptions.DEFAULT);
    }

4.删除文档

    @Test
    void testDeleteDocument() throws IOException {
        // 1.准备Request对象
        DeleteRequest request = new DeleteRequest("hotel", "36934");
        // 2.发送请求
        client.delete(request, RequestOptions.DEFAULT);

    }

5.批量增加文档

    @Test
    void testBulkRequest() throws IOException {
        // 批量查询酒店数据
        List<Hotel> hotels = hotelService.list();

        // 1.创建request对象
        BulkRequest request = new BulkRequest();
        // 2.准备参数，添加多个新增的Request
        for (Hotel hotel : hotels) {
            // 转换为文档类型HotelDoc
            HotelDoc hotelDoc = new HotelDoc(hotel);
            // 创建新增文档的request对象
            request.add(new IndexRequest("hotel")
                    .id(hotelDoc.getId().toString())
                    .source(JSON.toJSONString(hotelDoc),XContentType.JSON)
            );
        }
        // 3.发送请求
        client.bulk(request,RequestOptions.DEFAULT);
    }