分布式存储引擎elasticsearch

最新推荐文章于 2024-05-30 11:15:49 发布

学习没烦脑

最新推荐文章于 2024-05-30 11:15:49 发布

阅读量697

点赞数

文章标签：分布式 elasticsearch 大数据

本文链接：https://blog.csdn.net/weixin_47435200/article/details/131821960

版权

本文介绍了Elasticsearch作为开源搜索引擎的核心特性，如倒排索引和与MySQL的对比。详细阐述了分词器的使用，以及如何通过RESTClient进行索引库和文档的操作，包括创建、查询、删除及更新。还提及了如何自定义分词规则和使用JavaRestClient进行数据交互。

摘要由CSDN通过智能技术生成

一、简介

elasticsearch是一款非常强大的开源搜索引擎，可以帮助我们从海量数据中快速找到需要的内容。结合kibana、Logstash、Beats，也就是elastic stack（ELK）。被广泛应用在日志数据分析、实时监控等领域。

elastic stackl是以elasticsearch为核心的技术栈，包括beats、Logstash、kibana、elasticsearch。

Lucenel是Apache的开源搜索引擎类库，提供了搜索引擎的核心API。

二、正向索引和倒排索引

传统的数据库使用正向索引，一般是基于主键Id创建索引，查找时会对数据进行逐条扫描，符合查找条件的存入结果集，不符合的直接丢弃。

elasticsearch采用的是倒排索引：文档（document）：每条数据就是一个文档、词条（term）：文档按照语义分成的词语。在进行查找时会根据词条查询文档Id，再根据id查询文档并存入结果集。

文档：每一条数据就是一个文档

词条：对文档中的内容分词，得到的就是词条

三、Mysql与Es的对比

Mysql	Es	说明
Table	Index	索引(index)是文档的集合，类似于数据库中的表
Row	Document	文档（Document），就是一条条的数据，类似数据库中的行（Row），文档都是JSON格式
Cloumn	Field	字段（Field），就是JSON文档中的字段，类似数据库中的列（Column）
Schema	Mapping	Mapping（映射）是索引中文档的约束，例如字段类型约束。类似数据库的表结构（Schema）
SQL	DSL	DSL是elasticsearch提供的JSON风格的请求语句，用来操作elasticsearch，用来实现CRUD

Mysql更擅长事务类型操作，可以确保数据的安全和一致性

ES擅长海量数据的搜索、分析、计算

四、分词器

在使用分词器前，我们需要提前部署kibana。es在创建倒排索引时需要对文档分词；在搜索时，需要对用户输入内容分词。但默认的分词规则对中文处理并不友好。我们在kibana的DevTools中测试：

语法：

POST：请求方式

/_analyze：请求路径，这里省略了虚拟机kibana地址，有kibana帮我们补充

请求参数，json风格：analyzer：分词器类型，这里是默认的standard分词器 text：要分词的内容

ik分词器的两种模式：

ik_smart：最少切分，粗粒度

ik_max_word：最细切分，细粒度

在我们实际应用时有些词语不存在或者对某些敏感词汇进行屏蔽，这时就需要拓展ik分词器的词库，只需要修改一个ik分词器目录中的config目录中的IkAnalyzer.cfg.xml文件：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
   <comment>IK Analyzer 扩展配置</comment>
   
   <entry key="ext_dict">ext.dic</entry>
   
   <entry key="ext_stopwords">stopword.dic</entry>
   
   
   
   
</properties>

可在下图所示位置扩展词字典

五、索引库操作

1.创建索引库

ES中通过Restful请求操作索引库、文档。请求内容用DSL语句来表示。创建索引库和mapping的DSL语法如下：

PUT /索引库名称

{

  "mappings": {

    "properties": {

      "字段名":{

        "type": "text",

        "analyzer": "ik_smart"

      },

      "字段名2":{

        "type": "keyword",

        "index": "false"

      },

      "字段名3":{

        "properties": {

          "子字段": {

            "type": "keyword"

          }

        }

      },

      // ...略

    }

  }

}

2.查询、删除索引库

查询语法如下：

GET /索引库名

删除索引库语法：

DELETE /索引库名

3.修改索引库

索引库和mapping一旦创建无法修改，但是可以添加新的字段，语法如下：

PUT /索引库名/_mapping

{

  "properties": {

    "新字段名":{

      "type": "integer"

    }

  }

}

六、文档操作

1.新增文档

新增文档的DSL语法如下：

POST /索引库名/_doc/文档id

{

    "字段1": "值1",

    "字段2": "值2",

    "字段3": {

        "子属性1": "值3",

        "子属性2": "值4"

    },

    // ...

}

示例：

POST /example/_doc/1

{

    "info": "学习Java有前途",

    "email": "zs@itcast.cn",

    "name": {

        "firstName": "三",

        "lastName": "张 "

    }

}

2.查询、删除文档

查看文档语法：

GET /索引库名/_doc/文档id

删除文档语法：

DELETE /索引库名/_doc/文档id

3.修改文档

方式一：全局修改，id存在会把原有文档删除添加新的文档进去，id不存在直接新增一条文档。语法如下：

PUT /索引库名/_doc/文档id

{

    "字段1": "值1",

    "字段2": "值2",

    // ... 略

}

方式二：局部修改，只修改某个字段的固定值，语法如下：

POST /索引库名/_update/文档id

{

    "doc": {

         "字段名": "新的值",

    }

}

在新增文档时与mapping结构不同时，es 会帮助我们字段设置mapping，规则如下：

JSON类型	Elasticsearch类型
字符串	• 日期格式字符串： mapping 为 date 类型 • 普通字符串： mapping 为 text 类型，并添加 keyword 类型子字段
布尔值	boolean
浮点数	float
整数	long
对象嵌套	object，并添加properties
数组	由数组中的第一个非空类型决定
空值	忽略

七、RestClient操作索引库

1.分析数据结构

数据库表结构如下：

CREATE TABLE `tb_hotel` (

  `id` bigint(20) NOT NULL COMMENT '酒店id',

  `name` varchar(255) NOT NULL COMMENT '酒店名称；例：7天酒店',

  `address` varchar(255) NOT NULL COMMENT '酒店地址；例：航头路',

  `price` int(10) NOT NULL COMMENT '酒店价格；例：329',

  `score` int(2) NOT NULL COMMENT '酒店评分；例：45，就是4.5分',

  `brand` varchar(32) NOT NULL COMMENT '酒店品牌；例：如家',

  `city` varchar(32) NOT NULL COMMENT '所在城市；例：上海',

  `star_name` varchar(16) DEFAULT NULL COMMENT '酒店星级，从低到高分别是：1星到5星，1钻到5钻',

  `business` varchar(255) DEFAULT NULL COMMENT '商圈；例：虹桥',

  `latitude` varchar(32) NOT NULL COMMENT '纬度；例：31.2497',

  `longitude` varchar(32) NOT NULL COMMENT '经度；例：120.3925',

  `pic` varchar(255) DEFAULT NULL COMMENT '酒店图片；例:/img/1.jpg',

  PRIMARY KEY (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

通过分析和实际应用，创建如下索引库：

PUT /hotel
{
"mappings": {
"properties": {
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "ik_max_word",
"copy_to": "all"
},
"address":{
"type": "text",
"index": false
},
"price":{
"type": "integer"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword"
},
"starName":{
"type": "keyword"
},
"bussiness":{
"type": "keyword",
"copy_to": "all"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "text",
"index": false
},
"all":{
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}

注意：

2.初始化JavaRestClient

引入es的RestHighLevelClient依赖：

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

因为SpringBoot默认的ES版本是7.6.2，所以我们需要覆盖默认的ES版本：

<properties>
    <java.version>1.8</java.version>
    <elasticsearch.version>7.12.1</elasticsearch.version>
</properties>

初始化RestHighLevelClient：

public class HotelClientTest {

    private RestHighLevelClient client;

    @Test
    void test(){
        System.out.println(client);
    }

    @BeforeEach
    void setUp(){
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.226.100:9200")
        ));
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }
}

3.创建索引库

创建一个类，把索引库字段定义成MAPPING_TEMPLATE常量

public class hoteSimple {
    public static  final String MAPPING_TEMPLATE="{\n" +
            "  \"mappings\": {\n" +
            "    \"properties\": {\n" +
            "      \"id\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"name\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"address\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"price\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"score\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"brand\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"city\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"starName\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"bussiness\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"location\":{\n" +
            "        \"type\": \"geo_point\"\n" +
            "      },\n" +
            "      \"pic\":{\n" +
            "         \"type\": \"text\",\n" +
            "         \"index\": false\n" +
            "      },\n" +
            "      \"all\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\"\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";
}

 //创建索引库代码
    @Test
    void CreateIndex() throws IOException {
        //创建request对象
        CreateIndexRequest request = new CreateIndexRequest("hotel");
        //请求参数，dsl语句
        request.source(MAPPING_TEMPLATE,XContentType.JSON);
        //发起请求
        client.indices().create(request, RequestOptions.DEFAULT);
    }

运行成功后出现一下截图：

4.删除索引库

代码如下：

 @Test
    void testDeleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("hotel");
        client.indices().delete(request,RequestOptions.DEFAULT);
    }

5.索引是否存在

代码如下所示：

@Test
    void testExistsIndex() throws IOException {
        GetIndexRequest request = new GetIndexRequest("hotel");
        boolean exists = client.indices().exists(request,RequestOptions.DEFAULT);
        System.out.println(exists?"索引库已存在":"索引库不存在");
    }

6.小结：

索引操作基本步骤：初始化RestHighLevelClient、创建XxxIndexRequest。XXX是Create、Get、Delete、准备DSL（ Create时需要）、发送请求。调用RestHighLevelClient#indices().xxx()方法，xxx是create、exists、delete

八、RestClient操作文档

利用JavaRestClient实现文档的crud，具体步骤如下：

1.初始化JavaRestClient

创建一个测试类，实现文档相关操作，并且完成JavaRestClient的初始化

public class DocumentTest {
    private RestHighLevelClient client;
    @BeforeEach
    void setUp(){
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.226.100:9200")
        ));
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }
}

2.添加酒店数据到索引库

先查询酒店数据，然后给这条数据创建倒排索引，即可完成添加

 @Test
    void testAddDocument() throws IOException {
        //根据id查询酒店
        Hotel hotel = iHotelService.getById(39106L);
        //转换类型
        HotelDoc hotelDoc = new HotelDoc(hotel);
        //创建reuest对象
        IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
        //准备json文档
        request.source(JSON.toJSONString(hotelDoc),XContentType.JSON);
        //发送请求
        client.index(request,RequestOptions.DEFAULT);
    }

3.根据id查询酒店数据

根据id查询到的文档数据是json，需要反序列化为java对象：

 @Test
    void testGetDocument() throws IOException {
        //创建request对象
        GetRequest request = new GetRequest("hotel", "39106");
        //发送请求,得到结果
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        //解析结果
        String json = response.getSourceAsString();
        //发序列化，转换为Java对象
        HotelDoc hotelDoc = JSON.parseObject(json,HotelDoc.class);
        //输出结果
        System.out.println(hotelDoc);
    }

运行结果：

3.根据id修改酒店数据

这里边使用局部修改，修改某个字段固定的值：

 @Test
    void testUpdateDocument() throws IOException {
        //创建request对象
        UpdateRequest request = new UpdateRequest("hotel", "39106");
        //修改值，准备参数
        request.doc(
                "price","380",
                "starName","三钻"
        );
        //发送请求
        client.update(request,RequestOptions.DEFAULT);
    }

4.根据id删除酒店数据

代码如下：

 @Test
    void testDeleteDocument() throws IOException {
        //创建request对象
        DeleteRequest request = new DeleteRequest("hotel", "39106");
        //发送请求
        client.delete(request,RequestOptions.DEFAULT);
    }

5.批量导入数据

代码实现如下：

@Test
    void testAddDocuments() throws IOException {
        //查找酒店列表
        List<Hotel> hotels = iHotelService.list();
        //创建request对象
        BulkRequest request = new BulkRequest();
        for (Hotel hotel:hotels) {//遍历集合，将hotel转换成hoteldoc
            HotelDoc hotelDoc = new HotelDoc(hotel);
            request.add(new IndexRequest("hotel").id(hotel.getId().toString())
                    .source(JSON.toJSONString(hotelDoc)
                    ,XContentType.JSON));
        }
        //发送请求
        client.bulk(request,RequestOptions.DEFAULT);
    }

6.小结：

文档操作基本步骤：初始化RestHighLevelClient、创建XxxRequest。XXX是Index、Get、Update、Delete、准备参数（Index和Update时需要）、发送请求。调用RestHighLevelClient#.xxx()方法，xxx是index、get、update、delete、解析结果（Get时需要）