elasticsearch基础学习笔记

最新推荐文章于 2024-06-17 16:09:14 发布

幼稚范2r°

最新推荐文章于 2024-06-17 16:09:14 发布

阅读量93

点赞数

文章标签： elasticsearch 学习笔记

本文链接：https://blog.csdn.net/xiaoaofeng_/article/details/132250621

版权

1. 初识elasticsearch

1.1 什么是elasticsearch

elasticsearch是一款非常强大的开源搜索引擎，可以帮助我们从海量数据中快速找到需要的内容。

elasticsearch结合kibana、Logstash、Beats，也就是elastic stack（ELK）。被广泛应用在日志数据分析、实时监控等领域。

elasticsearch是elastic stack的核心，负责存储、搜索、分析数据。

1.2 为什么学习elasticsearch？

搜索引擎技术排名：

Elasticsearch：开源的分布式搜索引擎
Splunk：商业项目
Solr：Apache的开源搜索引擎

1.3 正向索引和倒排索引

传统数据库（如MySQL）采用正向索引，例如给下表（tb_goods）中的id创建索引：

id	title	price
1	小米手机	3499
2	华为手机	4999
3	华为小米充电器	49
4	小米手环	49
...	...	...

elasticsearch采用倒排索引：

文档（document）：每条数据就是一个文档
词条（term）：文档按照语义分成的词语

倒排索引中包含两部分内容：

词条词典（Term Dictionary）：记录所有词条，以及词条与倒排列表（Posting List）之间的关系，会给词条创建索引，提高查询和插入效率
倒排列表（Posting List）：记录词条所在的文档id、词条出现频率、词条在文档中的位置等信息
- 文档id：用于快速获取文档
- 词条频率（TF）：文档在词条出现的次数，用于评分

1.4 文档

elasticsearch是面向文档存储的，可以是数据库中的一条商品数据，一个订单信息。文档数据会被序列化为json格式后存储在elasticsearch中。

1.5 索引（Index）

索引（index）：相同类型的文档的集合
映射（mapping）：索引中文档的字段约束信息，类似表的结构约束

概念对比：

1.6 架构

Mysql：擅长事务类型操作，可以确保数据的安全和一致性

Elasticsearch：擅长海量数据的搜索、分析、计算

Mysql与Elasticsearch并不存在谁代替谁，而是相辅相成。

1.7 分词器

ik分词器包含两种模式：

ik_smart：最少切分，粗粒度
ik_max_word：最细切分，细粒度

es在创建倒排索引时需要对文档分词；在搜索时，需要对用户输入内容分词。但默认的分词规则对中文处理并不友好。

我们在kibana的DevTools中测试：

语法说明：

POST：请求方式
/_analyze：请求路径，这里省略了http://192.168.150.101:9200，有kibana帮我们补充
请求参数，json风格：
- analyzer：分词器类型，这里是默认的standard分词器
- text：要分词的内容

POST /_analyze
{
  "analyzer": "standard",
  "text": "认真学Java！"
}

ik分词器-拓展词库

要拓展ik分词器的词库，只需要修改一个ik分词器目录中的config目录中的IkAnalyzer.cfg.xml文件：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典-->
        <entry key="ext_dict">ext.dic</entry>
</properties>

然后在名为ext.dic的文件中，添加想要拓展的词语即可：

ik分词器-停用词库

要禁用某些敏感词条，只需要修改一个ik分词器目录中的config目录中的IkAnalyzer.cfg.xml文件：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典-->
        <entry key="ext_dict">ext.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典  *** 添加停用词词典-->
        <entry key="ext_stopwords">stopword.dic</entry>
</properties>

然后在名为stopword.dic的文件中，添加想要拓展的词语即可

2. 安装elasticsearch、kibana

2.1 部署单点es

2.1.1 创建网络

因为我们还需要部署kibana容器，因此需要让es和kibana容器互联。这里先创建一个网络：

docker network create es-net

2.1.2 加载镜像

运行命令加载即可：

# 导入数据
docker load -i es.tar

同理还有kibana的tar包也需要这样做。

2.1.3 运行

运行docker命令，部署单点es：

docker run -d \
	--name es \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1

命令解释：

-e "cluster.name=es-docker-cluster"：设置集群名称
-e "http.host=0.0.0.0"：监听的地址，可以外网访问
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"：内存大小
-e "discovery.type=single-node"：非集群模式
-v es-data:/usr/share/elasticsearch/data：挂载逻辑卷，绑定es的数据目录
-v es-logs:/usr/share/elasticsearch/logs：挂载逻辑卷，绑定es的日志目录
-v es-plugins:/usr/share/elasticsearch/plugins：挂载逻辑卷，绑定es的插件目录
--privileged：授予逻辑卷访问权
--network es-net ：加入一个名为es-net的网络中
-p 9200:9200：端口映射配置

在浏览器中输入：

2.2 部署kibana

kibana可以给我们提供一个elasticsearch的可视化界面，便于我们学习。

2.2.1 部署

运行docker命令，部署kibana

docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1

--network es-net ：加入一个名为es-net的网络中，与elasticsearch在同一个网络中
-e ELASTICSEARCH_HOSTS=http://es:9200"：设置elasticsearch的地址，因为kibana已经与elasticsearch在一个网络，因此可以用容器名直接访问elasticsearch
-p 5601:5601：端口映射配置

kibana启动一般比较慢，需要多等待一会，可以通过命令：

docker logs -f kibana

查看运行日志，当查看到下面的日志，说明成功：

此时，在浏览器输入地址访问：http://192.168.150.101:5601，即可看到结果

2.2.2 DevTools

kibana中提供了一个DevTools界面：

这个界面中可以编写DSL来操作elasticsearch。并且对DSL语句有自动补全功能。

3. 索引库操作

3.1 mapping属性

mapping是对索引库中文档的约束，常见的mapping属性包括：

type：字段数据类型，常见的简单类型有：字符串：
- text（可分词的文本）、keyword（精确值，例如：品牌、国家、ip地址）
- 数值：long、integer、short、byte、double、float
- 布尔：boolean
- 日期：date
- 对象：object
index：是否创建索引，默认为true
analyzer：使用哪种分词器
properties：该字段的子字段

3.2 创建索引库

ES中通过Restful请求操作索引库、文档。请求内容用DSL语句来表示。创建索引库和mapping的DSL语法如下：

PUT /索引库名称
{
  "mappings": {
    "properties": {
      "字段名":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "字段名2":{
        "type": "keyword",
        "index": "false"
      },
      "字段名3":{
        "properties": {
          "子字段": {
            "type": "keyword"
          }
        }
      },
      // ...略
    }
  }
}

例如：

# 创建索引库
PUT /xiaofan
{
  "mappings": {
    "properties": {
      "info":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "email":{
        "type": "keyword",
        "index": false
      },
      "name":{
        "type": "object",
        "properties": {
          "firstName":{
            "type":"keyword"
          },
          "lastName":{
            "type":"keyword"
          }
        }
      }
    }
  }
}

3.3查看、删除索引库

查看索引库语法：

GET /索引库名

示例：

# 查询
GET /xiaofan

删除索引库的语法：

DELETE /索引库名

示例：

# 删除文档
DELETE /xiaofan

3.4 修改索引库

索引库和mapping一旦创建无法修改，但是可以添加新的字段，语法如下：

PUT /索引库名/_mapping
{
  "properties": {
    "新字段名":{
      "type": "integer"
    }
  }
}

4. 文档操作

4.1 添加文档

新增文档的DSL语法如下：

POST /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    "字段3": {
        "子属性1": "值3",
        "子属性2": "值4"
    },
    // ...
}

例如：

# 插入文档
POST /xiaofan/_doc/1
{
  "info" : "小范努力学Java",
  "email": "123321@qq.com",
  "name":{
    "firstName":"范",
    "lastName":"小"
  }
}

4.2 查看、删除文档

查看文档语法：

GET /索引库名/_doc/文档id

示例：

# 查询
GET /xiaofan

删除索引库的语法：

DELETE /索引库名/_doc/文档id

示例：

# 删除文档
DELETE /xiaofan/_doc/1

4.3 修改文档

方式一：全量修改，会删除旧文档，添加新文档

PUT /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    // ... 略
}

方式二：增量修改，修改指定字段值

POST /索引库名/_update/文档id
{
    "doc": {
         "字段名": "新的值",
    }
}

POST /xiaofan/_update/1
{
  "doc": {
    "email":"567876@xiaofan.cim"
  }
}

当我们向ES中插入文档时，如果文档中字段没有对应的mapping，ES会帮助我们字段设置mapping，规则如下：

5. RestClient操作索引库

5.1 什么是RestClient

ES官方提供了各种不同语言的客户端，用来操作ES。这些客户端的本质就是组装DSL语句，通过http请求发送给ES。官方文档地址：https://www.elastic.co/guide/en/elasticsearch/client/index.html

5.2 初始化JavaRestClient

1. 引入es的RestHighLevelClient依赖

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

2. 因为SpringBoot默认的ES版本是7.6.2，所以我们需要覆盖默认的ES版本：

<properties>
    <java.version>1.8</java.version>
    <elasticsearch.version>7.12.1</elasticsearch.version> 
</properties>

3. 初始化RestHighLevelClient：

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
        HttpHost.create("http://192.168.150.101:9200")
));

5.3 创建索引库

private RestHighLevelClient client; 

    @Test
    void createHotelIndex() throws IOException {
        //1. 创建Request对象
        CreateIndexRequest request = new CreateIndexRequest("hotel");
        //2. 准备请求的参数： DSL语句
        request.source(MAPPING_TEMPLATE, XContentType.JSON);//MAPPING_TEMPLATE为自己定义的json格式的DSL字符串常量
        //3. 发送请求
        client.indices().create(request, RequestOptions.DEFAULT);
    }

5.4 删除索引库、判断索引库是否存在

删除索引库代码如下：

@Test
void testDeleteHotelIndex() throws IOException {
    // 1.创建Request对象 
    DeleteIndexRequest request = new DeleteIndexRequest("hotel");
    // 2.发起请求
    client.indices().delete(request, RequestOptions.DEFAULT);
}

判断索引库是否存在：

@Test
void testExistsHotelIndex() throws IOException {
    // 1.创建Request对象
    GetIndexRequest request = new GetIndexRequest("hotel");
    // 2.发起请求 
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    // 3.输出
    System.out.println(exists);
}

6. RestClient操作文档

文档操作的基本步骤：

初始化RestHighLevelClient
创建XxxRequest。XXX是Index、Get、Update、Delete 准
备参数（Index和Update时需要）
发送请求。调用RestHighLevelClient#.xxx()方法，xxx是index、get、update、delete
解析结果（Get时需要）

步骤1：初始化JavaRestClient

新建一个测试类，实现文档相关操作，并且完成JavaRestClient的初始化

public class ElasticsearchDocumentTest {
    // 客户端
    private RestHighLevelClient client; 
    
    @BeforeEach
    void setUp() {
        client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.150.101:9200")
        ));
    }
    @AfterEach
    void tearDown() throws IOException {
        client.close();
    }
}

步骤2：添加酒店数据到索引库

先查询酒店数据，然后给这条数据创建倒排索引，即可完成添加：

@Test
void testIndexDocument() throws IOException {
    // 1.创建request对象 
    IndexRequest request = new IndexRequest("indexName").id("1");
    // 2.准备JSON文档
    request.source("{\"name\": \"Jack\", \"age\": 21}", XContentType.JSON);
    // 3.发送请求
    client.index(request, RequestOptions.DEFAULT);
}

步骤3：根据id查询酒店数据

根据id查询到的文档数据是json，需要反序列化为java对象：

@Test
void testGetDocumentById() throws IOException {
    // 1.创建request对象
    GetRequest request = new GetRequest("indexName", "1");
    // 2.发送请求，得到结果
    GetResponse response = client.get(request, RequestOptions.DEFAULT);
    // 3.解析结果 
    String json = response.getSourceAsString();

    System.out.println(json);
}

步骤4：根据id修改酒店数据

局部更新。只更新部分字段

@Test
void testUpdateDocumentById() throws IOException {
    // 1.创建request对象
    UpdateRequest request = new UpdateRequest("indexName", "1");
    // 2.准备参数，每2个参数为一对 key value
    request.doc(
            "age", 18,
            "name", "Rose"
    );
    // 3.更新文档
    client.update(request, RequestOptions.DEFAULT);
}

步骤5：根据id删除文档数据

删除文档代码如下：

@Test
void testDeleteDocumentById() throws IOException {
    // 1.创建request对象
    DeleteRequest request = new DeleteRequest("indexName", "1");
    // 2.删除文档 
    client.delete(request, RequestOptions.DEFAULT);
}

案例：利用JavaRestClient批量导入酒店数据到ES

需求：批量查询酒店数据，然后批量导入索引库中

思路：

利用mybatis-plus查询酒店数据
将查询到的酒店数据（Hotel）转换为文档类型数据（HotelDoc）
利用JavaRestClient中的Bulk批处理，实现批量新增文档，示例代码如下

@SpringBootTest
class HotelDocumentTest {

    private RestHighLevelClient client;

    @Autowired
    private IHotelService hotelService;

    @Test
    void testAddDocument() throws IOException {
        // 1.查询数据库hotel数据
        Hotel hotel = hotelService.getById(61083L);
        // 2.转换为HotelDoc
        HotelDoc hotelDoc = new HotelDoc(hotel);
        // 3.转JSON
        String json = JSON.toJSONString(hotelDoc);

        // 1.准备Request
        IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
        // 2.准备请求参数DSL，其实就是文档的JSON字符串
        request.source(json, XContentType.JSON);
        // 3.发送请求
        client.index(request, RequestOptions.DEFAULT);
    }

    @Test
    void testGetDocumentById() throws IOException {
        // 1.准备Request      // GET /hotel/_doc/{id}
        GetRequest request = new GetRequest("hotel", "61083");
        // 2.发送请求
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        // 3.解析响应结果
        String json = response.getSourceAsString();

        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
        System.out.println("hotelDoc = " + hotelDoc);
    }

    @Test
    void testDeleteDocumentById() throws IOException {
        // 1.准备Request      // DELETE /hotel/_doc/{id}
        DeleteRequest request = new DeleteRequest("hotel", "61083");
        // 2.发送请求
        client.delete(request, RequestOptions.DEFAULT);
    }

    @Test
    void testUpdateById() throws IOException {
        // 1.准备Request
        UpdateRequest request = new UpdateRequest("hotel", "61083");
        // 2.准备参数
        request.doc(
                "price", "870"
        );
        // 3.发送请求
        client.update(request, RequestOptions.DEFAULT);
    }

    @Test
    void testBulkRequest() throws IOException {
        // 查询所有的酒店数据
        List<Hotel> list = hotelService.list();

        // 1.准备Request
        BulkRequest request = new BulkRequest();
        // 2.准备参数
        for (Hotel hotel : list) {
            // 2.1.转为HotelDoc
            HotelDoc hotelDoc = new HotelDoc(hotel);
            // 2.2.转json
            String json = JSON.toJSONString(hotelDoc);
            // 2.3.添加请求
            request.add(new IndexRequest("hotel").id(hotel.getId().toString()).source(json, XContentType.JSON));
        }

        // 3.发送请求
        client.bulk(request, RequestOptions.DEFAULT);
    }

    @BeforeEach
    void setUp() {
        client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.150.101:9200")
        ));
    }

    @AfterEach
    void tearDown() throws IOException {
        client.close();
    }



}