ElasticSearch入门到掌握，用心看完这三篇就够了【完结1】

西敏寺的乐章

已于 2024-07-02 11:25:50 修改

阅读量1.1k

点赞数 26

分类专栏： elasticsearch 文章标签： elasticsearch 大数据搜索引擎

于 2024-04-08 12:31:19 首次发布

本文链接：https://blog.csdn.net/weixin_45404884/article/details/137402463

版权

elasticsearch 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

「章节总览」

【ElasticSearch 第一篇 https://blog.csdn.net/weixin_45404884/article/details/137402463】
【ElasticSearch 第二篇 https://blog.csdn.net/weixin_45404884/article/details/137505489】
【ElasticSearch 第三篇 https://blog.csdn.net/weixin_45404884/article/details/137548120】

一、初识ElasticSearch

1.什么是 elasticsearch

elasticsearch 是一款非常强大的开源搜索引擎，可以帮助我们从海量数据中快速找到需要的内容。结合 kibana 、 Logstash 、 Beats ，也就是 elastic stack （ ELK ）。被广泛应用在日志数据分析、实时监控等领域。
在这里插入图片描述

2.发展历程

2004 年 Shay Banon 基于 Lucene 开发了 Compass
2010 年 Shay Banon 重写了 Compass ，取名为 Elasticsearch 。
官网地址：https://www.elastic.co/cn/
相比与 lucene ， Elasticsearch具备下列优势：

支持分布式，可水平扩展
提供 Restful 接口，可被任何语言调用

2.正向索引和倒排索引

elasticsearch 采用倒排索引：

文档（ document ）：每条数据就是一个文档，文档数据会被序列化为 json 格式后存储在 elasticsearch 中
词条（ term ）：文档按照语义分成的词语
什么是倒排索引？
对文档内容分词，对词条创建索引，并记录词条所在文档的信息。查询时先根据词条查询到文档 id ，而后获取到文档。
什么是正向索引？
基于文档 id 创建索引。查询词条时必须先找到文档，而后判断是否包含词条。
传统数据库（如 MySQL ）采用正向索引，例如给下表（ tb_goods ）中的 id 创建索引：

在这里插入图片描述

3.索引

索引（ index ）：相同类型的文档的集合
映射（ mapping ）：索引中文档的字段约束信息，类似表的结构约束

4.概念对比

在这里插入图片描述

5.分词器

（1）普通分词器

es 在创建倒排索引时需要对文档分词；在搜索时，需要对用户输入内容分词。

POST /_analyze
{
  "text": "你好分词器",
  "analyzer": "standard"
}

语法说明：

POST ：请求方式
/_analyze ：请求路径，这里省略了 http://192.168.150.101:9200，有 kibana 帮我们补充
请求参数， json 风格：
analyzer ：分词器类型，这里是默认的 standard 分词器
text ：要分词的内容

分词结果：

{
  "tokens": [
    {
      "token": "你",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<IDEOGRAPHIC>",
      "position": 0
    },
    {
      "token": "好",
      "start_offset": 1,
      "end_offset": 2,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "分",
      "start_offset": 2,
      "end_offset": 3,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "词",
      "start_offset": 3,
      "end_offset": 4,
      "type": "<IDEOGRAPHIC>",
      "position": 3
    },
    {
      "token": "器",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<IDEOGRAPHIC>",
      "position": 4
    }
  ]
}

（2）ik分词器

处理中文分词，一般会使用 IK 分词器。 ik分词器地址
ik 分词器包含两种模式：

ik_smart ：最少切分，粗粒度

POST /_analyze
{
  "text": "你好分词器",
  "analyzer": "ik_smart"
}

分词结果：

{
  "tokens": [
    {
      "token": "你好",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "分词器",
      "start_offset": 2,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

ik_max_word ：最细切分，细粒度

POST /_analyze
{
  "text": "你好分词器",
  "analyzer": "ik_max_word"
}

分词结果：

{
  "tokens": [
    {
      "token": "你好",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "分词器",
      "start_offset": 2,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "分词",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "器",
      "start_offset": 4,
      "end_offset": 5,
      "type": "CN_CHAR",
      "position": 3
    }
  ]
}

支持拓展词库
找到ik分词器安装目录下的config路径下的IKAnalyzer.cfg.xml文件，配置自己的拓展词以及停用词的路径

请添加图片描述

例如打开ext_dic添加你好分词器，重启es

举例：

POST /_analyze
{
  "text": "你好分词器",
  "analyzer": "ik_smart"
}

分词结果：

{
  "tokens": [
    {
      "token": "你好分词器",
      "start_offset": 0,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 0
    }
  ]
}

6.索引库操作

（1）mapping 属性

mapping 是对索引库中文档的约束，常见的 mapping 属性包括：

type ：字段数据类型，常见的简单类型有：
- 字符串： text （可分词的文本）、 keyword （精确值，例如：品牌、国家、 ip 地址）
- 数值： long 、 integer 、 short 、 byte 、 double 、 float
- 布尔： boolean
- 日期： date
- 对象： object
index ：是否创建索引，默认为 true
analyzer ：使用哪种分词器
properties ：该字段的子字段

（2）创建索引库

ES 中通过 Restful 请求操作索引库、文档。请求内容用 DSL 语句来表示，
创建索引库和 mapping 的 DSL 语法如下：

PUT /索引库名称
{
  "mappings": {
    "properties": {
      " 字段名 ": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      " 字段名 2": {
        "type": "keyword",
        "index": "false"
      },
      " 字段名 3": {
        "properties": {
          " 子字段 ": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

举例：

PUT /iam
{
  "mappings": {
    "properties": {
      "info":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "email":{
        "type": "keyword",
        "index": "false"
      },
      "name":{
        "properties": {
          "firstName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

返回结果：

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "iam"
}

（3）查看索引库

GET /iam

（4）删除索引库

DELETE /iam

（5）修改索引库

索引库和 mapping 一旦创建无法修改，但是可以添加新的字段，语法如下：
添加字段： PUT / 索引库名 /_mapping

PUT /iam/_mapping
{
  "properties": {
    "age": {
      "type": "integer"
    }
  }
}

7.文档操作

（1）新增文档

POST /索引库名/_doc/文档 id
{
  " 字段 1": " 值 1",
  " 字段 2": " 值 2",
  " 字段 3": {
    " 子属性 1": " 值 3",
    " 子属性 2": " 值 4"
  }
}

举例：

POST /iam/_doc/2
{
  "info": "Java工程师",
  "email": "zy@itcast.cn",
  "age": 20,
  "name": {
    "firstName": "张",
    "fullName": "张三"
  }
}

返回结果：

{
  "_index": "iam",
  "_id": "2",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 3,
  "_primary_term": 1
}

（1）查看文档

GET /索引库名/_doc/文档 id

GET /iam/_doc/2

返回结果：

{
  "_index": "iam",
  "_id": "2",
  "_version": 1,
  "_seq_no": 3,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "info": "Java工程师",
    "email": "zy@itcast.cn",
    "age": 20,
    "name": {
      "firstName": "张",
      "fullName": "张三"
    }
  }
}

（2）删除文档

DELETE /索引库名/_doc/文档 id

DELETE /iam/_doc/2

返回结果：

{
  "_index": "iam",
  "_id": "2",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 4,
  "_primary_term": 1
}

（3）修改文档

方式一：全量修改，会删除旧文档，添加新文档

PUT /iam/_doc/1
{
  "info": "Java攻城狮",
  "email": "zy@itcast.cn",
  "name": {
    "firstName": "云",
    "fullName": "赵云"
  }
}

返回结果：

{
  "_index": "iam",
  "_id": "1",
  "_version": 7,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 8,
  "_primary_term": 1
}

方式二：增量修改，修改指定字段值

POST /iam/_update/1
{
  "doc": {
    "email": "ZhaoYun@itcast.cn"
  }
}

8.RestClient 操作索引库

（1）什么是 RestClient

ES 官方提供了各种不同语言的客户端，用来操作 ES 。这些客户端的本质就是组装 DSL 语句，通过 http 请求发送给ES 。官方文档地址： https://www.elastic.co/guide/en/elasticsearch/client/index.html
利用 JavaRestClient 实现创建、删除索引库，判断索引库是否存在

（2）分析数据结构

mapping 要考虑的问题：
字段名、数据类型、是否参与搜索、是否分词、如果分词，分词器是什么？

create table tb_hotel (
    id bigint(20) not null comment '酒店id',
    name varchar(255) NOT NULL comment '酒店名称；例：7天酒店',
    address varchar(255) NOT NULL comment ' 酒店地址；例：航头路 ',
    price int(10) NOT NULL COMMENT ' 酒店价格；例： 329',
    score int(2) NOT NULL COMMENT ' 酒店评分；例： 45 ，就是 4.5 分 ',
    brand varchar(32) NOT NULL COMMENT ' 酒店品牌；例：如家 ',
    city varchar(32) NOT NULL COMMENT ' 所在城市；例：上海 ',
    star_name varchar(16) DEFAULT NULL COMMENT ' 酒店星级，从低到高分别是：1 星到 5 星， 1 钻到 5 钻 ',
    business varchar(255) DEFAULT NULL COMMENT ' 商圈；例：虹桥 ',
    latitude varchar(32) NOT NULL COMMENT ' 纬度；例： 31.2497',
    longitude varchar(32) NOT NULL COMMENT ' 经度；例： 120.3925',
    pic varchar(255) DEFAULT NULL COMMENT ' 酒店图片；例 :/img/1.jpg',
    PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

酒店索引：

PUT /hotel
{
  "mappings": {
    "properties": {
      "id":{
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "address":{
        "type": "keyword",
        "index": false
      },
      "price":{
        "type": "integer"
      },
      "score":{
        "type": "integer"
      },
      "brand":{
        "type": "keyword"
      },
      "city":{
        "type": "keyword"
      },
      "star_name":{
        "type": "keyword"
      },
      "business":{
        "type": "keyword"
      },
      "location":{
        "type": "geo_point"
      },
      "pic":{
        "type": "keyword",
        "index": false
      }
    }
  }
}

tips1:
ES支持两种地理坐标数据类型：

geo_point：由纬度和经度确定的一个点，例如：“32.32132,110.323213”
geo_shape：有多个geo_point组成的复杂几何图形，例如一条直线LINESTRING(-77.3434 38.34324,-77.23112 38.3232)

tips2:
字段拷贝可以使用copy_to属性将当前字段拷贝到指定字段，示例：

      "all":{
        "type": "text",
        "analyzer": "ik_max_word"
      }
      "brand":{
        "type": "keyword",
        "copy_to": "all"
      }

（3）初始化 JavaRestClient

引入依赖

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
        </dependency>

初始化RestHighLevelClient

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.150.101:9200")
        ));

（4）创建索引库

private static final String MAPPING_TEMPLATE = "{\n" +
            "  \"mappings\": {\n" +
            "    \"properties\": {\n" +
            "      \"id\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"name\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\"\n" +
            "      },\n" +
            "      \"address\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"price\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"score\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"brand\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"city\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"star_name\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"business\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"location\":{\n" +
            "        \"type\": \"geo_point\"\n" +
            "      },\n" +
            "      \"pic\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";
    @Test
    public void testCreateHotelIndex() throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://localhost:9200")
        ));
        // 1.创建 Request 对象
        CreateIndexRequest request = new CreateIndexRequest("hotel");
        // 2.请求参数， MAPPING_TEMPLATE是静态常量字符串，内容是创建索引库的 DSL语句
        request.source(MAPPING_TEMPLATE, XContentType.JSON);
        // 3.发起请求
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
        System.out.println(response);
    }

（5）删除索引库

@Test
    public void testDeleteHotelIndex() throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://localhost:9200")
        ));
        // 1.创建 Request对象
        DeleteIndexRequest request = new DeleteIndexRequest("hotel");
        // 2.发起请求
        AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
        System.out.println(delete);
    }

（6）判断索引库是否存在

@Test
    public void testExistsHotelIndex() throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://localhost:9200")
        ));
        // 1.创建 Request对象
        GetIndexRequest request = new GetIndexRequest("hotel");
        // 2.发起请求
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        // 3.输出
        System.out.println(exists);
    }

9.RestClient 操作文档

（1）添加酒店数据到索引库

@Test
    public void testIndexDocument() throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://localhost:9200")
        ));
        String template = "{\n" +
                "  \"name\": \"张三\",\n" +
                "  \"email\": \"zy@itcast.cn\"\n" +
                "}";
        // 1.创建 request对象
        IndexRequest request = new IndexRequest("hotel").id("1");
        // 2.准备 JSON文档
        request.source(template, XContentType.JSON);
        // 3.发送请求\
        IndexResponse response = client.index(request, RequestOptions.DEFAULT);
        System.out.println(response);
        client.close();
    }

（2）根据 id 查询酒店数据

@Test
    public void testGetDocumentById() throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://localhost:9200")
        ));
        // 1.创建 request对象
        GetRequest request = new GetRequest("hotel", "1");
        // 2.发送请求，得到结果
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        // 3.解析结果
        String json = response.getSourceAsString();
        System.out.println(json);
        client.close();
    }

（3）根据 id 修改酒店数据

修改文档数据有两种方式：
方式一：全量更新。再次写入 id 一样的文档，就会删除旧文档，添加新文档
方式二：局部更新。只更新部分字段，我们演示方式二

@Test
    public void testUpdateDocumentById() throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://localhost:9200")
        ));
        // 1.创建 request对象
        UpdateRequest request = new UpdateRequest("hotel", "1");
        // 2.准备参数，每 2个参数为一对 key value
        request.doc(
                "email", "2312123@163.com"
        );
        // 3.更新文档
        client.update(request, RequestOptions.DEFAULT);
        client.close();
    }

（4）根据 id 删除文档数据

    @Test
    public void testDeleteDocumentById() throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://localhost:9200")
        ));
        // 1.创建 request对象
        DeleteRequest request = new DeleteRequest("hotel", "1");
        // 2.删除文档
        client.delete(request, RequestOptions.DEFAULT);
        client.close();
    }

下一篇
https://blog.csdn.net/weixin_45404884/article/details/137505489

西敏寺的乐章

关注

26
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch入门到掌握，用心看完这三篇就够了【完结1】

elasticsearch 是一款非常强大的开源搜索引擎，可以帮助我们从海量数据中快速找到需要的内容。结合 kibana 、 Logstash 、 Beats ，也就是 elastic stack （ ELK ）。被广泛应用在日志数据分析、实时监控等领域。ES 官方提供了各种不同语言的客户端，用来操作 ES。这些客户端的本质就是组装 DSL 语句，通过 http 请求发送给ES。
复制链接

扫一扫