SpringBoot 检索篇 - 整合 Elasticsearch7.6.2

码渔

已于 2024-09-28 11:26:46 修改

阅读量1.6w

点赞数 9

分类专栏： SpringBoot 文章标签：数据库 elasticsearch java 分布式 mysql

于 2020-08-27 16:30:50 首次发布

本文链接：https://blog.csdn.net/weixin_41105242/article/details/107711634

版权

SpringBoot 专栏收录该内容

5 篇文章

订阅专栏

本文深入探讨了Elasticsearch的部署、配置及高级查询技巧，包括倒排索引、分词器使用、自定义字典、Rest风格API详解。并通过SpringBoot实现了Elasticsearch的索引创建、文档操作等，最后展示了京东搜索实战案例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言：

我们的应用经常需要添加检索功能，更或者是大量日志检索分析等，SpringBoot 通过整合 SpringData Elasticdearch 为我们提供了非常便捷的检索功能支持。

Elasticsearch是一个分布式搜索服务，提供Restful API，底层基于Lucene，采用多Shard的方式保证数据安全，并且提供自动Resharding的功能，GitHub等大型的站点也是采用了 Elasticsearch 作为其搜索服务。

Elasticsearch - 参考文档

Elasticsearch Java REST Client - 参考文档

Spring Data Elasticsearch - 参考文档

搭配项目仓库 Web IDE 观看体验更佳
在这里插入图片描述

特别鸣谢：遇见狂神说

一、概述

1.1 与关系型数据库的客观对比

Elasticsearch 是面向文档的，使用 JSON 作为文档的序列化格式。

Elasticsearch（集群）中可以包含多个索引（数据库），每个索引中可以包含多个类型（表），每个类型下又包含多个文档（行），每个文档中又包含多个字段（列）。

与关系型数据库的客观对比如下：

Relational DB	Elasticsearch
数据库（database）	索引（indices）
表（tables）	类型（types）（将被弃用）
行（row）	文档（documents）
列（columns）	字段（fields）

1.2 物理设计

Elasticsearch 在后台把每个索引划分为多个分片，每个分片可以在集群中的不同服务器间迁移。

一个运行中的 Elasticsearch 实例称为一个节点，而集群是由一个或者多个拥有相同 cluster.name 配置的节点组成，它们共同承担数据和负载的压力。

1.3 逻辑设计

一个索引类型中，包含多个文档，比如说文档1、文档2。当索引一篇文档时，可以通过这样的一个顺序找到它：

索引》类型》文档id

通过这个组合就能索引到某个具体的文档。（注意id不必是整数，实际上它是个字符串）

文档

在 Elasticsearch 中，文档是索引和搜索数据的最小单位。

文档有几个重要属性：

• 自我包含：一个文档同时包含字段和对应的值，也就是同时包含 key:value 。
• 层次性：一个文档中包含自文档。
• 结构灵活：文档不依赖预先定义的模式。

尽管可以随意新增或忽略某个字段，但是每个字段的类型非常重要。
类型

类型是文档的逻辑容器，就像关系型数据库一样，表格是行的容器。

类型中对于字段的定义称为映射。
索引

索引是映射类型的容器，Elasticsearch 中的索引是一个非常大的文档集合。

索引存储了映射类型的字段和其它设置，然后它们被存储到了各个分片上。

1.4 工作原理

一个集群至少有一个节点，而一个节点就是一个 Elasticsearch 进程，节点可以有多个默认索引，如果创建索引，那么索引将会有5个分片（primary shard 又称主分片）构成的，每一个主分片会有一个副本（replica shard 又称复制分片）。

在这里插入图片描述
上图是一个有3个节点的集群，主分片与对应的复制分片都不回在同一个节点内，这样有利于如果某个节点宕机，数据也不至于丢失。

实际上，一个分片就是一个 Lucene 索引，一个包含倒排索引的文件目录，倒排索引的结构使得 Elasticsearch 在不扫描全部文档的情况下，就能检索文档包含的特定关键字。

1.5 倒排索引

Elasticsearch 使用的是一种称为倒排索引的结构，采用 Lucene 倒排索引作为底层。

这种结构适用于快速的全文搜索，一个索引由文档中所有不重复的列表构成，对于每一个词，都有一个包含它的文档列表。

例如，现在有两个文档，每个文档包含如下内容：

# 文档1包含的内容
Study every day, good good up to forever

# 文档2包含的内容
To forever, study every day, good good up

为了创建倒排索引，首先要将每个文档拆分成独立的词（或称为词条或者tokens），然后创建一个包含所有不重复的词条的排序列表，然后列出每个词条出现在哪个文档。

term	doc_1	doc_2
Study	✓	✗
To	✗	✗
every	✓	✓
forever	✓	✓
day	✓	✓
study	✗	✓
good	✓	✓
every	✓	✓
to	✓	✗
up	✓	✓

如果搜索 to forever，只需查看包含每个词条的文档。

term	doc_1	doc_2
to	✓	✗
forever	✓	✓
total	2	1

两个文档都匹配，但是第一个文档比第二个文档的匹配程度更高。

如果没有别的条件，这两个包含关键字的文档都将返回。

二、部署&测试

2.1 部署 Elasticsearch

拉取镜像
```
docker pull elasticsearch
```

创建容器

其中9200是http访问端口，9300是tcp访问端口。

docker run -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -d -p 9200:9200 -p 9300:9300 --name es elasticsearch:7.6.2

启动异常：

ERROR: [1] bootstrap checks failed
    [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least

解决：

查看max_map_count：

cat /proc/sys/vm/max_map_count
65530

设置max_map_count：

sysctl -w vm.max_map_count=262144

测试

访问 http://Server-IP:9200 出现以下页面

2.2 部署可视化工具 Elasticsearch-head

拉取镜像
```
docker pull mobz/elasticsearch-head:5
```

创建容器

docker run -d -p 9100:9100 --name head mobz/elasticsearch-head:5

解决跨域请求问题

进入 Elasticsearch 容器，修改配置文件elasticsearch.yml
行末添加以下字段：
```
http.cors.enabled: true
http.cors.allow-origin: "*"
```
重启服务

在查看或操作索引数据时，可能还报如下错误：

{“error”:“Content-Type header [application/x-www-form-urlencoded] is not supported”,“status”:406}

解决方法:

• 进入head 容器

• 安装 vim

配置国内镜像源：

mv /etc/apt/sources.list /etc/apt/sources.list.bak
    echo "deb http://mirrors.163.com/debian/ jessie main non-free contrib" >> /etc/apt/sources.list
    echo "deb http://mirrors.163.com/debian/ jessie-proposed-updates main non-free contrib" >>/etc/apt/sources.list
    echo "deb-src http://mirrors.163.com/debian/ jessie main non-free contrib" >>/etc/apt/sources.list
    echo "deb-src http://mirrors.163.com/debian/ jessie-proposed-updates main non-free contrib" >>/etc/apt/sources.list

更新安装源

apt-get update

安装 vim

apt-get install vim

• 进入_site目录，修改vendor.js文件

 ① 6886行 contentType: "application/x-www-form-urlencoded"
 改成：contentType: "application/json;charset=UTF-8"

 ② 7573行 var inspectData = s.contentType === "application/x-www-form-urlencoded" &&
 改成：var inspectData = s.contentType === "application/json;charset=UTF-8" &&

测试

访问 http://Server-IP:9200 出现以下页面

2.3 部署可视化工具 Kibana

拉取镜像
```
docker pull kibana:7.6.2
```

创建容器

docker run -d -e ELASTICSEARCH_URL=http://39.105.80.221:9200 -p 5601:5601 --name kibana kibana:7.6.2

修改访问地址&汉化

进入容器

修改访问地址：编辑 kibana.yml 将 elasticsearch.hosts 修改为 Elasticsearch 服务地址

汉化：编辑 kibana.yml 行末添加 i18n.locale: “zh-CN”
测试

访问 http://Server-IP:5601 出现以下页面

在这里插入图片描述

2.4 安装 IK 分词器

什么是 IK 分词器？

分词：即把一段中文或者英文或分成一个个的关键字，我们在搜索的时候会把输入的信息进行分词，会把数据库或者索引库中的数据进行分词，然后进行一个匹配操作，默认的中文分词是将每一个字看成一个词，但这是不符合实际需求的，所以需要安装中文分词器 IK 来解决这个问题。

IK 提供了两个分词算法：ik_smart 和 ik_max_word ，其中 ik_smart 为最少切片，ik_max_word 为最细粒度切片。

进入 elasticsearch 容器
安装 wget
```
yum -y install wget
```
在 plugins 目录下创建 ik 目录
```
mkdir ik
```

进入 ik 目录使用 wget 下载对应版本

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip

解压压缩包

unzip elasticsearch-analysis-ik-7.6.2.zip

删除压缩包

rm -rf elasticsearch-analysis-ik-7.6.2.zip

验证

重启 elasticsearch 容器后重新进入容器，在 bin 目录下执行指令：
```
elasticsearch-plugin list
```
显示 ik 即表示安装成功

测试

在 Kibana Dev Tools 控制台中输入以下命令

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "中国共产党"
}

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "中国共产党"
}

分别发送请求会得到不同响应

{
  "tokens" : [
    {
      "token" : "中国共产党",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

{
  "tokens" : [
    {
      "token" : "中国共产党",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中国",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "国共",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "共产党",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "共产",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "党",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 5
    }
  ]
}

2.5 添加自定义分词字典

进入 elasticsearch 容器

处理中文乱码问题

编辑 ~/.vimrc 文件，行末添加以下配置：

set fileencodings=utf-8,ucs-bom,gb18030,gbk,gb2312,cp936
set termencoding=utf-8
set encoding=utf-8

保存退出

进入 IK 插件安装目录
进入 config 目录
创建 dic 文件
```
touch caixukun.dic
```
编辑 dic 添加自定义词条
```
蔡徐坤
鸡你太美
```

编辑 IKAnalyzer.cfg.xml

<entry key="ext_dict">caixukun.dic</entry>

重启 elasticsearch 容器

测试

在 Kibana Dev Tools 控制台中输入以下命令：

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "蔡徐坤鸡你太美"
}

默认响应数据：

{
  "tokens" : [
    {
      "token" : "蔡",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "徐",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "坤",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "鸡",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "你",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "太美",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 5
    }
  ]
}

自定义字典添加后响应数据：

{
  "tokens" : [
    {
      "token" : "蔡徐坤",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "鸡你太美",
      "start_offset" : 3,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

三、Rest 风格说明

一种软件结构风格，而不是标准，只是提供了一组设计原则和约束条件，它主要用于客户端和服务器交互类的软件。

基于这个风格设计的软件可以更简洁，更有层次，更易于实现缓存等机制。

基本 Rest 命令说明：

method	utl地址	描述
PUT	localhost:9200/索引名称/类型名称/文档id	创建文档（指定文档id）
POST	localhost:9200/索引名称/类型名称	创建文档
POST	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名称/类型名称/文档id	删除文档
GET	localhost:9200/索引名称/类型名称/文档id	通过id查询文档
POST	localhost:9200/索引名称/类型名称/_serch	查询所有数据

3.1 基础测试

在 Kibana Dev Tools 控制台中输入以下命令：
```
PUT /test1/type1/1
{
  "name": "蔡徐坤",
  "age": 10
}
```
• 命令解释：

PUT：创建命令
test1：索引
type1：类型
1：id
“name”: “蔡徐坤”：属性
“age”: 10：属性

发送请求

得到响应如下：

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test1",
  "_type" : "type1",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

进入 head 查看已创建的索引信息

3.2 创建索引规则

在 Kibana Dev Tools 控制台中输入以下命令：

PUT /test2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      },
      "birthday": {
        "type": "date"
      }
    }
  }
}

发送请求

得到响应如下：

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test2"
}

进入 head 查看已创建的索引信息

3.3 查看默认的信息

如果文档字段没有指定，那么 Elasticsearch 就会自动配置默认字段。

在 Kibana Dev Tools 控制台中输入以下命令：

PUT /test3/_doc/1
{
  "name": "蔡徐坤",
  "age": 10,
  "birthday": "1998-08-02"
}

发送请求

得到响应如下：

{
  "_index" : "test3",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

控制台中输入以下命令：
```
GET test3
```

发送请求

得到响应如下：

{
  "test3" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birthday" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1596476421598",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "Rh3Z67EpSPSOUbz1lmgB7g",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test3"
      }
    }
  }
}

3.4 修改操作

通过 POST 命令实现修改操作。

在 Kibana Dev Tools 控制台中输入以下命令：

POST /test3/_doc/1/_update
{
  "doc": {
    "name": "坤坤"
  }
}

发送请求

得到响应如下：

#! Deprecation: [types removal] Specifying types in document update requests is deprecated, use the endpoint /{index}/_update/{id} instead.
{
  "_index" : "test3",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2, // 更新次数
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

版本号发生变化

3.5 删除操作

通过 DELETE 命令实现删除操作。

在 Kibana Dev Tools 控制台中输入以下命令：
```
DELETE test1
```
发送请求

得到响应如下：
```
{
  "acknowledged" : true
}
```

3.6 拓展命令

通过 GET _cat 命令可以获得当前 Elasticsearch 集群的许多信息。

查看集群健康值

GET _cat/health

查看索引具体信息

GET _cat/indices?v

四、关于文档的基本操作

4.1 添加数据 PUT

在 Kibana Dev Tools 控制台中输入以下命令：

PUT /stars/user/1
{
  "name": "蔡徐坤",
  "age": "22",
  "desc": "鸡你太美",
  "tags": ["唱","跳","rap","篮球"]
}

发送请求

得到响应如下：

{
  "_index" : "stars",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

添加用户2

PUT /stars/user/2
{
  "name": "吴亦凡",
  "age": "29",
  "desc": "大碗宽面",
  "tags": ["加拿大","电鳗","说唱","嘻哈"]
}

添加用户3

PUT /stars/user/3
{
  "name": "梁非凡",
  "age": "40",
  "desc": "也啦你",
  "tags": ["桌面清理大师","警察","啵嘴"]
}

进入 head 查看已创建的索引信息

4.2 查询数据 GET

简单查询

GET stars/user/1

{
  "_index" : "stars",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "蔡徐坤",
    "age" : "22",
    "desc" : "鸡你太美",
    "tags" : [
      "唱",
      "跳",
      "rap",
      "篮球"
    ]
  }
}

复杂查询
包含关键字匹配

GET stars/user/_search?q=name:吴亦凡

  "took" : 64,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.313365,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 2.313365, //匹配度
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        }
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "梁非凡",
          "age" : "40",
          "desc" : "吔*啦你",
          "tags" : [
            "桌面清理大师",
            "警察",
            "啵嘴"
          ]
        }
      }
    ]
  }
}

4.3 更新数据 POST

在 Kibana Dev Tools 控制台中输入以下命令：

POST /stars/user/1/_update
{
  "doc": {
    "name": "坤坤"
  }
}

发送请求

得到响应如下：

{
  "_index" : "stars",
  "_type" : "user",
  "_id" : "1",
  "_version" : 2, // 更新次数
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

4.4 删除数据 DELETE

五、高级查询操作

5.1 普通查询

请求

GET stars/user/_search
{
  "query": {
    "match": {
      "name": "凡" // 关键字
    }
  }
}

响应

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.4471386,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        }
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "梁非凡",
          "age" : "40",
          "desc" : "吔*啦你",
          "tags" : [
            "桌面清理大师",
            "警察",
            "啵嘴"
          ]
        }
      }
    ]
  }
}

5.2 查询结果过滤指定字段

请求

GET stars/user/_search
{
  "query": {
    "match": {
      "name": "凡"
    }
  },
  "_source": ["name", "desc"] // 过滤字段
}

响应

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.4471386,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "吴亦凡",
          "desc" : "大碗宽面"
        }
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "梁非凡",
          "desc" : "吔*啦你"
        }
      }
    ]
  }
}

5.3 查询结果排序

请求

GET stars/user/_search
{
  "query": {
    "match": {
      "name": "凡"
    }
  },
  "sort": [
    {
      "age.keyword": {
        "order": "desc" // 降序
      }
    }
  ]
}

响应

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "梁非凡",
          "age" : "40",
          "desc" : "吔*啦你",
          "tags" : [
            "桌面清理大师",
            "警察",
            "啵嘴"
          ]
        },
        "sort" : [
          "40"
        ]
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        },
        "sort" : [
          "29"
        ]
      }
    ]
  }
}

5.4 查询结果分页

请求

GET stars/user/_search
{
  "query": {
    "match": {
      "name": "凡"
    }
  },
  "_source": ["name", "desc"],
  "from": 0, // 开始位置
  "size": 1 // 返回数据数目
}

响应

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.4471386,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "吴亦凡",
          "desc" : "大碗宽面"
        }
      }
    ]
  }
}

5.5 多条件查询

must：相当于关系型数据库 and

请求

GET stars/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "吴亦凡"
          }
        },
        {
          "match": {
            "age": "29"
          }
        }
      ]
    }
  }
}

响应

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 3.2941942,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 3.2941942,
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        }
      }
    ]
  }
}

should：相当于关系型数据库 or

请求

GET stars/user/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "吴亦凡"
          }
        },
        {
          "match": {
            "age": "29"
          }
        }
      ]
    }
  }
}

响应

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 3.2941942,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 3.2941942,
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        }
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "梁非凡",
          "age" : "40",
          "desc" : "吔*啦你",
          "tags" : [
            "桌面清理大师",
            "警察",
            "啵嘴"
          ]
        }
      }
    ]
  }
}

must_not：相当于关系型数据库 not

请求

GET stars/user/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "age": "29"
          }
        }
      ]
    }
  }
}

响应

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "name" : "梁非凡",
          "age" : "40",
          "desc" : "吔*啦你",
          "tags" : [
            "桌面清理大师",
            "警察",
            "啵嘴"
          ]
        }
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "name" : "坤坤",
          "age" : "22",
          "desc" : "鸡你太美",
          "tags" : [
            "唱",
            "跳",
            "rap",
            "篮球"
          ]
        }
      }
    ]
  }
}

5.6 根据过滤条件查询

请求

GET stars/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "凡"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "age": {
              "gte": 10, // 大于等于10岁
              "lte": 30 // 小于等于30岁
            }
          }
        }
      ]
    }
  }
}

响应

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.4471386,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        }
      }
    ]
  }
}

5.7 匹配多个条件查询

请求

GET stars/user/_search
{
  "query": {
    "match": {
      "tags": "唱 跳" // 多个条件使用空格隔开
    }
  }
}

响应

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.7137355,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.7137355,
        "_source" : {
          "name" : "坤坤",
          "age" : "22",
          "desc" : "鸡你太美",
          "tags" : [
            "唱",
            "跳",
            "rap",
            "篮球"
          ]
        }
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        }
      }
    ]
  }
}

5.8 精确查询

关于分词：

term：直接通过倒排索引指定的词条进行精确查询
match：先分析文档，再通过分析的文档进行查询

两个字段类型：

text：会被分词器解析
keyword：不会被分词器解析

5.9 高亮查询

请求

GET stars/user/_search
{
  "query": {
    "match": {
      "name": "吴亦凡"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}

响应

{
  "took" : 96,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.313365,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 2.313365,
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        },
        "highlight" : {
          "name" : [
            "<em>吴</em><em>亦</em><em>凡</em>" // 高亮标签
          ]
        }
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "梁非凡",
          "age" : "40",
          "desc" : "吔*啦你",
          "tags" : [
            "桌面清理大师",
            "警察",
            "啵嘴"
          ]
        },
        "highlight" : {
          "name" : [
            "梁非<em>凡</em>"
          ]
        }
      }
    ]
  }
}

5.10 自定义高亮标签

请求

GET stars/user/_search
{
  "query": {
    "match": {
      "name": "吴亦凡"
    }
  },
  "highlight": {
    "pre_tags": "<p class='key' style='color:red'>",
    "post_tags": "</p>", 
    "fields": {
      "name": {}
    }
  }
}

响应

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.313365,
    "hits" : [
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "2",
        "_score" : 2.313365,
        "_source" : {
          "name" : "吴亦凡",
          "age" : "29",
          "desc" : "大碗宽面",
          "tags" : [
            "加拿大",
            "电鳗",
            "说唱",
            "嘻哈"
          ]
        },
        "highlight" : {
          "name" : [
            "<p class='key' style='color:red'>吴</p><p class='key' style='color:red'>亦</p><p class='key' style='color:red'>凡</p>"
          ]
        }
      },
      {
        "_index" : "stars",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.4471386,
        "_source" : {
          "name" : "梁非凡",
          "age" : "40",
          "desc" : "吔*啦你",
          "tags" : [
            "桌面清理大师",
            "警察",
            "啵嘴"
          ]
        },
        "highlight" : {
          "name" : [
            "梁非<p class='key' style='color:red'>凡</p>"
          ]
        }
      }
    ]
  }
}

三、SpringBoot 整合 Elasticsearch

3.1 环境搭建

导入依赖

注意 Elasticsearch 版本需保持一致。

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

编写配置类

@Configuration
public class RestClientConfig extends AbstractElasticsearchConfiguration {

    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient() {
        final ClientConfiguration clientConfiguration = ClientConfiguration.builder()
                .connectedTo("39.105.80.221:9200").build();
        return RestClients.create(clientConfiguration).rest();
    }

}

3.2 索引相关操作

索引的创建

@SpringBootTest
class ElasticApplicationTests {

    @Autowired
    RestHighLevelClient elasticsearchClient;

    /**
     * 测试索引的创建
     */
    @Test
    void test01() throws IOException {
        // 创建请求
        CreateIndexRequest request = new CreateIndexRequest("test_index");
        // 客户端执行请求
        CreateIndexResponse response = elasticsearchClient.indices().create(request, RequestOptions.DEFAULT);
        System.out.println(response);
    }

}

判断索引是否存在

@SpringBootTest
class ElasticApplicationTests {

    @Autowired
    RestHighLevelClient elasticsearchClient;

    /**
     * 测试判断索引是否存在
     */
    @Test
    void test02() throws IOException {
        // 创建请求
        GetIndexRequest request = new GetIndexRequest("test_index");
        // 客户端执行请求
        boolean response = elasticsearchClient.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println(response);
    }

}

索引的删除

@SpringBootTest
class ElasticApplicationTests {

    @Autowired
    RestHighLevelClient elasticsearchClient;

    /**
     * 测试索引的删除
     */
    @Test
    void test03() throws IOException {
        // 创建请求
        DeleteIndexRequest request = new DeleteIndexRequest("test_index");
        // 客户端执行请求
        AcknowledgedResponse response = elasticsearchClient.indices().delete(request, RequestOptions.DEFAULT);
        System.out.println(response.isAcknowledged());
    }

}

3.3 文档相关操作

文档的添加

@Test
void test04() throws IOException {
    // 创建对象
    User user = new User("testUser", 18);
    // 创建请求
    IndexRequest request = new IndexRequest("test_index");
    // 设置id
    request.id("1");
    // 设置请求超时时间
    request.timeout(TimeValue.timeValueSeconds(1));
    // 将对象转为JSON数据放入请求
    request.source(objectMapper.writeValueAsString(user), XContentType.JSON);
    // 客户端发送请求
    IndexResponse response = elasticsearchClient.index(request, RequestOptions.DEFAULT);
    System.out.println(response.toString());
    System.out.println(response.status());
}

判断文档是否存在

@Test
void test05() throws IOException {
    // 创建请求
    GetRequest request = new GetRequest("test_index", "1");
    // 客户端发送请求
    boolean response = elasticsearchClient.exists(request, RequestOptions.DEFAULT);
    System.out.println(response);
}

文档信息的获取

@Test
void test06() throws IOException {
    // 创建请求
    GetRequest request = new GetRequest("test_index", "1");
    // 客户端发送请求
    GetResponse response = elasticsearchClient.get(request, RequestOptions.DEFAULT);
    System.out.println(response.getSourceAsString());
}

文档信息的更新

@Test
void test07() throws IOException {
    // 创建对象
    User user = new User("testUser", 28);
    // 创建请求
    UpdateRequest request = new UpdateRequest("test_index", "1");
    request.doc(objectMapper.writeValueAsString(user), XContentType.JSON);
    // 客户端发送请求
    UpdateResponse response = elasticsearchClient.update(request, RequestOptions.DEFAULT);
    System.out.println(response.status());
}

文档信息的删除

@Test
void test08() throws IOException {
    // 创建请求
    DeleteRequest request = new DeleteRequest("test_index", "1");
    // 设置请求超时时间
    request.timeout(TimeValue.timeValueSeconds(1));
    // 客户端发送请求
    DeleteResponse response = elasticsearchClient.delete(request, RequestOptions.DEFAULT);
    System.out.println(response.status());
}

文档数据的批量插入

@Test
void test09() throws IOException {
    // 创建请求
    BulkRequest request = new BulkRequest();
    // 设置超时时间
    request.timeout(TimeValue.timeValueSeconds(10));
    // 创建批量数据
    ArrayList<User> users = new ArrayList<>();
    users.add(new User("testUser02", 20));
    users.add(new User("testUser03", 21));
    users.add(new User("testUser04", 22));
    users.add(new User("testUser05", 23));
    users.add(new User("testUser06", 24));
    // 将批量数据添加至请求
    for (int i = 0; i < users.size(); i++) {
        request.add(
                new IndexRequest("test_index")
                        .id("" + i)
                        .source(objectMapper.writeValueAsString(users.get(i)), XContentType.JSON)
        );
    }
    // 客户端发送请求
    BulkResponse responses = elasticsearchClient.bulk(request, RequestOptions.DEFAULT);
    System.out.println(responses.hasFailures());
}

文档的查询

@Test
void test10() throws IOException {
    // 创建请求
    SearchRequest request = new SearchRequest("test_index");
    // 设置搜索条件
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    // 创建查询构建器
    sourceBuilder.query(QueryBuilders.termQuery("name.keyword", "testUser02"));
    // 设置超时时间
    sourceBuilder.timeout(TimeValue.timeValueSeconds(60));
    request.source(sourceBuilder);
    // 客户端发送请求
    SearchResponse response = elasticsearchClient.search(request, RequestOptions.DEFAULT);
    System.out.println(objectMapper.writeValueAsString(response.getHits()));
    for (SearchHit hit : response.getHits().getHits()) {
        System.out.println("----------");
        System.out.println(hit.getSourceAsMap());
    }
}

四、实战应用 - 京东搜索

4.1 环境搭建

导入依赖

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency>

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

配置文件

server:
  port: 8080
spring:
  thymeleaf:
    cache: false # 关闭 thymeleaf 缓存

controller

@Controller
public class IndexController {

    @GetMapping({"/", "/index"})
    public String index() {
        return "index";
    }

}

4.2 处理爬虫数据

搭配项目仓库 Web IDE 观看体验更佳
在这里插入图片描述

SpringBoot 检索篇 - 整合 Elasticsearch7.6.2

前言：

一、概述

1.1 与关系型数据库的客观对比

1.2 物理设计

1.3 逻辑设计

1.4 工作原理

1.5 倒排索引

二、部署&测试

2.1 部署 Elasticsearch

2.2 部署可视化工具 Elasticsearch-head

2.3 部署可视化工具 Kibana

2.4 安装 IK 分词器

2.5 添加自定义分词字典

三、Rest 风格说明

3.1 基础测试

3.2 创建索引规则

3.3 查看默认的信息

3.4 修改操作

3.5 删除操作

3.6 拓展命令

四、关于文档的基本操作

4.1 添加数据 PUT

4.2 查询数据 GET

4.3 更新数据 POST

4.4 删除数据 DELETE

五、高级查询操作

5.1 普通查询

5.2 查询结果过滤指定字段

5.3 查询结果排序

5.4 查询结果分页

5.5 多条件查询

5.6 根据过滤条件查询

5.7 匹配多个条件查询

5.8 精确查询

5.9 高亮查询

5.10 自定义高亮标签

三、SpringBoot 整合 Elasticsearch

3.1 环境搭建

3.2 索引相关操作

3.3 文档相关操作

四、实战应用 - 京东搜索

4.1 环境搭建

4.2 处理爬虫数据

一个值得尝试的 AI 赚钱小项目