谷粒商城-分布式高级篇[ElasticSearch]

HGW689

已于 2022-04-23 15:08:25 修改

阅读量2.9k

点赞数 6

分类专栏：戏称千人开发团队的大规模电商项目——谷粒商城文章标签： elasticsearch nginx docker

于 2022-04-04 16:35:35 首次发布

本文链接：https://blog.csdn.net/m0_49183244/article/details/123955156

版权

戏称千人开发团队的大规模电商项目——谷粒商城专栏收录该内容

16 篇文章 16 订阅

订阅专栏

一、ElasticSearch

官网学习文档

1.1、ElasticSearch 概述

1.1.1、ElasticSearch介绍

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用于能力的全文搜索引擎，基于RESTful web接口。它可以快速地存储、搜索和分析海量数据。

引入

Lucene 是一套信息检索工具包(Jar包) , 包含搜索引擎 !

包含的 : 索引结构、读写索引的工具、排序、搜索规则等等工具类

Lucene 和 ElasticSearch 的关系 :

ElasticSearch 是基于 Lucene 做了一些封装和增强

ElasticSearch：智能搜索，分布式的搜索引擎

是ELK的一个组成,是一个产品，而且是非常完善的产品，ELK代表的是：E就是ElasticSearch，L就是Logstach，K就是kibana

E：EalsticSearch 搜索和分析的功能
L：Logstach 搜集数据的功能，类似于flume（使用方法几乎跟flume一模一样），是日志收集系统
K：Kibana 数据可视化（分析），可以用图表的方式来去展示，文不如表，表不如图，是数据可视化平台

1.1.2、基本概念

Elasticsearch的一些常见术语:

Index ( 索引 ) ：索引是存放数据的地方 , 就好比mysql中的数据库 .
Type ( 类型 ) ：类型是用来定义数据结构的,这个在新的Elasticsearch版本已经废除（在以前的Elasticsearch版本，一个Index下支持多个Type-相当于表结构的描述，描述每个字段的类型。）
Document ( 文档 )：文档就是最终的数据了，可以认为一个文档就是一条记录。
Field( 字段 )：好比关系型数据库中列的概念，一个document有一个或者多个field组成。

集群，节点，索引，类型，文档，分片，映射是什么？

ElasticSearch 是面向文档 , 关系型数据库和 ElasticSearch客观的对比 ! 一切都是JSON

RelationalDB	ElasticSearch
数据库 ( DataBase )	index (索引)
表 ( tables )	Type (类型)
行 ( rows )	document (文档)
字段 ( columns )	Field (字段)

ElasticSearch(集群) 中可以包含多个索引(数据库) , 每个索引中可以包含多个类型(表) , 每个类型下又包含多个文档(行) , 每个文档中又包含多个字段(列) .

1.2、安装ElasticSearch

本次安装

安装ElasticSearch
安装Kibana

1.2.1、安装ElasticSearch

第一步、下载镜像

[root@hgwtencent ~]# docker pull elasticsearch:7.4.2
[root@hgwtencent ~]# docker pull kibana:7.4.2

第二步、创建本地挂载目录

将es中配置文件挂载到外面的目录，通过修改虚拟机外面的文件夹es配置，进而修改docker中es的配置

[root@hgwtencent ~]# mkdir -p /mydata/elasticsearch/config
[root@hgwtencent ~]# mkdir -p /mydata/elasticsearch/data

写了一个配置 http.host:0.0.0.0 代表es可以被远程的任何机器访问，注意这里host：后需要有空格

[root@hgwtencent ~]# echo "http.host: 0.0.0.0">> /mydata/elasticsearch/config/elasticsearch.yml

更改文件9个权限

[root@hgwtencent elasticsearch]# chmod -R 777 /mydata/elasticsearch/
[root@hgwtencent elasticsearch]# ll
总用量 12
drwxrwxrwx 2 root root 4096 3月  24 19:50 config
drwxrwxrwx 2 root root 4096 3月  24 19:50 data
drwxrwxrwx 2 root root 4096 3月  24 19:57 plugins

第三步、运行elasticsearch命令

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2

[root@hgwtencent ~]# docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xms64m -Xmx512m" -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins -d elasticsearch:7.4.2

为容器起一个名字为elasticsearch，-p暴露两个端口 9200 9300，
- 9200是发送http请求——restapi的端口；
- 9300是es在分布式集群状态下，结点之间的通信端口，
\ 代表换行下一行，
-e "discovery.type=single-node" : 是以单节点方式运行，
-e ES_JAVA_OPTS="-Xms64m -Xmx128m" : 指定初始64m，最大占用128m;
ES_JAVA_OPTS不指定的话，es一启动，会将内存全部占用，整个虚拟机就卡死了，
-v : 进行挂载，目录中配置，数据等一一关联 -d 后台启动es使用指定的镜像

访问 http://ip:9200/ ：

在这里插入图片描述

1.2.2、安装Kibana

上一步已经下载了Kibana的镜像，这里直接运行即可

访问5601端口，访问到可视化界面kibana，kibana再先发送请求到es9200

docker run --name kibana -e ELASTICSEARCH_HOSTS=http://124.222.223.222:9200 -p 5601:5601 -d kibana:7.4.2

在这里插入图片描述

汉化

修改kibana.yml 配置文件,在最底层加上下面后重启

i18n.locale: "zh-CN"

1.3、ES初步检索【CRUD】

1.3.1、检索es信息[_cat]

GET /_cat/nodes : 产看所有节点
GET /_cat/health : 查看es健康情况
GET /_cat/master : 查看主节点
GET /_cat/indices : 查看所有索引相当于mysql里的 show databases;

1.3.2、增加【POST/PUT】

保存一个数据，保存在哪个索引的哪个类型下，指定用哪个唯一标识

相当于保存在哪个数据库中的哪个表下

PUT 和 POST 都可以

POST 新增。如果不指定id，会自动生成id；指定id就会修改这个数据，并新增版本号
- 可以不指定id，不指定id时永远为创建
- 指定不存在的id为创建
- 指定存在的id为更新，而版本号会根据内容变没变而觉得版本号递增与否
PUT 可以新增可以修改。PUT必须指定id；由于PUT需要指定id，我们一般都用来做修改操作，不指定id会报错
- 必须指定id
- 版本号总会增加、

seq_no和version的区别：

每个文档的版本号_version 起始值都为1 每次对当前文档成功操作后都加1
而序列号_seq_no则可以看做是索引的信息在第一次为索引插入数据时为0，每对索引内数据操作成功一次sqlNO加1，并且文档会记录是第几次操作使它成为现在的情况的
可以参考https://www.cnblogs.com/Taeso/p/13363136.html

1.3.2.1、POST保存更新操作

新增：不带id,带id但之前没数据
修改：带id，并且id有数据

格式：POST /索引名/类型名[/标识id]

1.3.2.2、PUT更新操作

新增：带id但之前没数据
修改：带id，并且id有数据

注意：必须携带id

格式：PUT /索引名/类型名/标识id

比如：PUT customer/external/1, 在 customer 索引下的 external 类型下保存1号数据为

# PUT http://124.222.223.222:9200/customer/external/1

{
    "name":"John Doe"
}

响应体解说：

{
  	//带_的都称为元数据，反应基本信息
    "_index": "customer",	// 哪个索引下
    "_type": "external",	// 哪个类型下
    "_id": "1",						// id
    "_version": 1,				// 版本
    "result": "created",	// 结果
    "_shards": {					// 分片
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

再次发送一遍之后结果变成了 "result": "updated",

1.3.3、查询【GET】

格式： GET 索引/类型/标识id

GET请求： http://124.222.223.222:9200/customer/external/1

响应体：

{
    "_index": "customer",	// 在哪个索引
    "_type": "external",	// 在哪个记录
    "_id": "1",						// 记录id
    "_version": 2,				// 版本号
    "_seq_no": 1,					// 并发控制字段，每次更新就会+1,用来做乐观锁
    "_primary_term": 1,		// 同上，主分片重新分配，如重启，就会变化
    "found": true,
    "_source": {
        "name": "John Doe"
    }
}

乐观锁用法：通过“if_seq_no=1&if_primary_term=1”，当序列号匹配的时候，才进行修改，否则不修改。

A，B都要修改es中1记录，只要有一个人把这个记录改了，记录的版本号就+1（老版本），新版本用_sql_no,如果A还想改1，就需要加一个判断，

更新携带：?if_seq_no=0&if_primary_term=1

A发出PUT请求

http://124.222.223.222:9200/customer/external/1?if_seq_no=6&if_primary_term=1

并携带请求体：

{
    "name":"hgw"
}

响应体：

{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 5,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 7,		// 修改之后加1
    "_primary_term": 1
}

这时B也发出PUT请求

http://124.222.223.222:9200/customer/external/1?if_seq_no=6&if_primary_term=1

并携带请求体：

{
    "name":"2"
}

响应体：

{
    "error": {
        "root_cause": [
            {
                "type": "version_conflict_engine_exception",
                "reason": "[1]: version conflict, required seqNo [1], primary term [1]. current document has seqNo [7] and primary term [1]",
                "index_uuid": "E1DkbUZOT3mEVDgCUVqagQ",
                "shard": "0",
                "index": "customer"
            }
        ],
        "type": "version_conflict_engine_exception",
        "reason": "[1]: version conflict, required seqNo [6], primary term [1]. current document has seqNo [7] and primary term [1]",
        "index_uuid": "E1DkbUZOT3mEVDgCUVqagQ",
        "shard": "0",
        "index": "customer"
    },
    "status": 409
}

1.3.4、修改【POST/PUT】

PUT
PUT是全局更新 .
```
PUT /index/type/id
{
  "key":"value"
}
```
put修改数据其原理是覆盖 , 覆盖上次的数据 , 是全局的修改 . 使用时需修改所有的内容
POST
- 格式一 : 携带 _update
  
  POST 是局部更新数据 , 别的数据不动;
  
  首先对比原来的数据，如果与原来的数据一样就什么也不做，version，seq_no都不增加
```
POST /index/type/id/_update
{
  "doc":{
    "key":"value"
  }
}
```
- 格式二 ：不携带 _update
  此时就和PUT一样：修改数据其原理是覆盖 , 覆盖上次的数据 , 是全局的修改 . 使用时需修改所有的内容
```
POST /index/type/id
{
  "key":"value"
}
```

看场景：

对于大并发更新，不带_update
对于大并发查询偶尔更新，带_update；对比更新，重新计算分配规则

1.3.5、删除文档【DELETE】

删除指定文档
格式：DELETE index/type/id
删除索引
格式：DELETE index

注：es中没有提供类型直接删除的操作

删除指定文档

在这里插入图片描述

删除索引

在这里插入图片描述

1.3.6、bulk批量API

1.3.6.1、bulk批量API

两个为一行操作，每一条都是独立的，index是一个保存操作，上一条的失败不会影响下一条的记录的成功失败，不像mysql中的事务，一条失败全部回滚

格式：

POST /index/type/_bulk
{action:{metadata}}
{request body}
{action:{metadata}}
{request body}

普通实例：批量向customer索引的external类型下批量插入两条数据

两行为一个整体
注意格式json和text均不可，要去kibana里Dev Tools

POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"tang"}
{"index":{"_id":"2"}}
{"name":"Jane Doe"}

响应体：

#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
  "took" : 186,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

复杂实例：对于整个索引执行批量操作

请求体：

POST /_bulk
{"delete": {"_index": "website","_type": "blog", "_id": "123"}}
{"create": {"_index": "website","_type": "blog", "_id": "123"}}
{"title": "My first blog post"}
{"index": {"_index": "website","_type": "blog"}}
{"title": "My second blog post"}
{"update": {"_index": "website","_type": "blog", "_id": "123"}}
{"doc": {"title": "My updated blog post"}}

响应体：

#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
  "took" : 318,
  "errors" : false,
  "items" : [
    {
      "delete" : {	// 删除
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 1,
        "result" : "not_found",	// 没有该记录
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 404	
      }
    },
    {
      "create" : {	// 创建
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 2,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {		// 保存
        "_index" : "website",
        "_type" : "blog",
        "_id" : "YaYpv38Bv4eqNRuQTKHT",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "update" : {		// 更新   
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 3,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

1.3.6.2、样本测试数据

准备了一份顾客银行账户信息的虚构的JSON文档样本。每个文档都有下列的schema（模式）。

{
	"account_number": 1,
	"balance": 39225,
	"firstname": "Amber",
	"lastname": "Duke",
	"age": 32,
	"gender": "M",
	"address": "880 Holmes Lane",
	"employer": "Pyrami",
	"email": "amberduke@pyrami.com",
	"city": "Brogan",
	"state": "IL"
}

视频中的网址找不到了，这里使用评论区大哥的整理：样本测试数据链接导入测试数据

POST /bank/account/_bulk
连接里的内容

在这里插入图片描述

1.4、进阶检索

1.4.1、SearchAPI 检索文档

ES支持两种基本方式检索：

通过 REST request uri 发送检索参数（uri + 请求参数）
通过 REST request body 来发送它们（uri + 请求体）

信息检索

第一种：请求参数方式检索

请求参数方式检索：

GET bank/_search?q=*&sort=account_number:asc

请求体参数说明：

GET bank/_search :检索bank下所有信息，包括type和docs
q=* ：查询所有
sort ：排序字段
asc ：升序

响应体说明：

took : 花费多少ms搜索
timed_out：是否超时
_shards：多少分片被搜索了，以及多少成功/失败的搜索分片
max_score：文档相关性最高得分
hits.total.value：多少匹配文档被找到
hits.sort：结果的排序key（列），没有的话按照score排序
hits._score：相关得分

GET bank/_search?q=*&sort=account_number:asc

检索了1000条数据，但是根据相关性算法，只返回10条

第二种、uri+请求体进行检索

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" },
    { "balance":"desc"}
  ]
}

POSTMAN中get不能携带请求体，我们变为post也是一样的，我们post一个jsob风格的查询请求体到_search

需要了解，一旦搜索的结果被返回，es就完成了这次请求，不能切不会维护任何服务端的资源或者结果的cursor游标

1.4.2、DSL语言

ES最主要是用来做搜索和分析的 , 所以DSL还是对ES很重要的

query DSL:domain Specialed Lanaguage 在特定领域的语言

1.4.2.1、标准查询[query]

match查询是一个标准查询，不管你需要全文本查询还是精确查询基本上都要用到它。

1.4.2.1.1、查询所有【match_all】

查询所有【match_all】

GET /index/type/_search
{
  "query": {
    "match_all": {}
  }
}

实例一：查询bank索引下的所有文档

GET /bank/_search
{
  "query": {
    "match_all": {}
  }
}

1.4.2.1.2、匹配查询【match】

匹配查询【match】

基本类型（非字符串），精确控制
字符串，全文检索
- FIELD：拆分查询
- FIELD.keyword：必须全匹配上才检索成功，精确匹配

GET /index/type/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT"
    }
  }
}
## 精确匹配
GET /index/type/_search
{
  "query": {
    "match": {
      "FIELD.keyword": "TEXT"
    }
  }
}

实例一： 精确控制查询bank索引下的 balance 为 39225的文档

GET /bank/_search
{
  "query": {
    "match": {
      "balance": "39225"
    }
  }
}

实例二：全文检索 查询bank索引下的 address 含 kings 的文档

GET /bank/_search
{
  "query": {
    "match": {
      "address": "kings"
    }
  }
}

在这里插入图片描述

1.4.2.1.3、短语匹配【match_phrase】

短语匹配【match_phrase】

将需要匹配的值当成一整个单词（不分词）进行检索

match_phrase：不拆分字符串进行检索，子串包含即可
字段.keyword：不拆分字符串，且必须完全匹配上才检索成功精确匹配

GET /index/type/_search
{
  "query": {
    "match_phrase": {
      "FIELD": "PHRASE"
    }
  }
}

实例：查询bank索引下的 address 含 mill lane 的文档

在这里插入图片描述

文本字段的匹配，使用keyword，匹配的条件就是要显示字段的全部值，要进行精确匹配的。

match_phrase是做短语匹配，只要文本中包含匹配条件，就能匹配到。

1.4.2.1.4、多字段查询【multi_math】

多字段查询【multi_math】，在查询过程中，会对于查询条件进行分词。

GET /bank/_search
{
  "query": {
    "multi_match": {
      "query": "",
      "fields": []
    }
  }
}

案例： state或者address中包含mill，并且在查询过程中，会对于查询条件进行分词。

GET /bank/_search
{
  "query": {
    "multi_match": {  
      "query": "mill",
      "fields": [
        "state",
        "address"
      ]
    }
  }
}

1.4.2.1.5、term

和match一样。匹配某个属性的值。

全文检索字段用match，
其他非text字段匹配用term。

不要使用term来进行文本字段查询

es默认存储text值时用分词分析，所以要搜索text值，使用match

使用term匹配查询

GET /index/type/_search
{
  "query": {
    "term": {
      "FIELD": "TEXT"
    }
  }
}

1.4.2.2、过滤结果[_source]

输出结果 , 不想那么多

相当于:

selecy name, desc from user

过滤结果格式：

GET /index/type/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT"
    }
  },
  "_source": ["field1",...,"fieldn"]
}

实例：查询 bank 索引下的 "balance"为 "39225"的文档，且只输出 “balance”,“age” 字段
```
GET /bank/_search
{
  "query": {
    "match": {
      "balance": "39225"
    }
  },
  "_source": ["balance","age"]
}
```
查询结果：

在这里插入图片描述

1.4.2.3、排序【sort]

相当于 order by

GET /index/type/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT"
    }
  },
  "sort": [
    {
      "FIELD": {
        "order": "desc"
      }
    }
  ]
}

案例：查询bank索引下所有文档且按照 balance 降序排序

在这里插入图片描述

1.4.2.4、分页查询[from,size]

分页查询格式：

GET /index/type/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT"
    }
  },
  "sort": [
    {
      "FIELD": {
        "order": "desc"
      }
    }
  ],
  "from": 从第几个数据开始,
  "size": 返回多少条记录
}

实例：查询bank索引下所有文档且按照 balance 降序排序，显示前第1～10条文档

在这里插入图片描述

数据下标还是从0开始的 , 和所有学的数据结构是一样的 !

/search/{current}/{pagesize}

1.4.2.5、布尔值查询[bool]

如果需要多个查询条件拼接在一起就需要使用bool

bool 过滤可以用来合并多个过滤条件查询结果的布尔逻辑，它包含以下操作符：

must 相当于 and , 多个查询条件的完全匹配
should 相当于 or , 至少有一个查询条件匹配
must_not 相当于 not , 多个查询条件的相反匹配

这些参数可以分别继承一个过滤条件或者一个过滤条件的数组

1.4.2.5.1、must

must 相当于 and , 所有的条件都要符合

实例：查询gender=F，并且address=mill的数据

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "F"
          }
          
        },
        {
          "match":{
            "address": "mill"
          }
        }
      ]
    }
  }
}

在这里插入图片描述

1.4.2.5.2、should

should 相当于 or , 符合其中任意一条件即可

should：应该达到should列举的条件，如果到达会增加相关文档的评分，并不会改变查询的结果。如果query中只有should且只有一种匹配规则，那么should的条件就会被作为默认匹配条件二区改变查询结果。

实例：查询gender=m，并且address=mill的数据，匹配lastName应该等于Wallace的数据

在这里插入图片描述

1.4.2.5.3、must_not

must_not 相当于 not , 取反

实例：查询gender=m，并且address=mill的数据，但是age不等于38的

在这里插入图片描述

1.4.2.6、过滤器[filter]

"filter": {
  "range": {
    " FIELD": {
      "gte": num1,
      "lte": num2
    }
  }
}

gt 大于
gte 大于等于
lt 小于
lte 小于等于

并不是所有的查询都需要产生分数，特别是哪些仅用于filtering过滤的文档。为了不计算分数，elasticsearch会自动检查场景并且优化查询的执行。 不参与评分更快

must 贡献得分
should 贡献得分
must_not 不贡献得分
filter 不贡献得分

上面的must和should影响相关性得分，而must_not仅仅是一个filter ，不贡献得分
must改为filter就使must不贡献得分
如果只有filter条件的话，我们会发现得分都是0
一个key多个值可以用terms

实例：查询所有匹配address=mill的文档，然后再根据10000<=balance<=20000进行过滤查询结果

在这里插入图片描述

1.4.2.7、高亮查询[highlight]

在这里插入图片描述

1.4.2.8、aggregations 聚合分析

前面介绍了存储、检索，但还没介绍分析

聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于SQL Group by和SQL聚合函数。

在elasticsearch中，执行搜索返回this（命中结果），并且同时返回聚合结果，把以响应中的所有hits（命中结果）分隔开的能力。这是非常强大且有效的，你可以执行查询和多个聚合，并且在一次使用中得到各自的（任何一个的）返回结果，使用一次简洁和简化的API啦避免网络往返。

1.4.2.8.1、aggs：执行聚合

聚合语法如下：

"aggs":{ # 聚合
    "aggs_name":{ # 这次聚合的名字，方便展示在结果集中
        "AGG_TYPE":{} # 聚合的类型(avg,term,terms)
     }
}

terms：看值的可能性分布，会合并锁查字段，给出计数即可
avg：看值的分布平均

实例：搜索address中包含mill的所有人的年龄分布以及平均年龄，但不显示这些人的详情

GET /bank/_search
{
  "query": {	# 查询出包含mill的
    "match": {
      "address": "mill"
    }
  },
  "aggs": {		# 基于查询聚合
    "ageAgg": {	# 查询的名字，随便起
      "terms": {	# 看值的可能性分配
        "field": "age",
        "size": 10
      }
    },
    "ageAvg":{	
      "avg": {		# 看age值的平均
        "field": "age"
      }
    }
  },
  "size": 0			 # 不看详情
}

查询结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,		 // 命中4条
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {				// 第一个聚合的结果
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 38,			// age为38的有2条
          "doc_count" : 2
        },
        {
          "key" : 28,
          "doc_count" : 1
        },
        {
          "key" : 32,
          "doc_count" : 1
        }
      ]
    },
    "ageAvg" : {			// 第二个聚合的结果
      "value" : 34.0		// 平均年龄 34
    }
  }
}

1.4.2.8.2、子聚合

按照年龄聚合，并且求这些年龄段的这些人的平均薪资

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "aggAgg": {
      "terms": {		# 看分布
        "field": "age",
        "size": 10
      },
      "aggs": {			# 与terms并列
        "ageAvg": {		# 平均
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

输出结果：

在这里插入图片描述

1.4.2.8.3、复杂子聚合

复杂子聚合：查出所有年龄分布，并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": {  #  看age分布
        "field": "age",
        "size": 100
      },
      "aggs": { # 子聚合
        "genderAgg": {
          "terms": { # 看gender分布
            "field": "gender.keyword" # 注意这里，文本字段应该用.keyword
          },
          "aggs": { # 子聚合
            "balanceAvg": {
              "avg": { # 男性的平均
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg": {
          "avg": { #age分布的平均（男女）
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

输出结果：

在这里插入图片描述

1.5、Mapping 映射

Mapping(映射)是用来定义一个文档（document），以及它所包含的属性（field）是如何存储和索引的。比如：使用maping来定义：

哪些字符串属性应该被看做全文本属性（full text fields）；
哪些属性包含数字，日期或地理位置；
文档中的所有属性是否都能被索引（all 配置）；
日期的格式；
自定义映射规则来执行动态添加属性；

1.5.1、查看映射[GET /index/_mapping]

查看指定索引的mapping信息：GET /index/_mapping

在这里插入图片描述

1.5.2、创建映射 [PUT index ]

创建索引并指定映射

PUT index  
{
  "mappings": {
    "properties": {
      "字段名1":{
        "type": "字段类型1"
      },
      "字段名2":{
        "type": "字段类型2"
      },
      ......
      "字段名n ":{
        "type": "字段类型2"
      }
    }
  }
}

实例：

在这里插入图片描述

此时查看所有索引：发现被创建

在这里插入图片描述

1.5.3、添加新的字段映射[PUT /index/_mappint]

只限于添加新的字段映射

PUT /index/_mapping
{
  "properties": {
    "字段名1":{
        "type": "字段类型1"
        "index": true/false # 检索
      },
    ......
    "字段名n ":{
      "type": "字段类型n"
       "index": true/false # 检索 
    }
  }
}

在这里插入图片描述

1.5.4、更新映射

对于已存在的映射字段，我们不能更新。更新必须创建新的索引进行数据迁移

1.5.5、数据迁移

数据迁移一共分两步：

先创建新索引的正确映射

使用如下方式进行数据迁移

6.0以后写法，没有类型

POST reindex
{
  "source":{
      "index":"twitter"
   },
  "dest":{
      "index":"new_twitters"
   }
}

老版本写法，有类型

POST reindex
{
  "source":{
      "index":"twitter",
      "type":"tweet"
   },
  "dest":{
      "index":"new_twitters"
   }
}

实例： bank索引中原来存在 account类型，新版本中启用了类型，所以我们把他去掉

第一步、创建新的索引并指定映射

PUT /newbank
{
  "mappings": {
    "properties": {
      "account_number": {
        "type": "long"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "balance": {
        "type": "long"
      },
      "city": {
        "type": "keyword"
      },
      "email": {
        "type": "keyword"
      },
      "employer": {
        "type": "keyword"
      },
      "firstname": {
        "type": "keyword"
      },
      "gender": {
        "type": "keyword"
      },
      "lastname": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "keyword"
      }
    }
  }
}

第二步、将bank中的数据迁移到newbank中

POST _refresh
{
  "source": {
    "index": "bank",
    "type": "account"
  },
  "dest": {
    "index": "newbank"
  }
}

在这里插入图片描述

此时，查看newbank的数据
在这里插入图片描述

1.6、IK分词器

1.6.1、IK分词器概述

什么是IK分词器？

分词：即把一段中文或者别的划分成一个个的关键字，我们在搜索的时候会把自己的信息进行分词，会把数据库中或索引库中的数据进行分词，然后进行一个匹配操作，默认的中文分词是将每个字看成一个词，比如“我爱臭宝儿”会被分成 “我”、“爱”、“臭”、“宝”、“儿“，这显然是不符合要求的，所以我们需要安装中问分词器ik来解决这个问题。

如果要使用中文，建议使用ik分词器

IK提供了两个分词算法 : ik_smart 和 ik_max_work , 其中

ik_smart 为最少切分 ,
ik_max_word为最细粒度划分 .

什么是IK分词器：

把一句话分词
如果使用中文：推荐IK分词器
两个分词算法：ik_smart（最少切分），ik_max_word（最细粒度划分）

1.6.2、安装`ik分词器`

第一步、查看自己的ES版本号，下载同ES版本一致的IK分词器

在这里插入图片描述

第二步、进入es容器内部plugin目录，下载IK

docker exec -it 容器id /bin/bash
在plugins路径下下载 ik分词器: wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
unzip 下载的文件
重启ES容器服务： docker restart elasticsearch;

[root@hgwtencent plugins]# docker exec -it elasticsearch /bin/bash
[root@1d4f72514071 elasticsearch]# pwd
/usr/share/elasticsearch
[root@1d4f72514071 elasticsearch]# yum install wget
[root@1d4f72514071 elasticsearch]# cd plugins/
[root@1d4f72514071 plugins]# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
[root@1d4f72514071 plugins]# unzip elasticsearch-analysis-ik-7.4.2.zip -d ik
[root@1d4f72514071 plugins]# chmod -R 777 plugins/ik
[root@1d4f72514071 plugins]# exit;
[root@hgwtencent plugins]# docker restart elasticsearch;
elasticsearch

确认是否安装好了分词器：

在这里插入图片描述

1.6.3、测试ik分词器

ik_smart 为最少切分
ik_max_word为最细粒度划分

1.6.4、IK分词器自定义扩展词库

在这里插入图片描述

发现问题：黄龚伟被拆开了！

这种自己需要的词，需要自己加入到我们的分词器字典中！

第一步、根据1.8附录：安装好Nginx。用来存放IK分词器的扩展词库

[root@hgwtencent html]# mkdir /mydata/nginx/html/es
[root@hgwtencent html]# cd /mydata/nginx/html/es
[root@hgwtencent es]# vim fenci.txt
输入：黄龚伟

测试http://192.168.56.10/es/fenci.txt

第二步、修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml

[root@hgwtencent config]# pwd
/mydata/elasticsearch/plugins/ik/config
[root@hgwtencent config]# vim IKAnalyzer.cfg.xml

修改为以下内容：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict"></entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <entry key="remote_ext_dict">http://124.222.223.222/es/fenci.txt</entry>
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

修改完成后，需要重启elasticsearch容器，否则修改不生效。docker restart elasticsearch

更新完成后，es只会对于新增的数据用更新分词。历史数据是不会重新分词的。如果想要历史数据重新分词，需要执行：

POST my_index/_update_by_query?conflicts=proceed

测试：

在这里插入图片描述

以后我们只需要修改/mydata/nginx/html/es路径下的 fenci.txt文件即可，添加我们的扩展词库

1.7、elasticsearch-Rest-Client

java操作es有两种方式

9300: TCP
- spring-data-elasticsearch:transport-api.jar;
  - springboot版本不同，ransport-api.jar不同，不能适配es版本
  - 7.x已经不建议使用，8以后就要废弃
9200: HTTP
有诸多包
- jestClient: 非官方，更新慢；
- RestTemplate：模拟HTTP请求，ES很多操作需要自己封装，麻烦；
- HttpClient：同上；
- Elasticsearch-Rest-Client：官方RestClient，封装了ES操作，API层次分明，上手简单；

最终选择Elasticsearch-Rest-Client（elasticsearch-rest-high-level-client）

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html

1.8、SpringBoot整合ES

创建项目 gulimall-search

选择依赖web，但不要在里面选择es。并做降版本处理

1.8.1、配置基本的项目

第一步、导入依赖

这里的版本要和所按照的ELK版本匹配

<dependency>
  <groupId>org.elasticsearch.client</groupId>
  <artifactId>elasticsearch-rest-high-level-client</artifactId>
  <version>7.4.2</version>
</dependency>

在spring-boot-dependencies中所依赖的ES版本位6.8.4，要改掉

<properties>
    <java.version>1.8</java.version>
                         
</properties>

第二步、编写配置，给容器中注入一个 RestHighLevelClient

编写配置，给容器中注入一个 RestHighLevelClient

在 com.hgw.gulimall.search.config包下创建一个ES的配置类

请求测试项，比如es添加了安全访问规则，访问es需要添加一个安全头，就可以通过requestOptions设置

官方建议把requestOptions创建成单实例

@Configuration
public class GulimallElasticSearchConfig {
    public static final RequestOptions COMMON_OPTIONS;

    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
        COMMON_OPTIONS = builder.build();
    }

    @Bean
    public RestHighLevelClient esRestClient() {
        RestClientBuilder builder = null;
        // 可以指定多个es
        builder = RestClient.builder(new HttpHost("124.222.223.222",9200,"http"));
        RestHighLevelClient client = new RestHighLevelClient(builder);
        return client;
    }
}

此外还有多种方法

1.8.2、具体的Api测试

1.8.2.1、创建索引

创建索引

// 测试索引的创建 Request
@Test
public void testCreateIndex() throws IOException {
    // 1、创建索引请求
    CreateIndexRequest request = new CreateIndexRequest("hgw_index");
    // 2、执行创建请求 IndicesClient，请求后获得响应
    CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
    System.out.println(createIndexResponse);
}

在这里插入图片描述

1.8.2.2、获取索引（判断索引是否存在）

获取索引

// 测试获取索引
@Test
public void testGetIndex() throws IOException {
    GetIndexRequest request = new GetIndexRequest("hgw_index");
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    System.out.println(exists);
}

1.8.2.3、删除索引

删除索引

// 测试删除索引
@Test
public void testDeleteIndex() throws IOException {
    DeleteIndexRequest request = new DeleteIndexRequest("hgw_index");
    AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
    System.out.println(delete.isAcknowledged());
}

1.8.2.4、CRUD文档

1.8.2.4.1、增加

创建文档

@Test
public void testAddDocument() throws IOException {
    // 创建请求体内容
    User user = new User();
    user.setName("hgw");
    user.setGender("男");
    user.setAge(21);
    String UserJson = JSON.toJSONString(user);

    // 创建请求
    IndexRequest request = new IndexRequest("hgw_index");

    // 规则 PUT /hgw_index/_doc/1
    request.id("1");
    request.timeout(TimeValue.timeValueSeconds(1));
    request.timeout("1s");

    // 将我们的数据放入请求JSON, 并且指定泛指的类型
    request.source(UserJson, XContentType.JSON);

    // 客户端发送请求，获取响应的结果
    IndexResponse indexResponse = client.index(request, GulimallElasticSearchConfig.COMMON_OPTIONS);

    System.out.println(indexResponse.toString());   
    System.out.println(indexResponse.status()); // 对应我们命令返回的状态 CREATED

}

IndexResponse[index=hgw_index,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]

批量增加

@Test
public void testBulkRequest() throws IOException {
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.timeout("10s");

    ArrayList<User> userList = new ArrayList<>();
    userList.add(new User("hwg1", "男", 18));
    userList.add(new User("hwg2", "男", 19));
    userList.add(new User("hwg3", "男", 20));
    userList.add(new User("hwg4", "男", 21));
    userList.add(new User("hly5", "女", 22));

    // 批处理请求
    for (int i = 0; i < userList.size(); i++) {
        bulkRequest.add(new IndexRequest("hgw_index")
                .id(""+(i+1))
                .source(JSON.toJSONString(userList.get(i)), XContentType.JSON)
        );
    }
    BulkResponse bulk = client.bulk(bulkRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);
    System.out.println(bulk.hasFailures()); // 是否失败，返回false表示成功
}

1.8.2.4.2、查找

获取文档的信息

// 获取文档的信息
@Test
public void testGetDocument() throws IOException {
    GetRequest getRequest = new GetRequest("hgw_index","1");
    GetResponse getResponse = client.get(getRequest,  GulimallElasticSearchConfig.COMMON_OPTIONS);
    System.out.println(getResponse.getSourceAsString());    // 打印文档的内容
    System.out.println(getResponse);   // 返回的全部内容和命令是一样的
}

获取文档 (判断是否存在)

// 获取文档，判断是否存在 get /index/doc/1
@Test
public void testIsExists() throws IOException {
    GetRequest getRequest = new GetRequest("hgw_index", "1");
    // 不获取返回的 _source 的上下文了
    getRequest.fetchSourceContext(new FetchSourceContext(false));
    getRequest.storedFields("_none_");

    boolean exists = client.exists(getRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);
    System.out.println(exists);
}

1.8.2.4.3、修改

更新文档的信息

// 更新文档的信息
@Test
public void testUpdateDocument() throws IOException {
    UpdateRequest updateRequest = new UpdateRequest("hgw_index","1");
    updateRequest.timeout("1s");

    User user = new User("黄龚伟", "男",50);
    updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

    UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
    System.out.println(updateResponse.status());
}

1.8.2.4.4、删除

删除文档记录

// 删除文档的纪律
@Test
public void testDeleteRequest() throws IOException {
    DeleteRequest deleteRequest = new DeleteRequest("hgw_index","1");
    deleteRequest.timeout("1s");

    DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
    System.out.println(deleteResponse.status());
}

1.8.2.5、检索

1.8.2.5.1、普通检索

普通检索

/**
 * SearchRequest 搜索请求
 * SearchSourceBuilder 条件构造
 * HighlightBuilder 构建高亮
 * TermQueryBuilder 精确查询
 * MatchAllQueryBuilder 匹配所有
 * XXXQueryBuilder 对应所有命令
 */
@Test
public void testSearch() throws IOException {
    // 1、创建检索的请求
    SearchRequest searchRequest = new SearchRequest("bank");

    // 2、封装检索的构建
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    // 2.1）构件检索条件
    sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
    sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

    // 3、构建放到请求里面
    searchRequest.source(sourceBuilder);

    // 4、执行请求
    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    System.out.println(JSON.toJSON(searchResponse.getHits()));
    System.out.println("=====================");
    for (SearchHit hit : searchResponse.getHits().getHits()) {
        System.out.println(hit.getSourceAsMap());
    }

}

1.8.2.5.2、复杂检索

搜索address中包含mill的所有人的年龄分布，平均薪资

@Test
public void searchData() throws IOException {
    // 1、创建检索请求
    SearchRequest searchRequest = new SearchRequest();
    // 指定索引
    searchRequest.indices("bank");
    // 指定DSL,索引条件
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    // 1.1）构件检索条件
    searchSourceBuilder.query(QueryBuilders.matchQuery("address", "mill"));
    // 按照年年龄的值分布进行聚合
    TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10);
    searchSourceBuilder.aggregation(ageAgg);
    // 计算平均薪资进行聚合
    AvgAggregationBuilder balanceAvg = AggregationBuilders.avg("balanceAvg").field("balance");
    searchSourceBuilder.aggregation(balanceAvg);
    
    searchRequest.source(searchSourceBuilder);

    // 2、执行检索
    SearchResponse searchResponse = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);

    // 3、分析结果
    System.out.println(searchResponse.toString());

转换Bean

 // 3.1)、获取所有查找的数据
    SearchHits hits = searchResponse.getHits();
    SearchHit[] searchHits = hits.getHits();
    for (SearchHit searchHit : searchHits) {
        String string = searchHit.getSourceAsString();
        Accout accout = JSON.parseObject(string, Accout.class);
        System.out.println("accout: " + accout);
    }
}

accout: GulimallSearchApplicationTests.Accout(account_number=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
accout: GulimallSearchApplicationTests.Accout(account_number=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
accout: GulimallSearchApplicationTests.Accout(account_number=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
accout: GulimallSearchApplicationTests.Accout(account_number=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)

Buckets分析信息

    // 3.2)、获取这次检索到的分析信息
    Aggregations aggregations = searchResponse.getAggregations();
    Terms ageAgg1 = aggregations.get("ageAgg");
    for (Terms.Bucket bucket : ageAgg1.getBuckets()) {
        System.out.println("年龄："+ bucket.getKeyAsString());
    }

    Avg balanceAvg1 = aggregations.get("balanceAvg");
    System.out.println("平均薪资：" + balanceAvg1.value());

在这里插入图片描述

1.9、附录：安装Nginx

随便启动一个nginx实例，只是为了复制出配置

[root@hgwtencent mydata]# docker run -p 80:80 --name nginx -d nginx:1.10

将容器内的配置文件拷贝到/mydata/nginx/conf/ 下

[root@hgwtencent mydata]# mkdir -p /mydata/nginx/html
[root@hgwtencent mydata]# mkdir -p /mydata/nginx/logs
[root@hgwtencent mydata]# cd nginx
[root@hgwtencent mydata]# docker container cp nginx:/etc/nginx . 
[root@hgwtencent mydata]# mv nginx conf

在这里插入图片描述

终止并删除原容器：

[root@hgwtencent nginx]# docker stop nginx
[root@hgwtencent nginx]# docker rm nginx
nginx

创建新的Nginx，执行以下命令

docker run -p 80:80 --name nginx \
 -v /mydata/nginx/html:/usr/share/nginx/html \
 -v /mydata/nginx/logs:/var/log/nginx \
 -v /mydata/nginx/conf/:/etc/nginx \
 -d nginx:1.10

在这里插入图片描述

设置开机启动nginx

[root@hgwtencent html]# docker update nginx --restart=always

创建“/mydata/nginx/html/index.html”文件，测试是否能够正常访问
```
[root@hgwtencent html]# echo '<h1>Gulimall<h1/>' >index.html
```
访问：http://nginx所在主机的IP:80/index.html