ElasticSearch

最新推荐文章于 2024-08-11 18:53:25 发布

希昂的学习日记

最新推荐文章于 2024-08-11 18:53:25 发布

阅读量213

点赞数 1

分类专栏： ElasticSearch 文章标签： elasticsearch restful

本文链接：https://blog.csdn.net/guaituo0129/article/details/118199421

版权

ElasticSearch 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

0.参考资料

官方文档：What is Elasticsearch? | Elasticsearch Guide [7.8] | Elastic

B站视频：【尚硅谷】ElasticSearch教程入门到精通（基于ELK技术栈elasticsearch 7.8.x版本）_哔哩哔哩_bilibili

1、介绍

elasticSearch是一款基于json的分布式搜索和分析引擎。

分布式、RESTFUL搜索和分析。

官方使用手册：Elasticsearch Guide [7.8] | Elastic

2、安装

在官网下载安装包，本地直接解压即可。

Elasticsearch 7.8.0 | Elastic

下载完成后，直接进到bin目录下运行./elasticSearch

运行成功后，进入127.0.0.1:9200查看es是否启动成功。

3、概念

类比数据库：

index -> DB

doc -> record

(新版本的es已经取消了type)

4、实际操作

4.1 索引index

方法	方法类型	请求demo	requestBody	response	备注
创建	PUT	http://127.0.0.1:9200/shopping		{ "acknowledged": true, "shards_acknowledged": true, "index": "shopping" }
定义index属性	PUT	http://127.0.0.1:9200/shopping/_mapping	{ "properties":{ "name":{ "type":"text", "index":true }, "sex":{ "type":"keyword", "index":true }, "phone":{ "type":"text", "index":false } } }	{ "acknowledged": true }	1.type 定义类型 2.index定义是否可被检索
删除	DELETE	http://127.0.0.1:9200/shopping		{ "acknowledged": true }
查看	GET	http://127.0.0.1:9200/shopping		{ "shopping": { "aliases": {}, "mappings": { "properties": { "color": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "price": { "type": "long" }, "size": { "type": "long" } } }, "settings": { "index": { "creation_date": "1623854266494", "number_of_shards": "1", "number_of_replicas": "1", "uuid": "UodmDyjPSaCoPTWMKIUiAA", "version": { "created": "7080099" }, "provided_name": "shopping" } } } }
查看所有	GET	http://127.0.0.1:9200/_cat/indices?v

4.2 doc文档

方法	方法类型	请求demo	requestBody	response	备注
创建(不指定id)	POST	http://127.0.0.1:9200/shopping/_doc	{ "name":"apple", "price":5299, "color":"blue", "size":64 }	{ "_index": "shopping", "_type": "_doc", "_id": "UKGDFXoB3JbWUSxLOAEF", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 }	1.如果之前未指定index属性，则会根据此doc创建index属性 2.如果有之前没加过的字段，此次也会更新到index结构中
创建(指定id)	POST	http://127.0.0.1:9200/shopping/_doc/1001	{ "name":"apple", "price":5299, "color":"blue", "size":64 }	{ "_index": "shopping", "_type": "_doc", "_id": "1001", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 2, "_primary_term": 1 }	此方法也可用作更新
创建(指定id)	PUT	http://127.0.0.1:9200/shopping/_create/1004	{ "name":"apple", "price":5299, "color":"blue", "size":64 }	{ "_index": "shopping", "_type": "_doc", "_id": "1004", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 8, "_primary_term": 1 }
查看	GET	http://127.0.0.1:9200/shopping/_doc/1004		{ "_index": "shopping", "_type": "_doc", "_id": "1004", "_version": 1, "_seq_no": 8, "_primary_term": 1, "found": true, "_source": { "name": "apple", "price": 5299, "color": "blue", "size": 64 } }
删除	DELETE	http://127.0.0.1:9200/shopping/_doc/1004		{ "_index": "shopping", "_type": "_doc", "_id": "1004", "_version": 2, "result": "deleted", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 9, "_primary_term": 1 }
查询所有	GET	http://127.0.0.1:9200/shopping/_doc/_search		{ "took": 28, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 4, "relation": "eq" }, "max_score": 1.0, "hits": [ { "_index": "shopping", "_type": "_doc", "_id": "1001", "_score": 1.0, "_source": { "name": "apple", "price": 5299, "color": "blue", "size": 64 } }, { "_index": "shopping", "_type": "_doc", "_id": "UKGDFXoB3JbWUSxLOAEF", "_score": 1.0, "_source": { "name": "apple", "price": 5299, "color": "blue", "size": 64 } }, { "_index": "shopping", "_type": "_doc", "_id": "1002", "_score": 1.0, "_source": { "name": "apple", "price": 5299, "color": "blue", "size": 64 } }, { "_index": "shopping", "_type": "_doc", "_id": "1003", "_score": 1.0, "_source": { "name": "apple", "price": 5299, "color": "blue", "size": 64 } } ] } }
更改	POST	http://127.0.0.1:9200/shopping/_update/1001	{ "doc": { "name": "Myapple" } }	{ "_index": "shopping", "_type": "_doc", "_id": "1001", "_version": 11, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 15, "_primary_term": 1 }

4.3 doc高级查询

方法	方法类型	请求demo	requestBody	response	备注
查询	GET	http://127.0.0.1:9200/shopping/_doc/_search	{ "query": { "match":{ "name":"apple" } }, "from": 2, "size": 10, "_source": [ "name", "_id" ], "sort": { "_id": { "order": "asc" } } }	{ "took": 14, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 4, "relation": "eq" }, "max_score": null, "hits": [ { "_index": "shopping", "_type": "_doc", "_id": "1003", "_score": null, "_source": { "name": "apple" }, "sort": [ "1003" ] }, { "_index": "shopping", "_type": "_doc", "_id": "UKGDFXoB3JbWUSxLOAEF", "_score": null, "_source": { "name": "apple" }, "sort": [ "UKGDFXoB3JbWUSxLOAEF" ] } ] } }	1.query 查询筛选条件 2 from、size：分页信息 3._source：限制返回数据的字段 4sort ：排序规则
查询(多条件&范围查询)	GET	http://127.0.0.1:9200/shopping/_doc/_search	{ "query": { "bool": { "must": [ { "match": { "name": "apple" } }, { "match": { "name": "apple" } } ], "filter": { "range": { "price": { "gt": 5200 } } } } } }		query.bool中的must 表示and的意思，即要求条件都满足才行。如条件只需部分满足，可使用should替换must。 filter：范围查询
全文查询&&高亮显示	GET	http://127.0.0.1:9200/shopping/_doc/_search	{ "query": { "match": { "name": "小华" } }, "highlight": { "fields": { "name": {} } } }		match: 默认全文搜索，只要分词能匹配即可若不想分词，此处使用match_phrase. highlight 标明高亮字段
聚合操作	GET	http://127.0.0.1:9200/shopping/_doc/_search	{ "query":{ "match":{ "name":"apple" } }, "aggs":{ "myAggr":{ "terms":{ "field":"price" } } } }		aggs：聚合操作 terms 分组也可使用avg，sum等方法 size 返回原数据的数量，可以设置为0

5.JAVA API

代码见github： https://github.com/user0819/esDemo.git

6.集群部署

1.准备部署多个node节点

2.修改config下的elasticSearch.yml文件

主节点

# 集群名称
cluster.name: my-application
 
# 节点名称
node.name: node-1001
node.master: true
node.data: true 
                     
# 端口
network.host: localhost
http.port: 9001
transport.tcp.port: 9301

http.cors.enabled: true
http.cors.allow-origin: "*"

从节点2

# 集群名称
cluster.name: my-application
 
# 节点名称
node.name: node-1002
node.master: true
node.data: true 

# 端口
network.host: localhost
http.port: 9002
transport.tcp.port: 9302

# 集群其他节点
discovery.seed_hosts: ["localhost:9301"]

http.cors.enabled: true
http.cors.allow-origin: "*"

从节点3

# 集群名称
cluster.name: my-application
 
# 节点名称
node.name: node-1003
node.master: true
node.data: true 

# 端口
network.host: localhost
http.port: 9003
transport.tcp.port: 9303

# 集群其他节点
discovery.seed_hosts: ["localhost:9301","localhost:9302"]

http.cors.enabled: true
http.cors.allow-origin: "*"

3.按节点顺序，进入bin目录，启动节点

./elasticSearch

4.启动完成后，查看集群健康状态

http://127.0.0.1:9001/_cluster/health

7.进阶知识

7.1 核心概念

索引（Index）

具有几分相似特征的文档的集合。

类型（type）

一个索引中，可以创建多个类型。

整体类比数据库：

index -> 库

type -> 表

doc -> 记录

但现在已经取消了type结构

文档（document）

索引中的每条具体记录，具体数据。

字段（Field）

索引的属性、字段

映射（Mapping）

针对数据处理、字段上的一些定义和限制。

分片（shards）

将数据分片后，在每一个节点上存储一个分片。数据横切。

副本（Replicas）

备份

分配（Allocation）

7.2 系统架构

一个运行的elasticSearch实例称谓一个节点。集群是一个或多个实例连接起来的。

一个集群有一个master，负责管理集群。

7.3 创建索引（分片&副本）

创建索引：

方法

方法类型

请求demo

requestBody

response

备注

创建索引

PUT

http://127.0.0.1:9001/my_index

{

"settings":{

"number_of_shards":3,

"number_of_replicas":1

}

{

"acknowledged": true,

"shards_acknowledged": true,

"index": "my_index"

}

创建完成后，查看集群节点的分片和副本信息：加粗表示为主分片

更改副本

服务启动后，分片数不可再改动，但副本数仍可编辑

方法

方法类型

请求demo

requestBody

response

备注

变更索引副本数

PUT

http://127.0.0.1:9001/my_index/_setting

{

"number_of_replicas":2

}

{

"acknowledged": true

}

节点宕机

有了副本之后，任意实例挂了，es服务都可继续提供服务。

路由计算&分片控制

保存doc时，如何确定数据放到哪个分片？

路由计算：hash(id) % count

访问哪个副本默认用的轮询。

数据写流程

数据读流程

8.进阶知识

8.1倒排索引

概念：

数据库中我们一般是给表中的记录创建索引，然后通过索引查询数据，可以加快我们的查询速度。结构如下：

index	name	age	tags
1	hejiong	47	host、teacher
2	wanghan	48	host、politician

但这样的话如果我们想通过hosts标签查找数据的话，就只能全表扫描，速度就会慢很多。

倒排索引就解决了这个问题，它给每条记录分词后，创建一个分词和索引的关系表。

这样我们就能根据关键词快速找到索引，进而再根据索引找到全文记录。

结构如下：

keyword	index
host	1、2
teacher	1
politician	2

倒排索引的核心：本来是根据索引找记录信息，现在是根据关键字找索引。

分词：

怎么创建倒排索引的关系表？

不是doc的每一个单词或字符都会创建对应的关系的，每一个能创建这个关键的单词或短语我们成为一个分词。

如何将一个doc拆分成多个分词，就需要用到不同的分词器了。

词条

es中存放的所有的分词就是词条。

词典

es中分词依赖的一个标准，就是词典。

倒排表

上面的倒排索引结构表就是倒排表

8.2 文档搜索

倒排索引一旦创建好后，轻易不会被改变。

新增数据会以新的倒排索引段存在，删除的索引会以逻辑删除的方式存在。

直到定期批量重新整理。

（此处留疑）

8.3 文档保存

总览：

文档保存时，主要有如下几步

1.连接到集群中任意一个节点（协调节点），协调节点通过通过路由计算具体主分片：hash(id) % count

2.将数据保存至主分片中

3.主分片将数据同步至副本分片中

所有总共写入时间为: 主分片的延时+写入副本的延时

写入到主分片细节

1.先在内存中创建索引和分段信息

2.再将内存数据刷新至磁盘中

3.为了避免数据丢失，引入translog文件。成功保存至磁盘后清空内存中的translog，并写入到磁盘中的文件中。

加快同步速度机制：

数据要保存至磁盘中，其他的客户端才能访问到，这中间会有很长的延时。

为了避免这种时间差，又引入了os cache这层，只要分段信息存至这，其他的客户端即可访问。

分片合并问题：

上面说到，索引一旦创建后，不会轻易修改。

所以新增的数据会以新的分段形式存在，但不能每新增一条数据就创建一个分段，所以这些分段信息在保存时必须要合并。

8.4 文档分析

分析器：

字符过滤器、分词器、token过滤器

使用默认分词器：

方法

方法类型

请求demo

requestBody

response

备注

使用分词器

GET

http://127.0.0.1:9200/_analyze

{

"text": "my name is god",

"analyzer":"standard"

}

{

"tokens": [

{

"token": "my",

"start_offset": 0,

"end_offset": 2,

"type": "",

"position": 0

{

"token": "name",

"start_offset": 3,

"end_offset": 7,

"type": "",

"position": 1

{

"token": "is",

"start_offset": 8,

"end_offset": 10,

"type": "",

"position": 2

{

"token": "god",

"start_offset": 11,

"end_offset": 14,

"type": "",

"position": 3

}

]

}

使用ik分词器：

安装：

首先要下载ik分词器插件：https://github.com/medcl/elasticsearch-analysis-ik/releases

找到es对应版本的分词器，将其放入es的plugins下。

重新启动es即可。

使用/查看

方法

方法类型

请求demo

requestBody

response

备注

使用分词器

GET

http://127.0.0.1:9200/_analyze

{

"text": "我是上帝",

"analyzer":"ik_max_word" //ik_smart

}

{

"tokens": [

{

"token": "我",

"start_offset": 0,

"end_offset": 1,

"type": "CN_CHAR",

"position": 0

{

"token": "是",

"start_offset": 1,

"end_offset": 2,

"type": "CN_CHAR",

"position": 1

{

"token": "上帝",

"start_offset": 2,

"end_offset": 4,

"type": "CN_WORD",

"position": 2

}

]

}

自定义单词

1.进入ik插件中的配置config下：/plugins/elasticsearch-analysis-ik-7.8.0/config

2.自定义单词文件&单词

vi custom.dic

3.将自定义的扩展字典配置到IKAnalyzer.cfg.xml 中

vi IKAnalyzer.cfg.xml

4.重启es看效果

8.5文档控制（处理冲突）

加锁呗：悲观锁 or 乐观锁

乐观锁：CAS 带着版本号去做更新 (if_seq_no和 if_primary_term)

方法

方法类型

请求demo

requestBody

response

备注

带着版本去更新

POST

http://127.0.0.1:9200/shopping/_doc/1003?if_seq_no=5&if_primary_term=1

{

"name":"apple",

"price":5299,

"color":"blue",

"size":64

}

{

"_index": "shopping",

"_type": "_doc",

"_id": "1003",

"_version": 6,

"result": "updated",

"_shards": {

"total": 2,

"successful": 1,

"failed": 0

"_seq_no": 6,

"_primary_term": 1

}

9.Spring-Data框架集成

GitHub - user0819/es-spring-data

10.优化

10.1 硬盘选择

es查询存储非常依赖磁盘IO，所以选择读写速度快容量大的硬盘是一个提升效率的方式之一。

10.2 适当创建分片

原则：

每个分片占用的硬盘容量不超过es最大jvm的堆空间设置（因为磁盘数据会读到内存中，如果内存小于磁盘容量，则会一直进行磁盘IO）
分片数量最好不大于节点数

10.3 批量数据提交

10.4 内存设置

默认内存1GB

cd config/ 
cat jvm.options

内存设置原则：

不要超过物理内存的一半（考虑到os cache需要缓存数据加快存储与查询）
每个节点的JVM堆内存最好不要超过32G

10.5 重要配置

cluster-name 集群名称
node-name 节点名称
node-master 是否可被选举为master
index.numbers_of_shards 默认索引分片数
index.numbers_of_replicas 默认索引备份数

11.面试题

11.1 为什么使用es

全文搜索比传统模糊查询快很多

11.2 es的master选举流程

按nodeId排序，每次针对这个node进行投票
若投票结果超过半数，则成为master
否则，投票选举下一个node是否可称为master

11.3 es的脑裂问题如何解决

脑裂概念：由于某些原因，导致集群无判断master已下线，重新推举了另外的master，导致存在多个master，而无法判断到底听谁的。

脑裂原因：

1.网络问题

2.master即为master又为data，导致处理data时无法及时响应，被误判算为已下线

3.gc回收垃圾时导致未响应

解决方案：

1.减少误判：增大discovery.zen.ping_timeout时间，即给master充分的响应时间

2.选举触发：只有当discovery.zen.minimum_master_nodes 数量到达时才会进行master选举。增大此值保证选举的正确性。（推荐节点数一半以上）

3.角色分离：master只做为master，不作为data节点：node.master.true.node.data: false

...

希昂的学习日记

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
3
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录