【博学谷学习记录】超强总结，用心分享|【服务框架高级】Elasticsearch

本文链接：https://blog.csdn.net/weixin_43643834/article/details/127477467

基础

集群和节点

概念

生产环境配置

集群重启优化

基础

集群和节点

一个集群是由一个或多个ES组成的集合,每一个集群都有一个唯一的名字
每一个节点都有自己的名字,每一个节点都是通过集群的名字来加入集群的，节点能够存储数据，参与集群索引数据以及搜索数据的独立服务

概念

Index（索引）相当于SQL里的DataBase，也就是数据库
Type（类型）相当于SQL里的Table，也就是表，在6.0版本已经弃用这个概念官方不再建议在索引中创建多个类型。并在后续高版本弃用type，详细见官方文档。
Document（文档）相当于SQL里的一行记录，也就是一行数据
Field（字段）就是相当于SQL里的一个字段
Shard（分片）单台机器无法存储大量数据，es可以将一个索引中的数据切分为多个shard，分布在多台服务器上存储。有了shard就可以横向扩展，存储更多数据，让搜索和分析等操作分布到多台服务器上去执行，提升吞吐量和性能。每个shard都是一个lucene index。
Replica shard（副本分片）replica可以在shard故障时提供备用服务，保证数据不丢失，多个replica还可以提升搜索操作的吞吐量和性能。
Mapping（映射）它定义了索引中每个字段类型，以及索引的其他设置，可事先定义，也可以根据第一次存储的文档自动识别，类似mysql里建表时对字段定义数据类型。

建立索引前手动自定义分片：

{
  "settings": {
    "number_of_shards": "3",
    "number_of_replicas": "1",
    "refresh_interval": "30s"
  }
}

primary shard默认为5个，并且一旦建好不能修改，replica shard默认1个，随时修改数量
一般使用默认的分片就可以了，就是5个primary shard，每个primary shard拥有一个replica shard。也就是说每个索引有10个分片

修改副本shard数量

PUT   index/_settings
{
  "number_of_replicas": "2",
  "refresh_interval": "30s"
}

Mapping的创建:

Mapping一旦创建，字段类型不能修改，只能增加字段，类似于mysql，因此建表前需要考虑好。

生产环境配置

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#集群名称，默认是elasticsearch
cluster.name: es_prod
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#节点名称
node.name: docker_node1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#可以指定es的数据存储目录，默认存储在es_home/data目录下
path.data: /var/data/elasticsearch
#
# Path to log files:
#可以指定es的日志存储目录，默认存储在es_home/logs目录下
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#锁定物理内存地址，防止elasticsearch内存被交换出去,也就是避免es使用swap交换分区
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#为es设置ip绑定，默认是127.0.0.1，也就是默认只能通过127.0.0.1 或者localhost才能访问
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#为es设置自定义端口，默认是9200
#在同一个服务器中启动多个es节点的话，默认监听的端口号会自动加1：例如：9200，9201，9202...
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#设置其他节点连接此节点的地址，如果不设置的话，则自动获取
network.publish_host: 172.16.16.179
#通过这个ip列表进行节点发现，组建集群
discovery.zen.ping.unicast.hosts: ["172.16.16.179:9300","172.16.16.179:9301","172.16.16.178:9302"]
#discovery.zen.ping_timeout: 60s
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#这个参数决定了在选主过程中需要有多少个节点通信，通过配置这个参数来防止集群脑裂现象 (集群总节点数量/2)+1
discovery.zen.minimum_master_nodes: 2
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:

#gateway.recover_after_nodes: 3
#预期的节点加入集群，就进行数据恢复处理
gateway.expected_nodes: 3
#如未达到预期的节点加入集群，需要等待的时间
gateway.recover_after_time: 1m
#一个集群中的N个节点启动后,才允许进行数据恢复处理，默认是1
gateway.recover_after_nodes: 2
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
node.master: true
node.data: true
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization,X-Requested-With,Content-Length,Content-Type
xpack.ml.enabled: false
xpack.security.enabled: false
xpack.monitoring.enabled: false
xpack.graph.enabled: false
xpack.watcher.enabled: false
node.max_local_storage_nodes: 256
transport.tcp.port: 9300
transport.tcp.compress: true
action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history*,.ml*

将日志和数据，文件夹权限为es的user 添加： path.logs: /var/log/elasticsearch path.data: /var/data/elasticsearch

拷贝配置文件config下文件到其他目录/usr/local/esconfig/ 要拷贝以下文件：

cp -r elasticsearch.yml  /usr/local/esconfig/
cp -r jvm.options  /usr/local/esconfig/
cp -r log4j2.properties /usr/local/esconfig/

启动命令： ES_PATH_CONF=/usr/local/esconfig/ ./bin/elasticsearch -d

将es的bin加入环境变量PATH中

export ES_HOME=/usr/local/elasticsearch-6.2.2
export PATH=$ES_HOME/bin

执行source profile生效后启动 ES_PATH_CONF=/usr/local/esconfig/ elasticsearch -d

集群重启优化

shard重新复制，移动，删除，再次移动的过程，会大量的耗费网络和磁盘资源。对于数据量庞大的集群来说，可能导致每次集群重启时，都有TB级别的数据无端移动，可能导致集群启动会耗费很长时间。

比如我本来有10个node,集群重启时，有5个node 1.复制其他5个node的shard到本地 2.此时上线其他5个node 3.复制到新上线的5个node，原来的5个node删除自己的shard

生产优化的配置: gateway.expected_nodes: 3 gateway.recover_after_time: 1m gateway.recover_after_nodes: 2 等待至少2个节点在线，然后等待最多1分钟，或者3个节点都在线，开始shard recovery恢复的过程

es关闭

jps

ps -ef|grep Elasticsearch

kill -SIGTERM 15516