系列课程 ElasticSearch 之第 7 篇 —— ElasticSearch 集群概念，如何在 Windows 下创建 ES 集群（7.12版本）

本文链接：https://blog.csdn.net/BiandanLoveyou/article/details/115805818

ElasticSearch 为什么要集群？

现在已经是大数据时代，一个拥有百万级用户的网站，随着业务量的增量式爆发，必须要考虑使用集群，否则一旦QPS（Query Per Second，每秒查询率，即并发量）或者TPS（Throughput Per Second，吞吐量）上来，ES 在单台服务器节点上，就会导致服务瓶颈（查询效率低，内存不足等），用户体验感差，甚至导致服务宕机。因此需要 ES 集群。ES 将单个索引的分片到多个不同分布式物理机器上存储，从而可以实现高可用、容错性（主分片对应的备分片不在同一台节点存放）等。

ElasticSearch 的相关名词

1、Cluster：代表一个集群，集群中有多个节点，其中有一个为主节点，这个主节点是可以通过选举产生的，主从节点是对于集群内部来说的。ES 的一个概念就是去中心化，字面上理解就是无中心节点，这是对于集群外部来说的，因为从外部来看 ES 集群，在逻辑上是个整体，你与任何一个节点的通信和与整个 ES 集群通信是等价的。

2、Shards：代表索引分片，ES 可以把一个完整的索引分成多个分片，这样的好处是可以把一个大的索引拆分成多个，分布到不同的节点上。构成分布式搜索。分片的数量只能在索引创建前指定，并且索引创建后不能更改，否则ES无法查询到数据。

3、Replicas：代表索引副本，ES可以设置多个索引的副本，副本的作用一是提高系统的容错性，当某个节点某个分片损坏或丢失时可以从副本中恢复。二是提高ES的查询效率，ES会自动对搜索请求进行负载均衡。

4、Recovery：代表数据恢复或叫数据重新分布，ES在有节点加入或退出时会根据机器的负载对索引分片进行重新分配，挂掉的节点重新启动时也会进行数据恢复。

ElasticSearch 如何解决高并发？

ES是一个分布式全文检索框架，隐藏了复杂的处理机制，内部使用分片机制、集群发现、分片负载均衡请求路由。

Shards 分片：代表索引分片，ES 可以把一个完整的索引分成多个分片，这样的好处是可以把一个大的索引拆分成多个，分布到不同的节点上，构成分布式搜索。分片的数量只能在索引创建前指定，并且索引创建后不能更改（这是 ES 的数据路由决定：如果 number_of_primary_shards 在查询的时候取余发生的变化，无法获取到该数据）。

Replicas分片：代表索引副本，ES 可以设置多个索引的副本，副本的作用一是提高系统的容错性，当某个节点某个分片损坏或丢失时可以从副本中恢复。二是提高 ES 的查询效率，ES会自动对搜索请求进行负载均衡。

主分片：primary shards

副分片：replics shards

ES集群核心原理分析

1、每个索引会被分成多个分片 Shards 进行存储，默认创建索引时分配5个分片进行存储，每个分片都会分布式部署在多个不同的节点上进行部署，该分片成为 Primary Shards 主分片。

查看索引分片信息 http://IP地址/索引名/_settings 下面演示的是单机版

注意：索引的主分片数量定义好后，不能被修改

2、每一个主分片为了实现高可用，都会有自己对应的备分分片，主分片对应的备分片不能存放同一台服务器上，主分片可以和其他备分片存放在同一个node节点上（这就是 ES 的容错性）。在节点服务器上，既可以存放主分片，也可以存放备分片。

ElasticSearch documnet routing（数据路由）

当客户端发起创建 document 的时候，ES 需要确定这个 document 放在该 index 哪个 shard 上。这个过程就是数据路由。

路由算法（取余算法）：Shard = hash(routing) % number_of_primary_shards 其中，number_of_primary_shards 表示主分片数量。ES 使用文档的唯一 ID 做哈希，然后对主分片数量求余数，得到该 index 放在分片的位置。

ES 集群查询的时候，也是根据路由算法（取余算法）计算该ID所在的分片位置，去到对应的分片去查询。如果集群有2台机器，用户访问了A机器，而A上没有此文档的信息，ES 会自动转发到B机器上。

如果 number_of_primary_shards 在查询的时候取余发生的变化，无法获取到该数据。这就是分片的数量在制定的时候就不能更改的原因。

在 Windows 下创建 ElasticSearch 集群

搭建 ES 的集群是很容易的事，ES帮我们做足了功课，不需要我们写代码去做什么主节点的选取，主节点由 ES 内部通过选举产生的。为了不影响之前单机的文件，我们拷贝3份解压后的文件夹，如图：

对于节点1 来说，我们只需要增加以下配置（后续给出完整配置文件内容）：

# 保证三台服务器节点集群名称相同
cluster.name: my-application
# 每台机器的节点不一样，其它两台叫 node-2 node-3
node.name: node-1
# 实际服务器IP地址，如果是内网则是 192.168.0.1
network.host: 127.0.0.1
# 提供对外的 http 端口，其它两台是 9201,9202
http.port: 9201
# 默认 tcp 是 9300 端口，我们需要修改默认的端口，其它两台机器是 9302,9303
transport.tcp.port: 9301
# 这里配置所有加入集群里的主机的完整地址。
discovery.seed_hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
# 这里配置节点信息，主节点是 ES 内部竞选出来的
cluster.initial_master_nodes: ["node-1","node-2","node-3"]

节点1 elasticsearch_1 的 config 目录下的 elasticsearch.yml 文件完整内容：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 保证三台服务器节点集群名称相同
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 每台机器的节点不一样，其它两台叫 node-2 node-3
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
# 实际服务器IP地址，如果是内网则是 192.168.0.1
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 提供对外的 http 端口，其它两台是 9201,9202
http.port: 9201

# 默认 tcp 是 9300 端口，我们需要修改默认的端口，其它两台机器是 9302,9303
transport.tcp.port: 9301
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这里配置所有加入集群里的主机的完整地址。
discovery.seed_hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
# 这里配置节点信息，主节点是 ES 内部竞选出来的
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

然后修改节点2 elasticsearch_2 的 config 目录下的 elasticsearch.yml 文件：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 保证三台服务器节点集群名称相同
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 每台机器的节点不一样，其它两台叫 node-1 node-3
node.name: node-2
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
# 实际服务器IP地址，如果是内网则是 192.168.0.1
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 提供对外的 http 端口，其它两台是 9201,9203
http.port: 9202

# 默认 tcp 是 9300 端口，我们需要修改默认的端口，其它两台机器是 9301,9303
transport.tcp.port: 9302
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这里配置所有加入集群里的主机的完整地址。
discovery.seed_hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
# 这里配置节点信息，主节点是 ES 内部竞选出来的
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

然后修改节点3 elasticsearch_3 的 config 目录下的 elasticsearch.yml 文件：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 保证三台服务器节点集群名称相同
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 每台机器的节点不一样，其它两台叫 node-2 node-3
node.name: node-3
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
# 实际服务器IP地址，如果是内网则是 192.168.0.1
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 提供对外的 http 端口，其它两台是 9201,9202
http.port: 9203

# 默认 tcp 是 9300 端口，我们需要修改默认的端口，其它两台机器是 9301,9302
transport.tcp.port: 9303
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这里配置所有加入集群里的主机的完整地址。
discovery.seed_hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
# 这里配置节点信息，主节点是 ES 内部竞选出来的
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

然后，我们分别启动节点1、2、3的服务：（在 bin 目录下的 elasticsearch.bat）

打开浏览器验证集群是否生效：

验证节点1的：http://127.0.0.1:9201/_cat/nodes?pretty （没有其他节点信息）