系列课程 ElasticSearch 之第 7 篇 —— ElasticSearch 集群概念,如何在 Windows 下创建 ES 集群(7.12版本)

 

ElasticSearch 为什么要集群?

现在已经是大数据时代,一个拥有百万级用户的网站,随着业务量的增量式爆发,必须要考虑使用集群,否则一旦QPS(Query Per Second,每秒查询率,即并发量)或者TPS(Throughput Per Second,吞吐量)上来,ES 在单台服务器节点上,就会导致服务瓶颈(查询效率低,内存不足等),用户体验感差,甚至导致服务宕机。因此需要 ES 集群。ES 将单个索引的分片到多个不同分布式物理机器上存储,从而可以实现高可用、容错性(主分片对应的备分片不在同一台节点存放)等。

 

ElasticSearch 的相关名词

1、Cluster代表一个集群,集群中有多个节点,其中有一个为主节点,这个主节点是可以通过选举产生的,主从节点是对于集群内部来说的。ES 的一个概念就是去中心化,字面上理解就是无中心节点,这是对于集群外部来说的,因为从外部来看 ES 集群,在逻辑上是个整体,你与任何一个节点的通信和与整个 ES 集群通信是等价的。

2、Shards代表索引分片,ES 可以把一个完整的索引分成多个分片,这样的好处是可以把一个大的索引拆分成多个,分布到不同的节点上。构成分布式搜索。分片的数量只能在索引创建前指定,并且索引创建后不能更改,否则ES无法查询到数据。

3、Replicas代表索引副本,ES可以设置多个索引的副本,副本的作用一是提高系统的容错性,当某个节点某个分片损坏或丢失时可以从副本中恢复。二是提高ES的查询效率,ES会自动对搜索请求进行负载均衡。

4、Recovery代表数据恢复或叫数据重新分布,ES在有节点加入或退出时会根据机器的负载对索引分片进行重新分配,挂掉的节点重新启动时也会进行数据恢复。

 

ElasticSearch 如何解决高并发?

ES是一个分布式全文检索框架,隐藏了复杂的处理机制,内部使用分片机制集群发现分片负载均衡请求路由

Shards 分片:代表索引分片,ES 可以把一个完整的索引分成多个分片,这样的好处是可以把一个大的索引拆分成多个,分布到不同的节点上,构成分布式搜索分片的数量只能在索引创建前指定并且索引创建后不能更改(这是 ES 的数据路由决定:如果 number_of_primary_shards 在查询的时候取余发生的变化,无法获取到该数据)。

Replicas分片:代表索引副本,ES 可以设置多个索引的副本,副本的作用一是提高系统的容错性,当某个节点某个分片损坏或丢失时可以从副本中恢复。二是提高 ES 的查询效率,ES会自动对搜索请求进行负载均衡。

 

主分片:primary shards

副分片:replics shards

 

ES集群核心原理分析

1、每个索引会被分成多个分片 Shards 进行存储,默认创建索引时分配5个分片进行存储,每个分片都会分布式部署在多个不同的节点上进行部署,该分片成为 Primary Shards 主分片。    

查看索引分片信息 http://IP地址/索引名/_settings   下面演示的是单机版

 

注意:索引的主分片数量定义好后,不能被修改

2、每一个主分片为了实现高可用,都会有自己对应的备分分片主分片对应的备分片不能存放同一台服务器上,主分片可以和其他备分片存放在同一个node节点上(这就是 ES 的容错性)。在节点服务器上,既可以存放主分片,也可以存放备分片。

 

ElasticSearch documnet routing(数据路由)

当客户端发起创建 document 的时候,ES 需要确定这个 document 放在该 index 哪个 shard 上。这个过程就是数据路由。

路由算法(取余算法):Shard = hash(routing) % number_of_primary_shards     其中,number_of_primary_shards 表示主分片数量。ES 使用文档的唯一 ID 做哈希,然后对主分片数量求余数,得到该 index 放在分片的位置。

ES 集群查询的时候,也是根据路由算法(取余算法)计算该ID所在的分片位置,去到对应的分片去查询。如果集群有2台机器,用户访问了A机器,而A上没有此文档的信息,ES 会自动转发到B机器上。

如果 number_of_primary_shards 在查询的时候取余发生的变化,无法获取到该数据。这就是分片的数量在制定的时候就不能更改的原因

 

在 Windows 下创建 ElasticSearch 集群

搭建 ES 的集群是很容易的事,ES帮我们做足了功课,不需要我们写代码去做什么主节点的选取,主节点由 ES 内部通过选举产生的。为了不影响之前单机的文件,我们拷贝3份解压后的文件夹,如图:

对于节点1 来说,我们只需要增加以下配置(后续给出完整配置文件内容):

# 保证三台服务器节点集群名称相同
cluster.name: my-application
# 每台机器的节点不一样,其它两台叫 node-2 node-3
node.name: node-1
# 实际服务器IP地址,如果是内网则是 192.168.0.1
network.host: 127.0.0.1
# 提供对外的 http 端口,其它两台是 9201,9202
http.port: 9201
# 默认 tcp 是 9300 端口,我们需要修改默认的端口,其它两台机器是 9302,9303
transport.tcp.port: 9301
# 这里配置所有加入集群里的主机的完整地址。
discovery.seed_hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
# 这里配置节点信息,主节点是 ES 内部竞选出来的
cluster.initial_master_nodes: ["node-1","node-2","node-3"]

 

节点1 elasticsearch_1 的 config 目录下的 elasticsearch.yml 文件完整内容:

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 保证三台服务器节点集群名称相同
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 每台机器的节点不一样,其它两台叫 node-2 node-3
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
# 实际服务器IP地址,如果是内网则是 192.168.0.1
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 提供对外的 http 端口,其它两台是 9201,9202
http.port: 9201

# 默认 tcp 是 9300 端口,我们需要修改默认的端口,其它两台机器是 9302,9303
transport.tcp.port: 9301
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这里配置所有加入集群里的主机的完整地址。
discovery.seed_hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
# 这里配置节点信息,主节点是 ES 内部竞选出来的
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

 

然后修改节点2 elasticsearch_2 的 config 目录下的 elasticsearch.yml 文件:

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 保证三台服务器节点集群名称相同
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 每台机器的节点不一样,其它两台叫 node-1 node-3
node.name: node-2
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
# 实际服务器IP地址,如果是内网则是 192.168.0.1
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 提供对外的 http 端口,其它两台是 9201,9203
http.port: 9202

# 默认 tcp 是 9300 端口,我们需要修改默认的端口,其它两台机器是 9301,9303
transport.tcp.port: 9302
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这里配置所有加入集群里的主机的完整地址。
discovery.seed_hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
# 这里配置节点信息,主节点是 ES 内部竞选出来的
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

 

然后修改节点3 elasticsearch_3 的 config 目录下的 elasticsearch.yml 文件:

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 保证三台服务器节点集群名称相同
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 每台机器的节点不一样,其它两台叫 node-2 node-3
node.name: node-3
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
# 实际服务器IP地址,如果是内网则是 192.168.0.1
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 提供对外的 http 端口,其它两台是 9201,9202
http.port: 9203

# 默认 tcp 是 9300 端口,我们需要修改默认的端口,其它两台机器是 9301,9302
transport.tcp.port: 9303
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这里配置所有加入集群里的主机的完整地址。
discovery.seed_hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
# 这里配置节点信息,主节点是 ES 内部竞选出来的
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

然后,我们分别启动节点1、2、3的服务:(在 bin 目录下的  elasticsearch.bat)

打开浏览器验证集群是否生效:

验证节点1的:http://127.0.0.1:9201/_cat/nodes?pretty  (没有其他节点信息)

 

验证节点2的:http://127.0.0.1:9202/_cat/nodes?pretty  (没有其他节点信息)

 

验证节点3的:http://127.0.0.1:9203/_cat/nodes?pretty  (没有其他节点信息)

 

发现集群并没有成功!这是为什么?

原来:我们是在旧项目里复制内容的,导致 ES 的 data 目录下的文件不同步!解决办法是把节点1、2、3的 data 目录下的文件全部删掉(不要把 data 文件删掉噢!!!)。

然后重启3个服务,再次访问:

这次OK了,节点2是主节点。

带 * 表示主节点。

我们把节点2的服务关闭,看下效果:在节点1、3的控制台都输出以下信息,无法连接上节点2。

 

然后 ES 底层会重新选取主节点:

这时候,我们把节点2的服务重新启动:在节点2的控制台看到,主节点是3

我们浏览器验证一下:http://127.0.0.1:9202/_cat/nodes?pretty

说明 ES 集群还是很牛X的!一个机器宕机了,不会影响整个集群。会重新选取主节点。

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值