1. 部署图
1)原ES集群信息 |
|
|
|
|
|
|
|
IP |
集群名称 |
http端口号 |
tcp端口号 |
192.168.2.11 |
es_es11223_cluster |
11223 |
11224 |
192.168.2.12 |
es_es11223_cluster |
11223 |
11224 |
192.168.2.13 |
es_es11223_cluster |
11223 |
11224 |
192.168.2.14 |
es_es11223_cluster |
11223 |
11224 |
192.168.2.15 |
es_es11223_cluster |
11223 |
11224 |
192.168.2.16 |
es_es11223_cluster |
11223 |
11224 |
|
|
|
|
2)新ES集群信息 |
|
|
|
|
|
|
|
IP |
集群名称 |
http端口号 |
tcp端口号 |
192.168.2.14 |
es_es11223_cluster |
11223 |
11224 |
192.168.2.15 |
es_es11223_cluster |
11223 |
11224 |
192.168.2.16 |
es_es11223_cluster |
11223 |
11224 |
2. 检查集群状态
查看集群节点系进行:
192.168.2.16 14 99 0 0.37 0.58 0.72 mdi - 192.168.2.16-11223
192.168.2.14 11 95 0 0.20 0.30 0.46 mdi - 192.168.2.14-11223
192.168.2.15 14 99 0 0.19 0.26 0.41 mdi - 192.168.2.15-11223
192.168.2.13 51 75 1 0.92 0.72 0.85 mdi - 192.168.2.13-11223
192.168.2.11 58 73 0 0.85 0.73 0.67 mdi - 192.168.2.11-11223
192.168.2.12 30 88 0 1.14 0.71 0.68 mdi * 192.168.2.12-11223
检查集群状态是否为green:
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1621527014 00:10:14 es_es11223_cluster green 6 6 240 120 0 0 0 0 - 100.0%
3. 动态修改minimum_master_nodes
"persistent": {
"discovery.zen.minimum_master_nodes": "4"
}
}'
{
"acknowledged" : true,
"persistent" : {
"discovery" : {
"zen" : {
"minimum_master_nodes" : "4"
}
}
},
"transient" : {
}
}
注:设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1,对于大的集群来说,可以设置大一点的值(2-4)。
4. 确认自动分片已开启
1)开启集群自动平衡:
2)确认自动平衡已开启:
3)并发恢复分片数调整:
"transient": {
"cluster.routing.allocation.node_initial_primaries_recoveries": "16"
}
}'
注:cluster.routing.allocation.node_initial_primaries_recoveries,初始化数据恢复时,并发恢复线程的个数,默认为4。
4)recovery并发数调整:
# curl -H "Content-Type: application/json" -XPUT 'http://192.168.2.11:11223/_cluster/settings?pretty' -d '{
"transient": {
"cluster.routing.allocation.node_concurrent_recoveries": "8"
}
}'
注:cluster.routing.allocation.node_concurrent_recoveries,添加或删除节点或负载均衡时并发恢复线程的个数,默认为4。
5)平衡并发数调整:
# curl -H "Content-Type: application/json" -XPUT 'http://192.168.2.11:11223/_cluster/settings?pretty' -d '{
"transient": {
"cluster.routing.allocation.cluster_concurrent_rebalance": "20"
}
}'
注:cluster.routing.allocation.cluster_concurrent_rebalance,设置集群级别平衡过程中的shard并发度设置,默认为2。
6)检查集群自平衡配置:
# curl -H "Content-Type: application/json" -XGET 'http://192.168.2.11:11223/_cluster/settings?pretty'
{
"persistent" : {
"discovery" : {
"zen" : {
"minimum_master_nodes" : "4"
}
}
},
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"cluster_concurrent_rebalance" : "20",
"node_concurrent_recoveries" : "8",
"node_initial_primaries_recoveries" : "16",
"enable" : "all"
}
}
}
}
}
5. 关闭192.168.2.11:11223节点
5.1. 查看集群shards数量
确定分片是否同步完成:
```缩容前```
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
40 80.2gb 93.8gb 5.3tb 5.4tb 1 192.168.2.13 192.168.2.13 192.168.2.13-11223
40 76.9gb 103.7gb 3.5tb 3.6tb 2 192.168.2.12 192.168.2.12 192.168.2.12-11223
40 81.9gb 482.7gb 3tb 3.4tb 13 192.168.2.16 192.168.2.16 192.168.2.16-11223
40 79.4gb 174.6gb 5.2tb 5.4tb 3 192.168.2.11 192.168.2.11 192.168.2.11-11223
40 79.2gb 461.3gb 3tb 3.4tb 12 192.168.2.14 192.168.2.14 192.168.2.14-11223
40 81.3gb 475.4gb 3tb 3.4tb 13 192.168.2.15 192.168.2.15 192.168.2.15-11223
```缩容后```
5.2. 将192.168.2.11:11223节点从集群中exclude
1)查看集群节点信息
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.2.15 18 99 0 0.48 0.48 0.49 mdi - 192.168.2.15-11223
192.168.2.11 62 73 0 0.32 0.54 0.61 mdi - 192.168.2.11-11223
192.168.2.16 14 99 1 0.26 0.32 0.40 mdi - 192.168.2.16-11223
192.168.2.14 14 95 0 0.11 0.22 0.29 mdi - 192.168.2.14-11223
192.168.2.12 29 88 0 0.23 0.45 0.58 mdi * 192.168.2.12-11223
192.168.2.13 55 75 0 0.94 0.80 0.78 mdi - 192.168.2.13-11223
2)将192.168.2.11:11223节点从集群中exclude
"transient" : {
"cluster.routing.allocation.exclude._name" : "192.168.2.11-11223"
}}'
3)确认此节点已加入到exclude中
# curl -H "Content-Type: application/json" -XGET 'http://192.168.2.14:11223/_cluster/settings?pretty'
{
"persistent" : {
"discovery" : {
"zen" : {
"minimum_master_nodes" : "4"
}
}
},
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"cluster_concurrent_rebalance" : "20",
"node_concurrent_recoveries" : "8",
"exclude" : {
"_name" : "192.168.2.11-11223"
},
"node_initial_primaries_recoveries" : "16",
"enable" : "all"
}
}
}
}
}
4)查看集群节点信息
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.2.15 12 99 0 0.36 0.45 0.48 mdi - 192.168.2.15-11223
192.168.2.11 57 73 0 0.45 0.55 0.61 mdi - 192.168.2.11-11223
192.168.2.16 19 99 1 0.43 0.35 0.41 mdi - 192.168.2.16-11223
192.168.2.14 18 95 0 0.20 0.23 0.30 mdi - 192.168.2.14-11223
192.168.2.12 33 88 0 0.95 0.61 0.63 mdi * 192.168.2.12-11223
192.168.2.13 52 75 1 0.90 0.80 0.79 mdi - 192.168.2.13-11223
5.3. 查看自平衡进度
1)确认192.168.2.11:11223节点下已无分片后,再关闭实例(执行kill)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 21416 100 21416 0 0 853kidx-xxx000-justtest-1 5 r RELOCATING 3251629 2.8gb 192.168.2.11 192.168.2.11-11223 -> 192.168.2.12 IqCBPkvLQHyS_MwhJ1ylaw 192.168.2.12-11223
0 --:--:-- --:--:-- --:--:-- 871k
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.2.15 13 100 0 0.27 0.27 0.33 mdi - 192.168.2.15-11223
192.168.2.11 61 45 0 0.89 0.72 0.67 mdi - 192.168.2.11-11223
192.168.2.16 16 98 0 0.43 0.37 0.38 mdi - 192.168.2.16-11223
192.168.2.14 13 96 0 0.58 0.50 0.50 mdi - 192.168.2.14-11223
192.168.2.12 29 97 0 0.44 0.41 0.54 mdi * 192.168.2.12-11223
192.168.2.13 52 87 0 1.11 1.20 1.07 mdi - 192.168.2.13-11223
2)检查集群green,且无unassign、init、repl等分片操作即可执行下一步
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1621530304 01:05:04 es_es11223_cluster green 6 6 240 120 0 0 0 0 - 100.0%
5.4. 核对集群shards数量
确定分片是否同步完成:
```缩容前```
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
40 80.2gb 93.8gb 5.3tb 5.4tb 1 192.168.2.13 192.168.2.13 192.168.2.13-11223
40 76.9gb 103.7gb 3.5tb 3.6tb 2 192.168.2.12 192.168.2.12 192.168.2.12-11223
40 81.9gb 482.7gb 3tb 3.4tb 13 192.168.2.16 192.168.2.16 192.168.2.16-11223
40 79.4gb 174.6gb 5.2tb 5.4tb 3 192.168.2.11 192.168.2.11 192.168.2.11-11223
40 79.2gb 461.3gb 3tb 3.4tb 12 192.168.2.14 192.168.2.14 192.168.2.14-11223
40 81.3gb 475.4gb 3tb 3.4tb 13 192.168.2.15 192.168.2.15 192.168.2.15-11223
```缩容后```
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
48 96.9gb 493.3gb 3tb 3.4tb 13 192.168.2.15 192.168.2.15 192.168.2.15-11223
0 0b 85.7gb 5.3tb 5.4tb 1 192.168.2.11 192.168.2.11 192.168.2.11-11223
48 88.7gb 116.6gb 3.5tb 3.6tb 3 192.168.2.12 192.168.2.12 192.168.2.12-11223
48 98gb 501.6gb 2.9tb 3.4tb 14 192.168.2.16 192.168.2.16 192.168.2.16-11223
48 96.8gb 481.6gb 3tb 3.4tb 13 192.168.2.14 192.168.2.14 192.168.2.14-11223
48 95.6gb 111.2gb 5.3tb 5.4tb 1 192.168.2.13 192.168.2.13 192.168.2.13-11223
5.5. kill进程
1)kill elasticsearch进程:
cyread 10203 5838 0 00:57 pts/3 00:00:00 watch curl -X GET http://192.168.2.14:11223/_cat/shards | grep RELO | wc -l
es 14680 1 51 2020 ? 140-13:05:55 /usr/server/jdk8/bin/java -Xms30g -Xmx30g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -XX:+HeapDumpOnOutOfMemoryError -Des.allow_insecure_settings=true -Des.path.home=/data/PaaS/es11223 -Des.path.conf=/data/PaaS/es/11223/config -cp /data/PaaS/es/11223/lib/* org.elasticsearch.bootstrap.Elasticsearch -d
root 24237 18650 0 01:06 pts/1 00:00:00 grep --color=auto 11223
2)查看集群状态
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1621530446 01:07:26 es_es11223_cluster green 5 5 240 120 0 0 0 0 - 100.0%
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.2.15 18 100 0 0.24 0.25 0.31 mdi - 192.168.2.15-11223
192.168.2.14 19 96 0 0.43 0.43 0.47 mdi - 192.168.2.14-11223
192.168.2.16 11 98 1 0.22 0.35 0.38 mdi - 192.168.2.16-11223
192.168.2.12 24 97 0 0.24 0.32 0.48 mdi * 192.168.2.12-11223
192.168.2.13 56 87 0 0.56 0.92 0.98 mdi - 192.168.2.13-11223
6. 关闭192.168.2.12:11223节点
6.1. 查看集群shards数量
确定分片是否同步完成:
```缩容前```
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
48 95.4gb 111gb 5.3tb 5.4tb 1 192.168.2.13 192.168.2.13 192.168.2.13-11223
48 88.6gb 116.1gb 3.5tb 3.6tb 3 192.168.2.12 192.168.2.12 192.168.2.12-11223
48 96.6gb 480.9gb 3tb 3.4tb 13 192.168.2.14 192.168.2.14 192.168.2.14-11223
48 98.1gb 501.1gb 3tb 3.4tb 14 192.168.2.16 192.168.2.16 192.168.2.16-11223
48 96.9gb 492.9gb 3tb 3.4tb 13 192.168.2.15 192.168.2.15 192.168.2.15-11223
```缩容后```
6.2. 将192.168.2.12:11223节点从集群中exclude
1)查看集群节点信息
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1621530760 01:12:40 es_es11223_cluster green 5 5 240 120 0 0 0 0 - 100.0%
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.2.15 10 99 0 0.86 0.44 0.37 mdi - 192.168.2.15-11223
192.168.2.14 13 96 0 0.23 0.38 0.45 mdi - 192.168.2.14-11223
192.168.2.16 14 98 1 0.17 0.31 0.36 mdi - 192.168.2.16-11223
192.168.2.12 25 97 0 0.46 0.34 0.44 mdi * 192.168.2.12-11223
192.168.2.13 52 87 0 0.91 0.82 0.91 mdi - 192.168.2.13-11223
2)将192.168.2.11:11223节点从集群中exclude
"transient" : {
"cluster.routing.allocation.exclude._name" : "192.168.2.12-11223"
}}'
{
"acknowledged" : true,
"persistent" : { },
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"exclude" : {
"_name" : "192.168.2.12-11223"
}
}
}
}
}
}
3)确认此节点已加入到exclude中
# curl -H "Content-Type: application/json" -XGET 'http://192.168.2.14:11223/_cluster/settings?pretty'
{
"persistent" : {
"discovery" : {
"zen" : {
"minimum_master_nodes" : "4"
}
}
},
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"cluster_concurrent_rebalance" : "20",
"node_concurrent_recoveries" : "8",
"exclude" : {
"_name" : "192.168.2.12-11223"
},
"node_initial_primaries_recoveries" : "16",
"enable" : "all"
}
}
}
}
}
4)查看集群节点信息
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.2.15 16 100 0 0.85 0.51 0.39 mdi - 192.168.2.15-11223
192.168.2.14 18 97 0 0.22 0.35 0.44 mdi - 192.168.2.14-11223
192.168.2.16 19 98 1 0.40 0.34 0.36 mdi - 192.168.2.16-11223
192.168.2.12 28 97 0 0.48 0.37 0.44 mdi * 192.168.2.12-11223
192.168.2.13 51 87 1 0.69 0.77 0.89 mdi - 192.168.2.13-11223
6.3. 查看自平衡进度
1)确认192.168.2.11:11223节点下已无分片后,再关闭实例(执行kill)
2)检查集群green,且无unassign、init、repl等分片操作即可执行下一步
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1621532261 01:37:41 es_es11223_cluster green 5 5 240 120 0 0 0 0 - 100.0%
6.4. 核对集群shards数量
确定分片是否同步完成:
```缩容前```
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
48 95.4gb 111gb 5.3tb 5.4tb 1 192.168.2.13 192.168.2.13 192.168.2.13-11223
48 88.6gb 116.1gb 3.5tb 3.6tb 3 192.168.2.12 192.168.2.12 192.168.2.12-11223
48 96.6gb 480.9gb 3tb 3.4tb 13 192.168.2.14 192.168.2.14 192.168.2.14-11223
48 98.1gb 501.1gb 3tb 3.4tb 14 192.168.2.16 192.168.2.16 192.168.2.16-11223
48 96.9gb 492.9gb 3tb 3.4tb 13 192.168.2.15 192.168.2.15 192.168.2.15-11223
```缩容后```
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
60 117.7gb 504.4gb 2.9tb 3.4tb 14 192.168.2.14 192.168.2.14 192.168.2.14-11223
60 119.1gb 525.5gb 2.9tb 3.4tb 14 192.168.2.16 192.168.2.16 192.168.2.16-11223
60 119.4gb 138.9gb 5.3tb 5.4tb 2 192.168.2.13 192.168.2.13 192.168.2.13-11223
0 0b 17.2gb 3.6tb 3.6tb 0 192.168.2.12 192.168.2.12 192.168.2.12-11223
60 118.7gb 518gb 2.9tb 3.4tb 14 192.168.2.15 192.168.2.15 192.168.2.15-11223
6.5. kill进程
1)kill elasticsearch进程:
cyread 10203 5838 0 00:57 pts/3 00:00:00 watch curl -X GET http://192.168.2.14:11223/_cat/shards | grep RELO | wc -l
es 14680 1 51 2020 ? 140-13:05:55 /usr/server/jdk8/bin/java -Xms30g -Xmx30g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -XX:+HeapDumpOnOutOfMemoryError -Des.allow_insecure_settings=true -Des.path.home=/data/PaaS/es11223 -Des.path.conf=/data/PaaS/es/11223/config -cp /data/PaaS/es/11223/lib/* org.elasticsearch.bootstrap.Elasticsearch -d
root 24237 18650 0 01:06 pts/1 00:00:00 grep --color=auto 11223
2)查看集群状态
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1621532326 01:38:46 es_es11223_cluster green 4 4 240 120 0 0 0 0 - 100.0%
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.2.14 15 95 0 0.35 0.36 0.40 mdi - 192.168.2.14-11223
192.168.2.15 13 99 0 0.15 0.29 0.37 mdi * 192.168.2.15-11223
192.168.2.16 13 97 0 0.41 0.43 0.42 mdi - 192.168.2.16-11223
192.168.2.13 53 98 0 0.52 0.94 1.04 mdi - 192.168.2.13-11223
7. 关闭192.168.2.13:11223节点
7.1. 查看集群shards数量
确定分片是否同步完成:
```缩容前```
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
60 119.4gb 137.9gb 5.3tb 5.4tb 2 192.168.2.13 192.168.2.13 192.168.2.13-11223
60 118.7gb 518gb 2.9tb 3.4tb 14 192.168.2.15 192.168.2.15 192.168.2.15-11223
60 117.6gb 504.3gb 2.9tb 3.4tb 14 192.168.2.14 192.168.2.14 192.168.2.14-11223
60 119.1gb 525.4gb 2.9tb 3.4tb 14 192.168.2.16 192.168.2.16 192.168.2.16-11223
```缩容后```
7.2. 将192.168.2.11:11223节点从集群中exclude
1)动态设置minimum_master_nodes
"persistent": {
"discovery.zen.minimum_master_nodes": "3"
}
}'
{
"acknowledged" : true,
"persistent" : {
"discovery" : {
"zen" : {
"minimum_master_nodes" : "3"
}
}
},
"transient" : { }
}
注:设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1,对于大的集群来说,可以设置大一点的值(2-4)。
2)查看集群节点信息
# curl http://192.168.2.14:11223/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.2.14 14 95 0 0.20 0.30 0.37 mdi - 192.168.2.14-11223
192.168.2.15 12 99 0 0.25 0.23 0.32 mdi * 192.168.2.15-11223
192.168.2.16 11 97 1 0.35 0.37 0.39 mdi - 192.168.2.16-11223
192.168.2.13 53 98 0 0.51 0.78 0.96 mdi - 192.168.2.13-11223
3)将192.168.2.11:11223节点从集群中exclude
# curl -H "Content-Type: application/json" -XPUT http://192.168.2.14:11223/_cluster/settings?pretty -d '{
"transient" : {
"cluster.routing.allocation.exclude._name" : "192.168.2.13-11223"
}}'
4)确认此节点已加入到exclude中
# curl -H "Content-Type: application/json" -XGET 'http://192.168.2.14:11223/_cluster/settings?pretty'
{
"persistent"