问题:机房断电,重启机器后,Elasticsearch 集群无法访问
集群状态访问如下
可以访问 9200 端口,目测 Elasticsearch 是正常的
但是查看集群状态报错
报错内容
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
查看 Elasticsearch 日志如下
[2022-05-02T14:40:47,383][WARN ][o.e.d.z.ZenDiscovery ] [node-4] not enough master nodes discovered during pinging (found [[Candidate{node={node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2022-05-02T14:40:50,385][WARN ][o.e.d.z.ZenDiscovery ] [node-4] not enough master nodes discovered during pinging (found [[Candidate{node={node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2022-05-02T14:40:53,386][WARN ][o.e.d.z.ZenDiscovery ] [node-4] not enough master nodes discovered during pinging (found [[Candidate{node={node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2022-05-02T14:40:56,387][WARN ][o.e.d.z.ZenDiscovery ] [node-4] not enough master nodes discovered during pinging (found [[Candidate{node={node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2022-05-02T14:40:59,389][WARN ][o.e.d.z.ZenDiscovery ] [node-4] not enough master nodes discovered during pinging (found [[Candidate{node={node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2022-05-02T14:41:02,390][WARN ][o.e.d.z.ZenDiscovery ] [node-4] not enough master nodes discovered during pinging (found [[Candidate{node={node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2022-05-02T14:42:42,197][INFO ][o.e.d.z.ZenDiscovery ] [node-4] failed to send join request to master [{node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[node-3][172.18.0.1:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}] not master for join request]; ], tried [3] times
[2022-05-02T14:44:21,203][INFO ][o.e.d.z.ZenDiscovery ] [node-4] failed to send join request to master [{node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[node-3][172.18.0.1:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}] not master for join request]; ], tried [3] times
[2022-05-02T14:46:00,210][INFO ][o.e.d.z.ZenDiscovery ] [node-4] failed to send join request to master [{node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[node-3][172.18.0.1:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}] not master for join request]; ], tried [3] times
[2022-05-02T14:46:39,054][INFO ][o.e.n.Node ] [node-4] stopping ...
[2022-05-02T14:46:39,060][INFO ][o.e.d.z.ZenDiscovery ] [node-4] failed to send join request to master [{node-3}{WMPX0xFLSbGVy8K7X_ewxw}{Q6kiIOiwS0yPBr6rSMnTIg}{172.18.0.1}{172.18.0.1:9300}{ml.machine_memory=33607585792, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [IllegalStateException[Future got interrupted]; nested: InterruptedException; ]
[2022-05-02T14:46:39,060][INFO ][o.e.x.w.WatcherService ] [node-4] stopping watch service, reason [shutdown initiated]
[2022-05-02T14:46:39,544][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [node-4] [controller/103] [Main.cc@148] Ml controller exiting
[2022-05-02T14:46:39,545][INFO ][o.e.x.m.p.NativeController] [node-4] Native controller process has stopped - no new native processes can be started
[2022-05-02T14:46:39,567][INFO ][o.e.n.Node ] [node-4] stopped
[2022-05-02T14:46:39,567][INFO ][o.e.n.Node ] [node-4] closing ...
[2022-05-02T14:46:39,584][INFO ][o.e.n.Node ] [node-4] closed
注意查看上面的关键错误信息
# 关键信息 1
not enough master nodes discovered during pinging (found [[Candidate{node={node-3}...{172.18.0.1}{172.18.0.1:9300
# 关键信息 2
failed to send join request to master [{node-3}...{172.18.0.1}{172.18.0.1:9300}
疑问:怎么会向IP 172.18.0.1:9300
发送请求,明显和我们 ES 的网段不一样,查看 ES 配置中是network.host=0.0.0.0
解决办法:修改 network.host=ES 本机 IP 地址
,问题解决,ES 可以正常访问了
为什么同样的配置,机器重启之前可以访问 ES,机器重启之后就无法访问了,可能是机器存活期间,有人使用 docker 做实验,使得 0.0.0.0 指向的 local 改变过,所以导致 ES 无法加入集群,无法使用,修改为本机 IP 地址后,ES 集群恢复正常