MHA Failover测试-下

最新推荐文章于 2023-11-27 19:11:54 发布

Fan_-_

最新推荐文章于 2023-11-27 19:11:54 发布

阅读量268

点赞数

分类专栏： MHA MySQL

本文链接：https://blog.csdn.net/ashic/article/details/113748655

版权

MySQL 同时被 2 个专栏收录

109 篇文章 5 订阅

订阅专栏

MHA

6 篇文章 1 订阅

订阅专栏

MHA Failover测试-下

[用例测试] master挂了, 且slave也有问题1(部分slave宕机)

master挂了, 在此之前slave-1宕机了

ping_type=CONNECT

启动manager后, 关闭slave-1

Sat Oct 10 10:28:35 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 10:28:37 2020 - [info] GTID failover mode = 1
Sat Oct 10 10:28:37 2020 - [info] Dead Servers:
Sat Oct 10 10:28:37 2020 - [info] Alive Servers:
Sat Oct 10 10:28:37 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:28:37 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 10:28:37 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 10:28:37 2020 - [info] Alive Slaves:
Sat Oct 10 10:28:37 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 10:28:37 2020 - [info]     GTID ON
Sat Oct 10 10:28:37 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:28:37 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 10:28:37 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 10:28:37 2020 - [info]     GTID ON
Sat Oct 10 10:28:37 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:28:37 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 10:28:37 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:28:37 2020 - [info] Checking slave configurations..
Sat Oct 10 10:28:37 2020 - [info] Checking replication filtering settings..
Sat Oct 10 10:28:37 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 10:28:37 2020 - [info]  Replication filtering check ok.
Sat Oct 10 10:28:37 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 10:28:37 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 10:28:37 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 10:28:37 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 10:28:37 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 10:28:37 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 10:28:37 2020 - [info]  OK.
Sat Oct 10 10:28:37 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 10:28:37 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 10:28:37 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 10:28:37 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 10:28:37 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..

关闭slave-1

[root@centos-2 10:19:38 ~]
#mysql -uroot -p -S /data/mysql_3358/run/mysql.sock dbms_monitor       
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2118
Server version: 5.7.31-34-log Percona Server (GPL), Release 34, Revision 2e68637

Copyright (c) 2009-2020 Percona LLC and/or its affiliates
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

root@localhost 10:20:52 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.03 sec)

关闭后, mha manager仍然是正常的

关闭master

[root@centos-1 10:20:35 ~]
#mysql -uroot -p -S /data/mysql_3358/run/mysql.sock dbms_monitor                                              
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3413
Server version: 5.7.31-34-log Percona Server (GPL), Release 34, Revision 2e68637

Copyright (c) 2009-2020 Percona LLC and/or its affiliates
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

root@localhost 10:20:47 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.03 sec)

结论: 会触发failover, 但failover失败

Sat Oct 10 10:50:38 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (111) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 10:50:38 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Sat Oct 10 10:50:38 2020 - [info] Executing SSH check script: exit 0
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Sat Oct 10 10:50:38 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 10:50:38 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 10:50:41 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 10:50:41 2020 - [warning] Connection failed 2 time(s)..
Sat Oct 10 10:50:44 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 10:50:44 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 10:50:47 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 10:50:47 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 10:50:47 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 10:50:47 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 10:50:47 2020 - [warning] SSH is reachable.
Sat Oct 10 10:50:47 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 10:50:47 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 10:50:47 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 10:50:47 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 10:50:48 2020 - [info] GTID failover mode = 1
Sat Oct 10 10:50:48 2020 - [info] Dead Servers:
Sat Oct 10 10:50:48 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:50:48 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 10:50:48 2020 - [info] Alive Servers:
Sat Oct 10 10:50:48 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 10:50:48 2020 - [info] Alive Slaves:
Sat Oct 10 10:50:48 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 10:50:48 2020 - [info]     GTID ON
Sat Oct 10 10:50:48 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:50:48 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 10:50:48 2020 - [info] Checking slave configurations..
Sat Oct 10 10:50:48 2020 - [info] Checking replication filtering settings..
Sat Oct 10 10:50:48 2020 - [info]  Replication filtering check ok.
Sat Oct 10 10:50:48 2020 - [info] Master is down!
Sat Oct 10 10:50:48 2020 - [info] Terminating monitoring script.
Sat Oct 10 10:50:48 2020 - [info] Got exit code 20 (Master dead).
Sat Oct 10 10:50:48 2020 - [info] MHA::MasterFailover version 0.58.
Sat Oct 10 10:50:48 2020 - [info] Starting master failover.
Sat Oct 10 10:50:48 2020 - [info] 
Sat Oct 10 10:50:48 2020 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 10 10:50:48 2020 - [info] 
Sat Oct 10 10:50:49 2020 - [info] GTID failover mode = 1
Sat Oct 10 10:50:49 2020 - [info] Dead Servers:
Sat Oct 10 10:50:49 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:50:49 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 10:50:49 2020 - [info] Checking master reachability via MySQL(double check)...
Sat Oct 10 10:50:49 2020 - [info]  ok.
Sat Oct 10 10:50:49 2020 - [info] Alive Servers:
Sat Oct 10 10:50:49 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 10:50:49 2020 - [info] Alive Slaves:
Sat Oct 10 10:50:49 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 10:50:49 2020 - [info]     GTID ON
Sat Oct 10 10:50:49 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:50:49 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 10:50:49 2020 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln492]  Server 172.16.120.11(172.16.120.11:3358) is dead, but must be alive! Check server settings.
Sat Oct 10 10:50:49 2020 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/share/perl5/MHA/MasterFailover.pm line 269.

ping_type=INSERT

其实和connect应该是一样的, 不过还是走一遍流程

启动manager后, 关闭slave-1

Sat Oct 10 10:59:07 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 10:59:09 2020 - [info] GTID failover mode = 1
Sat Oct 10 10:59:09 2020 - [info] Dead Servers:
Sat Oct 10 10:59:09 2020 - [info] Alive Servers:
Sat Oct 10 10:59:09 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:59:09 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 10:59:09 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 10:59:09 2020 - [info] Alive Slaves:
Sat Oct 10 10:59:09 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 10:59:09 2020 - [info]     GTID ON
Sat Oct 10 10:59:09 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:59:09 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 10:59:09 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 10:59:09 2020 - [info]     GTID ON
Sat Oct 10 10:59:09 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:59:09 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 10:59:09 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 10:59:09 2020 - [info] Checking slave configurations..
Sat Oct 10 10:59:09 2020 - [info] Checking replication filtering settings..
Sat Oct 10 10:59:09 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 10:59:09 2020 - [info]  Replication filtering check ok.
Sat Oct 10 10:59:09 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 10:59:09 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 10:59:09 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 10:59:09 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 10:59:09 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 10:59:09 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 10:59:09 2020 - [info]  OK.
Sat Oct 10 10:59:09 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 10:59:09 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 10:59:09 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 10:59:09 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 10:59:09 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..

关闭slave-1

[root@centos-2 10:56:24 /usr/local/Percona-Server-5.7.29-32-Linux.x86_64.ssl101]
#2020-10-10T03:00:49.502943Z mysqld_safe mysqld from pid file /data/mysql_3358/run/mysql.pid ended

关闭master

[root@centos-1 11:07:08 ~]
#mysql -uroot -p -S /data/mysql_3358/run/mysql.sock dbms_monitor    
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 35
Server version: 5.7.29-32-log Percona Server (GPL), Release 32, Revision 56bce88

Copyright (c) 2009-2020 Percona LLC and/or its affiliates
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

root@localhost 11:07:09 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

结论: 会触发failover, 但failover失败

Sat Oct 10 11:07:15 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Sat Oct 10 11:07:15 2020 - [info] Executing SSH check script: exit 0
Sat Oct 10 11:07:15 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Sat Oct 10 11:07:16 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 11:07:16 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 11:07:18 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:07:18 2020 - [warning] Connection failed 2 time(s)..
Sat Oct 10 11:07:21 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:07:21 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 11:07:24 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:07:24 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 11:07:24 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 11:07:24 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 11:07:24 2020 - [warning] SSH is reachable.
Sat Oct 10 11:07:24 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 11:07:24 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 11:07:24 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 11:07:24 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 11:07:25 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:07:25 2020 - [info] Dead Servers:
Sat Oct 10 11:07:25 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:07:25 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:07:25 2020 - [info] Alive Servers:
Sat Oct 10 11:07:25 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:07:25 2020 - [info] Alive Slaves:
Sat Oct 10 11:07:25 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:07:25 2020 - [info]     GTID ON
Sat Oct 10 11:07:25 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:07:25 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:07:25 2020 - [info] Checking slave configurations..
Sat Oct 10 11:07:25 2020 - [info] Checking replication filtering settings..
Sat Oct 10 11:07:25 2020 - [info]  Replication filtering check ok.
Sat Oct 10 11:07:25 2020 - [info] Master is down!
Sat Oct 10 11:07:25 2020 - [info] Terminating monitoring script.
Sat Oct 10 11:07:25 2020 - [info] Got exit code 20 (Master dead).
Sat Oct 10 11:07:25 2020 - [info] MHA::MasterFailover version 0.58.
Sat Oct 10 11:07:25 2020 - [info] Starting master failover.
Sat Oct 10 11:07:25 2020 - [info] 
Sat Oct 10 11:07:25 2020 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 10 11:07:25 2020 - [info] 
Sat Oct 10 11:07:26 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:07:26 2020 - [info] Dead Servers:
Sat Oct 10 11:07:26 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:07:26 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:07:26 2020 - [info] Alive Servers:
Sat Oct 10 11:07:26 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:07:26 2020 - [info] Alive Slaves:
Sat Oct 10 11:07:26 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:07:26 2020 - [info]     GTID ON
Sat Oct 10 11:07:26 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:07:26 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:07:26 2020 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln492]  Server 172.16.120.11(172.16.120.11:3358) is dead, but must be alive! Check server settings.
Sat Oct 10 11:07:26 2020 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/share/perl5/MHA/MasterFailover.pm line 269.

总结: 如不希望slave-1宕机影响failover, 需要在配置文件中对slave-1设置ignore_fail=1

[server2]
hostname=172.16.120.11
port=3358
candidate_master=1
ignore_fail=1

[用例测试] master挂了, 且slave也有问题2(部分slave io_thread stop)

master挂了, 在此之前slave-1 io_thread stop了

ping_type=CONNECT

启动manager后, 关闭slave-1 io_thread

Sat Oct 10 11:13:58 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 11:13:59 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:13:59 2020 - [info] Dead Servers:
Sat Oct 10 11:13:59 2020 - [info] Alive Servers:
Sat Oct 10 11:13:59 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:13:59 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:13:59 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:13:59 2020 - [info] Alive Slaves:
Sat Oct 10 11:13:59 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:13:59 2020 - [info]     GTID ON
Sat Oct 10 11:13:59 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:13:59 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:13:59 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:13:59 2020 - [info]     GTID ON
Sat Oct 10 11:13:59 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:13:59 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:13:59 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:13:59 2020 - [info] Checking slave configurations..
Sat Oct 10 11:13:59 2020 - [info] Checking replication filtering settings..
Sat Oct 10 11:13:59 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 11:13:59 2020 - [info]  Replication filtering check ok.
Sat Oct 10 11:13:59 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 11:13:59 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 11:13:59 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 11:13:59 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 11:13:59 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 11:13:59 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 11:14:00 2020 - [info]  OK.
Sat Oct 10 11:14:00 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 11:14:00 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 11:14:00 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 11:14:00 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 11:14:00 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..

关闭slave-1

root@localhost 11:17:29 [dbms_monitor]> stop slave io_thread;
Query OK, 0 rows affected (0.01 sec)

root@localhost 11:17:33 [dbms_monitor]> pager cat - | grep -E 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Slave_IO_Running|Slave_SQL_Running|Slave_SQL_Running_State|Last|Relay_Log_File|Relay_Log_Pos'
PAGER set to 'cat - | grep -E 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Slave_IO_Running|Slave_SQL_Running|Slave_SQL_Running_State|Last|Relay_Log_File|Relay_Log_Pos''
root@localhost 11:17:49 [dbms_monitor]> show slave status\G
              Master_Log_File: mysql-bin.000011
          Read_Master_Log_Pos: 194
               Relay_Log_File: mysql-relay-bin.000009
                Relay_Log_Pos: 407
        Relay_Master_Log_File: mysql-bin.000011
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
                   Last_Errno: 0
                   Last_Error: 
          Exec_Master_Log_Pos: 194
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
1 row in set (0.00 sec)

关闭io_thread后 manager仍然正常

关闭master

root@localhost 11:21:18 [dbms_monitor]> insert into monitor_delay values(1,now());
Query OK, 1 row affected (0.01 sec)

root@localhost 11:21:39 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
+----+---------------------+
1 row in set (0.00 sec)

root@localhost 11:21:54 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

slave-1

root@localhost 11:17:49 [dbms_monitor]> select * from monitor_delay;
Empty set (0.01 sec)

slave-2

root@localhost 10:20:54 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
+----+---------------------+
1 row in set (0.01 sec)

结论: 会failover且成功

Sat Oct 10 11:22:12 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (111) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:22:12 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Sat Oct 10 11:22:12 2020 - [info] Executing SSH check script: exit 0
Sat Oct 10 11:22:12 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 11:22:12 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 11:22:15 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:22:15 2020 - [warning] Connection failed 2 time(s)..
Sat Oct 10 11:22:18 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:22:18 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 11:22:21 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:22:21 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 11:22:21 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 11:22:21 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 11:22:21 2020 - [warning] SSH is reachable.
Sat Oct 10 11:22:21 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 11:22:21 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 11:22:21 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 11:22:21 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 11:22:22 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:22:22 2020 - [info] Dead Servers:
Sat Oct 10 11:22:22 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:22 2020 - [info] Alive Servers:
Sat Oct 10 11:22:22 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:22:22 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:22:22 2020 - [info] Alive Slaves:
Sat Oct 10 11:22:22 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:22:22 2020 - [info]     GTID ON
Sat Oct 10 11:22:22 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:22 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:22:22 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:22:22 2020 - [info]     GTID ON
Sat Oct 10 11:22:22 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:22 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:22:22 2020 - [info] Checking slave configurations..
Sat Oct 10 11:22:22 2020 - [info] Checking replication filtering settings..
Sat Oct 10 11:22:22 2020 - [info]  Replication filtering check ok.
Sat Oct 10 11:22:22 2020 - [info] Master is down!
Sat Oct 10 11:22:22 2020 - [info] Terminating monitoring script.
Sat Oct 10 11:22:22 2020 - [info] Got exit code 20 (Master dead).
Sat Oct 10 11:22:22 2020 - [info] MHA::MasterFailover version 0.58.
Sat Oct 10 11:22:22 2020 - [info] Starting master failover.
Sat Oct 10 11:22:22 2020 - [info] 
Sat Oct 10 11:22:22 2020 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 10 11:22:22 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:22:23 2020 - [info] Dead Servers:
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:23 2020 - [info] Checking master reachability via MySQL(double check)...
Sat Oct 10 11:22:23 2020 - [info]  ok.
Sat Oct 10 11:22:23 2020 - [info] Alive Servers:
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:22:23 2020 - [info] Alive Slaves:
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:22:23 2020 - [info]     GTID ON
Sat Oct 10 11:22:23 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:23 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:22:23 2020 - [info]     GTID ON
Sat Oct 10 11:22:23 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:23 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:22:23 2020 - [info] Starting GTID based failover.
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Oct 10 11:22:23 2020 - [info] Executing master IP deactivation script:
Sat Oct 10 11:22:23 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stopssh --ssh_user=root  
Disabling the VIP on old master: 172.16.120.10 
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1 
Sat Oct 10 11:22:23 2020 - [info]  done.
Sat Oct 10 11:22:23 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Oct 10 11:22:23 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] * Phase 3: Master Recovery Phase..
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000011:486
Sat Oct 10 11:22:23 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20042-20531
Sat Oct 10 11:22:23 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:22:23 2020 - [info]     GTID ON
Sat Oct 10 11:22:23 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:23 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:22:23 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000011:194
Sat Oct 10 11:22:23 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20143-20530
Sat Oct 10 11:22:23 2020 - [info] Oldest slaves:
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:22:23 2020 - [info]     GTID ON
Sat Oct 10 11:22:23 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:23 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] * Phase 3.3: Determining New Master Phase..
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] Searching new master from slaves..
Sat Oct 10 11:22:23 2020 - [info]  Candidate masters from the configuration file:
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:22:23 2020 - [info]     GTID ON
Sat Oct 10 11:22:23 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:23 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:22:23 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:22:23 2020 - [info]     GTID ON
Sat Oct 10 11:22:23 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:22:23 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:22:23 2020 - [info]  Non-candidate masters:
Sat Oct 10 11:22:23 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Oct 10 11:22:23 2020 - [info] New master is 172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:22:23 2020 - [info] Starting master failover..
Sat Oct 10 11:22:23 2020 - [info] 
From:
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

To:
172.16.120.12(172.16.120.12:3358) (new master)
 +--172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Oct 10 11:22:23 2020 - [info] 
Sat Oct 10 11:22:23 2020 - [info]  Waiting all logs to be applied.. 
Sat Oct 10 11:22:23 2020 - [info]   done.
Sat Oct 10 11:22:23 2020 - [info] Getting new master's binlog name and position..
Sat Oct 10 11:22:23 2020 - [info]  mysql-bin.000007:3182161
Sat Oct 10 11:22:23 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.12', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Sat Oct 10 11:22:23 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000007, 3182161, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20531,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Sat Oct 10 11:22:23 2020 - [info] Executing master IP activate script:
Sat Oct 10 11:22:23 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.12 --new_master_ip=172.16.120.12 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.12 
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0 
Set read_only=0 on the new master.
Creating app user on the new master..
Sat Oct 10 11:22:24 2020 - [info]  OK.
Sat Oct 10 11:22:24 2020 - [info] ** Finished master recovery successfully.
Sat Oct 10 11:22:24 2020 - [info] * Phase 3: Master Recovery Phase completed.
Sat Oct 10 11:22:24 2020 - [info] 
Sat Oct 10 11:22:24 2020 - [info] * Phase 4: Slaves Recovery Phase..
Sat Oct 10 11:22:24 2020 - [info] 
Sat Oct 10 11:22:24 2020 - [info] 
Sat Oct 10 11:22:24 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Oct 10 11:22:24 2020 - [info] 
Sat Oct 10 11:22:24 2020 - [info] -- Slave recovery on host 172.16.120.11(172.16.120.11:3358) started, pid: 77208. Check tmp log /masterha/cls_new//172.16.120.11_3358_20201010112222.log if it takes time..
Sat Oct 10 11:22:25 2020 - [info] 
Sat Oct 10 11:22:25 2020 - [info] Log messages from 172.16.120.11 ...
Sat Oct 10 11:22:25 2020 - [info] 
Sat Oct 10 11:22:24 2020 - [info]  Resetting slave 172.16.120.11(172.16.120.11:3358) and starting replication from the new master 172.16.120.12(172.16.120.12:3358)..
Sat Oct 10 11:22:24 2020 - [info]  Executed CHANGE MASTER.
Sat Oct 10 11:22:24 2020 - [info]  Slave started.
Sat Oct 10 11:22:24 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20531,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.11(172.16.120.11:3358). Executed 2 events.
Sat Oct 10 11:22:25 2020 - [info] End of log messages from 172.16.120.11.
Sat Oct 10 11:22:25 2020 - [info] -- Slave on host 172.16.120.11(172.16.120.11:3358) started.
Sat Oct 10 11:22:25 2020 - [info] All new slave servers recovered successfully.
Sat Oct 10 11:22:25 2020 - [info] 
Sat Oct 10 11:22:25 2020 - [info] * Phase 5: New master cleanup phase..
Sat Oct 10 11:22:25 2020 - [info] 
Sat Oct 10 11:22:25 2020 - [info] Resetting slave info on the new master..
Sat Oct 10 11:22:25 2020 - [info]  172.16.120.12: Resetting slave info succeeded.
Sat Oct 10 11:22:25 2020 - [info] Master failover to 172.16.120.12(172.16.120.12:3358) completed successfully.
Sat Oct 10 11:22:25 2020 - [info] 

----- Failover Report -----

cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.12(172.16.120.12:3358) succeeded

Master 172.16.120.10(172.16.120.10:3358) is down!

Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.12(172.16.120.12:3358) as a new master.
172.16.120.12(172.16.120.12:3358): OK: Applying all logs succeeded.
172.16.120.12(172.16.120.12:3358): OK: Activated master IP address.
172.16.120.11(172.16.120.11:3358): OK: Slave started, replicating from 172.16.120.12(172.16.120.12:3358)
172.16.120.12(172.16.120.12:3358): Resetting slave info succeeded.
Master failover to 172.16.120.12(172.16.120.12:3358) completed successfully.
Sat Oct 10 11:22:25 2020 - [info] Sending mail..

slave-1正常change到new master slave-2

root@localhost 11:21:44 [dbms_monitor]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.120.12
                  Master_User: repler
                  Master_Port: 3358
                Connect_Retry: 1
              Master_Log_File: mysql-bin.000007
          Read_Master_Log_Pos: 3182161
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 681
        Relay_Master_Log_File: mysql-bin.000007
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3182161
              Relay_Log_Space: 888
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 120123358
                  Master_UUID: 45e70f96-fcad-11ea-a2f0-0050563108d2
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20531
            Executed_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20531,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

root@localhost 11:24:39 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
+----+---------------------+
1 row in set (0.00 sec)

ping_type=INSERT

启动manager后, 关闭slave-1 io_thread

Sat Oct 10 11:29:28 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 11:29:30 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:29:30 2020 - [info] Dead Servers:
Sat Oct 10 11:29:30 2020 - [info] Alive Servers:
Sat Oct 10 11:29:30 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:29:30 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:29:30 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:29:30 2020 - [info] Alive Slaves:
Sat Oct 10 11:29:30 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:29:30 2020 - [info]     GTID ON
Sat Oct 10 11:29:30 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:29:30 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:29:30 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:29:30 2020 - [info]     GTID ON
Sat Oct 10 11:29:30 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:29:30 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:29:30 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:29:30 2020 - [info] Checking slave configurations..
Sat Oct 10 11:29:30 2020 - [info] Checking replication filtering settings..
Sat Oct 10 11:29:30 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 11:29:30 2020 - [info]  Replication filtering check ok.
Sat Oct 10 11:29:30 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 11:29:30 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 11:29:30 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 11:29:30 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 11:29:30 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 11:29:30 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 11:29:30 2020 - [info]  OK.
Sat Oct 10 11:29:30 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 11:29:30 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 11:29:30 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 11:29:30 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 11:29:30 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..

关闭slave-1 io_thread

root@localhost 11:30:30 [dbms_monitor]> stop slave io_thread;
Query OK, 0 rows affected (0.00 sec)

root@localhost 11:30:43 [dbms_monitor]> pager cat - | grep -E 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Slave_IO_Running|Slave_SQL_Running|Slave_SQL_Running_State|Last|Relay_Log_File|Relay_Log_Pos'
PAGER set to 'cat - | grep -E 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Slave_IO_Running|Slave_SQL_Running|Slave_SQL_Running_State|Last|Relay_Log_File|Relay_Log_Pos''
root@localhost 11:30:45 [dbms_monitor]> show slave status\G
              Master_Log_File: mysql-bin.000012
          Read_Master_Log_Pos: 18307
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 18480
        Relay_Master_Log_File: mysql-bin.000012
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
                   Last_Errno: 0
                   Last_Error: 
          Exec_Master_Log_Pos: 18307
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
1 row in set (0.00 sec)

关闭io_thread后 manager仍然正常

关闭master

root@localhost 11:31:40 [dbms_monitor]> insert into monitor_delay values(2,now());
Query OK, 1 row affected (0.00 sec)

root@localhost 11:31:45 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
+----+---------------------+
2 rows in set (0.00 sec)

root@localhost 11:31:51 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

slave-1

root@localhost 11:30:45 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
+----+---------------------+
1 row in set (0.00 sec)

slave-2

root@localhost 11:27:07 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
+----+---------------------+
2 rows in set (0.00 sec)

结论: 会failover且成功

Sat Oct 10 11:33:09 2020 - [warning] SSH is reachable.
Sat Oct 10 11:33:09 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 11:33:09 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 11:33:09 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 11:33:09 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 11:33:10 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:33:10 2020 - [info] Dead Servers:
Sat Oct 10 11:33:10 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:10 2020 - [info] Alive Servers:
Sat Oct 10 11:33:10 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:33:10 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:33:10 2020 - [info] Alive Slaves:
Sat Oct 10 11:33:10 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:33:10 2020 - [info]     GTID ON
Sat Oct 10 11:33:10 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:10 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:33:10 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:33:10 2020 - [info]     GTID ON
Sat Oct 10 11:33:10 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:10 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:33:10 2020 - [info] Checking slave configurations..
Sat Oct 10 11:33:10 2020 - [info] Checking replication filtering settings..
Sat Oct 10 11:33:10 2020 - [info]  Replication filtering check ok.
Sat Oct 10 11:33:10 2020 - [info] Master is down!
Sat Oct 10 11:33:10 2020 - [info] Terminating monitoring script.
Sat Oct 10 11:33:10 2020 - [info] Got exit code 20 (Master dead).
Sat Oct 10 11:33:10 2020 - [info] MHA::MasterFailover version 0.58.
Sat Oct 10 11:33:10 2020 - [info] Starting master failover.
Sat Oct 10 11:33:10 2020 - [info] 
Sat Oct 10 11:33:10 2020 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 10 11:33:10 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:33:11 2020 - [info] Dead Servers:
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:11 2020 - [info] Alive Servers:
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:33:11 2020 - [info] Alive Slaves:
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:33:11 2020 - [info]     GTID ON
Sat Oct 10 11:33:11 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:11 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:33:11 2020 - [info]     GTID ON
Sat Oct 10 11:33:11 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:11 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:33:11 2020 - [info] Starting GTID based failover.
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Oct 10 11:33:11 2020 - [info] Executing master IP deactivation script:
Sat Oct 10 11:33:11 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stopssh --ssh_user=root  
Disabling the VIP on old master: 172.16.120.10 
RTNETLINK answers: Cannot assign requested address
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1 
Sat Oct 10 11:33:11 2020 - [info]  done.
Sat Oct 10 11:33:11 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Oct 10 11:33:11 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] * Phase 3: Master Recovery Phase..
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000012:50590
Sat Oct 10 11:33:11 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20532-20745
Sat Oct 10 11:33:11 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:33:11 2020 - [info]     GTID ON
Sat Oct 10 11:33:11 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:11 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:33:11 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000012:18307
Sat Oct 10 11:33:11 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20532-20608
Sat Oct 10 11:33:11 2020 - [info] Oldest slaves:
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:33:11 2020 - [info]     GTID ON
Sat Oct 10 11:33:11 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:11 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] * Phase 3.3: Determining New Master Phase..
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] Searching new master from slaves..
Sat Oct 10 11:33:11 2020 - [info]  Candidate masters from the configuration file:
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:33:11 2020 - [info]     GTID ON
Sat Oct 10 11:33:11 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:11 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:33:11 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:33:11 2020 - [info]     GTID ON
Sat Oct 10 11:33:11 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:33:11 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:33:11 2020 - [info]  Non-candidate masters:
Sat Oct 10 11:33:11 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Oct 10 11:33:11 2020 - [info] New master is 172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:33:11 2020 - [info] Starting master failover..
Sat Oct 10 11:33:11 2020 - [info] 
From:
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

To:
172.16.120.12(172.16.120.12:3358) (new master)
 +--172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info]  Waiting all logs to be applied.. 
Sat Oct 10 11:33:11 2020 - [info]   done.
Sat Oct 10 11:33:11 2020 - [info] Getting new master's binlog name and position..
Sat Oct 10 11:33:11 2020 - [info]  mysql-bin.000007:3232182
Sat Oct 10 11:33:11 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.12', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Sat Oct 10 11:33:11 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000007, 3232182, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20745,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Sat Oct 10 11:33:11 2020 - [info] Executing master IP activate script:
Sat Oct 10 11:33:11 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.12 --new_master_ip=172.16.120.12 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.12 
RTNETLINK answers: File exists
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0 
Set read_only=0 on the new master.
Creating app user on the new master..
Sat Oct 10 11:33:11 2020 - [info]  OK.
Sat Oct 10 11:33:11 2020 - [info] ** Finished master recovery successfully.
Sat Oct 10 11:33:11 2020 - [info] * Phase 3: Master Recovery Phase completed.
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] * Phase 4: Slaves Recovery Phase..
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Oct 10 11:33:11 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info] -- Slave recovery on host 172.16.120.11(172.16.120.11:3358) started, pid: 78319. Check tmp log /masterha/cls_new//172.16.120.11_3358_20201010113310.log if it takes time..
Sat Oct 10 11:33:12 2020 - [info] 
Sat Oct 10 11:33:12 2020 - [info] Log messages from 172.16.120.11 ...
Sat Oct 10 11:33:12 2020 - [info] 
Sat Oct 10 11:33:11 2020 - [info]  Resetting slave 172.16.120.11(172.16.120.11:3358) and starting replication from the new master 172.16.120.12(172.16.120.12:3358)..
Sat Oct 10 11:33:11 2020 - [info]  Executed CHANGE MASTER.
Sat Oct 10 11:33:11 2020 - [info]  Slave started.
Sat Oct 10 11:33:12 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20745,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.11(172.16.120.11:3358). Executed 2 events.
Sat Oct 10 11:33:12 2020 - [info] End of log messages from 172.16.120.11.
Sat Oct 10 11:33:12 2020 - [info] -- Slave on host 172.16.120.11(172.16.120.11:3358) started.
Sat Oct 10 11:33:12 2020 - [info] All new slave servers recovered successfully.
Sat Oct 10 11:33:12 2020 - [info] 
Sat Oct 10 11:33:12 2020 - [info] * Phase 5: New master cleanup phase..
Sat Oct 10 11:33:12 2020 - [info] 
Sat Oct 10 11:33:12 2020 - [info] Resetting slave info on the new master..
Sat Oct 10 11:33:13 2020 - [info]  172.16.120.12: Resetting slave info succeeded.
Sat Oct 10 11:33:13 2020 - [info] Master failover to 172.16.120.12(172.16.120.12:3358) completed successfully.
Sat Oct 10 11:33:13 2020 - [info] 

----- Failover Report -----

cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.12(172.16.120.12:3358) succeeded

Master 172.16.120.10(172.16.120.10:3358) is down!

Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.12(172.16.120.12:3358) as a new master.
172.16.120.12(172.16.120.12:3358): OK: Applying all logs succeeded.
172.16.120.12(172.16.120.12:3358): OK: Activated master IP address.
172.16.120.11(172.16.120.11:3358): OK: Slave started, replicating from 172.16.120.12(172.16.120.12:3358)
172.16.120.12(172.16.120.12:3358): Resetting slave info succeeded.
Master failover to 172.16.120.12(172.16.120.12:3358) completed successfully.
Sat Oct 10 11:33:13 2020 - [info] Sending mail..

slave-1已经change到了slave-2

root@localhost 11:32:04 [dbms_monitor]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.120.12
                  Master_User: repler
                  Master_Port: 3358
                Connect_Retry: 1
              Master_Log_File: mysql-bin.000007
          Read_Master_Log_Pos: 3232182
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 32447
        Relay_Master_Log_File: mysql-bin.000007
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3232182
              Relay_Log_Space: 32654
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 120123358
                  Master_UUID: 45e70f96-fcad-11ea-a2f0-0050563108d2
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20609-20745
            Executed_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20745,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

root@localhost 11:34:29 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
+----+---------------------+
2 rows in set (0.00 sec)

[用例测试] master挂了, 且slave也有问题3(部分slave io_thread error)

master挂了, 在此之前slave-1 io_thread error了

ping_type=CONNECT

启动manager后, 调整master防火墙, 禁止slave-1访问

Sat Oct 10 11:58:40 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 11:58:41 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:58:41 2020 - [info] Dead Servers:
Sat Oct 10 11:58:41 2020 - [info] Alive Servers:
Sat Oct 10 11:58:41 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:58:41 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:58:41 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:58:41 2020 - [info] Alive Slaves:
Sat Oct 10 11:58:41 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:58:41 2020 - [info]     GTID ON
Sat Oct 10 11:58:41 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:58:41 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:58:41 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:58:41 2020 - [info]     GTID ON
Sat Oct 10 11:58:41 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:58:41 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:58:41 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:58:41 2020 - [info] Checking slave configurations..
Sat Oct 10 11:58:41 2020 - [info] Checking replication filtering settings..
Sat Oct 10 11:58:41 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 11:58:41 2020 - [info]  Replication filtering check ok.
Sat Oct 10 11:58:41 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 11:58:41 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 11:58:41 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 11:58:41 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 11:58:41 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 11:58:41 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 11:58:41 2020 - [info]  OK.
Sat Oct 10 11:58:41 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 11:58:41 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 11:58:41 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 11:58:41 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 11:58:41 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..

调整master防火墙, 禁止slave-1访问

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.13 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

kill slave-1 io_thread

root@localhost 12:04:35 [dbms_monitor]> show processlist;
+-----+----------+---------------------+--------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id  | User     | Host                | db           | Command          | Time | State                                                         | Info             | Rows_sent | Rows_examined |
+-----+----------+---------------------+--------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
|   2 | proxysql | 172.16.120.12:33384 | NULL         | Sleep            |    9 |                                                               | NULL             |         0 |             0 |
|   3 | root     | localhost           | dbms_monitor | Query            |    0 | starting                                                      | show processlist |         0 |             0 |
|   4 | proxysql | 172.16.120.10:34072 | NULL         | Sleep            |    4 |                                                               | NULL             |         0 |             0 |
|   6 | proxysql | 172.16.120.11:35090 | NULL         | Sleep            |    2 |                                                               | NULL             |         0 |             0 |
|   7 | repler   | 172.16.120.11:35092 | NULL         | Binlog Dump GTID |  485 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
|   8 | repler   | 172.16.120.12:33392 | NULL         | Binlog Dump GTID |  472 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 110 | proxysql | 172.16.120.10:34094 | NULL         | Sleep            |    4 |                                                               | NULL             |         1 |             0 |
+-----+----------+---------------------+--------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
7 rows in set (0.00 sec)

root@localhost 12:04:46 [dbms_monitor]> kill 7;
Query OK, 0 rows affected (0.00 sec)

root@localhost 12:05:20 [dbms_monitor]> insert into monitor_delay values(6,now());
Query OK, 1 row affected (0.00 sec)

slave-1

root@localhost 12:05:02 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
|  3 | 2020-10-10 11:51:01 |
|  4 | 2020-10-10 11:59:15 |
|  5 | 2020-10-10 12:04:35 |
+----+---------------------+
5 rows in set (0.00 sec)


root@localhost 12:07:32 [dbms_monitor]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: 172.16.120.10
                  Master_User: repler
                  Master_Port: 3358
                Connect_Retry: 1
              Master_Log_File: mysql-bin.000014
          Read_Master_Log_Pos: 778
               Relay_Log_File: mysql-relay-bin.000003
                Relay_Log_Pos: 605
        Relay_Master_Log_File: mysql-bin.000014
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 778
              Relay_Log_Space: 1317
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error reconnecting to master 'repler@172.16.120.10:3358' - retry-time: 1  retries: 2
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 120103358
                  Master_UUID: 44a4ea53-fcad-11ea-bd16-0050563b7b42
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 201010 12:06:57
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20747-20748
            Executed_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20748,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

slave-2

root@localhost 12:04:42 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
|  3 | 2020-10-10 11:51:01 |
|  4 | 2020-10-10 11:59:15 |
|  5 | 2020-10-10 12:04:35 |
|  6 | 2020-10-10 12:05:30 |
+----+---------------------+
6 rows in set (0.00 sec)

关闭master

root@localhost 12:21:24 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

结论: 会failover且成功

Sat Oct 10 12:11:18 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (111) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 12:11:18 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Sat Oct 10 12:11:18 2020 - [info] Executing SSH check script: exit 0
Sat Oct 10 12:11:19 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 12:11:21 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 12:11:21 2020 - [warning] Connection failed 2 time(s)..
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 12:11:24 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 12:11:24 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 12:11:24 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 12:11:27 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 12:11:27 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 12:11:27 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 12:11:27 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 12:11:27 2020 - [warning] SSH is reachable.
Sat Oct 10 12:11:27 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 12:11:27 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 12:11:27 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 12:11:27 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 12:11:28 2020 - [info] GTID failover mode = 1
Sat Oct 10 12:11:28 2020 - [info] Dead Servers:
Sat Oct 10 12:11:28 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:28 2020 - [info] Alive Servers:
Sat Oct 10 12:11:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 12:11:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 12:11:28 2020 - [info] Alive Slaves:
Sat Oct 10 12:11:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:11:28 2020 - [info]     GTID ON
Sat Oct 10 12:11:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:11:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:11:28 2020 - [info]     GTID ON
Sat Oct 10 12:11:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:11:28 2020 - [info] Checking slave configurations..
Sat Oct 10 12:11:28 2020 - [info] Checking replication filtering settings..
Sat Oct 10 12:11:28 2020 - [info]  Replication filtering check ok.
Sat Oct 10 12:11:28 2020 - [info] Master is down!
Sat Oct 10 12:11:28 2020 - [info] Terminating monitoring script.
Sat Oct 10 12:11:28 2020 - [info] Got exit code 20 (Master dead).
Sat Oct 10 12:11:28 2020 - [info] MHA::MasterFailover version 0.58.
Sat Oct 10 12:11:28 2020 - [info] Starting master failover.
Sat Oct 10 12:11:28 2020 - [info] 
Sat Oct 10 12:11:28 2020 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 10 12:11:28 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] GTID failover mode = 1
Sat Oct 10 12:11:29 2020 - [info] Dead Servers:
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:29 2020 - [info] Checking master reachability via MySQL(double check)...
Sat Oct 10 12:11:29 2020 - [info]  ok.
Sat Oct 10 12:11:29 2020 - [info] Alive Servers:
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 12:11:29 2020 - [info] Alive Slaves:
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:11:29 2020 - [info]     GTID ON
Sat Oct 10 12:11:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:11:29 2020 - [info]     GTID ON
Sat Oct 10 12:11:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:11:29 2020 - [info] Starting GTID based failover.
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Oct 10 12:11:29 2020 - [info] Executing master IP deactivation script:
Sat Oct 10 12:11:29 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stopssh --ssh_user=root  
Disabling the VIP on old master: 172.16.120.10 
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1 
Sat Oct 10 12:11:29 2020 - [info]  done.
Sat Oct 10 12:11:29 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Oct 10 12:11:29 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] * Phase 3: Master Recovery Phase..
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000014:1070
Sat Oct 10 12:11:29 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20747-20749
Sat Oct 10 12:11:29 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:11:29 2020 - [info]     GTID ON
Sat Oct 10 12:11:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:11:29 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000014:778
Sat Oct 10 12:11:29 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20747-20748
Sat Oct 10 12:11:29 2020 - [info] Oldest slaves:
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:11:29 2020 - [info]     GTID ON
Sat Oct 10 12:11:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] * Phase 3.3: Determining New Master Phase..
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] Searching new master from slaves..
Sat Oct 10 12:11:29 2020 - [info]  Candidate masters from the configuration file:
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:11:29 2020 - [info]     GTID ON
Sat Oct 10 12:11:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:11:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:11:29 2020 - [info]     GTID ON
Sat Oct 10 12:11:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:11:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:11:29 2020 - [info]  Non-candidate masters:
Sat Oct 10 12:11:29 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Oct 10 12:11:29 2020 - [info] New master is 172.16.120.12(172.16.120.12:3358)
Sat Oct 10 12:11:29 2020 - [info] Starting master failover..
Sat Oct 10 12:11:29 2020 - [info] 
From:
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

To:
172.16.120.12(172.16.120.12:3358) (new master)
 +--172.16.120.11(172.16.120.11:3358)
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Oct 10 12:11:29 2020 - [info] 
Sat Oct 10 12:11:29 2020 - [info]  Waiting all logs to be applied.. 
Sat Oct 10 12:11:29 2020 - [info]   done.
Sat Oct 10 12:11:29 2020 - [info] Getting new master's binlog name and position..
Sat Oct 10 12:11:29 2020 - [info]  mysql-bin.000007:3233250
Sat Oct 10 12:11:29 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.12', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Sat Oct 10 12:11:29 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000007, 3233250, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20749,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Sat Oct 10 12:11:29 2020 - [info] Executing master IP activate script:
Sat Oct 10 12:11:29 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.12 --new_master_ip=172.16.120.12 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.12 
RTNETLINK answers: File exists
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0 
Set read_only=0 on the new master.
Creating app user on the new master..
Sat Oct 10 12:11:30 2020 - [info]  OK.
Sat Oct 10 12:11:30 2020 - [info] ** Finished master recovery successfully.
Sat Oct 10 12:11:30 2020 - [info] * Phase 3: Master Recovery Phase completed.
Sat Oct 10 12:11:30 2020 - [info] 
Sat Oct 10 12:11:30 2020 - [info] * Phase 4: Slaves Recovery Phase..
Sat Oct 10 12:11:30 2020 - [info] 
Sat Oct 10 12:11:30 2020 - [info] 
Sat Oct 10 12:11:30 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Oct 10 12:11:30 2020 - [info] 
Sat Oct 10 12:11:30 2020 - [info] -- Slave recovery on host 172.16.120.11(172.16.120.11:3358) started, pid: 81557. Check tmp log /masterha/cls_new//172.16.120.11_3358_20201010121128.log if it takes time..
Sat Oct 10 12:11:31 2020 - [info] 
Sat Oct 10 12:11:31 2020 - [info] Log messages from 172.16.120.11 ...
Sat Oct 10 12:11:31 2020 - [info] 
Sat Oct 10 12:11:30 2020 - [info]  Resetting slave 172.16.120.11(172.16.120.11:3358) and starting replication from the new master 172.16.120.12(172.16.120.12:3358)..
Sat Oct 10 12:11:30 2020 - [info]  Executed CHANGE MASTER.
Sat Oct 10 12:11:30 2020 - [info]  Slave started.
Sat Oct 10 12:11:30 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20749,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.11(172.16.120.11:3358). Executed 2 events.
Sat Oct 10 12:11:31 2020 - [info] End of log messages from 172.16.120.11.
Sat Oct 10 12:11:31 2020 - [info] -- Slave on host 172.16.120.11(172.16.120.11:3358) started.
Sat Oct 10 12:11:31 2020 - [info] All new slave servers recovered successfully.
Sat Oct 10 12:11:31 2020 - [info] 
Sat Oct 10 12:11:31 2020 - [info] * Phase 5: New master cleanup phase..
Sat Oct 10 12:11:31 2020 - [info] 
Sat Oct 10 12:11:31 2020 - [info] Resetting slave info on the new master..
Sat Oct 10 12:11:31 2020 - [info]  172.16.120.12: Resetting slave info succeeded.
Sat Oct 10 12:11:31 2020 - [info] Master failover to 172.16.120.12(172.16.120.12:3358) completed successfully.
Sat Oct 10 12:11:31 2020 - [info] 

----- Failover Report -----

cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.12(172.16.120.12:3358) succeeded

Master 172.16.120.10(172.16.120.10:3358) is down!

Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.12(172.16.120.12:3358) as a new master.
172.16.120.12(172.16.120.12:3358): OK: Applying all logs succeeded.
172.16.120.12(172.16.120.12:3358): OK: Activated master IP address.
172.16.120.11(172.16.120.11:3358): OK: Slave started, replicating from 172.16.120.12(172.16.120.12:3358)
172.16.120.12(172.16.120.12:3358): Resetting slave info succeeded.
Master failover to 172.16.120.12(172.16.120.12:3358) completed successfully.
Sat Oct 10 12:11:31 2020 - [info] Sending mail..

ping_type=INSERT

启动manager后, 调整master防火墙, 禁止slave-1访问

Sat Oct 10 12:14:59 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 12:15:00 2020 - [info] GTID failover mode = 1
Sat Oct 10 12:15:00 2020 - [info] Dead Servers:
Sat Oct 10 12:15:00 2020 - [info] Alive Servers:
Sat Oct 10 12:15:00 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:15:00 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 12:15:00 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 12:15:00 2020 - [info] Alive Slaves:
Sat Oct 10 12:15:00 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:15:00 2020 - [info]     GTID ON
Sat Oct 10 12:15:00 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:15:00 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:15:00 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:15:00 2020 - [info]     GTID ON
Sat Oct 10 12:15:00 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:15:00 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:15:00 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:15:00 2020 - [info] Checking slave configurations..
Sat Oct 10 12:15:00 2020 - [info] Checking replication filtering settings..
Sat Oct 10 12:15:00 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 12:15:00 2020 - [info]  Replication filtering check ok.
Sat Oct 10 12:15:00 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 12:15:00 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 12:15:00 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 12:15:00 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 12:15:00 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 12:15:00 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 12:15:00 2020 - [info]  OK.
Sat Oct 10 12:15:00 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 12:15:00 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 12:15:00 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 12:15:00 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 12:15:01 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..

调整master防火墙, 禁止slave-1访问

IPTABLES="/sbin/iptables"
$IPTABLES -F
$IPTABLES -A INPUT -p icmp --icmp-type any -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.10 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.13 -j ACCEPT
$IPTABLES -A INPUT -p tcp -s 172.16.120.12 -j ACCEPT
$IPTABLES -A INPUT -p tcp --syn -j DROP

在master kill slave-1 io_thread

root@localhost 12:13:11 [dbms_monitor]> show processlist;
+----+----------+---------------------+--------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
| Id | User     | Host                | db           | Command          | Time | State                                                         | Info             | Rows_sent | Rows_examined |
+----+----------+---------------------+--------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
|  2 | proxysql | 172.16.120.12:33486 | NULL         | Sleep            |    9 |                                                               | NULL             |         0 |             0 |
|  3 | root     | localhost           | dbms_monitor | Query            |    0 | starting                                                      | show processlist |         0 |             0 |
|  4 | proxysql | 172.16.120.10:34148 | NULL         | Sleep            |    4 |                                                               | NULL             |         0 |             0 |
|  5 | proxysql | 172.16.120.11:35190 | NULL         | Sleep            |    2 |                                                               | NULL             |         0 |             0 |
|  8 | repler   | 172.16.120.11:35200 | NULL         | Binlog Dump GTID |  428 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
|  9 | repler   | 172.16.120.12:33490 | NULL         | Binlog Dump GTID |  379 | Master has sent all binlog to slave; waiting for more updates | NULL             |         0 |             0 |
| 13 | mha      | 172.16.120.13:40526 | NULL         | Sleep            |    0 |                                                               | NULL             |         0 |             0 |
| 17 | proxysql | 172.16.120.10:34164 | NULL         | Sleep            |   14 |                                                               | NULL             |         1 |             0 |
+----+----------+---------------------+--------------+------------------+------+---------------------------------------------------------------+------------------+-----------+---------------+
8 rows in set (0.00 sec)

root@localhost 12:20:46 [dbms_monitor]> kill 8;
Query OK, 0 rows affected (0.00 sec)

root@localhost 12:20:59 [dbms_monitor]> truncate table monitor_delay;
Query OK, 0 rows affected (0.01 sec)

root@localhost 12:21:10 [dbms_monitor]> insert into monitor_delay values(88,now());
Query OK, 1 row affected (0.00 sec)

root@localhost 12:21:17 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
| 88 | 2020-10-10 12:21:17 |
+----+---------------------+
1 row in set (0.00 sec)

slave-1

root@localhost 12:21:29 [dbms_monitor]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: 172.16.120.10
                  Master_User: repler
                  Master_Port: 3358
                Connect_Retry: 1
              Master_Log_File: mysql-bin.000015
          Read_Master_Log_Pos: 85472
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 85645
        Relay_Master_Log_File: mysql-bin.000015
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 85472
              Relay_Log_Space: 85852
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error reconnecting to master 'repler@172.16.120.10:3358' - retry-time: 1  retries: 1
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 120103358
                  Master_UUID: 44a4ea53-fcad-11ea-bd16-0050563b7b42
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 201010 12:21:59
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20750-21111
            Executed_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-21111,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

root@localhost 12:22:03 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
|  3 | 2020-10-10 11:51:01 |
|  4 | 2020-10-10 11:59:15 |
|  5 | 2020-10-10 12:04:35 |
|  6 | 2020-10-10 12:05:30 |
+----+---------------------+
6 rows in set (0.00 sec)

slave-2

root@localhost 12:21:32 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
| 88 | 2020-10-10 12:21:17 |
+----+---------------------+
1 row in set (0.00 sec)

关闭master

root@localhost 12:21:24 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

结论: 会failover且成功

Sat Oct 10 12:22:43 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Sat Oct 10 12:22:43 2020 - [info] Executing SSH check script: exit 0
Sat Oct 10 12:22:43 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Sat Oct 10 12:22:43 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 12:22:46 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 12:22:46 2020 - [warning] Connection failed 2 time(s)..
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 12:22:48 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 12:22:49 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 12:22:49 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 12:22:52 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 12:22:52 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 12:22:52 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 12:22:52 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 12:22:52 2020 - [warning] SSH is reachable.
Sat Oct 10 12:22:52 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 12:22:52 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 12:22:52 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 12:22:52 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 12:22:53 2020 - [info] GTID failover mode = 1
Sat Oct 10 12:22:53 2020 - [info] Dead Servers:
Sat Oct 10 12:22:53 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:53 2020 - [info] Alive Servers:
Sat Oct 10 12:22:53 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 12:22:53 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 12:22:53 2020 - [info] Alive Slaves:
Sat Oct 10 12:22:53 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:22:53 2020 - [info]     GTID ON
Sat Oct 10 12:22:53 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:53 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:22:53 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:22:53 2020 - [info]     GTID ON
Sat Oct 10 12:22:53 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:53 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:22:53 2020 - [info] Checking slave configurations..
Sat Oct 10 12:22:53 2020 - [info] Checking replication filtering settings..
Sat Oct 10 12:22:53 2020 - [info]  Replication filtering check ok.
Sat Oct 10 12:22:53 2020 - [info] Master is down!
Sat Oct 10 12:22:53 2020 - [info] Terminating monitoring script.
Sat Oct 10 12:22:53 2020 - [info] Got exit code 20 (Master dead).
Sat Oct 10 12:22:53 2020 - [info] MHA::MasterFailover version 0.58.
Sat Oct 10 12:22:53 2020 - [info] Starting master failover.
Sat Oct 10 12:22:53 2020 - [info] 
Sat Oct 10 12:22:53 2020 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 10 12:22:53 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] GTID failover mode = 1
Sat Oct 10 12:22:54 2020 - [info] Dead Servers:
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:54 2020 - [info] Alive Servers:
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 12:22:54 2020 - [info] Alive Slaves:
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:22:54 2020 - [info]     GTID ON
Sat Oct 10 12:22:54 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:54 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:22:54 2020 - [info]     GTID ON
Sat Oct 10 12:22:54 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:54 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:22:54 2020 - [info] Starting GTID based failover.
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Oct 10 12:22:54 2020 - [info] Executing master IP deactivation script:
Sat Oct 10 12:22:54 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stopssh --ssh_user=root  
Disabling the VIP on old master: 172.16.120.10 
RTNETLINK answers: Cannot assign requested address
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1 
Sat Oct 10 12:22:54 2020 - [info]  done.
Sat Oct 10 12:22:54 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Oct 10 12:22:54 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] * Phase 3: Master Recovery Phase..
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000015:110146
Sat Oct 10 12:22:54 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20750-21216
Sat Oct 10 12:22:54 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:22:54 2020 - [info]     GTID ON
Sat Oct 10 12:22:54 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:54 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:22:54 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000015:85472
Sat Oct 10 12:22:54 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20750-21111
Sat Oct 10 12:22:54 2020 - [info] Oldest slaves:
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:22:54 2020 - [info]     GTID ON
Sat Oct 10 12:22:54 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:54 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] * Phase 3.3: Determining New Master Phase..
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] Searching new master from slaves..
Sat Oct 10 12:22:54 2020 - [info]  Candidate masters from the configuration file:
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:22:54 2020 - [info]     GTID ON
Sat Oct 10 12:22:54 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:54 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:22:54 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 12:22:54 2020 - [info]     GTID ON
Sat Oct 10 12:22:54 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 12:22:54 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 12:22:54 2020 - [info]  Non-candidate masters:
Sat Oct 10 12:22:54 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Oct 10 12:22:54 2020 - [info] New master is 172.16.120.12(172.16.120.12:3358)
Sat Oct 10 12:22:54 2020 - [info] Starting master failover..
Sat Oct 10 12:22:54 2020 - [info] 
From:
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

To:
172.16.120.12(172.16.120.12:3358) (new master)
 +--172.16.120.11(172.16.120.11:3358)
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info]  Waiting all logs to be applied.. 
Sat Oct 10 12:22:54 2020 - [info]   done.
Sat Oct 10 12:22:54 2020 - [info] Getting new master's binlog name and position..
Sat Oct 10 12:22:54 2020 - [info]  mysql-bin.000007:3342407
Sat Oct 10 12:22:54 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.12', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Sat Oct 10 12:22:54 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000007, 3342407, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-21216,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Sat Oct 10 12:22:54 2020 - [info] Executing master IP activate script:
Sat Oct 10 12:22:54 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.12 --new_master_ip=172.16.120.12 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.12 
RTNETLINK answers: File exists
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0 
Set read_only=0 on the new master.
Creating app user on the new master..
Sat Oct 10 12:22:54 2020 - [info]  OK.
Sat Oct 10 12:22:54 2020 - [info] ** Finished master recovery successfully.
Sat Oct 10 12:22:54 2020 - [info] * Phase 3: Master Recovery Phase completed.
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] * Phase 4: Slaves Recovery Phase..
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Oct 10 12:22:54 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info] -- Slave recovery on host 172.16.120.11(172.16.120.11:3358) started, pid: 82756. Check tmp log /masterha/cls_new//172.16.120.11_3358_20201010122253.log if it takes time..
Sat Oct 10 12:22:55 2020 - [info] 
Sat Oct 10 12:22:55 2020 - [info] Log messages from 172.16.120.11 ...
Sat Oct 10 12:22:55 2020 - [info] 
Sat Oct 10 12:22:54 2020 - [info]  Resetting slave 172.16.120.11(172.16.120.11:3358) and starting replication from the new master 172.16.120.12(172.16.120.12:3358)..
Sat Oct 10 12:22:54 2020 - [info]  Executed CHANGE MASTER.
Sat Oct 10 12:22:54 2020 - [info]  Slave started.
Sat Oct 10 12:22:55 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-21216,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.11(172.16.120.11:3358). Executed 2 events.
Sat Oct 10 12:22:55 2020 - [info] End of log messages from 172.16.120.11.
Sat Oct 10 12:22:55 2020 - [info] -- Slave on host 172.16.120.11(172.16.120.11:3358) started.
Sat Oct 10 12:22:55 2020 - [info] All new slave servers recovered successfully.
Sat Oct 10 12:22:55 2020 - [info] 
Sat Oct 10 12:22:55 2020 - [info] * Phase 5: New master cleanup phase..
Sat Oct 10 12:22:55 2020 - [info] 
Sat Oct 10 12:22:55 2020 - [info] Resetting slave info on the new master..
Sat Oct 10 12:22:55 2020 - [info]  172.16.120.12: Resetting slave info succeeded.
Sat Oct 10 12:22:55 2020 - [info] Master failover to 172.16.120.12(172.16.120.12:3358) completed successfully.
Sat Oct 10 12:22:55 2020 - [info] 

----- Failover Report -----

cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.12(172.16.120.12:3358) succeeded

Master 172.16.120.10(172.16.120.10:3358) is down!

Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.12(172.16.120.12:3358) as a new master.
172.16.120.12(172.16.120.12:3358): OK: Applying all logs succeeded.
172.16.120.12(172.16.120.12:3358): OK: Activated master IP address.
172.16.120.11(172.16.120.11:3358): OK: Slave started, replicating from 172.16.120.12(172.16.120.12:3358)
172.16.120.12(172.16.120.12:3358): Resetting slave info succeeded.
Master failover to 172.16.120.12(172.16.120.12:3358) completed successfully.
Sat Oct 10 12:22:55 2020 - [info] Sending mail..

[用例测试] master挂了, 且slave也有问题4(部分slave sql_thread stop)

master挂了, 在此之前slave-1 sql_thread stop了

ping_type=CONNECT

启动manager后, 关闭slave-1 sql_thread

Sat Oct 10 11:43:34 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 11:43:35 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:43:35 2020 - [info] Dead Servers:
Sat Oct 10 11:43:35 2020 - [info] Alive Servers:
Sat Oct 10 11:43:35 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:43:35 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:43:35 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:43:35 2020 - [info] Alive Slaves:
Sat Oct 10 11:43:35 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:43:35 2020 - [info]     GTID ON
Sat Oct 10 11:43:35 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:43:35 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:43:35 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:43:35 2020 - [info]     GTID ON
Sat Oct 10 11:43:35 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:43:35 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:43:35 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:43:35 2020 - [info] Checking slave configurations..
Sat Oct 10 11:43:35 2020 - [info] Checking replication filtering settings..
Sat Oct 10 11:43:35 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 11:43:35 2020 - [info]  Replication filtering check ok.
Sat Oct 10 11:43:35 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 11:43:35 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 11:43:35 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 11:43:35 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 11:43:35 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 11:43:35 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 11:43:35 2020 - [info]  OK.
Sat Oct 10 11:43:35 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 11:43:35 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 11:43:35 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 11:43:35 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 11:43:35 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..

关闭slave-1 sql_thread

root@localhost 11:44:26 [dbms_monitor]> stop slave sql_thread;
Query OK, 0 rows affected (0.01 sec)

root@localhost 11:50:00 [dbms_monitor]> pager cat - | grep -E 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Slave_IO_Running|Slave_SQL_Running|Slave_SQL_Running_State|Last|Relay_Log_File|Relay_Log_Pos'
PAGER set to 'cat - | grep -E 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Slave_IO_Running|Slave_SQL_Running|Slave_SQL_Running_State|Last|Relay_Log_File|Relay_Log_Pos''
root@localhost 11:50:04 [dbms_monitor]> show slave status\G
              Master_Log_File: mysql-bin.000013
          Read_Master_Log_Pos: 194
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 367
        Relay_Master_Log_File: mysql-bin.000013
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                   Last_Errno: 0
                   Last_Error: 
          Exec_Master_Log_Pos: 194
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
      Slave_SQL_Running_State: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
1 row in set (0.00 sec)

关闭sql_thread后 manager仍然正常

关闭master

root@localhost 11:40:00 [dbms_monitor]> insert into monitor_delay values(3,now());
Query OK, 1 row affected (0.00 sec)

root@localhost 11:51:01 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

slave-1

root@localhost 11:50:04 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
+----+---------------------+
2 rows in set (0.00 sec)

slave-2

root@localhost 11:36:17 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
|  3 | 2020-10-10 11:51:01 |
+----+---------------------+
3 rows in set (0.00 sec)

结论: 会failover且成功

Sat Oct 10 11:51:18 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (111) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:51:18 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Sat Oct 10 11:51:18 2020 - [info] Executing SSH check script: exit 0
Sat Oct 10 11:51:18 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 11:51:18 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 11:51:21 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:51:21 2020 - [warning] Connection failed 2 time(s)..
Sat Oct 10 11:51:24 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:51:24 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 11:51:27 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 11:51:27 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 11:51:27 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 11:51:27 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 11:51:27 2020 - [warning] SSH is reachable.
Sat Oct 10 11:51:27 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 11:51:27 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 11:51:27 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 11:51:27 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 11:51:28 2020 - [warning] SQL Thread is stopped(no error) on 172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:51:28 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:51:28 2020 - [info] Dead Servers:
Sat Oct 10 11:51:28 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:28 2020 - [info] Alive Servers:
Sat Oct 10 11:51:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:51:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:51:28 2020 - [info] Alive Slaves:
Sat Oct 10 11:51:28 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:28 2020 - [info]     GTID ON
Sat Oct 10 11:51:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:28 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:28 2020 - [info]     GTID ON
Sat Oct 10 11:51:28 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:28 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:28 2020 - [info] Checking slave configurations..
Sat Oct 10 11:51:28 2020 - [info] Checking replication filtering settings..
Sat Oct 10 11:51:28 2020 - [info]  Replication filtering check ok.
Sat Oct 10 11:51:28 2020 - [info] Master is down!
Sat Oct 10 11:51:28 2020 - [info] Terminating monitoring script.
Sat Oct 10 11:51:28 2020 - [info] Got exit code 20 (Master dead).
Sat Oct 10 11:51:28 2020 - [info] MHA::MasterFailover version 0.58.
Sat Oct 10 11:51:28 2020 - [info] Starting master failover.
Sat Oct 10 11:51:28 2020 - [info] 
Sat Oct 10 11:51:28 2020 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 10 11:51:28 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [warning] SQL Thread is stopped(no error) on 172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:51:29 2020 - [info] GTID failover mode = 1
Sat Oct 10 11:51:29 2020 - [info] Dead Servers:
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info] Checking master reachability via MySQL(double check)...
Sat Oct 10 11:51:29 2020 - [info]  ok.
Sat Oct 10 11:51:29 2020 - [info] Alive Servers:
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:51:29 2020 - [info] Alive Slaves:
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:29 2020 - [info]     GTID ON
Sat Oct 10 11:51:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:29 2020 - [info]     GTID ON
Sat Oct 10 11:51:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:29 2020 - [info]  Starting SQL thread on 172.16.120.11(172.16.120.11:3358) ..
Sat Oct 10 11:51:29 2020 - [info]   done.
Sat Oct 10 11:51:29 2020 - [info] Starting GTID based failover.
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Oct 10 11:51:29 2020 - [info] Executing master IP deactivation script:
Sat Oct 10 11:51:29 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stopssh --ssh_user=root  
Disabling the VIP on old master: 172.16.120.10 
RTNETLINK answers: Cannot assign requested address
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1 
Sat Oct 10 11:51:29 2020 - [info]  done.
Sat Oct 10 11:51:29 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Oct 10 11:51:29 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] * Phase 3: Master Recovery Phase..
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000013:486
Sat Oct 10 11:51:29 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20746
Sat Oct 10 11:51:29 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:29 2020 - [info]     GTID ON
Sat Oct 10 11:51:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:29 2020 - [info]     GTID ON
Sat Oct 10 11:51:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:29 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000013:486
Sat Oct 10 11:51:29 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:20746
Sat Oct 10 11:51:29 2020 - [info] Oldest slaves:
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:29 2020 - [info]     GTID ON
Sat Oct 10 11:51:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:29 2020 - [info]     GTID ON
Sat Oct 10 11:51:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] * Phase 3.3: Determining New Master Phase..
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] Searching new master from slaves..
Sat Oct 10 11:51:29 2020 - [info]  Candidate masters from the configuration file:
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:29 2020 - [info]     GTID ON
Sat Oct 10 11:51:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:29 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 11:51:29 2020 - [info]     GTID ON
Sat Oct 10 11:51:29 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 11:51:29 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 11:51:29 2020 - [info]  Non-candidate masters:
Sat Oct 10 11:51:29 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Oct 10 11:51:29 2020 - [info] New master is 172.16.120.11(172.16.120.11:3358)
Sat Oct 10 11:51:29 2020 - [info] Starting master failover..
Sat Oct 10 11:51:29 2020 - [info] 
From:
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

To:
172.16.120.11(172.16.120.11:3358) (new master)
 +--172.16.120.12(172.16.120.12:3358)
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info]  Waiting all logs to be applied.. 
Sat Oct 10 11:51:29 2020 - [info]   done.
Sat Oct 10 11:51:29 2020 - [info]  Replicating from the latest slave 172.16.120.12(172.16.120.12:3358) and waiting to apply..
Sat Oct 10 11:51:29 2020 - [info]  Waiting all logs to be applied on the latest slave.. 
Sat Oct 10 11:51:29 2020 - [info]  Resetting slave 172.16.120.11(172.16.120.11:3358) and starting replication from the new master 172.16.120.12(172.16.120.12:3358)..
Sat Oct 10 11:51:29 2020 - [info]  Executed CHANGE MASTER.
Sat Oct 10 11:51:29 2020 - [info]  Slave started.
Sat Oct 10 11:51:29 2020 - [info]  Waiting to execute all relay logs on 172.16.120.11(172.16.120.11:3358)..
Sat Oct 10 11:51:29 2020 - [info]  master_pos_wait(mysql-bin.000007:3232449) completed on 172.16.120.11(172.16.120.11:3358). Executed 1 events.
Sat Oct 10 11:51:29 2020 - [info]   done.
Sat Oct 10 11:51:29 2020 - [info]   done.
Sat Oct 10 11:51:29 2020 - [info] Getting new master's binlog name and position..
Sat Oct 10 11:51:29 2020 - [info]  mysql-bin.000010:141523
Sat Oct 10 11:51:29 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.11', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Sat Oct 10 11:51:29 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000010, 141523, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20746,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Sat Oct 10 11:51:29 2020 - [info] Executing master IP activate script:
Sat Oct 10 11:51:29 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.11 --new_master_ip=172.16.120.11 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.11 
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0 
Set read_only=0 on the new master.
Creating app user on the new master..
Sat Oct 10 11:51:29 2020 - [info]  OK.
Sat Oct 10 11:51:29 2020 - [info] ** Finished master recovery successfully.
Sat Oct 10 11:51:29 2020 - [info] * Phase 3: Master Recovery Phase completed.
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] * Phase 4: Slaves Recovery Phase..
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Oct 10 11:51:29 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info] -- Slave recovery on host 172.16.120.12(172.16.120.12:3358) started, pid: 79937. Check tmp log /masterha/cls_new//172.16.120.12_3358_20201010115128.log if it takes time..
Sat Oct 10 11:51:30 2020 - [info] 
Sat Oct 10 11:51:30 2020 - [info] Log messages from 172.16.120.12 ...
Sat Oct 10 11:51:30 2020 - [info] 
Sat Oct 10 11:51:29 2020 - [info]  Resetting slave 172.16.120.12(172.16.120.12:3358) and starting replication from the new master 172.16.120.11(172.16.120.11:3358)..
Sat Oct 10 11:51:29 2020 - [info]  Executed CHANGE MASTER.
Sat Oct 10 11:51:29 2020 - [info]  Slave started.
Sat Oct 10 11:51:29 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-20746,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.12(172.16.120.12:3358). Executed 0 events.
Sat Oct 10 11:51:30 2020 - [info] End of log messages from 172.16.120.12.
Sat Oct 10 11:51:30 2020 - [info] -- Slave on host 172.16.120.12(172.16.120.12:3358) started.
Sat Oct 10 11:51:30 2020 - [info] All new slave servers recovered successfully.
Sat Oct 10 11:51:30 2020 - [info] 
Sat Oct 10 11:51:30 2020 - [info] * Phase 5: New master cleanup phase..
Sat Oct 10 11:51:30 2020 - [info] 
Sat Oct 10 11:51:30 2020 - [info] Resetting slave info on the new master..
Sat Oct 10 11:51:30 2020 - [info]  172.16.120.11: Resetting slave info succeeded.
Sat Oct 10 11:51:30 2020 - [info] Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Sat Oct 10 11:51:30 2020 - [info] 

----- Failover Report -----

cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.11(172.16.120.11:3358) succeeded

Master 172.16.120.10(172.16.120.10:3358) is down!

Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.11(172.16.120.11:3358) as a new master.
172.16.120.11(172.16.120.11:3358): OK: Applying all logs succeeded.
172.16.120.11(172.16.120.11:3358): OK: Activated master IP address.
172.16.120.12(172.16.120.12:3358): OK: Slave started, replicating from 172.16.120.11(172.16.120.11:3358)
172.16.120.11(172.16.120.11:3358): Resetting slave info succeeded.
Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Sat Oct 10 11:51:30 2020 - [info] Sending mail..

slave-1成了new master

root@localhost 11:51:06 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
|  1 | 2020-10-10 11:21:39 |
|  2 | 2020-10-10 11:31:45 |
|  3 | 2020-10-10 11:51:01 |
+----+---------------------+
3 rows in set (0.00 sec)

root@localhost 11:54:53 [dbms_monitor]> show slave status;
Empty set (0.00 sec)

ping_type=INSERT

启动manager后, 关闭slave-1 sql_thread

Sat Oct 10 14:06:09 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 14:06:10 2020 - [info] GTID failover mode = 1
Sat Oct 10 14:06:10 2020 - [info] Dead Servers:
Sat Oct 10 14:06:10 2020 - [info] Alive Servers:
Sat Oct 10 14:06:10 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:06:10 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 14:06:10 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 14:06:10 2020 - [info] Alive Slaves:
Sat Oct 10 14:06:10 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:06:10 2020 - [info]     GTID ON
Sat Oct 10 14:06:10 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:06:10 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:06:10 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:06:10 2020 - [info]     GTID ON
Sat Oct 10 14:06:10 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:06:10 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:06:10 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:06:10 2020 - [info] Checking slave configurations..
Sat Oct 10 14:06:10 2020 - [info] Checking replication filtering settings..
Sat Oct 10 14:06:10 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 14:06:10 2020 - [info]  Replication filtering check ok.
Sat Oct 10 14:06:10 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 14:06:10 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 14:06:10 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 14:06:10 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 14:06:10 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 14:06:10 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 14:06:10 2020 - [info]  OK.
Sat Oct 10 14:06:10 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 14:06:10 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 14:06:10 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 14:06:10 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 14:06:10 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..

关闭slave-1 sql_thread

root@localhost 14:10:20 [dbms_monitor]> stop slave sql_thread;
Query OK, 0 rows affected (0.03 sec)

root@localhost 14:22:34 [dbms_monitor]> pager cat - | grep -E 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Slave_IO_Running|Slave_SQL_Running|Slave_SQL_Running_State|Last|Relay_Log_File|Relay_Log_Pos'
PAGER set to 'cat - | grep -E 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Slave_IO_Running|Slave_SQL_Running|Slave_SQL_Running_State|Last|Relay_Log_File|Relay_Log_Pos''
root@localhost 14:22:57 [dbms_monitor]> show slave status\G
              Master_Log_File: mysql-bin.000016
          Read_Master_Log_Pos: 238476
               Relay_Log_File: mysql-relay-bin.000004
                Relay_Log_Pos: 211835
        Relay_Master_Log_File: mysql-bin.000016
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                   Last_Errno: 0
                   Last_Error: 
          Exec_Master_Log_Pos: 211756
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
      Slave_SQL_Running_State: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
1 row in set (0.00 sec)

root@localhost 14:22:57 [dbms_monitor]> pager
Default pager wasn't set, using stdout.

master

root@localhost 14:22:09 [dbms_monitor]> insert into monitor_delay values(90, now());
Query OK, 1 row affected (0.00 sec)

root@localhost 14:22:29 [dbms_monitor]>  select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
| 88 | 2020-10-10 12:21:17 |
| 90 | 2020-10-10 14:22:29 |
+----+---------------------+
2 rows in set (0.01 sec)

slave-1

root@localhost 14:22:57 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
| 88 | 2020-10-10 12:21:17 |
+----+---------------------+
1 row in set (0.00 sec)

slave-2

root@localhost 14:22:36 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
| 88 | 2020-10-10 12:21:17 |
| 90 | 2020-10-10 14:22:29 |
+----+---------------------+
2 rows in set (0.00 sec)

关闭sql_thread后 manager仍然正常

关闭master

root@localhost 14:23:38 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

结论: 会failover且成功

Sat Oct 10 14:25:05 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Sat Oct 10 14:25:05 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Sat Oct 10 14:25:05 2020 - [info] Executing SSH check script: exit 0
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Sat Oct 10 14:25:06 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 14:25:06 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 14:25:08 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 14:25:08 2020 - [warning] Connection failed 2 time(s)..
Sat Oct 10 14:25:11 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 14:25:11 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 14:25:14 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 14:25:14 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 14:25:14 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 14:25:14 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 14:25:14 2020 - [warning] SSH is reachable.
Sat Oct 10 14:25:14 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 14:25:14 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 14:25:14 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 14:25:14 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 14:25:15 2020 - [warning] SQL Thread is stopped(no error) on 172.16.120.11(172.16.120.11:3358)
Sat Oct 10 14:25:15 2020 - [info] GTID failover mode = 1
Sat Oct 10 14:25:15 2020 - [info] Dead Servers:
Sat Oct 10 14:25:15 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:15 2020 - [info] Alive Servers:
Sat Oct 10 14:25:15 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 14:25:15 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 14:25:15 2020 - [info] Alive Slaves:
Sat Oct 10 14:25:15 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:15 2020 - [info]     GTID ON
Sat Oct 10 14:25:15 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:15 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:15 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:15 2020 - [info]     GTID ON
Sat Oct 10 14:25:15 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:15 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:15 2020 - [info] Checking slave configurations..
Sat Oct 10 14:25:15 2020 - [info] Checking replication filtering settings..
Sat Oct 10 14:25:15 2020 - [info]  Replication filtering check ok.
Sat Oct 10 14:25:15 2020 - [info] Master is down!
Sat Oct 10 14:25:15 2020 - [info] Terminating monitoring script.
Sat Oct 10 14:25:15 2020 - [info] Got exit code 20 (Master dead).
Sat Oct 10 14:25:15 2020 - [info] MHA::MasterFailover version 0.58.
Sat Oct 10 14:25:15 2020 - [info] Starting master failover.
Sat Oct 10 14:25:15 2020 - [info] 
Sat Oct 10 14:25:15 2020 - [info] * Phase 1: Configuration Check Phase..
Sat Oct 10 14:25:15 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [warning] SQL Thread is stopped(no error) on 172.16.120.11(172.16.120.11:3358)
Sat Oct 10 14:25:16 2020 - [info] GTID failover mode = 1
Sat Oct 10 14:25:16 2020 - [info] Dead Servers:
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info] Alive Servers:
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 14:25:16 2020 - [info] Alive Slaves:
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:16 2020 - [info]     GTID ON
Sat Oct 10 14:25:16 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:16 2020 - [info]     GTID ON
Sat Oct 10 14:25:16 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:16 2020 - [info]  Starting SQL thread on 172.16.120.11(172.16.120.11:3358) ..
Sat Oct 10 14:25:16 2020 - [info]   done.
Sat Oct 10 14:25:16 2020 - [info] Starting GTID based failover.
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Oct 10 14:25:16 2020 - [info] Executing master IP deactivation script:
Sat Oct 10 14:25:16 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --command=stopssh --ssh_user=root  
Disabling the VIP on old master: 172.16.120.10 
RTNETLINK answers: Cannot assign requested address
Fake!!! 原主库 rpl_semi_sync_master_enabled=0 rpl_semi_sync_slave_enabled=1 
Sat Oct 10 14:25:16 2020 - [info]  done.
Sat Oct 10 14:25:16 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Oct 10 14:25:16 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] * Phase 3: Master Recovery Phase..
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000016:268346
Sat Oct 10 14:25:16 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:21217-22354
Sat Oct 10 14:25:16 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:16 2020 - [info]     GTID ON
Sat Oct 10 14:25:16 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:16 2020 - [info]     GTID ON
Sat Oct 10 14:25:16 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:16 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000016:268346
Sat Oct 10 14:25:16 2020 - [info] Retrieved Gtid Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:21217-22354
Sat Oct 10 14:25:16 2020 - [info] Oldest slaves:
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:16 2020 - [info]     GTID ON
Sat Oct 10 14:25:16 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:16 2020 - [info]     GTID ON
Sat Oct 10 14:25:16 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] * Phase 3.3: Determining New Master Phase..
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] Searching new master from slaves..
Sat Oct 10 14:25:16 2020 - [info]  Candidate masters from the configuration file:
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:16 2020 - [info]     GTID ON
Sat Oct 10 14:25:16 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:16 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:25:16 2020 - [info]     GTID ON
Sat Oct 10 14:25:16 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:25:16 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:25:16 2020 - [info]  Non-candidate masters:
Sat Oct 10 14:25:16 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Oct 10 14:25:16 2020 - [info] New master is 172.16.120.11(172.16.120.11:3358)
Sat Oct 10 14:25:16 2020 - [info] Starting master failover..
Sat Oct 10 14:25:16 2020 - [info] 
From:
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

To:
172.16.120.11(172.16.120.11:3358) (new master)
 +--172.16.120.12(172.16.120.12:3358)
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Oct 10 14:25:16 2020 - [info] 
Sat Oct 10 14:25:16 2020 - [info]  Waiting all logs to be applied.. 
Sat Oct 10 14:25:17 2020 - [info]   done.
Sat Oct 10 14:25:17 2020 - [info]  Replicating from the latest slave 172.16.120.12(172.16.120.12:3358) and waiting to apply..
Sat Oct 10 14:25:17 2020 - [info]  Waiting all logs to be applied on the latest slave.. 
Sat Oct 10 14:25:17 2020 - [info]  Resetting slave 172.16.120.11(172.16.120.11:3358) and starting replication from the new master 172.16.120.12(172.16.120.12:3358)..
Sat Oct 10 14:25:17 2020 - [info]  Executed CHANGE MASTER.
Sat Oct 10 14:25:17 2020 - [info]  Slave started.
Sat Oct 10 14:25:17 2020 - [info]  Waiting to execute all relay logs on 172.16.120.11(172.16.120.11:3358)..
Sat Oct 10 14:25:17 2020 - [info]  master_pos_wait(mysql-bin.000007:3608644) completed on 172.16.120.11(172.16.120.11:3358). Executed 1 events.
Sat Oct 10 14:25:17 2020 - [info]   done.
Sat Oct 10 14:25:17 2020 - [info]   done.
Sat Oct 10 14:25:17 2020 - [info] Getting new master's binlog name and position..
Sat Oct 10 14:25:17 2020 - [info]  mysql-bin.000010:517718
Sat Oct 10 14:25:17 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.120.11', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Sat Oct 10 14:25:17 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000010, 517718, 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-22354,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
Sat Oct 10 14:25:17 2020 - [info] Executing master IP activate script:
Sat Oct 10 14:25:17 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=start --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 --new_master_host=172.16.120.11 --new_master_ip=172.16.120.11 --new_master_port=3358 --new_master_user='mha'   --new_master_password=xxx
Enabling the VIP - 172.16.120.128 on the new master - 172.16.120.11 
Fake!!! 新主库 rpl_semi_sync_master_enabled=1 rpl_semi_sync_slave_enabled=0 
Set read_only=0 on the new master.
Creating app user on the new master..
Sat Oct 10 14:25:18 2020 - [info]  OK.
Sat Oct 10 14:25:18 2020 - [info] ** Finished master recovery successfully.
Sat Oct 10 14:25:18 2020 - [info] * Phase 3: Master Recovery Phase completed.
Sat Oct 10 14:25:18 2020 - [info] 
Sat Oct 10 14:25:18 2020 - [info] * Phase 4: Slaves Recovery Phase..
Sat Oct 10 14:25:18 2020 - [info] 
Sat Oct 10 14:25:18 2020 - [info] 
Sat Oct 10 14:25:18 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Oct 10 14:25:18 2020 - [info] 
Sat Oct 10 14:25:18 2020 - [info] -- Slave recovery on host 172.16.120.12(172.16.120.12:3358) started, pid: 89417. Check tmp log /masterha/cls_new//172.16.120.12_3358_20201010142515.log if it takes time..
Sat Oct 10 14:25:19 2020 - [info] 
Sat Oct 10 14:25:19 2020 - [info] Log messages from 172.16.120.12 ...
Sat Oct 10 14:25:19 2020 - [info] 
Sat Oct 10 14:25:18 2020 - [info]  Resetting slave 172.16.120.12(172.16.120.12:3358) and starting replication from the new master 172.16.120.11(172.16.120.11:3358)..
Sat Oct 10 14:25:18 2020 - [info]  Executed CHANGE MASTER.
Sat Oct 10 14:25:18 2020 - [info]  Slave started.
Sat Oct 10 14:25:18 2020 - [info]  gtid_wait(44a4ea53-fcad-11ea-bd16-0050563b7b42:1-22354,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27) completed on 172.16.120.12(172.16.120.12:3358). Executed 0 events.
Sat Oct 10 14:25:19 2020 - [info] End of log messages from 172.16.120.12.
Sat Oct 10 14:25:19 2020 - [info] -- Slave on host 172.16.120.12(172.16.120.12:3358) started.
Sat Oct 10 14:25:19 2020 - [info] All new slave servers recovered successfully.
Sat Oct 10 14:25:19 2020 - [info] 
Sat Oct 10 14:25:19 2020 - [info] * Phase 5: New master cleanup phase..
Sat Oct 10 14:25:19 2020 - [info] 
Sat Oct 10 14:25:19 2020 - [info] Resetting slave info on the new master..
Sat Oct 10 14:25:19 2020 - [info]  172.16.120.11: Resetting slave info succeeded.
Sat Oct 10 14:25:19 2020 - [info] Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Sat Oct 10 14:25:19 2020 - [info] 

----- Failover Report -----

cls_new: MySQL Master failover 172.16.120.10(172.16.120.10:3358) to 172.16.120.11(172.16.120.11:3358) succeeded

Master 172.16.120.10(172.16.120.10:3358) is down!

Check MHA Manager logs at centos-4:/masterha/cls_new/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.120.10(172.16.120.10:3358)
Selected 172.16.120.11(172.16.120.11:3358) as a new master.
172.16.120.11(172.16.120.11:3358): OK: Applying all logs succeeded.
172.16.120.11(172.16.120.11:3358): OK: Activated master IP address.
172.16.120.12(172.16.120.12:3358): OK: Slave started, replicating from 172.16.120.11(172.16.120.11:3358)
172.16.120.11(172.16.120.11:3358): Resetting slave info succeeded.
Master failover to 172.16.120.11(172.16.120.11:3358) completed successfully.
Sat Oct 10 14:25:19 2020 - [info] Sending mail..

slave-1


root@localhost 14:23:56 [dbms_monitor]> select * from monitor_delay;
+----+---------------------+
| id | ctime               |
+----+---------------------+
| 88 | 2020-10-10 12:21:17 |
| 90 | 2020-10-10 14:22:29 |
+----+---------------------+
2 rows in set (0.00 sec)

[用例测试] master挂了, 且slave也有问题5(部分slave sql_thread error)

master挂了, 在此之前slave-1 sql_thread error了

ping_type=CONNECT

启动manager

Sat Oct 10 14:34:42 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 14:34:43 2020 - [info] GTID failover mode = 1
Sat Oct 10 14:34:43 2020 - [info] Dead Servers:
Sat Oct 10 14:34:43 2020 - [info] Alive Servers:
Sat Oct 10 14:34:43 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:34:43 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 14:34:43 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 14:34:43 2020 - [info] Alive Slaves:
Sat Oct 10 14:34:43 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:34:43 2020 - [info]     GTID ON
Sat Oct 10 14:34:43 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:34:43 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:34:43 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.31-34-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 14:34:43 2020 - [info]     GTID ON
Sat Oct 10 14:34:43 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:34:43 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 14:34:43 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 14:34:43 2020 - [info] Checking slave configurations..
Sat Oct 10 14:34:43 2020 - [info] Checking replication filtering settings..
Sat Oct 10 14:34:43 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 14:34:43 2020 - [info]  Replication filtering check ok.
Sat Oct 10 14:34:43 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 14:34:43 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 14:34:43 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 14:34:43 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 14:34:43 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 14:34:43 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 14:34:43 2020 - [info]  OK.
Sat Oct 10 14:34:43 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 14:34:43 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 14:34:43 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 14:34:43 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 14:34:43 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..

制造slave-1 sql_thread error

在master创建表

root@localhost 14:35:55 [dbms_monitor]> create table make_error(id int not null auto_increment primary key);
Query OK, 0 rows affected (0.02 sec)

在slave-1删除make_error表

root@localhost 14:38:10 [dbms_monitor]> set global super_read_only=0;
Query OK, 0 rows affected (0.00 sec)

root@localhost 14:38:14 [dbms_monitor]> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

root@localhost 14:38:17 [dbms_monitor]> drop table make_error;
Query OK, 0 rows affected (0.01 sec)

root@localhost 14:38:20 [dbms_monitor]> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec)

在master删除make_error表(slave-1 sql_thread会报错)

root@localhost 14:36:28 [dbms_monitor]> drop table make_error;
Query OK, 0 rows affected (0.01 sec)

查看slave-1复制状态

root@localhost 14:38:26 [dbms_monitor]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.120.10
                  Master_User: repler
                  Master_Port: 3358
                Connect_Retry: 1
              Master_Log_File: mysql-bin.000017
          Read_Master_Log_Pos: 620
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 589
        Relay_Master_Log_File: mysql-bin.000017
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1051
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '44a4ea53-fcad-11ea-bd16-0050563b7b42:22356' at master log mysql-bin.000017, end_log_pos 620. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 416
              Relay_Log_Space: 1000
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1051
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '44a4ea53-fcad-11ea-bd16-0050563b7b42:22356' at master log mysql-bin.000017, end_log_pos 620. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 120103358
                  Master_UUID: 44a4ea53-fcad-11ea-bd16-0050563b7b42
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 201010 14:39:25
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:22355-22356
            Executed_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-22355,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

此时manager仍然正常运行

关闭master

root@localhost 14:39:25 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

结论: 会触发failover, 但failover失败

Sat Oct 10 14:40:59 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=172.16.120.10;port=3358;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '172.16.120.10' (111) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.
2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 14:40:59 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=CONNECT
Sat Oct 10 14:40:59 2020 - [info] Executing SSH check script: exit 0
Sat Oct 10 14:40:59 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 14:40:59 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 14:41:02 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 14:41:02 2020 - [warning] Connection failed 2 time(s)..
Sat Oct 10 14:41:05 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 14:41:05 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 14:41:08 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 14:41:08 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 14:41:08 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 14:41:08 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 14:41:08 2020 - [warning] SSH is reachable.
Sat Oct 10 14:41:08 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 14:41:08 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 14:41:08 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 14:41:08 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 14:41:09 2020 - [error][/usr/local/share/perl5/MHA/Server.pm, ln935] SQL Thread is stopped(error) on 172.16.120.11(172.16.120.11:3358)! Errno:1051, Error:Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '44a4ea53-fcad-11ea-bd16-0050563b7b42:22356' at master log mysql-bin.000017, end_log_pos 620. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Sat Oct 10 14:41:09 2020 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln703] Server 172.16.120.11(172.16.120.11:3358) is alive, but does not work as a slave!
Sat Oct 10 14:41:09 2020 - [warning] Got Error:  at /usr/local/share/perl5/MHA/MasterMonitor.pm line 560.
Sat Oct 10 14:41:09 2020 - [info] Got exit code 1 (Not master dead).

ping_type=INSERT

启动manager

Sat Oct 10 15:54:20 2020 - [info] MHA::MasterMonitor version 0.58.
Sat Oct 10 15:54:21 2020 - [info] GTID failover mode = 1
Sat Oct 10 15:54:21 2020 - [info] Dead Servers:
Sat Oct 10 15:54:21 2020 - [info] Alive Servers:
Sat Oct 10 15:54:21 2020 - [info]   172.16.120.10(172.16.120.10:3358)
Sat Oct 10 15:54:21 2020 - [info]   172.16.120.11(172.16.120.11:3358)
Sat Oct 10 15:54:21 2020 - [info]   172.16.120.12(172.16.120.12:3358)
Sat Oct 10 15:54:21 2020 - [info] Alive Slaves:
Sat Oct 10 15:54:21 2020 - [info]   172.16.120.11(172.16.120.11:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 15:54:21 2020 - [info]     GTID ON
Sat Oct 10 15:54:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 15:54:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 15:54:21 2020 - [info]   172.16.120.12(172.16.120.12:3358)  Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Sat Oct 10 15:54:21 2020 - [info]     GTID ON
Sat Oct 10 15:54:21 2020 - [info]     Replicating from 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 15:54:21 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 10 15:54:21 2020 - [info] Current Alive Master: 172.16.120.10(172.16.120.10:3358)
Sat Oct 10 15:54:21 2020 - [info] Checking slave configurations..
Sat Oct 10 15:54:21 2020 - [info] Checking replication filtering settings..
Sat Oct 10 15:54:21 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 10 15:54:21 2020 - [info]  Replication filtering check ok.
Sat Oct 10 15:54:21 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Oct 10 15:54:21 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 10 15:54:21 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Sat Oct 10 15:54:21 2020 - [info] 
172.16.120.10(172.16.120.10:3358) (current master)
 +--172.16.120.11(172.16.120.11:3358)
 +--172.16.120.12(172.16.120.12:3358)

Sat Oct 10 15:54:21 2020 - [info] Checking master_ip_failover_script status:
Sat Oct 10 15:54:21 2020 - [info]   /etc/masterha/scripts/master_ip_failover_vip --vip=172.16.120.128 --command=status --ssh_user=root --orig_master_host=172.16.120.10 --orig_master_ip=172.16.120.10 --orig_master_port=3358 
Sat Oct 10 15:54:21 2020 - [info]  OK.
Sat Oct 10 15:54:21 2020 - [warning] shutdown_script is not defined.
Sat Oct 10 15:54:21 2020 - [info] Set master ping interval 3 seconds.
Sat Oct 10 15:54:21 2020 - [info] Set secondary check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12
Sat Oct 10 15:54:21 2020 - [info] Starting ping health check on 172.16.120.10(172.16.120.10:3358)..
Sat Oct 10 15:54:21 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..

准备工作省略, slave-1 sql_thread报错

root@localhost 15:55:13 [dbms_monitor]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.120.10
                  Master_User: repler
                  Master_Port: 3358
                Connect_Retry: 1
              Master_Log_File: mysql-bin.000019
          Read_Master_Log_Pos: 16331
               Relay_Log_File: mysql-relay-bin.000007
                Relay_Log_Pos: 14926
        Relay_Master_Log_File: mysql-bin.000019
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1051
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '44a4ea53-fcad-11ea-bd16-0050563b7b42:22419' at master log mysql-bin.000019, end_log_pos 14917. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 14713
              Relay_Log_Space: 19342
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1051
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '44a4ea53-fcad-11ea-bd16-0050563b7b42:22419' at master log mysql-bin.000019, end_log_pos 14917. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 120103358
                  Master_UUID: 44a4ea53-fcad-11ea-bd16-0050563b7b42
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 201010 15:55:16
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:22356-22425
            Executed_Gtid_Set: 44a4ea53-fcad-11ea-bd16-0050563b7b42:1-22418,
45d1f02a-fcad-11ea-8a44-0050562f2198:1-27
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

关闭master

root@localhost 15:55:16 [dbms_monitor]> shutdown;
Query OK, 0 rows affected (0.00 sec)

结论: 会触发failover, 但failover失败

Sat Oct 10 15:56:03 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Sat Oct 10 15:56:03 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.120.11 -s 172.16.120.12  --user=root  --master_host=172.16.120.10  --master_ip=172.16.120.10  --master_port=3358 --master_user=mha --master_password=xxx --ping_type=INSERT
Sat Oct 10 15:56:03 2020 - [info] Executing SSH check script: exit 0
Sat Oct 10 15:56:04 2020 - [info] HealthCheck: SSH to 172.16.120.10 is reachable.
Monitoring server 172.16.120.11 is reachable, Master is not reachable from 172.16.120.11. OK.
Monitoring server 172.16.120.12 is reachable, Master is not reachable from 172.16.120.12. OK.
Sat Oct 10 15:56:04 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sat Oct 10 15:56:06 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 15:56:06 2020 - [warning] Connection failed 2 time(s)..
Sat Oct 10 15:56:09 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 15:56:09 2020 - [warning] Connection failed 3 time(s)..
Sat Oct 10 15:56:12 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.120.10' (111))
Sat Oct 10 15:56:12 2020 - [warning] Connection failed 4 time(s)..
Sat Oct 10 15:56:12 2020 - [warning] Master is not reachable from health checker!
Sat Oct 10 15:56:12 2020 - [warning] Master 172.16.120.10(172.16.120.10:3358) is not reachable!
Sat Oct 10 15:56:12 2020 - [warning] SSH is reachable.
Sat Oct 10 15:56:12 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_new.cnf again, and trying to connect to all servers to check server status..
Sat Oct 10 15:56:12 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..
Sat Oct 10 15:56:12 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 15:56:12 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_new.cnf..
Sat Oct 10 15:56:13 2020 - [error][/usr/local/share/perl5/MHA/Server.pm, ln935] SQL Thread is stopped(error) on 172.16.120.11(172.16.120.11:3358)! Errno:1051, Error:Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '44a4ea53-fcad-11ea-bd16-0050563b7b42:22419' at master log mysql-bin.000019, end_log_pos 14917. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Sat Oct 10 15:56:13 2020 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln703] Server 172.16.120.11(172.16.120.11:3358) is alive, but does not work as a slave!
Sat Oct 10 15:56:13 2020 - [warning] Got Error:  at /usr/local/share/perl5/MHA/MasterMonitor.pm line 560.
Sat Oct 10 15:56:13 2020 - [info] Got exit code 1 (Not master dead).