一 MHA集群部署
参考之前写的文章
二 故障模拟
① 关闭主服务器master上的MySQL服务,看VIP是否漂移到从服务节点
[root@mha15 data]# ps -ef |grep mysql
root 12617 9903 0 Apr28 pts/0 00:00:00 /bin/sh ./mysqld_safe --defaults-file=/app/data/etc/my.cnf
mysql 14402 12617 0 Apr28 pts/0 00:13:34 /app/mysql/bin/mysqld --defaults-file=/app/data/etc/my.cnf --basedir=/app/mysql/ --datadir=/app/data/data --plugin-dir=/app/mysql//lib/plugin --user=mysql --log-erro
r=/app/data/log/error.log --open-files-limit=655350 --pid-file=/app/data/sock/mysql.pid --socket=/app/data/sock/mysql.sock --port=3306root 14475 9903 0 Apr28 pts/0 00:00:00 ./mysql -uroot -p -S /app/data/sock/mysql.sock
root 40176 17642 0 03:03 pts/1 00:00:00 grep --color=auto mysql
[root@mha15 data]# kill -9 12617 14402
[root@mha15 data]# ps -ef |grep mysql
root 14475 9903 0 Apr28 pts/0 00:00:00 ./mysql -uroot -p -S /app/data/sock/mysql.sock
root 40195 17642 0 03:03 pts/1 00:00:00 grep --color=auto mysql
VIP已漂移到mha16节点上
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:92:6d:fd brd ff:ff:ff:ff:ff:ff
inet 192.168.10.16/24 brd 192.168.10.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.10.100/24 brd 192.168.10.255 scope global secondary ens33:1
valid_lft forever preferred_lft forever
inet6 fe80::b6cb:27dd:5b53:f57f/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::32ad:80bf:c2e0:633f/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::6b4c:646e:e677:d4ac/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
② 查看日志,已完成新master的拉起
查看manager日志
tail -f /var/log/masterha/app1/manager.log
Mon Apr 29 04:20:20 2024 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Mon Apr 29 04:20:20 2024 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.10.16 -s 192.168.10.17 --user=root --master_host=192.168.10.15 --master_ip=192.168.10
.15 --master_port=3306 --master_user=mha --master_password=123456 --ping_type=SELECTMon Apr 29 04:20:20 2024 - [info] Executing SSH check script: exit 0
Mon Apr 29 04:20:21 2024 - [info] HealthCheck: SSH to 192.168.10.15 is reachable.
Monitoring server 192.168.10.16 is reachable, Master is not reachable from 192.168.10.16. OK.
Monitoring server 192.168.10.17 is reachable, Master is not reachable from 192.168.10.17. OK.
Mon Apr 29 04:20:21 2024 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Mon Apr 29 04:20:21 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.15' (111))
Mon Apr 29 04:20:21 2024 - [warning] Connection failed 2 time(s)..
Mon Apr 29 04:20:22 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.15' (111))
Mon Apr 29 04:20:22 2024 - [warning] Connection failed 3 time(s)..
Mon Apr 29 04:20:23 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.15' (111))
Mon Apr 29 04:20:23 2024 - [warning] Connection failed 4 time(s)..
Mon Apr 29 04:20:23 2024 - [warning] Master is not reachable from health checker!
Mon Apr 29 04:20:23 2024 - [warning] Master 192.168.10.15(192.168.10.15:3306) is not reachable!
Mon Apr 29 04:20:23 2024 - [warning] SSH is reachable.
Mon Apr 29 04:20:23 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server sta
tus..Mon Apr 29 04:20:23 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Apr 29 04:20:23 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Apr 29 04:20:23 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Apr 29 04:20:24 2024 - [info] GTID failover mode = 1
Mon Apr 29 04:20:24 2024 - [info] Dead Servers:
Mon Apr 29 04:20:24 2024 - [info] 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:24 2024 - [info] Alive Servers:
Mon Apr 29 04:20:24 2024 - [info] 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 04:20:24 2024 - [info] 192.168.10.17(192.168.10.17:3306)
Mon Apr 29 04:20:24 2024 - [info] Alive Slaves:
Mon Apr 29 04:20:24 2024 - [info] 192.168.10.16(192.168.10.16:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:24 2024 - [info] GTID ON
Mon Apr 29 04:20:24 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:24 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:24 2024 - [info] 192.168.10.17(192.168.10.17:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:24 2024 - [info] GTID ON
Mon Apr 29 04:20:24 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:24 2024 - [info] Checking slave configurations..
Mon Apr 29 04:20:24 2024 - [info] Checking replication filtering settings..
Mon Apr 29 04:20:24 2024 - [info] Replication filtering check ok.
Mon Apr 29 04:20:24 2024 - [info] Master is down!
Mon Apr 29 04:20:24 2024 - [info] Terminating monitoring script.
Mon Apr 29 04:20:24 2024 - [info] Got exit code 20 (Master dead).
Mon Apr 29 04:20:24 2024 - [info] MHA::MasterFailover version 0.58.
Mon Apr 29 04:20:24 2024 - [info] Starting master failover.
Mon Apr 29 04:20:24 2024 - [info]
Mon Apr 29 04:20:24 2024 - [info] * Phase 1: Configuration Check Phase..
Mon Apr 29 04:20:24 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] GTID failover mode = 1
Mon Apr 29 04:20:26 2024 - [info] Dead Servers:
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] Checking master reachability via MySQL(double check)...
Mon Apr 29 04:20:26 2024 - [info] ok.
Mon Apr 29 04:20:26 2024 - [info] Alive Servers:
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.17(192.168.10.17:3306)
Mon Apr 29 04:20:26 2024 - [info] Alive Slaves:
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.16(192.168.10.16:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info] GTID ON
Mon Apr 29 04:20:26 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.17(192.168.10.17:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info] GTID ON
Mon Apr 29 04:20:26 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] Starting GTID based failover.
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Apr 29 04:20:26 2024 - [info] Executing master IP deactivation script:
Mon Apr 29 04:20:26 2024 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.10.15 --orig_master_ip=192.168.10.15 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.10.100/24===
Disabling the VIP on old master: 192.168.10.15
Mon Apr 29 04:20:26 2024 - [info] done.
Mon Apr 29 04:20:26 2024 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Apr 29 04:20:26 2024 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] * Phase 3: Master Recovery Phase..
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:197
Mon Apr 29 04:20:26 2024 - [info] Retrieved Gtid Set: 7cb7532a-052d-11ef-9f84-000c2962bd90:8-24
Mon Apr 29 04:20:26 2024 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.16(192.168.10.16:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info] GTID ON
Mon Apr 29 04:20:26 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.17(192.168.10.17:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info] GTID ON
Mon Apr 29 04:20:26 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:197
Mon Apr 29 04:20:26 2024 - [info] Retrieved Gtid Set: 7cb7532a-052d-11ef-9f84-000c2962bd90:8-24
Mon Apr 29 04:20:26 2024 - [info] Oldest slaves:
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.16(192.168.10.16:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info] GTID ON
Mon Apr 29 04:20:26 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.17(192.168.10.17:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info] GTID ON
Mon Apr 29 04:20:26 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] * Phase 3.3: Determining New Master Phase..
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] Searching new master from slaves..
Mon Apr 29 04:20:26 2024 - [info] Candidate masters from the configuration file:
Mon Apr 29 04:20:26 2024 - [info] 192.168.10.16(192.168.10.16:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info] GTID ON
Mon Apr 29 04:20:26 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:26 2024 - [info] Non-candidate masters:
Mon Apr 29 04:20:26 2024 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Mon Apr 29 04:20:26 2024 - [info] New master is 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 04:20:26 2024 - [info] Starting master failover..
Mon Apr 29 04:20:26 2024 - [info]
From:
192.168.10.15(192.168.10.15:3306) (current master)
+--192.168.10.16(192.168.10.16:3306)
+--192.168.10.17(192.168.10.17:3306)
To:
192.168.10.16(192.168.10.16:3306) (new master)
+--192.168.10.17(192.168.10.17:3306)
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] * Phase 3.3: New Master Recovery Phase..
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] Waiting all logs to be applied..
Mon Apr 29 04:20:26 2024 - [info] done.
Mon Apr 29 04:20:26 2024 - [info] Getting new master's binlog name and position..
Mon Apr 29 04:20:26 2024 - [info] mysql-bin.000001:7098
Mon Apr 29 04:20:26 2024 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.16', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='mysla
ve', MASTER_PASSWORD='xxx';Mon Apr 29 04:20:26 2024 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 7098, 6f4c3822-052d-11ef-b559-000c29926dfd:1-12,
7cb7532a-052d-11ef-9f84-000c2962bd90:8-24
Mon Apr 29 04:20:26 2024 - [info] Executing master IP activate script:
Mon Apr 29 04:20:26 2024 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.10.15 --orig_master_ip=192.168.10.15 --orig_master_port=3306 --new_master_host=192.168
.10.16 --new_master_ip=192.168.10.16 --new_master_port=3306 --new_master_user='mha' --new_master_password=xxx
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.10.100/24===
Enabling the VIP - 192.168.10.100/24 on the new master - 192.168.10.16
Mon Apr 29 04:20:26 2024 - [info] OK.
Mon Apr 29 04:20:26 2024 - [info] Setting read_only=0 on 192.168.10.16(192.168.10.16:3306)..
Mon Apr 29 04:20:26 2024 - [info] ok.
Mon Apr 29 04:20:26 2024 - [info] ** Finished master recovery successfully.
Mon Apr 29 04:20:26 2024 - [info] * Phase 3: Master Recovery Phase completed.
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] * Phase 4: Slaves Recovery Phase..
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon Apr 29 04:20:26 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] -- Slave recovery on host 192.168.10.17(192.168.10.17:3306) started, pid: 33731. Check tmp log /var/log/masterha/app1/192.168.10.17_3306_20240429042024.log if it takes time..
Mon Apr 29 04:20:27 2024 - [info]
Mon Apr 29 04:20:27 2024 - [info] Log messages from 192.168.10.17 ...
Mon Apr 29 04:20:27 2024 - [info]
Mon Apr 29 04:20:26 2024 - [info] Resetting slave 192.168.10.17(192.168.10.17:3306) and starting replication from the new master 192.168.10.16(192.168.10.16:3306)..
Mon Apr 29 04:20:26 2024 - [info] Executed CHANGE MASTER.
Mon Apr 29 04:20:26 2024 - [info] Slave started.
Mon Apr 29 04:20:27 2024 - [info] gtid_wait(6f4c3822-052d-11ef-b559-000c29926dfd:1-12,
7cb7532a-052d-11ef-9f84-000c2962bd90:8-24) completed on 192.168.10.17(192.168.10.17:3306). Executed 0 events.
Mon Apr 29 04:20:27 2024 - [info] End of log messages from 192.168.10.17.
Mon Apr 29 04:20:27 2024 - [info] -- Slave on host 192.168.10.17(192.168.10.17:3306) started.
Mon Apr 29 04:20:27 2024 - [info] All new slave servers recovered successfully.
Mon Apr 29 04:20:27 2024 - [info]
Mon Apr 29 04:20:27 2024 - [info] * Phase 5: New master cleanup phase..
Mon Apr 29 04:20:27 2024 - [info]
Mon Apr 29 04:20:27 2024 - [info] Resetting slave info on the new master..
Mon Apr 29 04:20:27 2024 - [info] 192.168.10.16: Resetting slave info succeeded.
Mon Apr 29 04:20:27 2024 - [info] Master failover to 192.168.10.16(192.168.10.16:3306) completed successfully.
Mon Apr 29 04:20:27 2024 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .
Mon Apr 29 04:20:27 2024 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.10.15(192.168.10.15:3306) to 192.168.10.16(192.168.10.16:3306) succeeded
Master 192.168.10.15(192.168.10.15:3306) is down!
Check MHA Manager logs at mhamaster:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.10.15(192.168.10.15:3306)
Selected 192.168.10.16(192.168.10.16:3306) as a new master.
192.168.10.16(192.168.10.16:3306): OK: Applying all logs succeeded.
192.168.10.16(192.168.10.16:3306): OK: Activated master IP address.
192.168.10.17(192.168.10.17:3306): OK: Slave started, replicating from 192.168.10.16(192.168.10.16:3306)
192.168.10.16(192.168.10.16:3306): Resetting slave info succeeded.
Master failover to 192.168.10.16(192.168.10.16:3306) completed successfully.
③ 故障转移完成后,manager将会自动停止,此时使用masterha_check_status会报错
[root@mhamaster masterha]# masterha_check_status -conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
[1]+ Done nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
[root@mhamaster masterha]#
④ 此时mha16节点为master,mha17为从节点
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 192.168.10.16
Master_User: myslave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 7098
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 420
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 7098
Relay_Log_Space: 624
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 203148916
Master_UUID: 6f4c3822-052d-11ef-b559-000c29926dfd
Master_Info_File: /app/data/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 62decfb0-052d-11ef-9a98-000c295ee570:1-8,
6f4c3822-052d-11ef-b559-000c29926dfd:1-12,
7cb7532a-052d-11ef-9f84-000c2962bd90:8-24
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
⑤ 在新主节点更新数据,验证同步
在mha16上更新数据
mysql> select * from test123;
+------+
| id |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
mysql> insert into test123 values(2);
Query OK, 1 row affected (0.01 sec)
mysql> select * from test123;
+------+
| id |
+------+
| 1 |
| 2 |
+------+
2 rows in set (0.00 sec)
在mha17上验证数据
mysql> use test;
Database changed
mysql> select * from test123;
+------+
| id |
+------+
| 1 |
| 2 |
+------+
2 rows in set (0.01 sec)
三 故障恢复
原有 master 节点故障后,基于新master 节点的备份恢复数据后,将其配置为新的 master 的从节点。
① 数据库的数据恢复
原来mha16已经成为了新的主节点,对其进行完全备份
[root@mha16 app]# mkdir -p /app/backup
[root@mha16 backup]# mysqldump -S /app/data/sock/mysql.sock -p --source-data=2 --single-transaction --set-gtid-purged=OFF --all-databases > /app/backup/alltest.sql
Enter password:
[root@mha16 backup]#
[root@mha16 backup]# scp /app/backup/alltest.sql root@192.168.10.15:/app
alltest.sql 100% 1267KB 28.1MB/s 00:00
[root@mha16 backup]#
②在新节点上安装MySQL或启动已有的MySQL
[root@mha15 data]# mysqld_safe --defaults-file=/app/data/etc/my.cnf &
[root@mha15 app]# mysql -S /app/data/sock/mysql.sock -p <alltest.sql
Enter password:
[root@mha15 app]#
③ 查看备份文件中二进制日志和位置
--
-- Position to start replication or point-in-time recovery from
--
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=7376;
--
-- Current Database: `mysql`
④ 在新节点上进行同步配置
mysql> change master to master_host='192.168.10.16',master_user='myslave',master_port=3306,master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=7376;
Query OK, 0 rows affected, 9 warnings (0.05 sec)
mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 192.168.10.16
Master_User: myslave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 7376
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 326
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
⑤ 将新恢复的节点设置为只读模式
set global read_only=1;
⑥ 再次进行节点复制检测
[root@mhamaster masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Mon Apr 29 05:34:16 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Apr 29 05:34:16 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Apr 29 05:34:16 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Apr 29 05:34:16 2024 - [info] MHA::MasterMonitor version 0.58.
Mon Apr 29 05:34:17 2024 - [info] GTID failover mode = 1
Mon Apr 29 05:34:17 2024 - [info] Dead Servers:
Mon Apr 29 05:34:17 2024 - [info] Alive Servers:
Mon Apr 29 05:34:17 2024 - [info] 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 05:34:17 2024 - [info] 192.168.10.17(192.168.10.17:3306)
Mon Apr 29 05:34:17 2024 - [info] Alive Slaves:
Mon Apr 29 05:34:17 2024 - [info] 192.168.10.17(192.168.10.17:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 05:34:17 2024 - [info] GTID ON
Mon Apr 29 05:34:17 2024 - [info] Replicating from 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 05:34:17 2024 - [info] Current Alive Master: 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 05:34:17 2024 - [info] Checking slave configurations..
Mon Apr 29 05:34:17 2024 - [info] Checking replication filtering settings..
Mon Apr 29 05:34:17 2024 - [info] binlog_do_db= , binlog_ignore_db=
Mon Apr 29 05:34:17 2024 - [info] Replication filtering check ok.
Mon Apr 29 05:34:17 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Apr 29 05:34:17 2024 - [info] Checking SSH publickey authentication settings on the current master..
Mon Apr 29 05:34:18 2024 - [info] HealthCheck: SSH to 192.168.10.16 is reachable.
Mon Apr 29 05:34:18 2024 - [info]
192.168.10.16(192.168.10.16:3306) (current master)
+--192.168.10.17(192.168.10.17:3306)
Mon Apr 29 05:34:18 2024 - [info] Checking replication health on 192.168.10.17..
Mon Apr 29 05:34:18 2024 - [info] ok.
Mon Apr 29 05:34:18 2024 - [info] Checking master_ip_failover_script status:
Mon Apr 29 05:34:18 2024 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.10.16 --orig_master_ip=192.168.10.16 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.10.100/24===
Checking the Status of the script.. OK
Mon Apr 29 05:34:18 2024 - [info] OK.
Mon Apr 29 05:34:18 2024 - [warning] shutdown_script is not defined.
Mon Apr 29 05:34:18 2024 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
⑦ 启动manager,并检测状态
[root@mhamaster masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[1] 33786
[root@mhamaster masterha]# masterha_check_status -conf=/etc/masterha/app1.cnf
app1 (pid:33786) is running(0:PING_OK), master:192.168.10.16
四 故障转换及恢复中的注意事项
1、在生产环境中,主节点挂掉后,第一时间对从节点进行备份,用备份文件手动将原主节点提升为从节点,并指明从哪一个日志文件的位置开始复制。
2、每一次自动完成转换后,进行replication health检测。