MHA故障模拟与恢复

一 MHA集群部署

参考之前写的文章

MHA 高可用部署-CSDN博客

二 故障模拟

① 关闭主服务器master上的MySQL服务,看VIP是否漂移到从服务节点
[root@mha15 data]# ps -ef |grep mysql
root      12617   9903  0 Apr28 pts/0    00:00:00 /bin/sh ./mysqld_safe --defaults-file=/app/data/etc/my.cnf
mysql     14402  12617  0 Apr28 pts/0    00:13:34 /app/mysql/bin/mysqld --defaults-file=/app/data/etc/my.cnf --basedir=/app/mysql/ --datadir=/app/data/data --plugin-dir=/app/mysql//lib/plugin --user=mysql --log-erro
r=/app/data/log/error.log --open-files-limit=655350 --pid-file=/app/data/sock/mysql.pid --socket=/app/data/sock/mysql.sock --port=3306root      14475   9903  0 Apr28 pts/0    00:00:00 ./mysql -uroot -p -S /app/data/sock/mysql.sock
root      40176  17642  0 03:03 pts/1    00:00:00 grep --color=auto mysql
[root@mha15 data]# kill -9 12617 14402
[root@mha15 data]# ps -ef |grep mysql
root      14475   9903  0 Apr28 pts/0    00:00:00 ./mysql -uroot -p -S /app/data/sock/mysql.sock
root      40195  17642  0 03:03 pts/1    00:00:00 grep --color=auto mysql

VIP已漂移到mha16节点上

2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:92:6d:fd brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.16/24 brd 192.168.10.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet 192.168.10.100/24 brd 192.168.10.255 scope global secondary ens33:1
       valid_lft forever preferred_lft forever
    inet6 fe80::b6cb:27dd:5b53:f57f/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::32ad:80bf:c2e0:633f/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::6b4c:646e:e677:d4ac/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
② 查看日志,已完成新master的拉起
查看manager日志
tail -f /var/log/masterha/app1/manager.log

Mon Apr 29 04:20:20 2024 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Mon Apr 29 04:20:20 2024 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.10.16 -s 192.168.10.17  --user=root  --master_host=192.168.10.15  --master_ip=192.168.10
.15  --master_port=3306 --master_user=mha --master_password=123456 --ping_type=SELECTMon Apr 29 04:20:20 2024 - [info] Executing SSH check script: exit 0
Mon Apr 29 04:20:21 2024 - [info] HealthCheck: SSH to 192.168.10.15 is reachable.
Monitoring server 192.168.10.16 is reachable, Master is not reachable from 192.168.10.16. OK.
Monitoring server 192.168.10.17 is reachable, Master is not reachable from 192.168.10.17. OK.
Mon Apr 29 04:20:21 2024 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Mon Apr 29 04:20:21 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.15' (111))
Mon Apr 29 04:20:21 2024 - [warning] Connection failed 2 time(s)..
Mon Apr 29 04:20:22 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.15' (111))
Mon Apr 29 04:20:22 2024 - [warning] Connection failed 3 time(s)..
Mon Apr 29 04:20:23 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.15' (111))
Mon Apr 29 04:20:23 2024 - [warning] Connection failed 4 time(s)..
Mon Apr 29 04:20:23 2024 - [warning] Master is not reachable from health checker!
Mon Apr 29 04:20:23 2024 - [warning] Master 192.168.10.15(192.168.10.15:3306) is not reachable!
Mon Apr 29 04:20:23 2024 - [warning] SSH is reachable.
Mon Apr 29 04:20:23 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server sta
tus..Mon Apr 29 04:20:23 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Apr 29 04:20:23 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Apr 29 04:20:23 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Apr 29 04:20:24 2024 - [info] GTID failover mode = 1
Mon Apr 29 04:20:24 2024 - [info] Dead Servers:
Mon Apr 29 04:20:24 2024 - [info]   192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:24 2024 - [info] Alive Servers:
Mon Apr 29 04:20:24 2024 - [info]   192.168.10.16(192.168.10.16:3306)
Mon Apr 29 04:20:24 2024 - [info]   192.168.10.17(192.168.10.17:3306)
Mon Apr 29 04:20:24 2024 - [info] Alive Slaves:
Mon Apr 29 04:20:24 2024 - [info]   192.168.10.16(192.168.10.16:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:24 2024 - [info]     GTID ON
Mon Apr 29 04:20:24 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:24 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:24 2024 - [info]   192.168.10.17(192.168.10.17:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:24 2024 - [info]     GTID ON
Mon Apr 29 04:20:24 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:24 2024 - [info] Checking slave configurations..
Mon Apr 29 04:20:24 2024 - [info] Checking replication filtering settings..
Mon Apr 29 04:20:24 2024 - [info]  Replication filtering check ok.
Mon Apr 29 04:20:24 2024 - [info] Master is down!
Mon Apr 29 04:20:24 2024 - [info] Terminating monitoring script.
Mon Apr 29 04:20:24 2024 - [info] Got exit code 20 (Master dead).
Mon Apr 29 04:20:24 2024 - [info] MHA::MasterFailover version 0.58.
Mon Apr 29 04:20:24 2024 - [info] Starting master failover.
Mon Apr 29 04:20:24 2024 - [info] 
Mon Apr 29 04:20:24 2024 - [info] * Phase 1: Configuration Check Phase..
Mon Apr 29 04:20:24 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] GTID failover mode = 1
Mon Apr 29 04:20:26 2024 - [info] Dead Servers:
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] Checking master reachability via MySQL(double check)...
Mon Apr 29 04:20:26 2024 - [info]  ok.
Mon Apr 29 04:20:26 2024 - [info] Alive Servers:
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.16(192.168.10.16:3306)
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.17(192.168.10.17:3306)
Mon Apr 29 04:20:26 2024 - [info] Alive Slaves:
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.16(192.168.10.16:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info]     GTID ON
Mon Apr 29 04:20:26 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.17(192.168.10.17:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info]     GTID ON
Mon Apr 29 04:20:26 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] Starting GTID based failover.
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Apr 29 04:20:26 2024 - [info] Executing master IP deactivation script:
Mon Apr 29 04:20:26 2024 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=192.168.10.15 --orig_master_ip=192.168.10.15 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig ens33:1  down==/sbin/ifconfig ens33:1 192.168.10.100/24===

Disabling the VIP on old master: 192.168.10.15 
Mon Apr 29 04:20:26 2024 - [info]  done.
Mon Apr 29 04:20:26 2024 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Apr 29 04:20:26 2024 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] * Phase 3: Master Recovery Phase..
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:197
Mon Apr 29 04:20:26 2024 - [info] Retrieved Gtid Set: 7cb7532a-052d-11ef-9f84-000c2962bd90:8-24
Mon Apr 29 04:20:26 2024 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.16(192.168.10.16:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info]     GTID ON
Mon Apr 29 04:20:26 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.17(192.168.10.17:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info]     GTID ON
Mon Apr 29 04:20:26 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:197
Mon Apr 29 04:20:26 2024 - [info] Retrieved Gtid Set: 7cb7532a-052d-11ef-9f84-000c2962bd90:8-24
Mon Apr 29 04:20:26 2024 - [info] Oldest slaves:
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.16(192.168.10.16:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info]     GTID ON
Mon Apr 29 04:20:26 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.17(192.168.10.17:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info]     GTID ON
Mon Apr 29 04:20:26 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] * Phase 3.3: Determining New Master Phase..
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] Searching new master from slaves..
Mon Apr 29 04:20:26 2024 - [info]  Candidate masters from the configuration file:
Mon Apr 29 04:20:26 2024 - [info]   192.168.10.16(192.168.10.16:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 04:20:26 2024 - [info]     GTID ON
Mon Apr 29 04:20:26 2024 - [info]     Replicating from 192.168.10.15(192.168.10.15:3306)
Mon Apr 29 04:20:26 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Apr 29 04:20:26 2024 - [info]  Non-candidate masters:
Mon Apr 29 04:20:26 2024 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Mon Apr 29 04:20:26 2024 - [info] New master is 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 04:20:26 2024 - [info] Starting master failover..
Mon Apr 29 04:20:26 2024 - [info] 
From:
192.168.10.15(192.168.10.15:3306) (current master)
 +--192.168.10.16(192.168.10.16:3306)
 +--192.168.10.17(192.168.10.17:3306)

To:
192.168.10.16(192.168.10.16:3306) (new master)
 +--192.168.10.17(192.168.10.17:3306)
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] * Phase 3.3: New Master Recovery Phase..
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info]  Waiting all logs to be applied.. 
Mon Apr 29 04:20:26 2024 - [info]   done.
Mon Apr 29 04:20:26 2024 - [info] Getting new master's binlog name and position..
Mon Apr 29 04:20:26 2024 - [info]  mysql-bin.000001:7098
Mon Apr 29 04:20:26 2024 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.16', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='mysla
ve', MASTER_PASSWORD='xxx';Mon Apr 29 04:20:26 2024 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 7098, 6f4c3822-052d-11ef-b559-000c29926dfd:1-12,
7cb7532a-052d-11ef-9f84-000c2962bd90:8-24
Mon Apr 29 04:20:26 2024 - [info] Executing master IP activate script:
Mon Apr 29 04:20:26 2024 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.10.15 --orig_master_ip=192.168.10.15 --orig_master_port=3306 --new_master_host=192.168
.10.16 --new_master_ip=192.168.10.16 --new_master_port=3306 --new_master_user='mha'   --new_master_password=xxx

IN SCRIPT TEST====/sbin/ifconfig ens33:1  down==/sbin/ifconfig ens33:1 192.168.10.100/24===

Enabling the VIP - 192.168.10.100/24 on the new master - 192.168.10.16 
Mon Apr 29 04:20:26 2024 - [info]  OK.
Mon Apr 29 04:20:26 2024 - [info] Setting read_only=0 on 192.168.10.16(192.168.10.16:3306)..
Mon Apr 29 04:20:26 2024 - [info]  ok.
Mon Apr 29 04:20:26 2024 - [info] ** Finished master recovery successfully.
Mon Apr 29 04:20:26 2024 - [info] * Phase 3: Master Recovery Phase completed.
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] * Phase 4: Slaves Recovery Phase..
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon Apr 29 04:20:26 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info] -- Slave recovery on host 192.168.10.17(192.168.10.17:3306) started, pid: 33731. Check tmp log /var/log/masterha/app1/192.168.10.17_3306_20240429042024.log if it takes time..
Mon Apr 29 04:20:27 2024 - [info] 
Mon Apr 29 04:20:27 2024 - [info] Log messages from 192.168.10.17 ...
Mon Apr 29 04:20:27 2024 - [info] 
Mon Apr 29 04:20:26 2024 - [info]  Resetting slave 192.168.10.17(192.168.10.17:3306) and starting replication from the new master 192.168.10.16(192.168.10.16:3306)..
Mon Apr 29 04:20:26 2024 - [info]  Executed CHANGE MASTER.
Mon Apr 29 04:20:26 2024 - [info]  Slave started.
Mon Apr 29 04:20:27 2024 - [info]  gtid_wait(6f4c3822-052d-11ef-b559-000c29926dfd:1-12,
7cb7532a-052d-11ef-9f84-000c2962bd90:8-24) completed on 192.168.10.17(192.168.10.17:3306). Executed 0 events.
Mon Apr 29 04:20:27 2024 - [info] End of log messages from 192.168.10.17.
Mon Apr 29 04:20:27 2024 - [info] -- Slave on host 192.168.10.17(192.168.10.17:3306) started.
Mon Apr 29 04:20:27 2024 - [info] All new slave servers recovered successfully.
Mon Apr 29 04:20:27 2024 - [info] 
Mon Apr 29 04:20:27 2024 - [info] * Phase 5: New master cleanup phase..
Mon Apr 29 04:20:27 2024 - [info] 
Mon Apr 29 04:20:27 2024 - [info] Resetting slave info on the new master..
Mon Apr 29 04:20:27 2024 - [info]  192.168.10.16: Resetting slave info succeeded.
Mon Apr 29 04:20:27 2024 - [info] Master failover to 192.168.10.16(192.168.10.16:3306) completed successfully.
Mon Apr 29 04:20:27 2024 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .
Mon Apr 29 04:20:27 2024 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.10.15(192.168.10.15:3306) to 192.168.10.16(192.168.10.16:3306) succeeded

Master 192.168.10.15(192.168.10.15:3306) is down!

Check MHA Manager logs at mhamaster:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.10.15(192.168.10.15:3306)
Selected 192.168.10.16(192.168.10.16:3306) as a new master.
192.168.10.16(192.168.10.16:3306): OK: Applying all logs succeeded.
192.168.10.16(192.168.10.16:3306): OK: Activated master IP address.
192.168.10.17(192.168.10.17:3306): OK: Slave started, replicating from 192.168.10.16(192.168.10.16:3306)
192.168.10.16(192.168.10.16:3306): Resetting slave info succeeded.
Master failover to 192.168.10.16(192.168.10.16:3306) completed successfully.
③ 故障转移完成后,manager将会自动停止,此时使用masterha_check_status会报错
[root@mhamaster masterha]# masterha_check_status -conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
[1]+  Done                    nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
[root@mhamaster masterha]# 
④ 此时mha16节点为master,mha17为从节点
mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.10.16
                  Master_User: myslave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 7098
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 420
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 7098
              Relay_Log_Space: 624
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 203148916
                  Master_UUID: 6f4c3822-052d-11ef-b559-000c29926dfd
             Master_Info_File: /app/data/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 62decfb0-052d-11ef-9a98-000c295ee570:1-8,
6f4c3822-052d-11ef-b559-000c29926dfd:1-12,
7cb7532a-052d-11ef-9f84-000c2962bd90:8-24
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)
⑤ 在新主节点更新数据,验证同步

在mha16上更新数据

mysql> select * from test123;
+------+
| id   |
+------+
|    1 |
+------+
1 row in set (0.00 sec)

mysql> insert into test123 values(2);
Query OK, 1 row affected (0.01 sec)

mysql> select * from test123;
+------+
| id   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

在mha17上验证数据

mysql> use test;
Database changed
mysql> select * from test123;
+------+
| id   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.01 sec)

三 故障恢复

原有 master 节点故障后,基于新master 节点的备份恢复数据后,将其配置为新的 master 的从节点。

① 数据库的数据恢复

原来mha16已经成为了新的主节点,对其进行完全备份

[root@mha16 app]# mkdir -p /app/backup
[root@mha16 backup]# mysqldump -S /app/data/sock/mysql.sock -p --source-data=2 --single-transaction --set-gtid-purged=OFF --all-databases > /app/backup/alltest.sql
Enter password: 
[root@mha16 backup]# 
[root@mha16 backup]# scp /app/backup/alltest.sql root@192.168.10.15:/app
alltest.sql                                                                                                                                                                          100% 1267KB  28.1MB/s   00:00    
[root@mha16 backup]#
②在新节点上安装MySQL或启动已有的MySQL
[root@mha15 data]# mysqld_safe --defaults-file=/app/data/etc/my.cnf &
[root@mha15 app]# mysql -S /app/data/sock/mysql.sock -p <alltest.sql
Enter password: 
[root@mha15 app]#
③ 查看备份文件中二进制日志和位置
--
-- Position to start replication or point-in-time recovery from
--

-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=7376;

--
-- Current Database: `mysql`

④ 在新节点上进行同步配置
mysql> change master to master_host='192.168.10.16',master_user='myslave',master_port=3306,master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=7376;
Query OK, 0 rows affected, 9 warnings (0.05 sec)

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.10.16
                  Master_User: myslave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 7376
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 326
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
⑤ 将新恢复的节点设置为只读模式
set global read_only=1;

⑥ 再次进行节点复制检测

[root@mhamaster masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Mon Apr 29 05:34:16 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Apr 29 05:34:16 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Apr 29 05:34:16 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Apr 29 05:34:16 2024 - [info] MHA::MasterMonitor version 0.58.
Mon Apr 29 05:34:17 2024 - [info] GTID failover mode = 1
Mon Apr 29 05:34:17 2024 - [info] Dead Servers:
Mon Apr 29 05:34:17 2024 - [info] Alive Servers:
Mon Apr 29 05:34:17 2024 - [info]   192.168.10.16(192.168.10.16:3306)
Mon Apr 29 05:34:17 2024 - [info]   192.168.10.17(192.168.10.17:3306)
Mon Apr 29 05:34:17 2024 - [info] Alive Slaves:
Mon Apr 29 05:34:17 2024 - [info]   192.168.10.17(192.168.10.17:3306)  Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Mon Apr 29 05:34:17 2024 - [info]     GTID ON
Mon Apr 29 05:34:17 2024 - [info]     Replicating from 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 05:34:17 2024 - [info] Current Alive Master: 192.168.10.16(192.168.10.16:3306)
Mon Apr 29 05:34:17 2024 - [info] Checking slave configurations..
Mon Apr 29 05:34:17 2024 - [info] Checking replication filtering settings..
Mon Apr 29 05:34:17 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Apr 29 05:34:17 2024 - [info]  Replication filtering check ok.
Mon Apr 29 05:34:17 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Apr 29 05:34:17 2024 - [info] Checking SSH publickey authentication settings on the current master..
Mon Apr 29 05:34:18 2024 - [info] HealthCheck: SSH to 192.168.10.16 is reachable.
Mon Apr 29 05:34:18 2024 - [info] 
192.168.10.16(192.168.10.16:3306) (current master)
 +--192.168.10.17(192.168.10.17:3306)

Mon Apr 29 05:34:18 2024 - [info] Checking replication health on 192.168.10.17..
Mon Apr 29 05:34:18 2024 - [info]  ok.
Mon Apr 29 05:34:18 2024 - [info] Checking master_ip_failover_script status:
Mon Apr 29 05:34:18 2024 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.10.16 --orig_master_ip=192.168.10.16 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig ens33:1  down==/sbin/ifconfig ens33:1 192.168.10.100/24===

Checking the Status of the script.. OK 
Mon Apr 29 05:34:18 2024 - [info]  OK.
Mon Apr 29 05:34:18 2024 - [warning] shutdown_script is not defined.
Mon Apr 29 05:34:18 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
⑦ 启动manager,并检测状态
[root@mhamaster masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[1] 33786
[root@mhamaster masterha]# masterha_check_status -conf=/etc/masterha/app1.cnf
app1 (pid:33786) is running(0:PING_OK), master:192.168.10.16

四 故障转换及恢复中的注意事项

 

1、在生产环境中,主节点挂掉后,第一时间对从节点进行备份,用备份文件手动将原主节点提升为从节点,并指明从哪一个日志文件的位置开始复制。

2、每一次自动完成转换后,进行replication health检测。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值