MySQL--MHA高可用及读写分离

不吃辣堡

已于 2024-06-03 17:29:48 修改

阅读量986

点赞数 40

分类专栏： MySQL 文章标签： mysql 数据库

于 2024-05-31 15:55:19 首次发布

本文链接：https://blog.csdn.net/weixin_41977045/article/details/139299350

版权

MySQL 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

一、什么是高可用

1.企业级高可用标准：全年无故障时间

全年无故障时间	全年故障时间	具体时间
99.9%	0.1%	525.6 min	keeplive+双主（切换需要人为干预）
99.99%	0.01%	52.56 min	MHA （半自动化）
99.999%	0.001%	5.256 min	PXC、MGR、MGC （自动化）
99.9999%	0.0001%	0.5256 min	自动化、云化、平台化

二、MHA介绍

1.MHA工作原理

1.监控

通过masterha_master_monitor，每隔ping_interval秒监测一次master心跳。如果监控不到心跳，一共给4次机会

2.选主

主库宕机谁来接管？
1. 所有从节点日志都是一致的，默认会以配置文件的顺序去选择一个新主。
2. 从节点日志不一致，自动选择最接近于主库的从库
3. 如果对于某节点设定了权重（candidate_master=1），权重节点会优先选择。
但是此节点日志量落后主库100M日志的话，也不会被选择。可以配合check_repl_delay=0，关闭日志量的检查，强制选择候选节点。

(1) ping_interval=1
#设置监控主库，发送ping包的时间间隔，尝试三次没有回应的时候自动进行failover
(2) candidate_master=1
#设置为候选master，如果设置该参数以后，发生主从切换以后将会将此从库提升为主库，即使这个主库不是集群中事件最新的slave
(3)check_repl_delay=0
#默认情况下如果一个slave落后master 100M的relay logs的话，MHA将不会选择该slave作为一个新的master，因为对于这个slave的恢复需要花费很长时间，通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时，这个参数对于设置了candidate_master=1的主机非常有用，因为这个候选主在切换的过程中一定是新的master

3.数据日志补偿

1）如果ssh通

各个从节点，通过save_binary_logs立即保存缺失部分的binlog到var/tmp/xxxx

2）如果ssh不通

从节点调用apply_diff_relay_logs，计算两个节点的relay log差异

4.故障转移

解除原油主从关系，构建新的主从关系

5.自动将故障节点，从配置文件剔除

6.MHA自杀

7.应用透明-vip

8.数据补偿补充方案：binlog-server

9.故障提醒：邮件、钉钉

2.MHA架构介绍

1主2从，master：db01   slave：db02   db03 ）：
MHA 高可用方案软件构成
Manager软件：选择一个从节点安装
Node软件：所有节点都要安装

3.MHA 软件构成

Manager工具包主要包括以下几个工具：
masterha_manger             启动MHA 
masterha_check_ssh      检查MHA的SSH配置状况 
masterha_check_repl         检查MySQL复制状况 
masterha_master_monitor     检测master是否宕机 
masterha_check_status       检测当前MHA运行状态 
masterha_master_switch  控制故障转移（自动或者手动）
masterha_conf_host      添加或删除配置的server信息

Node工具包主要包括以下几个工具：
这些工具通常由MHA Manager的脚本触发，无需人为操作
save_binary_logs            保存和复制master的二进制日志 
apply_diff_relay_logs       识别差异的中继日志事件并将其差异的事件应用于其他的
purge_relay_logs            清除中继日志（不会阻塞SQL线程）

4.MHA集群

三、MHA环境搭建

1.准备一主两从的环境

2.所有节点创建软连接

[root@localhost bin]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
[root@localhost bin]# ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
[root@localhost bin]# ls -l /usr/bin/mysql*
lrwxrwxrwx 1 root root 26 5月  30 15:27 /usr/bin/mysql -> /usr/local/mysql/bin/mysql
lrwxrwxrwx 1 root root 32 5月  30 15:27 /usr/bin/mysqlbinlog -> /usr/local/mysql/bin/mysqlbinlog

3.配置各节点互信（各节点之间无密码SSH登录）

db01：
rm -rf /root/.ssh 
ssh-keygen
cd /root/.ssh 
mv id_rsa.pub authorized_keys
scp  -r  /root/.ssh  192.168.20.120:/root 
scp  -r  /root/.ssh  192.168.20.231:/root 
各节点验证
db01:
ssh 192.168.20.132 date
ssh 192.168.20.120 date
ssh 192.168.20.231 date
db02:
ssh 192.168.20.132 date
ssh 192.168.20.120 date
ssh 192.168.20.231 date
db03:
ssh 192.168.20.132 date
ssh 192.168.20.120 date
ssh 192.168.20.231 date

4.下载mha软件

mha官网：https://code.google.com/archive/p/mysql-master-ha/
github下载地址mysql5.7以下可用：https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
mysql8.0：https://pan.baidu.com/s/1-yo1KjZZUvbHxrcI9Yl7-Q  提取码fr50

5.所有节点安装Node软件依赖包

[root@localhost opt]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
准备中...                          ################################# [100%]
正在升级/安装...
   1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
[root@localhost opt]#

6.创建mha需要的用户——mha和replmha

mysql> create user mha@'%' identified with mysql_native_password by'ok';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON *.* TO 'mha'@'%' WITH GRANT OPTION;
Query OK, 0 rows affected (0.01 sec)

mysql> select user,host,authentication_string,plugin from mysql.user;
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| user             | host      | authentication_string                                                  | plugin                |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| mha              | %         | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
| replmha          | %         | $A$005$)fcB"IXZmo{qr%)71hR72t5sUOD3H27kNo8uGnWX8/mkwadbMlpTdSyw9B | caching_sha2_password |
| mysql.infoschema | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.session    | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.sys        | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| repl             | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
| root             | localhost |                                                                        | caching_sha2_password |
| yizuo            | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
8 rows in set (0.00 sec)

具有复制权限的用户（repl）必须在所有节点上都创建一次，具有管理权限的用户也是一样，在从库做任何操作之前记得set sql_log_bin=0——>关闭binlog日志；

7.Manager软件安装（db03）

# 在管理节点安装依赖软件
yum install -y perl-Config-Tiny perl-Log-Dispatch  perl-Parallel-ForkManager
# 在管理节点安装mha4mysql-manager
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

8.配置文件准备(db03)

创建配置文件目录
 mkdir -p /etc/mha
创建日志目录
 mkdir -p /var/log/mha/app1
编辑mha配置文件
vim /etc/mha/app1.cnf

[server default]
manager_log=/var/log/mha/app1/manager        
manager_workdir=/var/log/mha/app1            
master_binlog_dir=/data/3306/binlog       
user=replmha                                   
password=ok                               
ping_interval=2
repl_password=ok
repl_user=mha
ssh_user=root                               
[server1]                                   
hostname=192.168.20.132
port=3306                                  
[server2]            
hostname=192.168.20.120
port=3306
[server3]
hostname=192.168.20.231
port=3306

MHA主要配置文件说明

         manager_workdir=/var/log/masterha/app1.log：设置manager的工作目录

         manager_log=/var/log/masterha/app1/manager.log：设置manager的日志文件，主日志文件，出了问题看这个

         master_binlog_dir=/data/mysql：设置master 保存binlog的位置，以便MHA可以找到master的日志，主库的二进制日志目录

         master_ip_failover_script= /usr/local/bin/master_ip_failover：设置自动failover时候的切换脚本

         master_ip_online_change_script= /usr/local/bin/master_ip_online_change：设置手动切换时候的切换脚本

         user=root：设置监控mysql的用户

         password=dayi123：设置监控mysql的用户，需要授权能够在manager节点远程登录

         ping_interval=1：设置监控主库，发送ping包的时间间隔，默认是3秒，尝试三次没有回应的时候自动进行railover

         remote_workdir=/tmp：设置远端mysql在发生切换时binlog的保存位置

         repl_user=repl ：设置mysql中用于复制的用户密码

         repl_password=replication：设置mysql中用于复制的用户

         report_script=/usr/local/send_report：设置发生切换后发送的报警的脚本

         shutdown_script=""：设置故障发生后关闭故障主机脚本（该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用）

         ssh_user=root //设置ssh的登录用户名

         candidate_master=1：在节点下设置，设置当前节点为候选的master

         slave check_repl_delay=0 :在节点配置下设置，默认情况下如果一个slave落后master 100M的relay logs的话，MHA将不会选择该slave作为一个新的master；这个选项对于对于设置了candidate_master=1的主机非常有用
————————————————
原文链接：https://blog.csdn.net/dayi_123/article/details/83690608

9.互信检查（db03）

[root@localhost opt]# masterha_check_ssh  --conf=/etc/mha/app1.cnf 
Thu May 30 16:25:13 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 30 16:25:13 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu May 30 16:25:13 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu May 30 16:25:13 2024 - [info] Starting SSH connection tests..
Thu May 30 16:25:14 2024 - [debug] 
Thu May 30 16:25:13 2024 - [debug]  Connecting via SSH from root@192.168.20.132(192.168.20.132:22) to root@192.168.20.120(192.168.20.120:22)..
Thu May 30 16:25:14 2024 - [debug]   ok.
Thu May 30 16:25:14 2024 - [debug]  Connecting via SSH from root@192.168.20.132(192.168.20.132:22) to root@192.168.20.231(192.168.20.231:22)..
Thu May 30 16:25:14 2024 - [debug]   ok.
Thu May 30 16:25:15 2024 - [debug] 
Thu May 30 16:25:14 2024 - [debug]  Connecting via SSH from root@192.168.20.120(192.168.20.120:22) to root@192.168.20.132(192.168.20.132:22)..
Thu May 30 16:25:14 2024 - [debug]   ok.
Thu May 30 16:25:14 2024 - [debug]  Connecting via SSH from root@192.168.20.120(192.168.20.120:22) to root@192.168.20.231(192.168.20.231:22)..
Thu May 30 16:25:15 2024 - [debug]   ok.
Thu May 30 16:25:16 2024 - [debug] 
Thu May 30 16:25:14 2024 - [debug]  Connecting via SSH from root@192.168.20.231(192.168.20.231:22) to root@192.168.20.132(192.168.20.132:22)..
Thu May 30 16:25:15 2024 - [debug]   ok.
Thu May 30 16:25:15 2024 - [debug]  Connecting via SSH from root@192.168.20.231(192.168.20.231:22) to root@192.168.20.120(192.168.20.120:22)..
Thu May 30 16:25:15 2024 - [debug]   ok.
Thu May 30 16:25:16 2024 - [info] All SSH connection tests passed successfully.

10.主从检查（db03）

masterha_check_repl  --conf=/etc/mha/app1.cnf

[root@localhost app1]# masterha_check_repl  --conf=/etc/mha/app1.cnf
Thu May 30 19:09:30 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 30 19:09:30 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu May 30 19:09:30 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu May 30 19:09:30 2024 - [info] MHA::MasterMonitor version 0.58.
Thu May 30 19:09:31 2024 - [info] GTID failover mode = 1
Thu May 30 19:09:31 2024 - [info] Dead Servers:
Thu May 30 19:09:31 2024 - [info] Alive Servers:
Thu May 30 19:09:31 2024 - [info]   192.168.20.132(192.168.20.132:3306)
Thu May 30 19:09:31 2024 - [info]   192.168.20.120(192.168.20.120:3306)
Thu May 30 19:09:31 2024 - [info]   192.168.20.231(192.168.20.231:3306)
Thu May 30 19:09:31 2024 - [info] Alive Slaves:
Thu May 30 19:09:31 2024 - [info]   192.168.20.120(192.168.20.120:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Thu May 30 19:09:31 2024 - [info]     GTID ON
Thu May 30 19:09:31 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Thu May 30 19:09:31 2024 - [info]   192.168.20.231(192.168.20.231:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Thu May 30 19:09:31 2024 - [info]     GTID ON
Thu May 30 19:09:31 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Thu May 30 19:09:31 2024 - [info] Current Alive Master: 192.168.20.132(192.168.20.132:3306)
Thu May 30 19:09:31 2024 - [info] Checking slave configurations..
Thu May 30 19:09:31 2024 - [info]  read_only=1 is not set on slave 192.168.20.120(192.168.20.120:3306).
Thu May 30 19:09:31 2024 - [info]  read_only=1 is not set on slave 192.168.20.231(192.168.20.231:3306).
Thu May 30 19:09:31 2024 - [info] Checking replication filtering settings..
Thu May 30 19:09:31 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu May 30 19:09:31 2024 - [info]  Replication filtering check ok.
Thu May 30 19:09:31 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu May 30 19:09:31 2024 - [info] Checking SSH publickey authentication settings on the current master..
Thu May 30 19:09:32 2024 - [info] HealthCheck: SSH to 192.168.20.132 is reachable.
Thu May 30 19:09:32 2024 - [info] 
192.168.20.132(192.168.20.132:3306) (current master)
 +--192.168.20.120(192.168.20.120:3306)
 +--192.168.20.231(192.168.20.231:3306)

Thu May 30 19:09:32 2024 - [info] Checking replication health on 192.168.20.120..
Thu May 30 19:09:32 2024 - [info]  ok.
Thu May 30 19:09:32 2024 - [info] Checking replication health on 192.168.20.231..
Thu May 30 19:09:32 2024 - [info]  ok.
Thu May 30 19:09:32 2024 - [warning] master_ip_failover_script is not defined.
Thu May 30 19:09:32 2024 - [warning] shutdown_script is not defined.
Thu May 30 19:09:32 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

1）主从检查踩过的坑——Attempt to reload DBD/mysql.pm aborted

[root@localhost data]# masterha_check_repl  --conf=/etc/mha/app1.cnf
Thu May 30 17:43:25 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 30 17:43:25 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu May 30 17:43:25 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu May 30 17:43:25 2024 - [info] MHA::MasterMonitor version 0.58.
Thu May 30 17:43:25 2024 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 37) line 3.

 at /usr/share/perl5/vendor_perl/MHA/DBHelper.pm line 208.
 at /usr/share/perl5/vendor_perl/MHA/Server.pm line 166.
Thu May 30 17:43:25 2024 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 37) line 3.

 at /usr/share/perl5/vendor_perl/MHA/DBHelper.pm line 208.
 at /usr/share/perl5/vendor_perl/MHA/Server.pm line 166.
Thu May 30 17:43:25 2024 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 37) line 3.

 at /usr/share/perl5/vendor_perl/MHA/DBHelper.pm line 208.
 at /usr/share/perl5/vendor_perl/MHA/Server.pm line 166.
Thu May 30 17:43:26 2024 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations
Thu May 30 17:43:26 2024 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Thu May 30 17:43:26 2024 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu May 30 17:43:26 2024 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

解决办法：
install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted 问题解决方案-CSDN博客

操作后出现新问题：

chmod 755 blib/arch/auto/DBD/mysql/mysql.so
Manifying 2 pod documents
  DVEEDEN/DBD-mysql-5.005.tar.gz
  /usr/bin/make -- OK
Running make test
"/usr/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- mysql.bs blib/arch/auto/DBD/mysql/mysql.bs 644
PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00base.t .............................. 1/6 # Driver version is 5.005
t/00base.t .............................. ok   
t/01caching_sha2_prime.t ................ ok   
t/05dbcreate.t .......................... 1/2 # Database 'test' accessible
t/05dbcreate.t .......................... ok   
t/10connect.t ........................... 1/? # mysql_clientinfo is: 8.0.20
# mysql_clientversion is: 80020
# mysql_serverversion is: 80020
# mysql_hostinfo is: Localhost via UNIX socket
# mysql_serverinfo is: 8.0.20
# mysql_stat is: Uptime: 64536  Threads: 4  Questions: 1664  Slow queries: 200  Opens: 667  Flush tables: 4  Open tables: 408  Queries per second avg: 0.025
# mysql_protoinfo is: 10
# SQL_DBMS_VER is 8.0.20
# Default storage engine is: InnoDB
t/10connect.t ........................... ok    
t/15reconnect.t ......................... ok     
t/16dbi-get_info.t ...................... ok   
t/17quote.t ............................. ok     
t/20createdrop.t ........................ ok   
t/25lockunlock.t ........................ ok     
t/29warnings.t .......................... ok     
t/30insertfetch.t ....................... ok    
t/31insertid.t .......................... ok     
t/32insert_error.t ...................... ok   
t/35limit.t ............................. ok       
t/35prepare.t ........................... ok     
t/40bindparam.t ......................... ok     
t/40bindparam2.t ........................ ok     
t/40bit.t ............................... ok     
t/40blobs.t ............................. ok     
t/40catalog.t ........................... ok     
t/40keyinfo.t ........................... ok   
t/40listfields.t ........................ ok     
t/40nulls.t ............................. ok     
t/40nulls_prepare.t ..................... ok    
t/40numrows.t ........................... ok     
t/40server_prepare.t .................... ok     
t/40server_prepare_crash.t .............. ok     
t/40server_prepare_error.t .............. ok   
t/40types.t ............................. ok     
t/41bindparam.t ......................... ok     
t/41blobs_prepare.t ..................... ok     
t/41int_min_max.t ....................... ok       
t/42bindparam.t ......................... ok     
t/43count_params.t ...................... ok     
t/50chopblanks.t ........................ ok     
t/50commit.t ............................ ok     
t/51bind_type_guessing.t ................ 1/98 DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
DBD::mysql::st execute failed: Data truncated for column 'nn' at row 1 at t/51bind_type_guessing.t line 114.
DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
DBD::mysql::st execute failed: Incorrect integer value: '+' for column 'nn' at row 1 at t/51bind_type_guessing.t line 114.
DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
DBD::mysql::st execute failed: Incorrect integer value: '.' for column 'nn' at row 1 at t/51bind_type_guessing.t line 114.
DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
DBD::mysql::st execute failed: Incorrect integer value: 'e5' for column 'nn' at row 1 at t/51bind_type_guessing.t line 114.
DBD::mysql::st execute failed: Data truncated for column 'dd' at row 1 at t/51bind_type_guessing.t line 136.
t/51bind_type_guessing.t ................ ok     
t/52comment.t ........................... ok     
t/53comment.t ........................... ok    
t/55utf8.t .............................. ok     
t/55utf8mb4.t ........................... ok   
t/56connattr.t .......................... ok   
t/57trackgtid.t ......................... skipped: GTID tracking not enabled
t/60leaks.t ............................. skipped: Skip $ENV{EXTENDED_TESTING} is not set
t/65segfault.t .......................... ok   
t/65types.t ............................. ok     
t/70takeimp.t ........................... ok     
t/71impdata.t ........................... ok     
t/75supported_sql.t ..................... ok     
t/76multi_statement.t ................... ok     
t/80procs.t ............................. ok     
t/81procs.t ............................. ok     
t/85init_command.t ...................... ok   
t/86_bug_36972.t ........................ ok     
t/87async.t ............................. ok     
t/88async-multi-stmts.t ................. ok   
t/89async-method-check.t ................ ok       
t/91errcheck.t .......................... ok   
t/92ssl_backronym_vulnerability.t ....... skipped: Server supports SSL connections, cannot test false-positive enforcement
t/92ssl_optional.t ...................... skipped: Server supports SSL connections, cannot test fallback to plain text
t/92ssl_riddle_vulnerability.t .......... skipped: Server supports SSL connections, cannot test false-positive enforcement
t/99_bug_server_prepare_blob_null.t ..... ok     
t/99compression.t ....................... ok    
t/gh352.t ............................... 1/2 DBD::mysql::db prepare failed: Statement not active at t/gh352.t line 27.
t/gh352.t ............................... ok   
t/gh360.t ............................... ok   
t/manifest.t ............................ skipped: these tests are for release testing
t/pod.t ................................. 1/3 
#   Failed test 'POD test for blib/lib/DBD/mysql.pm'
#   at /usr/share/perl5/vendor_perl/Test/Pod.pm line 186.
# blib/lib/DBD/mysql.pm (1350): You forgot a '=back' before '=head1'
# Looks like you failed 1 test of 3.
t/pod.t ................................. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests 
t/rt110983-valid-mysqlfd.t .............. ok   
t/rt118977-zerofill.t ................... ok   
t/rt25389-bin-case.t .................... ok   
t/rt50304-column_info_parentheses.t ..... ok    
t/rt61849-bind-param-buffer-overflow.t .. ok   
t/rt75353-innodb-lock-timeout.t ......... ok   
t/rt83494-quotes-comments.t ............. ok   
t/rt85919-fetch-lost-connection.t ....... ok   
t/rt86153-reconnect-fail-memory.t ....... skipped: $ENV{EXTENDED_TESTING} is not set
t/rt88006-bit-prepare.t ................. ok    
t/rt91715.t ............................. ok   
t/version.t ............................. 1/? # mysql_get_client_version: 80020
t/version.t ............................. ok   

Test Summary Report
-------------------
t/pod.t                               (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
Files=79, Tests=2449, 41 wallclock secs ( 0.36 usr  0.12 sys +  4.32 cusr  0.98 csys =  5.78 CPU)
Result: FAIL
Failed 1/79 test programs. 1/2449 subtests failed.
make: *** [test_dynamic] 错误 255
  DVEEDEN/DBD-mysql-5.005.tar.gz
  /usr/bin/make test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports DVEEDEN/DBD-mysql-5.005.tar.gz
Running make install
  make test had returned bad status, won't install without force

为了解决该问题：install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted. 异常解决_attempt to reload clone.pm aborted.-CSDN博客

[root@localhost bin]# ldd /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql.so
	linux-vdso.so.1 =>  (0x00007fffa75b4000)
	libmysqlclient.so.18 => not found
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1a492b5000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f1a4909f000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f1a48d9d000)
	libssl.so.10 => /lib64/libssl.so.10 (0x00007f1a48b2b000)
	libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f1a486ca000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f1a484c6000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f1a480f9000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1a496ed000)
	libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f1a47eac000)
	libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f1a47bc4000)
	libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f1a479c0000)
	libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f1a4778d000)
	libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f1a4757f000)
	libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f1a4737b000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f1a47162000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f1a46f3b000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f1a46cd9000)

又发现libmysqlclient.so.18 => not found，于是继续解决

在其他安装了mysql数据库的服务器上查找下这个文件

[root@localhost opt]# find / -name libmysqlclient.so.18
find: ‘/run/user/1000/gvfs’: 权限不够
/usr/lib64/mysql/libmysqlclient.so.18
[root@localhost opt]# sz /usr/lib64/mysql/libmysqlclient.so.18

将文件传到出问题的服务器上，cp到指定目录下，重新ldd，发现不再not found：

[root@localhost opt]# cp libmysqlclient.so.18 /usr/lib/
[root@localhost opt]# 
[root@localhost opt]# cp libmysqlclient.so.18 /usr/lib64/
[root@localhost opt]# ldd /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql
ldd: /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql: 没有那个文件或目录
[root@localhost opt]# ldd /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql.so
	linux-vdso.so.1 =>  (0x00007fff47db4000)
	libmysqlclient.so.18 => /lib64/libmysqlclient.so.18 (0x00007f3697cce000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f3697ab2000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f369789c000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f369759a000)
	libssl.so.10 => /lib64/libssl.so.10 (0x00007f3697328000)
	libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f3696ec7000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f3696cc3000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f36968f6000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f36965ee000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f36963d8000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f369844e000)
	libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f369618b000)
	libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f3695ea3000)
	libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f3695c9f000)
	libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f3695a6c000)
	libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f369585e000)
	libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f369565a000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f3695441000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f369521a000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f3694fb8000)

再运行一遍主从检查：很好，这个问题过了，出现了下一个问题

[root@localhost opt]# masterha_check_repl  --conf=/etc/mha/app1.cnf
Thu May 30 17:56:26 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 30 17:56:26 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu May 30 17:56:26 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu May 30 17:56:26 2024 - [info] MHA::MasterMonitor version 0.58.
Thu May 30 17:56:27 2024 - [info] GTID failover mode = 1
Thu May 30 17:56:27 2024 - [info] Dead Servers:
Thu May 30 17:56:27 2024 - [info]   192.168.20.231(192.168.20.231:3306)
Thu May 30 17:56:27 2024 - [info] Alive Servers:
Thu May 30 17:56:27 2024 - [info]   192.168.20.132(192.168.20.132:3306)
Thu May 30 17:56:27 2024 - [info]   192.168.20.120(192.168.20.120:3306)
Thu May 30 17:56:27 2024 - [info] Alive Slaves:
Thu May 30 17:56:27 2024 - [info]   192.168.20.120(192.168.20.120:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Thu May 30 17:56:27 2024 - [info]     GTID ON
Thu May 30 17:56:27 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Thu May 30 17:56:27 2024 - [info] Current Alive Master: 192.168.20.132(192.168.20.132:3306)
Thu May 30 17:56:27 2024 - [info] Checking slave configurations..
Thu May 30 17:56:27 2024 - [info]  read_only=1 is not set on slave 192.168.20.120(192.168.20.120:3306).
Thu May 30 17:56:27 2024 - [info] Checking replication filtering settings..
Thu May 30 17:56:27 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu May 30 17:56:27 2024 - [info]  Replication filtering check ok.
Thu May 30 17:56:27 2024 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln398] 192.168.20.120(192.168.20.120:3306): User replmha does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
Thu May 30 17:56:27 2024 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 1403.
Thu May 30 17:56:27 2024 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu May 30 17:56:27 2024 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

2）主从检查踩过的坑——User replmha does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.

解决方法：具有复制权限的用户（repl）必须在所有节点上都创建一次，具有管理权限的用户也是一样，在从库做任何操作之前记得set sql_log_bin=0——>关闭binlog日志；

3）主从检查踩过的坑—— Server 192.168.20.231(192.168.20.231:3306) is dead, but must be alive! Check server settings.

管理机的my.cnf端口设置错了啊啊。3307改回3306好了

其他可能出现的问题：https://www.cnblogs.com/xuliuzai/p/11980273.html

MYSQL 高可用集群搭建 ---MHA_mha下载-CSDN博客

4）主从检查踩过的坑——Fri May 31 18:25:52 2024 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln180] Got MySQL error when connecting 192.168.20.231(192.168.20.231:3306) :1130:Host '192.168.20.231' is not allowed to connect to this MySQL server, but this is not a MySQL crash. Check MySQL server settings.

这个错误表示客户端主机（IP地址为192.168.20.231）试图连接到MySQL服务器，但是服务器的权限设置不允许该主机进行连接。这通常是因为MySQL的用户权限配置不正确，导致该主机的用户无法从该主机上进行连接。

检查后发现，由于整个数据库文件都是备份过来的，备份过来的用户表中的replmha用户不能真实使用，在从库中需要删除后重新create replmha用户，重新赋予权限后解决了该问题

在从库做任何操作之前记得set sql_log_bin=0——>关闭binlog日志；

mysql> drop user replmha@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> select user,host,authentication_string,plugin from mysql.user;
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| user             | host      | authentication_string                                                  | plugin                |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| mha              | %         | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
| yiyi             | %         | $A$005$^oaN;+gtM.v?}dzN9ur30WU8M8ZKEMmqPx00qANDdp3WuzcAvu4DbDz6 | caching_sha2_password |
| mysql.infoschema | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.session    | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.sys        | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| repl             | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
| root             | localhost |                                                                        | caching_sha2_password |
| yizuo            | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
8 rows in set (0.00 sec)

mysql> create user replmha@'%' identified with mysql_native_password by'ok';
Query OK, 0 rows affected (0.00 sec)

mysql> select user,host,authentication_string,plugin from mysql.user;
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| user             | host      | authentication_string                                                  | plugin                |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
| mha              | %         | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
| replmha          | %         | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
| yiyi             | %         | $A$005$^oaN;+gtM.v?}dzN9ur30WU8M8ZKEMmqPx00qANDdp3WuzcAvu4DbDz6 | caching_sha2_password |
| mysql.infoschema | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.session    | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| mysql.sys        | localhost | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | caching_sha2_password |
| repl             | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
| root             | localhost |                                                                        | caching_sha2_password |
| yizuo            | localhost | *31330A9B24799CC9566A39CBD78CEF60E26C906F                              | mysql_native_password |
+------------------+-----------+------------------------------------------------------------------------+-----------------------+
9 rows in set (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON *.* TO 'replmha'@'%' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)

mysql> show master status;
+---------------+----------+--------------+------------------+-------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                         |
+---------------+----------+--------------+------------------+-------------------------------------------+
| binlog.000002 |      156 |              |                  | 93909ace-1b58-11ef-81d8-000c2912a662:1-31 |
+---------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)

mysql> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec)

mysql> show master status;
+---------------+----------+--------------+------------------+-------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                         |
+---------------+----------+--------------+------------------+-------------------------------------------+
| binlog.000002 |      156 |              |                  | 93909ace-1b58-11ef-81d8-000c2912a662:1-31 |
+---------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)

11.开启MHA(db03)

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &

[root@localhost /]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
[1] 88762    //没有报错说明是成功的
[root@localhost /]#

12.查看MHA状态(db03)

[root@localhost /]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:88762) is running(0:PING_OK), master:192.168.20.132

13.服务器关机重启后

1）检查各个机器数据库是否正常启动

2）检查1主2从状态

3）检查mha配置文件

4）互信检查，主从检查

5）启动mha

四、MHA模拟故障并恢复

1.MHA工作状态查看

[root@localhost app1]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:88762) is running(0:PING_OK), master:192.168.20.132

2.主库宕机

mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)

mysql> quit;

//db03MHA工作状态查看
[root@localhost app1]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 master is down and failover is running(50:FAILOVER_RUNNING). master:192.168.20.132
Check /var/log/mha/app1/manager for details.

3.查看日志

cat /var/log/mha/app1/manager

----- Failover Report -----

app1: MySQL Master failover 192.168.20.132(192.168.20.132:3306) to 192.168.20.120(192.168.20.120:3306)

Master 192.168.20.132(192.168.20.132:3306) is down!

Check MHA Manager logs at localhost.localdomain:/var/log/mha/app1/manager for details.

Started automated(non-interactive) failover.
Selected 192.168.20.120(192.168.20.120:3306) as a new master.
192.168.20.120(192.168.20.120:3306): OK: Applying all logs succeeded.
192.168.20.231(192.168.20.231:3306): ERROR: Starting slave failed.
Master failover to 192.168.20.120(192.168.20.120:3306) done, but recovery on slave partially failed.
Fri May 31 20:04:25 2024 - [info] MHA::MasterMonitor version 0.58.
Fri May 31 20:04:26 2024 - [info] GTID failover mode = 1
Fri May 31 20:04:26 2024 - [info] Dead Servers:
Fri May 31 20:04:26 2024 - [info] Alive Servers:
Fri May 31 20:04:26 2024 - [info]   192.168.20.132(192.168.20.132:3306)
Fri May 31 20:04:26 2024 - [info]   192.168.20.120(192.168.20.120:3306)
Fri May 31 20:04:26 2024 - [info]   192.168.20.231(192.168.20.231:3306)
Fri May 31 20:04:26 2024 - [info] Alive Slaves:
Fri May 31 20:04:26 2024 - [info]   192.168.20.120(192.168.20.120:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:04:26 2024 - [info]     GTID ON
Fri May 31 20:04:26 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:04:26 2024 - [info]   192.168.20.231(192.168.20.231:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:04:26 2024 - [info]     GTID ON
Fri May 31 20:04:26 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:04:26 2024 - [info] Current Alive Master: 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:04:26 2024 - [info] Checking slave configurations..
Fri May 31 20:04:26 2024 - [info]  read_only=1 is not set on slave 192.168.20.120(192.168.20.120:3306).
Fri May 31 20:04:26 2024 - [info]  read_only=1 is not set on slave 192.168.20.231(192.168.20.231:3306).
Fri May 31 20:04:26 2024 - [info] Checking replication filtering settings..
Fri May 31 20:04:26 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Fri May 31 20:04:26 2024 - [info]  Replication filtering check ok.
Fri May 31 20:04:26 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri May 31 20:04:26 2024 - [info] Checking SSH publickey authentication settings on the current master..
Fri May 31 20:04:26 2024 - [info] HealthCheck: SSH to 192.168.20.132 is reachable.
Fri May 31 20:04:26 2024 - [info] 
192.168.20.132(192.168.20.132:3306) (current master)
 +--192.168.20.120(192.168.20.120:3306)
 +--192.168.20.231(192.168.20.231:3306)

Fri May 31 20:04:26 2024 - [warning] master_ip_failover_script is not defined.
Fri May 31 20:04:26 2024 - [warning] shutdown_script is not defined.
Fri May 31 20:04:26 2024 - [info] Set master ping interval 2 seconds.
Fri May 31 20:04:26 2024 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri May 31 20:04:26 2024 - [info] Starting ping health check on 192.168.20.132(192.168.20.132:3306)..
Fri May 31 20:04:26 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri May 31 20:05:04 2024 - [warning] Got error on MySQL select ping: 1053 (Server shutdown in progress)
Fri May 31 20:05:04 2024 - [info] Executing SSH check script: exit 0
Fri May 31 20:05:04 2024 - [info] HealthCheck: SSH to 192.168.20.132 is reachable.
Fri May 31 20:05:06 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.132' (111))
Fri May 31 20:05:06 2024 - [warning] Connection failed 2 time(s)..
Fri May 31 20:05:08 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.132' (111))
Fri May 31 20:05:08 2024 - [warning] Connection failed 3 time(s)..
Fri May 31 20:05:10 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.132' (111))
Fri May 31 20:05:10 2024 - [warning] Connection failed 4 time(s)..
Fri May 31 20:05:10 2024 - [warning] Master is not reachable from health checker!
Fri May 31 20:05:10 2024 - [warning] Master 192.168.20.132(192.168.20.132:3306) is not reachable!
Fri May 31 20:05:10 2024 - [warning] SSH is reachable.
Fri May 31 20:05:10 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status..
Fri May 31 20:05:10 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri May 31 20:05:10 2024 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri May 31 20:05:10 2024 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri May 31 20:05:11 2024 - [info] GTID failover mode = 1
Fri May 31 20:05:11 2024 - [info] Dead Servers:
Fri May 31 20:05:11 2024 - [info]   192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:11 2024 - [info] Alive Servers:
Fri May 31 20:05:11 2024 - [info]   192.168.20.120(192.168.20.120:3306)
Fri May 31 20:05:11 2024 - [info]   192.168.20.231(192.168.20.231:3306)
Fri May 31 20:05:11 2024 - [info] Alive Slaves:
Fri May 31 20:05:11 2024 - [info]   192.168.20.120(192.168.20.120:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:11 2024 - [info]     GTID ON
Fri May 31 20:05:11 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:11 2024 - [info]   192.168.20.231(192.168.20.231:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:11 2024 - [info]     GTID ON
Fri May 31 20:05:11 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:11 2024 - [info] Checking slave configurations..
Fri May 31 20:05:11 2024 - [info]  read_only=1 is not set on slave 192.168.20.120(192.168.20.120:3306).
Fri May 31 20:05:11 2024 - [info]  read_only=1 is not set on slave 192.168.20.231(192.168.20.231:3306).
Fri May 31 20:05:11 2024 - [info] Checking replication filtering settings..
Fri May 31 20:05:11 2024 - [info]  Replication filtering check ok.
Fri May 31 20:05:11 2024 - [info] Master is down!
Fri May 31 20:05:11 2024 - [info] Terminating monitoring script.
Fri May 31 20:05:11 2024 - [info] Got exit code 20 (Master dead).
Fri May 31 20:05:11 2024 - [info] MHA::MasterFailover version 0.58.
Fri May 31 20:05:11 2024 - [info] Starting master failover.
Fri May 31 20:05:11 2024 - [info] 
Fri May 31 20:05:11 2024 - [info] * Phase 1: Configuration Check Phase..
Fri May 31 20:05:11 2024 - [info] 
Fri May 31 20:05:12 2024 - [info] GTID failover mode = 1
Fri May 31 20:05:12 2024 - [info] Dead Servers:
Fri May 31 20:05:12 2024 - [info]   192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:12 2024 - [info] Checking master reachability via MySQL(double check)...
Fri May 31 20:05:12 2024 - [info]  ok.
Fri May 31 20:05:12 2024 - [info] Alive Servers:
Fri May 31 20:05:12 2024 - [info]   192.168.20.120(192.168.20.120:3306)
Fri May 31 20:05:12 2024 - [info]   192.168.20.231(192.168.20.231:3306)
Fri May 31 20:05:12 2024 - [info] Alive Slaves:
Fri May 31 20:05:12 2024 - [info]   192.168.20.120(192.168.20.120:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:12 2024 - [info]     GTID ON
Fri May 31 20:05:12 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:12 2024 - [info]   192.168.20.231(192.168.20.231:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:12 2024 - [info]     GTID ON
Fri May 31 20:05:12 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:12 2024 - [info] Starting GTID based failover.
Fri May 31 20:05:12 2024 - [info] 
Fri May 31 20:05:12 2024 - [info] ** Phase 1: Configuration Check Phase completed.
Fri May 31 20:05:12 2024 - [info] 
Fri May 31 20:05:12 2024 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri May 31 20:05:12 2024 - [info] 
Fri May 31 20:05:12 2024 - [info] Forcing shutdown so that applications never connect to the current master..
Fri May 31 20:05:12 2024 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Fri May 31 20:05:12 2024 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri May 31 20:05:13 2024 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] * Phase 3: Master Recovery Phase..
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] The latest binary log file/position on all slaves is binlog.000011:196
Fri May 31 20:05:13 2024 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri May 31 20:05:13 2024 - [info]   192.168.20.120(192.168.20.120:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:13 2024 - [info]     GTID ON
Fri May 31 20:05:13 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:13 2024 - [info]   192.168.20.231(192.168.20.231:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:13 2024 - [info]     GTID ON
Fri May 31 20:05:13 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:13 2024 - [info] The oldest binary log file/position on all slaves is binlog.000011:196
Fri May 31 20:05:13 2024 - [info] Oldest slaves:
Fri May 31 20:05:13 2024 - [info]   192.168.20.120(192.168.20.120:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:13 2024 - [info]     GTID ON
Fri May 31 20:05:13 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:13 2024 - [info]   192.168.20.231(192.168.20.231:3306)  Version=8.0.20 (oldest major version between slaves) log-bin:enabled
Fri May 31 20:05:13 2024 - [info]     GTID ON
Fri May 31 20:05:13 2024 - [info]     Replicating from 192.168.20.132(192.168.20.132:3306)
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] * Phase 3.3: Determining New Master Phase..
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] Searching new master from slaves..
Fri May 31 20:05:13 2024 - [info]  Candidate masters from the configuration file:
Fri May 31 20:05:13 2024 - [info]  Non-candidate masters:
Fri May 31 20:05:13 2024 - [info] New master is 192.168.20.120(192.168.20.120:3306)
Fri May 31 20:05:13 2024 - [info] Starting master failover..
Fri May 31 20:05:13 2024 - [info] 
From:
192.168.20.132(192.168.20.132:3306) (current master)
 +--192.168.20.120(192.168.20.120:3306)
 +--192.168.20.231(192.168.20.231:3306)

To:
192.168.20.120(192.168.20.120:3306) (new master)
 +--192.168.20.231(192.168.20.231:3306)
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] * Phase 3.3: New Master Recovery Phase..
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info]  Waiting all logs to be applied.. 
Fri May 31 20:05:13 2024 - [info]   done.
Fri May 31 20:05:13 2024 - [info] Getting new master's binlog name and position..
Fri May 31 20:05:13 2024 - [info]  binlog.000003:196
Fri May 31 20:05:13 2024 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.20.120', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='replmha', MASTER_PASSWORD='xxx';
Fri May 31 20:05:13 2024 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: binlog.000003, 196, 93909ace-1b58-11ef-81d8-000c2912a662:1-7
Fri May 31 20:05:13 2024 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Fri May 31 20:05:13 2024 - [info] ** Finished master recovery successfully.
Fri May 31 20:05:13 2024 - [info] * Phase 3: Master Recovery Phase completed.
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] * Phase 4: Slaves Recovery Phase..
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri May 31 20:05:13 2024 - [info] 
Fri May 31 20:05:13 2024 - [info] -- Slave recovery on host 192.168.20.231(192.168.20.231:3306) started, pid: 38527. Check tmp log /var/log/mha/app1/192.168.20.231_3306_20240531200511.log if it takes time..
Fri May 31 20:05:15 2024 - [info] 
Fri May 31 20:05:15 2024 - [info] Log messages from 192.168.20.231 ...
Fri May 31 20:05:15 2024 - [info] 
Fri May 31 20:05:13 2024 - [info]  Resetting slave 192.168.20.231(192.168.20.231:3306) and starting replication from the new master 192.168.20.120(192.168.20.120:3306)..
Fri May 31 20:05:13 2024 - [info]  Executed CHANGE MASTER.
Fri May 31 20:05:14 2024 - [info]  Slave started.
Fri May 31 20:05:14 2024 - [info]  gtid_wait(93909ace-1b58-11ef-81d8-000c2912a662:1-7) completed on 192.168.20.231(192.168.20.231:3306). Executed 0 events.
Fri May 31 20:05:15 2024 - [info] End of log messages from 192.168.20.231.
Fri May 31 20:05:15 2024 - [info] -- Slave on host 192.168.20.231(192.168.20.231:3306) started.
Fri May 31 20:05:15 2024 - [info] All new slave servers recovered successfully.
Fri May 31 20:05:15 2024 - [info] 
Fri May 31 20:05:15 2024 - [info] * Phase 5: New master cleanup phase..
Fri May 31 20:05:15 2024 - [info] 
Fri May 31 20:05:15 2024 - [info] Resetting slave info on the new master..
Fri May 31 20:05:15 2024 - [info]  192.168.20.120: Resetting slave info succeeded.
Fri May 31 20:05:15 2024 - [info] Master failover to 192.168.20.120(192.168.20.120:3306) completed successfully.
Fri May 31 20:05:15 2024 - [info] Deleted server1 entry from /etc/mha/app1.cnf .
Fri May 31 20:05:15 2024 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.20.132(192.168.20.132:3306) to 192.168.20.120(192.168.20.120:3306) succeeded

Master 192.168.20.132(192.168.20.132:3306) is down!

Check MHA Manager logs at localhost.localdomain:/var/log/mha/app1/manager for details.

Started automated(non-interactive) failover.
Selected 192.168.20.120(192.168.20.120:3306) as a new master.
192.168.20.120(192.168.20.120:3306): OK: Applying all logs succeeded.
192.168.20.231(192.168.20.231:3306): OK: Slave started, replicating from 192.168.20.120(192.168.20.120:3306)
192.168.20.120(192.168.20.120:3306): Resetting slave info succeeded.
Master failover to 192.168.20.120(192.168.20.120:3306) completed successfully.

4.修复主库

重启主库数据库服务即可

如果是实际生产中怎么办？——>判断是否有可恢复性，如果没有就重新初始化，重构

5.恢复主从

此时db02是主库，db03是他的从库也是master库，db01失去了主从环境

需要把db01重新设定为db02的从库

6.修改配置文件

vim /etc/mha/app1.cnf
打开配置文件发现，db01的配置已经被mha自动删掉了，再添加上就可以

7.重新启动MHA

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &

8.查看MHA状态

[root@localhost /]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:100728) is running(0:PING_OK), master:192.168.20.120

五、应用透明——VIP

1.准备vip故障转移脚本