MHA架构介绍
MHA是Master High Availability的缩写,它是目前MySQL高可用方面的一个相对成熟的解决方案,其核心是使用perl语言编写的一组脚本,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
基于MHA的架构不像MMM那样需要搭建主主复制,只需要搭建基本的主从复制架构即可。因为MHA在主库挂掉时,是在多个从库中选取出一个从库作为新的主库。MHA集群中的各节点彼此之间均需要基于ssh互信通信,以实现远程控制及数据管理功能。
MHA提供了什么功能:
- 可以监控Master节点是否可用
- 当Master不可用时,能在多个Slave中选举出新的Master
- 提供了主从切换和故障转移功能,MHA会尝试在宕机的Master上保存binlog,在最大程度上保证事务不丢失。但如果是Master所在的服务器已经无法访问,或硬件层面出现了问题,则无法成功保存binlog
- MHA可以与半同步复制结合,避免从库之间出现数据不一致的情况
- 支持MySQL基于GTID和基于日志点的两种复制方式
MHA故障转移过程:
- 尝试使用ssh登录到宕机崩溃的Master节点上保存二进制日志事件(binlog events);
- 从多个Slave中识别含有最新更新的Slave,将其作为备选的Master;
- 然后基于该Slave同步差异的中继日志(relaylog)到其他的Slave上;
- 接着同步从原Master上保存的二进制日志事件(binlog events);
- 将备选的Master提升为新的Master;
- 使其他的Slave连接新的Master进行复制;
- 在新的Master启动vip地址,保证前端请求可以发送到新的Master。
MHA架构优缺点
优点:
- 使用Perl脚本语言开发并且完全开源,开发者可以根据自己的需求进行二次开发
- 能够支持基于GTID和基于日志点的复制模式
- MHA在进行故障转移时更不易产生数据丢失
- 在一个监控节点上可以监控多个Replication集群
缺点:
- MHA默认不提供虚拟IP功能,需要自行编写脚本或利用第三方工具来实现虚拟IP的配
- MHA启动后只会对Master进行监控,不会对Slave进行监控,也无法监控复制链路的情
- 集群环境需要能够通过ssh免密登录,存在一定的安全隐患
- MHA没有提供对Slave的读负载均衡功能,需要通过第三方工具来实现
文章目录
前言
环境版本说明:
1. 操作系统:centos 7
2. mysql版本:5.7
3. mha 版本:0.58
ip 名称 角色分配
192.168.52.101 mha1 master + mha node01
192.168.52.102 mha2 slave01 node02
192.168.52.103 mha3 slave02 + manger node03
为了减少不必要的麻烦,mysql 已经提前安装 MYSQL的安装
一、配置三台机器免密通信-配置ssh免密登陆
配置集群内所有主机之间能够通过ssh免密登录,因为MHA是基于ssh去实现远程控制及数据管理的。例如,故障转移过程中保存原Master节点的二进制日志以及配置虚拟IP等。
- 生成 ssh 密钥 (直接在man)
[root@node01 .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:B6zmIF3u8D1BsImJpXgyDYtvOiB/PJ89WGvEcV/fkoo root@node01
The key's randomart image is:
+---[RSA 2048]----+
|. . . |
|.= + o = |
|* = o + + |
| = . o o... . |
|o + + +.So.. . o.|
|o+ o B .+o . o o|
|o . + ++o. . . . |
| . . o.o+.E . |
| o... |
+----[SHA256]-----+
[root@node01 .ssh]# scp id_rsa.pub root@192.168.52.103:/root/.ssh/id_rsa.pub_51
The authenticity of host '192.168.52.103 (192.168.52.103)' can't be established.
ECDSA key fingerprint is SHA256:CtO37URFc/RUpJ5yCIaFAPqXQJC4gblbPbgSWBaR7qU.
ECDSA key fingerprint is MD5:12:26:2c:a6:ec:9f:36:88:df:52:8d:79:89:3f:38:4e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.52.103' (ECDSA) to the list of known hosts.
root@192.168.52.103's password:
id_rsa.pub 100% 393 334.3KB/s 00:00
- 将密钥拷贝到其他服务器上
#生成公钥和私钥文件,一直回车使用默认即可,所有节点执行
ssh-keygen -t rsa
#将所有节点上的公钥复制到一台机器上,汇总成 authorized_keys
#192.168.52.101服务器上,将生成的公钥复制到192.168.52.103下
scp id_rsa.pub root@192.168.52.103:/root/.ssh/id_rsa.pub_51
#192.168.52.102服务器上,将生成的公钥复制到192.168.52.103下
scp id_rsa.pub root@192.168.52.103:/root/.ssh/id_rsa.pub_52
#192.168.52.103服务器上,将101,102 服务器的公钥追加到认证文件中
cat id_rsa.pub id_rsa.pub_51 id_rsa.pub_52 >> authorized_keys
#192.168.52.103服务器上,将汇总的公钥认证文件分发到其它节点
scp authorized_keys root@192.168.52.101:/root/.ssh/
scp authorized_keys root@192.168.52.102/root/.ssh/
#所有节点 验证ssh免密登录,第一次可能需要输入yes,后续不需要输入密码
ssh ip
[root@node03 .ssh]# cat id_rsa.pub id_rsa.pub_51 id_rsa.pub_52 >> authorized_keys
[root@node03 .ssh]#
[root@node03 .ssh]# scp authorized_keys root@192.168.52.101:/root/.ssh/
root@192.168.52.101's password:
authorized_keys 100% 1965 2.1MB/s 00:00
[root@node03 .ssh]# scp authorized_keys root@192.168.52.102:/root/.ssh/
root@192.168.52.102's password:
authorized_keys 100% 1965 2.3MB/s 00:00
- 验证是否成功
[root@node03]# ssh root@192.168.52.102 # 登陆
Last login: Mon Dec 20 10:36:38 2021 from 192.168.52.101
[root@centos01 ~]# logout #退出
Connection to 192.168.52.102 closed.
[root@101]#
二、修改数据库配置
配置主从节点的配置文件
- 在所有主从节点上使用如下语句创建用于主从复制的MySQL用户,因为每个从库都有可能会被选举为主库,所以都需要拥有用于复制的用户:
mysql> create user 'repl'@'%' identified with mysql_native_password by 'xxx';
Query OK, 0 rows affected (0.41 sec)
mysql> grant replication slave on *.* to 'repl'@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.08 sec)
- 然后修改master节点上的MySQL配置文件:
[mysqld]
# 设置当前节点的id
server_id=101
# 开启binlog,并指定binlog文件的名称
log_bin=mysql_bin
# 开启relay_log,并指定relay_log文件的名称
relay_log=relay_bin
# 将relaylog的同步内容记录到binlog中
log_slave_updates=on
# 开启GTID复制模式
gtid_mode=ON
enforce_gtid_consistency=1
- 在slave-01的配置文件中也是添加一样配置,只不过server_id不一样:
# 设置当前节点的id
server_id=102
# 开启binlog,并指定binlog文件的名称
log_bin=mysql_bin
# 开启relay_log,并指定relay_log文件的名称
relay_log=relay_bin
# 将relaylog的同步内容记录到binlog中
log_slave_updates=on
# 开启GTID复制模式
gtid_mode=ON
enforce_gtid_consistency=1
- 接着是配置slave-02:
# 设置当前节点的id
server_id=103
# 开启binlog,并指定binlog文件的名称
log_bin=mysql_bin
# 开启relay_log,并指定relay_log文件的名称
relay_log=relay_bin
# 将relaylog的同步内容记录到binlog中
log_slave_updates=on
# 开启GTID复制模式
gtid_mode=ON
enforce_gtid_consistency=1
完成以上配置文件的修改后,分别重启这三个节点上的MySQL服务:
systemctl restart mysqld / systemctl start mysqld.service
配置slave-01对master的主从关系
# 停止主从同步
stop slave;
# 配置master节点的连接信息
change master to master_host='192.168.52.101', master_port=3306, master_user='repl', master_password='Qwe@1234', master_auto_position=1;
# 启动主从同步
start slave;
# 查看配置文件中的 server_id
show variables like 'server_id';
# 查看主从同步状态
show slave status\G;
配置slave-02对master的主从关系
# 停止主从同步
stop slave;
# 配置master节点的连接信息
change master to master_host='192.168.52.101', master_port=3306, master_user='repl', master_password='Qwe@1234', master_auto_position=1;
# 启动主从同步
start slave;
# 查看配置文件中的 server_id
show variables like 'server_id';
# 查看主从同步状态
show slave status\G;
配置完主从复制链路后,使用show slave status\G;语句查看主从同步状态,Slave_IO_Running和Slave_SQL_Running的值均为Yes才能表示主从同步状态是正常的:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.52.101
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql_bin.000002
Read_Master_Log_Pos: 154
Relay_Log_File: relay_bin.000003
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql_bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
在执行过程中遇到:Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.
解决方案
- 查看auto.cnf 中的文件uuid 是否相同
- 修改my.cnf 文件是否生效
遇到第一个问题,是因为mysql 是直接从主节点copy 导致的
解决:mv /var/lib/mysql/auto.cnf /var/lib/mysql/auto.cnf.bak #重启mysql解决
三、安装MHA软件包
1.安装mha4mysql-node节点
1.首先在所有的节点上安装mha4mysql-node软件包,安装包可到如下地址进行下载
mha4mysql-node-0.58-0.el7.centos.noarch.rpm
yum -y install epel-release
yum -y install perl-DBD-MySQL perl-DBI ncftp
2.依赖安装完成进行mha4mysql-node 安装
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
[root@node01 mha]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
Preparing... ################################# [100%]
Updating / installing...
1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
Tips:另外的两个Slave节点和监控节点按如上步骤安装即可,这里就不重复演示了
2.安装mha4mysql-manager节点
1.安装相关依赖
yum -y install epel-release
yum -y install perl-Config-Tiny perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-DBD-MySQL ncftp
如果监控节点是单独服务,需要安装如上依赖
yum -y install perl-Config-Tiny perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch
如果不是安装以上即可 一定要注意 epel-release perl-DBD-MySQL ncftp 的安装
没贴全,看到最后成功就好啦
root@node03 mha]# yum -y install perl-Config-Tiny perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.nju.edu.cn
* epel: ftp.yz.yamagata-u.ac.jp
* extras: mirrors.nju.edu.cn
* updates: mirrors.nju.edu.cn
Package 4:perl-Time-HiRes-1.9725-3.el7.x86_64 already installed and latest version
Dependency Installed:
perl-Class-Load.noarch 0:0.20-3.el7 perl-Data-OptList.noarch 0:0.107-9.el7 perl-Email-Date-Format.noarch 0:1.002-15.el7
perl-IO-Socket-IP.noarch 0:0.21-5.el7 perl-IO-Socket-SSL.noarch 0:1.94-7.el7 perl-List-MoreUtils.x86_64 0:0.33-9.el7
perl-MIME-Lite.noarch 0:3.030-1.el7 perl-MIME-Types.noarch 0:1.38-2.el7 perl-Mail-Sender.noarch 0:0.8.23-1.el7
perl-Mail-Sendmail.noarch 0:0.79-21.el7 perl-MailTools.noarch 0:2.12-2.el7 perl-Module-Implementation.noarch 0:0.06-6.el7
perl-Module-Runtime.noarch 0:0.013-4.el7 perl-Mozilla-CA.noarch 0:20130114-5.el7 perl-Net-LibIDN.x86_64 0:0.12-15.el7
perl-Net-SMTP-SSL.noarch 0:1.01-13.el7 perl-Net-SSLeay.x86_64 0:1.55-6.el7 perl-Package-DeprecationManager.noarch 0:0.13-7.el7
perl-Package-Stash.noarch 0:0.34-2.el7 perl-Package-Stash-XS.x86_64 0:0.26-3.el7 perl-Params-Util.x86_64 0:1.07-6.el7
perl-Params-Validate.x86_64 0:1.08-4.el7 perl-Sub-Install.noarch 0:0.926-6.el7 perl-Sys-Syslog.x86_64 0:0.33-3.el7
perl-TimeDate.noarch 1:2.30-2.el7 perl-Try-Tiny.noarch 0:0.12-2.el7
Complete!
[root@centos01 mha]#
2.然后安装mha4mysql-manager包,命令如下:
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
[root@node03 mha]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
Preparing... ################################# [100%]
Updating / installing...
1:mha4mysql-manager-0.58-0.el7.cent################################# [100%]
3.配置MHA管理节点(manager 节点配置)
1、创建MHA的配置文件存放目录和工作目录:
[root@node03]# mkdir /etc/mha
[root@node03]# mkdir /home/mysql_mha
2、创建MHA的配置文件,并添加如下内容:
[root@node03]# vim /etc/mha/mysql_mha.cnf
[server default]
# mha用于访问数据库的账户和密码
user=mha
password=Qwe@1234
# 指定mha的工作目录
manager_workdir=/home/mysql_mha
# mha日志文件的存放路径
manager_log=/home/mysql_mha/manager.log
# 指定mha在远程节点上的工作目录
remote_workdir=/home/mysql_mha
# 可以使用ssh登录的用户
ssh_user=root
# 用于主从复制的MySQL用户和密码
repl_user=repl
repl_password=Qwe@1234
# 指定间隔多少秒检测一次
ping_interval=1
# 指定master节点存放binlog日志文件的目录
master_binlog_dir=/var/lib/mysql
# 指定一个脚本,该脚本实现了在主从切换之后,将虚拟IP漂移到新的Master上
master_ip_failover_script=/usr/bin/master_ip_failover
# 指定用于二次检查节点状态的脚本
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.52.101 -s 192.168.52.102 -s 192.168.52.13
# 配置集群中的节点信息
[server1]
hostname=192.168.52.101
# 指定该节点可以参与Master选举
candidate_master=1
[server2]
hostname=192.168.52.102
candidate_master=1
[server3]
hostname=192.168.52.103
# 指定该节点不能参与Master选举
no_master=1
文件中尽量不要出现空格,注意标点符号
3、编写配置文件中所配置的 master_ip_failover 脚本,该脚本是根据MHA的官方示例修改的,MHA默认并没有提供。需要注意脚本中的几处地方需要根据实际情况进行修改,已用注释标明:
[root@node03]# vim /usr/bin/master_ip_failover
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
);
# 虚拟ip 根据实际情况修改
my $vip = '192.168.52.109/24';
my $key = '1';
# 注意网卡名
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
还需要给该脚本添加可执行权限,否则MHA是无法调用的:
[root@node03]# chmod a+x /usr/bin/master_ip_failover
4、根据配置文件中remote_workdir的配置,需在其他节点上创建MHA的远程工作目录:
[root@node01]# mkdir /home/mysql_mha
[root@node02]# mkdir /home/mysql_mha
[root@node03]# mkdir /home/mysql_mha
5、在配置文件中指定了让manager使用mha这个用户来访问数据库节点,所以需要在master节点上创建mha用户:
create user 'mha'@'%' identified with mysql_native_password by 'Abc_123456';
grant all privileges on *.* to 'mha'@'%';
flush privileges;
6、完成以上所有步骤后,在manager节点上使用masterha_check_ssh和masterha_check_repl对配置进行检查,其中masterha_check_ssh用于检查ssh登录是否正常,而masterha_check_repl则用于检查主从节点的复制链路是否正常:
masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf
masterha_check_repl --conf=/etc/mha/mysql_mha.cnf
执行结果如下:
[root@node03 .ssh]# masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf
Wed Dec 22 17:31:51 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 22 17:31:51 2021 - [info] Reading application default configuration from /etc/mha/mysql_mha.cnf..
Wed Dec 22 17:31:51 2021 - [info] Reading server configuration from /etc/mha/mysql_mha.cnf..
Wed Dec 22 17:31:51 2021 - [info] Starting SSH connection tests..
Wed Dec 22 17:31:53 2021 - [debug]
Wed Dec 22 17:31:51 2021 - [debug] Connecting via SSH from root@192.168.52.101(192.168.52.101:22) to root@192.168.52.102(192.168.52.102:22)..
Warning: Permanently added '192.168.52.102' (ECDSA) to the list of known hosts.
Wed Dec 22 17:31:52 2021 - [debug] ok.
Wed Dec 22 17:31:52 2021 - [debug] Connecting via SSH from root@192.168.52.101(192.168.52.101:22) to root@192.168.52.103(192.168.52.103:22)..
Wed Dec 22 17:31:53 2021 - [debug] ok.
Wed Dec 22 17:31:54 2021 - [debug]
Wed Dec 22 17:31:51 2021 - [debug] Connecting via SSH from root@192.168.52.102(192.168.52.102:22) to root@192.168.52.101(192.168.52.101:22)..
Wed Dec 22 17:31:52 2021 - [debug] ok.
Wed Dec 22 17:31:52 2021 - [debug] Connecting via SSH from root@192.168.52.102(192.168.52.102:22) to root@192.168.52.103(192.168.52.103:22)..
Wed Dec 22 17:31:54 2021 - [debug] ok.
Wed Dec 22 17:31:55 2021 - [debug]
Wed Dec 22 17:31:52 2021 - [debug] Connecting via SSH from root@192.168.52.103(192.168.52.103:22) to root@192.168.52.101(192.168.52.101:22)..
Wed Dec 22 17:31:54 2021 - [debug] ok.
Wed Dec 22 17:31:54 2021 - [debug] Connecting via SSH from root@192.168.52.103(192.168.52.103:22) to root@192.168.52.102(192.168.52.102:22)..
Wed Dec 22 17:31:55 2021 - [debug] ok.
Wed Dec 22 17:31:55 2021 - [info] All SSH connection tests passed successfully.
[root@node03 .ssh]# masterha_check_repl --conf=/etc/mha/mysql_mha.cnf
Wed Dec 22 17:32:18 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 22 17:32:18 2021 - [info] Reading application default configuration from /etc/mha/mysql_mha.cnf..
Wed Dec 22 17:32:18 2021 - [info] Reading server configuration from /etc/mha/mysql_mha.cnf..
Wed Dec 22 17:32:18 2021 - [info] MHA::MasterMonitor version 0.58.
Wed Dec 22 17:32:19 2021 - [info] GTID failover mode = 1
Wed Dec 22 17:32:19 2021 - [info] Dead Servers:
Wed Dec 22 17:32:19 2021 - [info] Alive Servers:
Wed Dec 22 17:32:19 2021 - [info] 192.168.52.101(192.168.52.101:3306)
Wed Dec 22 17:32:19 2021 - [info] 192.168.52.102(192.168.52.102:3306)
Wed Dec 22 17:32:19 2021 - [info] 192.168.52.103(192.168.52.103:3306)
Wed Dec 22 17:32:19 2021 - [info] Alive Slaves:
Wed Dec 22 17:32:19 2021 - [info] 192.168.52.102(192.168.52.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Wed Dec 22 17:32:19 2021 - [info] GTID ON
Wed Dec 22 17:32:19 2021 - [info] Replicating from 192.168.52.101(192.168.52.101:3306)
Wed Dec 22 17:32:19 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 22 17:32:19 2021 - [info] 192.168.52.103(192.168.52.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Wed Dec 22 17:32:19 2021 - [info] GTID ON
Wed Dec 22 17:32:19 2021 - [info] Replicating from 192.168.52.101(192.168.52.101:3306)
Wed Dec 22 17:32:19 2021 - [info] Not candidate for the new Master (no_master is set)
Wed Dec 22 17:32:19 2021 - [info] Current Alive Master: 192.168.52.101(192.168.52.101:3306)
Wed Dec 22 17:32:19 2021 - [info] Checking slave configurations..
Wed Dec 22 17:32:19 2021 - [info] read_only=1 is not set on slave 192.168.52.102(192.168.52.102:3306).
Wed Dec 22 17:32:19 2021 - [info] read_only=1 is not set on slave 192.168.52.103(192.168.52.103:3306).
Wed Dec 22 17:32:19 2021 - [info] Checking replication filtering settings..
Wed Dec 22 17:32:19 2021 - [info] binlog_do_db= , binlog_ignore_db=
Wed Dec 22 17:32:19 2021 - [info] Replication filtering check ok.
Wed Dec 22 17:32:19 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Wed Dec 22 17:32:19 2021 - [info] Checking SSH publickey authentication settings on the current master..
Wed Dec 22 17:32:20 2021 - [info] HealthCheck: SSH to 192.168.52.101 is reachable.
Wed Dec 22 17:32:20 2021 - [info]
192.168.52.101(192.168.52.101:3306) (current master)
+--192.168.52.102(192.168.52.102:3306)
+--192.168.52.103(192.168.52.103:3306)
Wed Dec 22 17:32:20 2021 - [info] Checking replication health on 192.168.52.102..
Wed Dec 22 17:32:20 2021 - [info] ok.
Wed Dec 22 17:32:20 2021 - [info] Checking replication health on 192.168.52.103..
Wed Dec 22 17:32:20 2021 - [info] ok.
Wed Dec 22 17:32:20 2021 - [info] Checking master_ip_failover_script status:
Wed Dec 22 17:32:20 2021 - [info] /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.52.101 --orig_master_ip=192.168.52.101 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.52.109/24===
Checking the Status of the script.. OK
Wed Dec 22 17:32:20 2021 - [info] OK.
Wed Dec 22 17:32:20 2021 - [warning] shutdown_script is not defined.
Wed Dec 22 17:32:20 2021 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
7、手动切换节点
masterha_master_switch --master_state=alive --conf=/etc/mha/mysql_mha.cnf --new_master_host=192.168.52.102 --orig_master_is_new_slave
[root@node03 .ssh]# masterha_master_switch --master_state=alive --conf=/etc/mha/mysql_mha.cnf --new_master_host=192.168.52.102 --orig_master_is_new_slave
Wed Dec 22 17:34:52 2021 - [info] MHA::MasterRotate version 0.58.
Wed Dec 22 17:34:52 2021 - [info] Starting online master switch..
Wed Dec 22 17:34:52 2021 - [info]
Wed Dec 22 17:34:52 2021 - [info] * Phase 1: Configuration Check Phase..
Wed Dec 22 17:34:52 2021 - [info]
Wed Dec 22 17:34:52 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 22 17:34:52 2021 - [info] Reading application default configuration from /etc/mha/mysql_mha.cnf..
Wed Dec 22 17:34:52 2021 - [info] Reading server configuration from /etc/mha/mysql_mha.cnf..
Wed Dec 22 17:34:53 2021 - [info] GTID failover mode = 1
Wed Dec 22 17:34:53 2021 - [info] Current Alive Master: 192.168.52.101(192.168.52.101:3306)
Wed Dec 22 17:34:53 2021 - [info] Alive Slaves:
Wed Dec 22 17:34:53 2021 - [info] 192.168.52.102(192.168.52.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Wed Dec 22 17:34:53 2021 - [info] GTID ON
Wed Dec 22 17:34:53 2021 - [info] Replicating from 192.168.52.101(192.168.52.101:3306)
Wed Dec 22 17:34:53 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 22 17:34:53 2021 - [info] 192.168.52.103(192.168.52.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Wed Dec 22 17:34:53 2021 - [info] GTID ON
Wed Dec 22 17:34:53 2021 - [info] Replicating from 192.168.52.101(192.168.52.101:3306)
Wed Dec 22 17:34:53 2021 - [info] Not candidate for the new Master (no_master is set)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.52.101(192.168.52.101:3306)? (YES/no): yes
Wed Dec 22 17:34:56 2021 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Wed Dec 22 17:34:57 2021 - [info] ok.
Wed Dec 22 17:34:57 2021 - [info] Checking MHA is not monitoring or doing failover..
Wed Dec 22 17:34:57 2021 - [info] Checking replication health on 192.168.52.102..
Wed Dec 22 17:34:57 2021 - [info] ok.
Wed Dec 22 17:34:57 2021 - [info] Checking replication health on 192.168.52.103..
Wed Dec 22 17:34:57 2021 - [info] ok.
Wed Dec 22 17:34:57 2021 - [info] 192.168.52.102 can be new master.
Wed Dec 22 17:34:57 2021 - [info]
From:
192.168.52.101(192.168.52.101:3306) (current master)
+--192.168.52.102(192.168.52.102:3306)
+--192.168.52.103(192.168.52.103:3306)
To:
192.168.52.102(192.168.52.102:3306) (new master)
+--192.168.52.103(192.168.52.103:3306)
+--192.168.52.101(192.168.52.101:3306)
Starting master switch from 192.168.52.101(192.168.52.101:3306) to 192.168.52.102(192.168.52.102:3306)? (yes/NO): yes
Wed Dec 22 17:34:59 2021 - [info] Checking whether 192.168.52.102(192.168.52.102:3306) is ok for the new master..
Wed Dec 22 17:34:59 2021 - [info] ok.
Wed Dec 22 17:34:59 2021 - [info] 192.168.52.101(192.168.52.101:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Wed Dec 22 17:34:59 2021 - [info] 192.168.52.101(192.168.52.101:3306): Resetting slave pointing to the dummy host.
Wed Dec 22 17:34:59 2021 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Dec 22 17:34:59 2021 - [info]
Wed Dec 22 17:34:59 2021 - [info] * Phase 2: Rejecting updates Phase..
Wed Dec 22 17:34:59 2021 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Wed Dec 22 17:35:01 2021 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Wed Dec 22 17:35:01 2021 - [info] Executing FLUSH TABLES WITH READ LOCK..
Wed Dec 22 17:35:01 2021 - [info] ok.
Wed Dec 22 17:35:01 2021 - [info] Orig master binlog:pos is mysql_bin.000003:1395.
Wed Dec 22 17:35:01 2021 - [info] Waiting to execute all relay logs on 192.168.52.102(192.168.52.102:3306)..
Wed Dec 22 17:35:01 2021 - [info] master_pos_wait(mysql_bin.000003:1395) completed on 192.168.52.102(192.168.52.102:3306). Executed 0 events.
Wed Dec 22 17:35:01 2021 - [info] done.
Wed Dec 22 17:35:01 2021 - [info] Getting new master's binlog name and position..
Wed Dec 22 17:35:01 2021 - [info] mysql_bin.000001:1855
Wed Dec 22 17:35:01 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.52.102', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Wed Dec 22 17:35:01 2021 - [info]
Wed Dec 22 17:35:01 2021 - [info] * Switching slaves in parallel..
Wed Dec 22 17:35:01 2021 - [info]
Wed Dec 22 17:35:01 2021 - [info] -- Slave switch on host 192.168.52.103(192.168.52.103:3306) started, pid: 51685
Wed Dec 22 17:35:01 2021 - [info]
Wed Dec 22 17:35:03 2021 - [info] Log messages from 192.168.52.103 ...
Wed Dec 22 17:35:03 2021 - [info]
Wed Dec 22 17:35:01 2021 - [info] Waiting to execute all relay logs on 192.168.52.103(192.168.52.103:3306)..
Wed Dec 22 17:35:01 2021 - [info] master_pos_wait(mysql_bin.000003:1395) completed on 192.168.52.103(192.168.52.103:3306). Executed 0 events.
Wed Dec 22 17:35:01 2021 - [info] done.
Wed Dec 22 17:35:01 2021 - [info] Resetting slave 192.168.52.103(192.168.52.103:3306) and starting replication from the new master 192.168.52.102(192.168.52.102:3306)..
Wed Dec 22 17:35:01 2021 - [info] Executed CHANGE MASTER.
Wed Dec 22 17:35:02 2021 - [info] Slave started.
Wed Dec 22 17:35:03 2021 - [info] End of log messages from 192.168.52.103 ...
Wed Dec 22 17:35:03 2021 - [info]
Wed Dec 22 17:35:03 2021 - [info] -- Slave switch on host 192.168.52.103(192.168.52.103:3306) succeeded.
Wed Dec 22 17:35:03 2021 - [info] Unlocking all tables on the orig master:
Wed Dec 22 17:35:03 2021 - [info] Executing UNLOCK TABLES..
Wed Dec 22 17:35:03 2021 - [info] ok.
Wed Dec 22 17:35:03 2021 - [info] Starting orig master as a new slave..
Wed Dec 22 17:35:03 2021 - [info] Resetting slave 192.168.52.101(192.168.52.101:3306) and starting replication from the new master 192.168.52.102(192.168.52.102:3306)..
Wed Dec 22 17:35:03 2021 - [info] Executed CHANGE MASTER.
Wed Dec 22 17:35:04 2021 - [info] Slave started.
Wed Dec 22 17:35:04 2021 - [info] All new slave servers switched successfully.
Wed Dec 22 17:35:04 2021 - [info]
Wed Dec 22 17:35:04 2021 - [info] * Phase 5: New master cleanup phase..
Wed Dec 22 17:35:04 2021 - [info]
Wed Dec 22 17:35:04 2021 - [info] 192.168.52.102: Resetting slave info succeeded.
Wed Dec 22 17:35:04 2021 - [info] Switching master to 192.168.52.102(192.168.52.102:3306) completed successfully.
在切换回到node01 节点 ,此时node01 节点为主节点
8、以上检测都通过后,就可以启动MHA服务了。启动命令如下:
[root@node03 ]# nohup masterha_manager --conf=/etc/mha/mysql_mha.cnf &
启动完成后,可以使用ps命令查看masterha_manager进程是否存在,如下存在则代表启动成功:
[root@node03]# ps aux |grep masterha_manager
root 57933 2.2 1.1 228260 21976 pts/0 S 09:14 0:00 perl /usr/bin/masterha_manager --conf=/etc/mha/mysql_mha.cnf
root 57986 0.0 0.0 112660 976 pts/0 S+ 09:14 0:00 grep --color=auto masterha_manager
9、最后我们需要到master节点上,手动去配置虚拟IP。因为MHA只会在主从切换时漂移虚拟IP到新的Master节点,而不会在第一次启动时主动去设置Master的虚拟IP,所以我们需要手动设置。设置虚拟IP的命令如下:
[root@node01]# ifconfig ens33:1 192.168.52.109/24
在这里一定要注意自己的网卡 ens33
设置成功后,使用ip addr命令可以看到网卡上绑定的虚拟IP:
[root@node01 .ssh]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:13:92:1c brd ff:ff:ff:ff:ff:ff
inet 192.168.52.130/24 brd 192.168.52.255 scope global dynamic ens33
valid_lft 1496sec preferred_lft 1496sec
inet 192.168.52.101/24 brd 192.168.52.255 scope global secondary ens33
valid_lft forever preferred_lft forever
inet 192.168.52.109/24 brd 192.168.52.255 scope global secondary ens33:1
valid_lft forever preferred_lft forever
inet6 fe80::303f:752e:b9c:6547/64 scope link
valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
link/ether 52:54:00:27:ec:b8 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 1000
link/ether 52:54:00:27:ec:b8 brd ff:ff:ff:ff:ff:ff
4.测试MHA服务
到此为止,我们就已经完成了MHA高可用架构的搭建,接下来我们对其进行一些简单的测试。例如,测试下是否能正常ping通虚拟IP,毕竟应用端访问数据库时连接的是虚拟IP,所以首先得确保虚拟IP是能够被访问的。如下:
正在 Ping 192.168.52.109 具有 32 字节的数据:
来自 192.168.52.119 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.52.119 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.52.119 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.52.119 的回复: 字节=32 时间<1ms TTL=64
能ping通之后,使用Navicat等远程连接工具测试下能否正常通过虚拟IP连接上数据库:
确定了虚拟IP能正常访问后,接着测试MHA是否能够正常进行主从切换,首先将master节点上的MySQL服务给停掉,模拟Master宕机:
systemctl stop mysqld.service
正常情况下,此时master节点上的网卡就不会再绑定该虚拟IP:
[root@node01 .ssh]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:13:92:1c brd ff:ff:ff:ff:ff:ff
inet 192.168.52.130/24 brd 192.168.52.255 scope global dynamic ens33
valid_lft 1023sec preferred_lft 1023sec
inet 192.168.52.101/24 brd 192.168.52.255 scope global secondary ens33
valid_lft forever preferred_lft forever
inet6 fe80::303f:752e:b9c:6547/64 scope link
valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
link/ether 52:54:00:27:ec:b8 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 1000
link/ether 52:54:00:27:ec:b8 brd ff:ff:ff:ff:ff:ff
而是会被MHA漂移到slave-01节点的网卡上,因为此时该Slave就是新的Master:
[root@node02 .ssh]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:59:70:65 brd ff:ff:ff:ff:ff:ff
inet 192.168.52.128/24 brd 192.168.52.255 scope global dynamic ens33
valid_lft 1279sec preferred_lft 1279sec
inet 192.168.52.102/24 brd 192.168.52.255 scope global secondary ens33
valid_lft forever preferred_lft forever
inet 192.168.52.109/24 brd 192.168.52.255 scope global secondary ens33:1
valid_lft forever preferred_lft forever
inet6 fe80::5e15:e497:8b2b:2b13/64 scope link
valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
link/ether 52:54:00:d0:12:07 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 1000
link/ether 52:54:00:d0:12:07 brd ff:ff:ff:ff:ff:ff
接着进入slave-02节点上的MySQL命令行终端,确认下该Slave是否已经正常与新的Master进行同步。之前我们配置slave-02的主库是master,现在将master停掉后,可以看到slave-02的Master_Host已经被MHA切换成了slave-01的IP:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.52.102
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql_bin.000001
Read_Master_Log_Pos: 2014
Relay_Log_File: relay_bin.000003
Relay_Log_Pos: 454
Relay_Master_Log_File: mysql_bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
四、恢复
1.查看节点状态
故障转移完成后,manager 将会自动停止,因为配置中含有故障节点,是无法启动的,需要修好故障节点,或者将故障节点从配置文件中删除,服务才可以启动,此时使用 masterha_check_status 会遇到如下错误:
masterha_check_status --conf=/etc/mha/mysql_mha.cnf
[root@node03 mha]# masterha_check_status --conf=/etc/mha/mysql_mha.cnf
mysql_mha (pid:70300) is running(0:PING_OK), master:192.168.52.101
2.恢复 node01 节点 切换为主节点
node01 节点需要重新配置主从
stop slave; -- 停止主从同步
change master to master_host='192.168.52.102', master_port=3306, master_user='repl', master_password='Qwe@1234', master_auto_position=1;
start slave; -- 启动主从同步
配置完主从复制链路后,使用show slave status\G;语句查看主从同步状态,然后手动切换到node01 节点,node01节点至此为主节点
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.52.101
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql_bin.000004
Read_Master_Log_Pos: 393
Relay_Log_File: relay_bin.000002
Relay_Log_Pos: 414
Relay_Master_Log_File: mysql_bin.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
3.重新启动 mha 服务
nohup masterha_manager --conf=/etc/mha/mysql_mha.cnf > /etc/mhamanager/mha.log 2>&1 &
[root@node03 mha]# nohup masterha_manager --conf=/etc/mha/mysql_mha.cnf > /etc/mhamanager/mha.log 2>&1 &
[1] 68674
[root@node03 mha]#
[root@node03 mha]#
[root@node03 mha]# masterha_check_status --conf=/etc/mha/mysql_mha.cnf
mysql_mha master maybe down(20:PING_FAILING). master:192.168.52.101
Check /home/mysql_mha/manager.log for details.
[root@node03 mha]# ll
total 116
-rw-r--r--. 1 root root 81024 Dec 22 15:53 mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
-rw-r--r--. 1 root root 36328 Dec 22 15:53 mha4mysql-node-0.58-0.el7.centos.noarch.rpm
[root@node03 mha]#
[root@node03 mha]# cat /etc/mhamanager/mha.log
nohup: ignoring input
Thu Dec 23 13:38:20 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Dec 23 13:38:20 2021 - [info] Reading application default configuration from /etc/mha/mysql_mha.cnf..
Thu Dec 23 13:38:20 2021 - [info] Reading server configuration from /etc/mha/mysql_mha.cnf..
[root@node03 mha]#
五、问题
1.故障转移
第二次模拟主节点宕机之后。查看mha 日志,提示并没有进行故障转移
[root@node03 mha]# cat nohup.out
Thu Dec 23 14:02:00 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Dec 23 14:02:00 2021 - [info] Reading application default configuration from /etc/mha/mysql_mha.cnf..
Thu Dec 23 14:02:00 2021 - [info] Reading server configuration from /etc/mha/mysql_mha.cnf..
Thu Dec 23 14:04:20 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Dec 23 14:04:20 2021 - [info] Reading application default configuration from /etc/mha/mysql_mha.cnf..
Thu Dec 23 14:04:20 2021 - [info] Reading server configuration from /etc/mha/mysql_mha.cnf..
查看进程发现已经没有
[root@node03 mha]# ps aux |grep masterha_manager
root 70702 0.0 0.0 112660 976 pts/0 S+ 14:11 0:00 grep --color=auto masterha_manager
查看远程连接数据库连接失败
2.原因
mha自动故障转移脚本默认设置 – 如果上次故障转移是在8小时内完成,则不进行故障转移
3.解决办法
手动切换
masterha_master_switch --master_state=alive --conf=/etc/mha/mysql_mha.cnf --new_master_host=192.168.52.101 --orig_master_is_new_slave
命令总结
masterha_check_ssh | 检查 mha 的 ssh 配置状况 |
---|---|
masterha_check_repl | 检查mysql 的复制链路是否正常 |
masterha_manager | 启动脚本 |
masterha_check_status | 检查当前mha 状态 |
masterha_master_monitor | 检查 mha 的 ssh 配置状况 |
masterha_check_repl | 检查master 是否宕机 |
masterha_master_switch | 控制故障转移 |
masterha_stop | 关闭manager |
masterha_conf_host | 添加或者删除配置的server 信息 |