一 功能描述
MHA (master high avaliabulity) 是基于主库的高可用环境下,可以实现主从复制及故障切换。MHA要求一主两从,半同步复制模式。解决单点故障问题,当主库crash,MHA服务可以在0-30秒内自动完成故障切换,实现业务的可靠性保障。
二 简要原理
① MHA使用的是半同步复制方式,只要有一个从节点写入数据,就会自动提交给客户端;
② 如果 master crash,slave会识别最新更新的日志,差异部分同步到slave;并提升一个新的slave作为master;其他的slave继续和新的master同步。
三 MHA环境资源
角色 | 主机名 | IP地址 |
master | mha15 | 192.168.10.15 |
slave1 | mha16 | 192.168.10.16 |
slave2 | mha17 | 192.168.10.17 |
MHA manager | mhamaster18 | 192.168.10.18 |
四 部署步骤
4.1 在Master,Slave1,Slave2上面部署MySQL8.0
配置成一主两从模式(略),可参考其他MySQL集群配置文档
在所有库上创建用户
#从库数据同步用户
create user 'myslave'@'%' identified by '123456';
grant replication slave,replication client on *.* to 'myslave'@'%';
#manager使用用户
create user 'mha'@'%' identified by '123456';
grant all on *.* to 'mha'@'%';
4.2 在Master查看binlog的点位
mysql> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000001 | 1837 | | | 7cb7532a-052d-11ef-9f84-000c2962bd90:1-7 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.01 sec)
4.3 在slave节点上执行同步命令
change master to master_host='192.168.10.15',master_user='myslave',master_port=3306,master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=1837;
start slave;
4.4 在slave节点上查看同步结果
mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.05 sec)
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 192.168.10.15
Master_User: myslave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 1837
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 326
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1837
Relay_Log_Space: 530
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 203148915
Master_UUID: 7cb7532a-052d-11ef-9f84-000c2962bd90
Master_Info_File: /app/data/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 6f4c3822-052d-11ef-b559-000c29926dfd:1-7
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
ERROR:
No query specified
4.5 将slave节点设置为只读模式
set global read_only=1;
4.6 数据同步测试
#在主库上执行
create database test;
use test;
create table test123(id int);
insert into test123 values(1);
#在从库上验证同步结果
select * from test.test123;
4.7 安装MHA软件
① 在所有4台服务器上安装MHA依赖的环境,首先安装epel源
yum install epel-release --nogpgcheck -y
yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN
② 在所有4台服务器上安装MHA node组件
下载
https://github.com/yoshinorim/mha4mysql-node
https://github.com/yoshinorim/mha4mysql-manager
mv mha4mysql-* /app
cd /app
tar -zxf mha4mysql-node-0.58.tar.gz
cd mha4mysql-node-0.58
perl Makefile.PL
make && make install
③ 在所有4台服务器上安装MHA manager组件
cd /app
tar -zxf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
perl Makefile.PL
make && make install
安装完成后的文件清单:
[root@mha15 soft]# cd /usr/local/bin/
[root@mha15 bin]# ll
total 88
-r-xr-xr-x. 1 root root 17639 Apr 28 04:10 apply_diff_relay_logs
-r-xr-xr-x. 1 root root 4807 Apr 28 04:10 filter_mysqlbinlog
-r-xr-xr-x. 1 root root 1995 Apr 28 04:14 masterha_check_repl
-r-xr-xr-x. 1 root root 1779 Apr 28 04:14 masterha_check_ssh
-r-xr-xr-x. 1 root root 1865 Apr 28 04:14 masterha_check_status
-r-xr-xr-x. 1 root root 3201 Apr 28 04:14 masterha_conf_host
-r-xr-xr-x. 1 root root 2517 Apr 28 04:14 masterha_manager
-r-xr-xr-x. 1 root root 2165 Apr 28 04:14 masterha_master_monitor
-r-xr-xr-x. 1 root root 2373 Apr 28 04:14 masterha_master_switch
-r-xr-xr-x. 1 root root 5172 Apr 28 04:14 masterha_secondary_check
-r-xr-xr-x. 1 root root 1739 Apr 28 04:14 masterha_stop
-r-xr-xr-x. 1 root root 8337 Apr 28 04:10 purge_relay_logs
-r-xr-xr-x. 1 root root 7525 Apr 28 04:10 save_binary_logs
4.8 在所有服务器上配置无密码认证
① 在manager 节点上配置到所有数据库节点的无密码认证
ssh-keygen -t rsa #一路按回车键
ssh-copy-id 192.168.10.15
ssh-copy-id 192.168.10.16
ssh-copy-id 192.168.10.17
② 在master 节点上配置到从数据库节点的无密码认证
ssh-keygen -t rsa #一路按回车键
ssh-copy-id 192.168.10.16
ssh-copy-id 192.168.10.17
③ 在slave1 节点上配置到maser 和 slave2 的无密码认证
ssh-keygen -t rsa #一路按回车键
ssh-copy-id 192.168.10.15
ssh-copy-id 192.168.10.17
④ 在slave2 节点上配置到maser 和 slave1 的无密码认证
ssh-keygen -t rsa #一路按回车键
ssh-copy-id 192.168.10.15
ssh-copy-id 192.168.10.16
4.9 在manager节点上配置MHA
① 在manager节点上复制脚本到 /usr/local/bin 目录下
cp -rp /app/mha4mysql-manager-0.58/samples/scripts/ /usr/local/bin
② 在manager 节点复制自动切换时VIP管理的脚本到/usr/local/bin目录
cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin/
cp /usr/local/bin/scripts/master_ip_online_change /usr/local/bin/
③ 修改内容如下:(删除原有内容,直接复制并修改VIP相关参数)
vim /usr/local/bin/master_ip_failover
# 修改手动vip切换脚本
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
);
##############该部分按实际修改
my $vip = '192.168.10.100/24';
my $ifdev = 'ens33';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig $ifdev:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig $ifdev:$key down";
##############
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
# updating global catalog, etc
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
# If you want to continue failover, exit 10.
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
④ 创建MHA软件目录并拷贝配置文件,这里使用app1.cnf配置文件来管理MySQL节点服务器
mkdir /etc/masterha
cp /app/mha4mysql-manager-0.58/samples/conf/app1.cnf /etc/masterha
#删除原有内容,直接复制并修改节点服务器的IP地址
vim /etc/masterha/app1.cnf
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/app/data/data
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
ssh_user=root
user=mha
password=123456
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=myslave
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.10.16 -s 192.168.10.17
#从对主监听
shutdown_script=""
[server1]
hostname=192.168.10.15
#主服务器
port=3306
[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.10.16
#备用主服务器
port=3306
[server3]
hostname=192.168.10.17
#从服务器2
port=3306
4.10 第一次需手动在master节点上开启虚拟IP
安装ifconfig工具
yum -y install net-tools
手动增加虚拟IP
/sbin/ifconfig ens33:1 192.168.10.100/24
4.11 在manager节点上测试SSH无密码认证
masterha_check_ssh -conf=/etc/masterha/app1.cnf
4.12 在manager节点上测试MySQL主从连接情况
[root@mhamaster masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sun Apr 28 21:48:24 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Apr 28 21:48:24 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Apr 28 21:48:24 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Apr 28 21:48:24 2024 - [info] MHA::MasterMonitor version 0.58.
Sun Apr 28 21:48:25 2024 - [info] GTID failover mode = 1
Sun Apr 28 21:48:25 2024 - [info] Dead Servers:
Sun Apr 28 21:48:25 2024 - [info] Alive Servers:
Sun Apr 28 21:48:25 2024 - [info] 192.168.10.15(192.168.10.15:3306)
Sun Apr 28 21:48:25 2024 - [info] 192.168.10.16(192.168.10.16:3306)
Sun Apr 28 21:48:25 2024 - [info] 192.168.10.17(192.168.10.17:3306)
Sun Apr 28 21:48:25 2024 - [info] Alive Slaves:
Sun Apr 28 21:48:25 2024 - [info] 192.168.10.16(192.168.10.16:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Sun Apr 28 21:48:25 2024 - [info] GTID ON
Sun Apr 28 21:48:25 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Sun Apr 28 21:48:25 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Apr 28 21:48:25 2024 - [info] 192.168.10.17(192.168.10.17:3306) Version=8.0.34 (oldest major version between slaves) log-bin:enabled
Sun Apr 28 21:48:25 2024 - [info] GTID ON
Sun Apr 28 21:48:25 2024 - [info] Replicating from 192.168.10.15(192.168.10.15:3306)
Sun Apr 28 21:48:25 2024 - [info] Current Alive Master: 192.168.10.15(192.168.10.15:3306)
Sun Apr 28 21:48:25 2024 - [info] Checking slave configurations..
Sun Apr 28 21:48:25 2024 - [info] Checking replication filtering settings..
Sun Apr 28 21:48:25 2024 - [info] binlog_do_db= , binlog_ignore_db=
Sun Apr 28 21:48:25 2024 - [info] Replication filtering check ok.
Sun Apr 28 21:48:25 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sun Apr 28 21:48:25 2024 - [info] Checking SSH publickey authentication settings on the current master..
Sun Apr 28 21:48:26 2024 - [info] HealthCheck: SSH to 192.168.10.15 is reachable.
Sun Apr 28 21:48:26 2024 - [info]
192.168.10.15(192.168.10.15:3306) (current master)
+--192.168.10.16(192.168.10.16:3306)
+--192.168.10.17(192.168.10.17:3306)
Sun Apr 28 21:48:26 2024 - [info] Checking replication health on 192.168.10.16..
Sun Apr 28 21:48:26 2024 - [info] ok.
Sun Apr 28 21:48:26 2024 - [info] Checking replication health on 192.168.10.17..
Sun Apr 28 21:48:26 2024 - [info] ok.
Sun Apr 28 21:48:26 2024 - [info] Checking master_ip_failover_script status:
Sun Apr 28 21:48:26 2024 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.10.15 --orig_master_ip=192.168.10.15 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.10.100===
Checking the Status of the script.. OK
Sun Apr 28 21:48:26 2024 - [info] OK.
Sun Apr 28 21:48:26 2024 - [warning] shutdown_script is not defined.
Sun Apr 28 21:48:26 2024 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@mhamaster masterha]#
4.13 在manager节点启动MHA
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[root@mhamaster masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[1] 13812
[root@mhamaster masterha]#
4.14 在manager节点上查看MHA状态
[root@mhamaster masterha]# masterha_check_status -conf=/etc/masterha/app1.cnf
app1 (pid:13812) is running(0:PING_OK), master:192.168.10.15
[root@mhamaster masterha]#
4.15 在manager节点上查看MHA日志
cat /var/log/masterha/app1/manager.log | grep "current master"
[root@mhamaster masterha]# cat /var/log/masterha/app1/manager.log | grep "current master"
Sun Apr 28 23:03:08 2024 - [info] Checking SSH publickey authentication settings on the current master..
192.168.10.15(192.168.10.15:3306) (current master)
[root@mhamaster masterha]#
4.16 在master节点上查看VIP地址
[root@mha15 data]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.10.15 netmask 255.255.255.0 broadcast 192.168.10.255
inet6 fe80::6b4c:646e:e677:d4ac prefixlen 64 scopeid 0x20<link>
inet6 fe80::32ad:80bf:c2e0:633f prefixlen 64 scopeid 0x20<link>
inet6 fe80::b6cb:27dd:5b53:f57f prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:62:bd:90 txqueuelen 1000 (Ethernet)
RX packets 740602 bytes 1019317019 (972.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 75618 bytes 8632318 (8.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.10.100 netmask 255.255.255.0 broadcast 192.168.10.255
ether 00:0c:29:62:bd:90 txqueuelen 1000 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 737 bytes 73994 (72.2 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 737 bytes 73994 (72.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
4.17 在manager节点上关闭MHA服务
masterha_stop --conf=/etc/masterha/app1.cnf
或者可以直接采用 kill 进程 ID 的方式关闭
后续会出MHA故障模拟及恢复的文章,欢迎关注及转载!