MHA:Master High Availability。对主节点进行监控,可实现自动故障转移至其它从节点;通过提升某一从节点为新的主节点,基于主从复制实现,还需要客户端配合实现,目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,出于机器成本的考虑,淘宝进行了改造,目前淘宝TMHA已经支持一主一从。
1. MHA利用 SELECT 1 As Value 指令判断master服务器的健康性,一旦master 宕机,MHA 从宕机崩溃的master保存二进制日志事件(binlog events)
2. 识别含有最新更新的slave
3. 应用差异的中继日志(relay log)到其他的slave
4. 应用从master保存的二进制日志事件(binlog events)
5. 提升一个slave为新的master
6. 使其他的slave连接新的master进行复制
MHA软件有两个组成部:
Manager工具包
Node工具包
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 故障转移(自动或手动)
masterha_conf_host 添加或删除配置的server信息
masterha_stop --conf=app1.cnf 停止MHA
masterha_secondary_check 两个或多个网络线路检查MySQL主服务器的可用
Node工具包:这些工具通常由MHA Manager的脚本触发,无需人为操作,主要包括以下几个工具:
save_binary_logs #保存和复制master的二进制日志
apply_diff_relay_logs #识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog #去除不必要的ROLLBACK事件(MHA已不再使用此工具)
purge_relay_logs #清除中继日志(不会阻塞SQL线程)
MHA配置文件:
global配置,为各application提供默认配置,默认文件路径 /etc/masterha_default.cnf
application配置:为每个主从复制集群
MHA的实现
准备环境:
Node1是主服务器192.168.114.10
Node2是从1服务器192.168.114.20
Node3是从2服务器192.168.114.30
Node4是MHA服务器192.168.114.40
一主二从!
systemctl stop firewalld
setenforce 0
实验是建立在mysql安装完成的基础上。这里不再做mysql的安装。只做主从复制。
基于key验证,ssh免密登录。7-4自己和自己生成密钥,将密钥传给自己,将.ssh文件夹下的内容同步到远端服务器。
[root@Node4 ~]#:ssh-keygen
三个回车
......
#密钥传给自己
[root@Node4 ~]#:ssh-copy-id 127.0.0.1
#同步到远端服务器
[root@Node4 ~]#:rsync -a .ssh 192.168.114.10:/root/
输入:yes
password:
[root@Node4 ~]#:rsync -a .ssh 192.168.114.20:/root/
输入:yes
password:
[root@Node4 ~]#:rsync -a .ssh 192.168.114.30:/root/
输入:yes
password:
可以免密登录:
安装MHA管理和客户端工具:需要epel源
MHA服务器都安装,而主和从三台只需要安装客户端即可:
准备两个包,manager和node。放在MHA服务器的/opt/下
[root@Node4 opt]#:ls
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm mha4mysql-node-0.58-0.el7.centos.noarch.rpm
把node客户端工具远程拷贝到一主两从服务器上。
四台服务器都要安装客户端工具:mha4mysql-node-0.58-0.el7.centos.noarch.rpm
[root@Node1 opt]#:ls
mha4mysql-node-0.58-0.el7.centos.noarch.rpm
Node1安装:
[root@Node1 opt]#:yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm
Node2安装:
[root@Node2 opt]#:yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm
Node3安装:
[root@Node3 opt]#:yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm
Node4安装:
[root@Node4 opt]#:yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm
......
已安装:
mha4mysql-node.noarch 0:0.58-0.el7.centos
作为依赖被安装:
perl-Compress-Raw-Bzip2.x86_64 0:2.061-3.el7 perl-Compress-Raw-Zlib.x86_64 1:2.061-4.el7
perl-DBD-MySQL.x86_64 0:4.023-6.el7 perl-DBI.x86_64 0:1.627-4.el7
perl-IO-Compress.noarch 0:2.061-2.el7 perl-Net-Daemon.noarch 0:0.48-5.el7
perl-PlRPC.noarch 0:0.2020-14.el7
完毕!
#只有Node4作为MHA服务器,才安装管理端
[root@Node4 opt]#:yum install -y mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
......
完毕!
MHA服务器,建立MHA文件夹和配置文件:
[root@Node4 opt]#:mkdir /etc/mastermha
[root@Node4 opt]#:vim /etc/mastermha/app1.cnf
[server default]
user=mhauser
password=Admin@123
manager_workdir=/data/mastermha/app1/
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=test
repl_password=Admin@123
ping_interval=1
master_ip_failover_script=/usr/local/bin/master_ip_failover
check_repl_delay=0
master_binlog_dir=/data/mysql/
[server1]
hostname=192.168.114.10
candidate_master=1
[server2]
hostname=192.168.114.20
candidate_master=1
[server3]
hostname=192.168.114.30
user:mhauser用于主服务器与MHA连接的用户,
manager_workdir用于工作目录
log日志文件
remote_workdir远程登录工作目录
ssh_user用于远程登录的主机用户
repl_user用于mysql用户,使用test用户连接主从,并授权
server1指定了主服务器
server2如果主宕机了,server2里的主机接替主
准备切换脚本:
[root@Node4 opt]#:vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.114.188/24';
my $gateway = '192.168.114.2';
my $interface = 'ens33';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1";
my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
注意9,10行的IP修改!
加上执行权限!
[root@Node4 opt]#:chmod +x /usr/local/bin/master_ip_failover
主从复制:
Node1主。Node2从1。Nod33从2
在mysql安装完成的基础上,前面有讲解安装步骤:这里直接从主从复制开始:
Node1:主
[root@Node1 opt]#:vim /etc/my.cnf
......
#最后添加四行
server_id=10 #唯一id
log-bin=/data/mysql/mysql-bin #二进制
skip_name_resolve=1 #禁用反向解析
general_log #通用日志
[root@Node1 opt]#:mkdir /data/mysql -p
[root@Node1 opt]#:chown -R mysql.mysql /data
[root@Node1 opt]#:systemctl restart mysqld #重启
#查看状态
[root@Node1 opt]#:systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since 四 2024-07-11 15:44:13 CST; 47s ago
Docs: man:mysqld(8)
......
Node2:从1
[root@Node2 opt]#:vim /etc/my.cnf
......
server_id=20
log-bin=/data/mysql/mysql-bin
read_only
relay_log_purge=0
skip_name_resolve=1
general_log
[root@Node2 opt]#:mkdir /data/mysql -p
[root@Node2 opt]#:chown -R mysql.mysql /data
[root@Node2 opt]#:systemctl restart mysqld
[root@Node2 opt]#:systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since 四 2024-07-11 15:46:49 CST; 6s ago
Docs: man:mysqld(8)
......
Node3:从2
[root@Node3 opt]#:vim /etc/my.cnf
......
server_id=30
log-bin=/data/mysql/mysql-bin
read_only
relay_log_purge=0
skip_name_resolve=1
general_log
[root@Node3 opt]#:mkdir /data/mysql -p
[root@Node3 opt]#:chown -R mysql.mysql /data
[root@Node3 opt]#:systemctl restart mysqld
[root@Node3 opt]#:systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since 四 2024-07-11 15:49:09 CST; 4s ago
Docs: man:mysqld(8)
......
到这里,数据库的主从都开启了二进制。去主服务器查看节点,并创建test用户并授权
[root@Node1 opt]#:mysql -uroot -pabc123
......
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 154 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
mysql> grant replication slave on *.* to test@'192.168.114.%' identified by 'Admin@123';
Query OK, 0 rows affected, 1 warning (0.00 sec)
#建立复制test用户,然后去从连接
从1和从2:
#Node2
[root@Node2 opt]#:mysql -uroot -pabc123
......
mysql> CHANGE MASTER TO
-> MASTER_HOST='192.168.114.10',
-> MASTER_USER='test',
-> MASTER_PASSWORD='Admin@123',
-> MASTER_PORT=3306,
-> MASTER_LOG_FILE='mysql-bin.000001',
-> MASTER_LOG_POS=154;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
#开启
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
#状态
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.114.10
Master_User: test
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 449
Relay_Log_File: Node2-relay-bin.000002
Relay_Log_Pos: 615
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
#已经起来了!
#Node3
[root@Node3 opt]#:mysql -uroot -pabc123
......
mysql> CHANGE MASTER TO
-> MASTER_HOST='192.168.114.10',
-> MASTER_USER='test',
-> MASTER_PASSWORD='Admin@123',
-> MASTER_PORT=3306,
-> MASTER_LOG_FILE='mysql-bin.000001',
-> MASTER_LOG_POS=154;
Query OK, 0 rows affected, 2 warnings (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.114.10
Master_User: test
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 449
Relay_Log_File: Node3-relay-bin.000002
Relay_Log_Pos: 615
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
#I/O线程和SQL线程,两个yes
主从复制完成了,我们在主上创建了一个test用户,也能在两个从上看到test用户:
Node2:从1
Node3:从2
这时候主从复制做好了,需要在主上建立一个mhauser用户,用于MHA服务器的连接。
[root@Node1 opt]#:mysql -uroot -pabc123
......
mysql> grant all on *.* to mhauser@'192.168.%.%' identified by 'Admin@123';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges; #刷新一下
Query OK, 0 rows affected (0.00 sec)
mysql> select user,host from mysql.user;
+---------------+---------------+
| user | host |
+---------------+---------------+
| mhauser | 192.168.%.% |
| test | 192.168.114.% |
| mysql.session | localhost |
| mysql.sys | localhost |
| root | localhost |
+---------------+---------------+
5 rows in set (0.00 sec)
主服务器设置虚拟地址:
[root@Node1 ~]#:ifconfig ens33:1 192.168.114.188/24
MHA服务器上要检查环境是否符合:
1.检查ssh免密登录是否成功:都是ok的。看得到successfully。
[root@Node4 opt]#:masterha_check_ssh --conf=/etc/mastermha/app1.cnf
......
Thu Jul 11 16:11:20 2024 - [debug] Connecting via SSH from root@192.168.114.30(192.168.114.30:22) to root@192.168.114.20(192.168.114.20:22)..
Thu Jul 11 16:11:21 2024 - [debug] ok.
Thu Jul 11 16:11:21 2024 - [debug]
Thu Jul 11 16:11:19 2024 - [debug] Connecting via SSH from root@192.168.114.20(192.168.114.20:22) to root@192.168.114.10(192.168.114.10:22)..
Thu Jul 11 16:11:20 2024 - [debug] ok.
Thu Jul 11 16:11:20 2024 - [debug] Connecting via SSH from root@192.168.114.20(192.168.114.20:22) to root@192.168.114.30(192.168.114.30:22)..
Thu Jul 11 16:11:20 2024 - [debug] ok.
Thu Jul 11 16:11:21 2024 - [info] All SSH connection tests passed successfully.
2.检查主从复制是否正常:
[root@Node4 opt]#:masterha_check_repl --conf=/etc/mastermha/app1.cnf
......
Checking the Status of the script.. OK
Thu Jul 11 16:14:22 2024 - [info] OK.
Thu Jul 11 16:14:22 2024 - [warning] shutdown_script is not defined.
Thu Jul 11 16:14:22 2024 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
看到MySQL Replication Health is OK.
查看当前状态:没有运行会显示:NOT_RUNNING
开启MHA:默认是前台运行,生产环境一般为后台执行
#后台:
nohup masterha_manager --conf=/etc/mastermha/app1.cnf &> /dev/null
#前台:我们是测试环境。前台运行这个。可以再开一个终端操作
masterha_manager --conf=/etc/mastermha/app1.cnf
开启前台运行会影响当前终端的操作,再开一个新的终端,执行前台运行:这个终端是新开的终端
会卡在这里,如果当主宕机了,会跳。在终端查看状态:
is running。master:是Node1
可以查看MHA服务的日志,同时查看主服务器的日志:这时末尾显示等待主没有回应的时候,上面的前台运行就不会卡,才会更新
MHA服务的日志
当MySQL主服务器没有响应时,服务日志就会跳。
Node1主上查看:日志文件与主机名同名
每秒发一次,通过发送SELECT 1 As Value指令把1设置成value给主,主无法执行就认为他死了
模拟故障:把主mysql宕机掉,看主是否会跑到Node2上。并且虚拟IP也会在Node2上:
[root@Node1 opt]#:systemctl stop mysqld
主死了,MHA服务日志中提示如下:转到了Node2上!
Node2的slave也被清空了;
Node3上查看主是否为Node2:
发现Master_Host: 192.168.114.20转为了Node2。Node2成为了主服务器。
虚拟IP:由新主Node2继承!
实现了高可用!
如果Node1再启,主也就回不来了,也起不来了。一次性的。
必须把/data/mastermha/app1/app1.failover.complete删除,是一个空文件。
然后修改配置文件
[root@Node4 ~]#:vim /etc/mastermha/app1.cnf
把server1作为192.168.114.20
server2作为192.168.114.10
这里我没有去深入研究,有兴趣可以研究一下!
---end---