目录
1.部署规划
角色 | IP | 主机名 | server-id | 功能 |
MHA-manager | 192.168.109.179 | mha-manager | 无 | 管理节点 |
MHA-node(master) | 192.168.109.177 | mha-maste | 1 | 数据节点(写) |
MHA-node(slave1) | 192.168.109.176 | mha-slave1 | 2 | 数据节点(读) |
MHA-node(slave2) | 192.168.109.178 | mha-slave2 | 3 | 数据节点(读) |
2.系统软件和版本
系统版本 | mysql版本 | MHA版本 |
CentOS 7 | MySQL-5.7.34 |
3.系统环境的部署(4台机器同步操作)
step1:绑定好静态ip
#本地ip解析
localIP=$(ip add|egrep '192\.[0-9]+\.[0-9]+\.[0-9]'|awk -F '[/ ]+' '{print $3}')
localGATEWAY=$(echo $localIP|awk -F '.' 'OFS="." {print $1,$2,$3}')
#配置静态ip
cat >/etc/sysconfig/network-scripts/ifcfg-ens33<<EOF
BOOTPROTO="none"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=$localIP
NETMASK=255.255.255.0
GATEWAY=$localGATEWAY.2
DNS1=114.114.114.114
EOF
step2:在/etc/hosts文件中写入4台机器的IP地址与主机名称
192.168.109.176 mha-slave1
192.168.109.177 mha-master
192.168.109.178 mha-slave2
192.168.109.179 mha-manager
step3:关闭防火墙与selinux
#关闭firewalld防火墙服务,并且设置开机不要启动
service firewalld stop
systemctl disable firewalld
#临时关闭selinux
setenforce 0
#永久关闭selinux
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config
step4:时间同步
yum -y install chrony
systemctl enable chronyd
systemctl start chronyd
step5:ssh免密通道
在4台机器上:
ssh-keygen -t rsa
#之后一直回车
在每台机器上给其他三台机器发送ssh密钥以建立免密通道:
#以mha-manager机器上举例
ssh-copy-id -i ~/.ssh/id_rsa.pub -p 22 root@mha-master
#之后输入yes
#然后输入root密码,后续同理
ssh-copy-id -i ~/.ssh/id_rsa.pub -p 22 root@mha-slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub -p 22 root@mha-slave
4.部署MySQL一主两从复制(GTID)环境
参考
基于GITD的拥有备库和延迟备库的MySQL双主热备高可用集群_wowchx的博客-CSDN博客
或者随便找个简单的GTID主从复制文章或视频
在mha-master上创建一个mha账号,方便后期mha监控主从同步状态
create user 'mha'@'192.168.109.179' identified by '123';
grant all privileges on *.* to 'mha'@'192.168.109.179';
flush privileges;
5.MHA安装与配置
step1:下载mysql MHA CentOS7相关依赖包
yum install -y perl-DBD-mysql perl-Config-Tiny perl-Time-HiRes perl-Mail-Sender perl-Mail-Sendmail perl-MIME-Base32 perl-MIME-Charset perl-MIME-EncWords perl-Params-Classify perl-Params-Validate.x86_64 perl-Log-Dispatch perl-Parallel-ForkManager perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN net-tools
step2:在不同节点安装mha-node
说明:在所有节点安装mha-node软件包,在mha-manager再安装mha-manager软件包。我的是tar压缩包,自行在网上寻找下载。
#在每台机器上执行
tar zxvf mha4mysql-node-0.57.tar.gz
cd mha4mysql-node-0.57
perl Makefile.PL
make && make install
step3:在mha-manager上安装mha-manager
tar zxvf mha4mysql-manager-0.57.tar.gz
cd mha4mysql-manager-0.57
perl Makefile.PL
make && make install
step4:配置mha-manager的ssh免密登录
先在每台机器上创建一个账户用于mha的免密登录
useradd mha
echo 'mha'|passwd --stdin mha
然后配置mha到其他主机
su - mha
ssh-keygen -P '' -f ~/.ssh/id_rsa
cd .ssh/
mv id_rsa.pub authorized_keys
for i in 176 177 178 179;do scp -r ../.ssh/ 192.168.109.$i:~/;done
#注意上面176 177 178 179 192.168.109是4台机器对应的的ip
#之后按照提示输入yes和mha账号的密码
step5:配置mha用户的权限
mha-manager上创建/etc/sudoers.d/mha并写入:
vim /etc/sudoers.d/mha
#User_Alias 表示具有sudo权限的用户列表;Host_Alias表示主机的列表
User_Alias MYSQL_USERS = mha
#Runas_Alias 表示用户以什么身份登录
Runas_Alias MYSQL_RUNAS = root
#Cmnd_Alias 表示允许执行命令的列表(命令需要使用完整路径)
Cmnd_Alias MYSQL_CMNDS = /sbin/ifconfig,/sbin/arping
MYSQL_USERS ALL = (MYSQL_RUNAS) NOPASSWD: MYSQL_CMNDS
将这个权限分发给mha-slave1和mha-slave2
scp /etc/sudoers.d/mha mha-slave1:/etc/sudoers.d/
scp /etc/sudoers.d/mha mha-slave2:/etc/sudoers.d/
step7:创建mha相关配置文件
在mha-manager上创建工作目录
mkdir /etc/mha/
mkdir -p /data/mha/masterha/app1
chown -R mha. /data/mha
在mha-manager上创建局部配置文件:
vim /etc/mha/app1.conf
[server default]
#设置监控用户和密码(mysql-master上的用户)
user=mha
password=123
#设置复制环境中的复制用户和密码
repl_user=mha_slave
repl_password=123
#设置ssh的登录用户名
ssh_user=mha
#设置监控主库,发送ping包的时间间隔,默认是3s,尝试三次没回应自动进行failover
ping_interval=3
#设置mgr的工作目录
manager_workdir=/data/mha/masterha/app1
#设置MySQL master保存binlog的目录(注意自己保存的位置,我的是/mysql/data),以便MHA可以找到master的二进制日志
master_binlog_dir=/mysql/data
#设置master的pid文件
master_pid_file=/data/mysql/mha-master.pid
#设置mysql master在发生切换时保存binlog的目录(自己创建)
remote_workdir=/data/mysql/mha
#设置mgr日志文件
manager_log=/data/mha/masterha/app1/app1-3306.log
#MHA到master的监控之间出现问题,mha-manager将会尝试从slave1和slave2登录到master上
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.109.176 -s 192.168.109.178 --user=mha --port=22 --master_host=192.168.109.177 --master_port=3306
#设置自动failover时候的切换脚本
master_ip_failover_script=/usr/local/bin/master_ip_failover
#设置手动切换的时候的切换脚本
#master_ip_online_change_script="/etc/mha/master_ip_online_change.sh 192.168.109.177"
#设置故障发生后关闭故障主机脚本
#shutdown_script='/etc/mha/power_manager'
[server1]
hostname=192.168.109.177
port=3306
candidate_master=1
[server2]
hostname=192.168.109.176
port=3306
candidate_master=1
[server3]
hostname=192.168.109.178
port=3306
candidate_master=1
step8:上传并配置对应脚本
cp -rp mha4mysql-manager-0.57/samples/scripts /usr/local/bin
cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin
> /usr/local/bin/master_ip_failover
配置master_ip_failover文件
vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#############################添加内容部分#########################################
my $vip = '192.168.109.200'; #指定vip的地址
my $brdc = '192.168.109.255'; #指定vip的广播地址
my $ifdev = 'ens33'; #指定vip绑定的网卡
my $key = '1'; #指定vip绑定的虚拟网卡序列号
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip"; #代表此变量值为ifconfig ens33:1 192.168.109.200
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down"; #代表此变量值为ifconfig ens33:1 192.168.109.200 down
my $exit_code = 0; #指定退出状态码为0
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
##################################################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
## A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
6.检查ssh互信和MySQL集群状态
检查ssh互信:
su - mha
masterha_check_ssh --conf=/etc/mha/app1.conf
#结果如下
[mha@mha-manager ~]$ masterha_check_ssh --conf=/etc/mha/app1.conf
Mon Jul 24 02:23:16 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 24 02:23:16 2023 - [info] Reading application default configuration from /etc/mha/app1.conf..
Mon Jul 24 02:23:16 2023 - [info] Reading server configuration from /etc/mha/app1.conf..
Mon Jul 24 02:23:16 2023 - [info] Starting SSH connection tests..
Mon Jul 24 02:23:17 2023 - [debug]
Mon Jul 24 02:23:16 2023 - [debug] Connecting via SSH from mha@192.168.109.177(192.168.109.177:22) to mha@192.168.109.176(192.168.109.176:22)..
Mon Jul 24 02:23:16 2023 - [debug] ok.
Mon Jul 24 02:23:16 2023 - [debug] Connecting via SSH from mha@192.168.109.177(192.168.109.177:22) to mha@192.168.109.178(192.168.109.178:22)..
Mon Jul 24 02:23:17 2023 - [debug] ok.
Mon Jul 24 02:23:18 2023 - [debug]
Mon Jul 24 02:23:17 2023 - [debug] Connecting via SSH from mha@192.168.109.176(192.168.109.176:22) to mha@192.168.109.177(192.168.109.177:22)..
Mon Jul 24 02:23:17 2023 - [debug] ok.
Mon Jul 24 02:23:17 2023 - [debug] Connecting via SSH from mha@192.168.109.176(192.168.109.176:22) to mha@192.168.109.178(192.168.109.178:22)..
Mon Jul 24 02:23:17 2023 - [debug] ok.
Mon Jul 24 02:23:19 2023 - [debug]
Mon Jul 24 02:23:17 2023 - [debug] Connecting via SSH from mha@192.168.109.178(192.168.109.178:22) to mha@192.168.109.177(192.168.109.177:22)..
Mon Jul 24 02:23:17 2023 - [debug] ok.
Mon Jul 24 02:23:17 2023 - [debug] Connecting via SSH from mha@192.168.109.178(192.168.109.178:22) to mha@192.168.109.176(192.168.109.176:22)..
Mon Jul 24 02:23:18 2023 - [debug] ok.
Mon Jul 24 02:23:19 2023 - [info] All SSH connection tests passed successfully.
检查集群状态:
masterha_check_repl --conf=/etc/mha/app1.conf
#结果如下
[mha@mha-manager ~]$ masterha_check_repl --conf=/etc/mha/app1.conf
Mon Jul 24 02:26:41 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 24 02:26:41 2023 - [info] Reading application default configuration from /etc/mha/app1.conf..
Mon Jul 24 02:26:41 2023 - [info] Reading server configuration from /etc/mha/app1.conf..
Mon Jul 24 02:26:41 2023 - [info] MHA::MasterMonitor version 0.57.
Mon Jul 24 02:26:44 2023 - [info] GTID failover mode = 1
Mon Jul 24 02:26:44 2023 - [info] Dead Servers:
Mon Jul 24 02:26:44 2023 - [info] Alive Servers:
Mon Jul 24 02:26:44 2023 - [info] 192.168.109.177(192.168.109.177:3306)
Mon Jul 24 02:26:44 2023 - [info] 192.168.109.176(192.168.109.176:3306)
Mon Jul 24 02:26:44 2023 - [info] 192.168.109.178(192.168.109.178:3306)
Mon Jul 24 02:26:44 2023 - [info] Alive Slaves:
Mon Jul 24 02:26:44 2023 - [info] 192.168.109.176(192.168.109.176:3306) Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 02:26:44 2023 - [info] GTID ON
Mon Jul 24 02:26:44 2023 - [info] Replicating from 192.168.109.177(192.168.109.177:3306)
Mon Jul 24 02:26:44 2023 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 02:26:44 2023 - [info] 192.168.109.178(192.168.109.178:3306) Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Jul 24 02:26:44 2023 - [info] GTID ON
Mon Jul 24 02:26:44 2023 - [info] Replicating from 192.168.109.177(192.168.109.177:3306)
Mon Jul 24 02:26:44 2023 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 24 02:26:44 2023 - [info] Current Alive Master: 192.168.109.177(192.168.109.177:3306)
Mon Jul 24 02:26:44 2023 - [info] Checking slave configurations..
Mon Jul 24 02:26:44 2023 - [info] read_only=1 is not set on slave 192.168.109.176(192.168.109.176:3306).
Mon Jul 24 02:26:44 2023 - [info] read_only=1 is not set on slave 192.168.109.178(192.168.109.178:3306).
Mon Jul 24 02:26:44 2023 - [info] Checking replication filtering settings..
Mon Jul 24 02:26:44 2023 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jul 24 02:26:44 2023 - [info] Replication filtering check ok.
Mon Jul 24 02:26:44 2023 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jul 24 02:26:44 2023 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jul 24 02:26:45 2023 - [info] HealthCheck: SSH to 192.168.109.177 is reachable.
Mon Jul 24 02:26:45 2023 - [info]
192.168.109.177(192.168.109.177:3306) (current master)
+--192.168.109.176(192.168.109.176:3306)
+--192.168.109.178(192.168.109.178:3306)
Mon Jul 24 02:26:45 2023 - [info] Checking replication health on 192.168.109.176..
Mon Jul 24 02:26:45 2023 - [info] ok.
Mon Jul 24 02:26:45 2023 - [info] Checking replication health on 192.168.109.178..
Mon Jul 24 02:26:45 2023 - [info] ok.
Mon Jul 24 02:26:45 2023 - [info] Checking master_ip_failover_script status:
Mon Jul 24 02:26:45 2023 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=mha --orig_master_host=192.168.109.177 --orig_master_ip=192.168.109.177 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.109.200===
Checking the Status of the script.. OK
Mon Jul 24 02:26:45 2023 - [info] OK.
Mon Jul 24 02:26:45 2023 - [warning] shutdown_script is not defined.
Mon Jul 24 02:26:45 2023 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
7.检查MHA状态,然后运行MHA(监控开始)
开启mha-manager监控:
nohup masterha_manager --conf=/etc/mha/app1.conf --remove_dead_master_conf --ignore_last_failover
masterha_check_status --conf=/etc/mha/app1.conf
#####成功效果如下(别复制下面的)
[mha@mha-manager ~]$ masterha_check_status --conf=/etc/mha/app1.conf
app1 (pid:6604) is running(0:PING_OK), master:192.168.109.177
8.自动failover测试
在master服务器上安装sysbench测试工具
yum install -y sysbench
创建测试账户与测试库
create database test;
create user 'mha_test'@'localhost' identified by '123';
grant all on *.* to 'mha_test'@'localhost';
flush privileges;
开始测试
sysbench /usr/share/sysbench/oltp_read_only.lua --mysql-host=192.168.109.177 --mysql-user=mha_test --mysql-password=123 --mysql-socket=/data/mysql/mysql.sock --mysql-db=test --db-driver=mysql --tables=1 --table-size=100000 --report-interval=10 --threads=128 --time=120 prepare
模拟master服务器故障(先把自己的机器拍好快照,防止出事)
用systemctl stop mysqld把master直接宕掉模拟master出现故障
#mha-manager上监控日志
tail -f /data/mha/masterha/app1/app1-3306.log
#####等待一会观察日志发现如下结果
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.109.200===
Enabling the VIP - 192.168.109.200 on the new master - 192.168.109.176
----- Failover Report -----
app1: MySQL Master failover 192.168.109.177(192.168.109.177:3306) to 192.168.109.176(192.168.109.176:3306) succeeded
Master 192.168.109.177(192.168.109.177:3306) is down!
Check MHA Manager logs at mha-manager:/data/mha/masterha/app1/app1-3306.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.109.177(192.168.109.177:3306)
Selected 192.168.109.176(192.168.109.176:3306) as a new master.
192.168.109.176(192.168.109.176:3306): OK: Applying all logs succeeded.
192.168.109.176(192.168.109.176:3306): OK: Activated master IP address.
192.168.109.178(192.168.109.178:3306): OK: Slave started, replicating from 192.168.109.176(192.168.109.176:3306)
192.168.109.176(192.168.109.176:3306): Resetting slave info succeeded.
Master failover to 192.168.109.176(192.168.109.176:3306) completed successfully.
在剩下的另一个从库中show slave status\G;得到以下结果
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.109.176
Master_User: mha_slave
Master_Port: 3306
Connect_Retry: 60
#此时master库以及切换为slave1库了
说明mha-manager检测到原来的主服务器挂掉了,并自动将一个从库提升为主库,证明MHA高性能架构基本搭建成功!