MHA的原理及其安装部署
一、MHA架构
1、MHA简介
MHA是由日本人yoshinorim(原就职于DeNA现就职于FaceBook)开发的比较成熟的MySQL高可用方案。MHA能够在30秒内实现故障切换,并能在故障切换中,最大可能的保证数据一致性。目前淘宝也正在开发相似产品TMHA,目前已支持一主一从。
2、MHA组成
MHA由MHA Manager和MHA Node组成,如下图:
1.1 MHA Manager(管理节点)
运行一些工具,比如masterha_manager工具实现自动监控MySQL Master和实现master故障切换,其它工具实现手动实现master故障切换、在线mater转移、连接检查等等。一个Manager可以管理多 个master-slave集群。
1.2 MHA Node(数据节点)
部署在所有运行MySQL的服务器上,无论是master还是slave。主要作用有三个:
1、保存二进制日志
如果能够访问故障master,会拷贝master的二进制日志。
2、应用差异中继日志
从拥有最新数据的slave上生成差异中继日志,然后应用差异日志。
3、清除中继日志
在不停止SQL线程的情况下删除中继日志。
3、MHA工作原理
1、当master出现故障时,通过对比slave之间I/O线程读取masterbinlog的位置,选取最接近的slave做为latestslave。 其它slave通过与latest slave对比生成差异中继日志。在latest slave上应用从master保存的binlog,同时将latest slave提升为master。最后在其它slave上应用相应的差异中继日志并开始从新的master开始复制。
2、在MHA实现Master故障切换过程中,MHA Node会试图访问故障的master(通过SSH),如果可以访问(不是硬件故障,比如InnoDB数据文件损坏等),会保存二进制文件,以最大程度保 证数据不丢失。MHA和半同步复制一起使用会大大降低数据丢失的危险。
二、MHA高可用架构部署
1、实验环境:需要四台Centos7服务器
1、一台作为mha服务器(mha_manager)来监控管理下面的MySQL服务器
2、三台:一主两从MySQL服务器
Host | IP |
---|---|
mha_manager | 192.168.220.107 |
master | 192.168.220.110 |
slave-1 | 192.168.220.100 |
slave-2 | 192.168.220.200 |
实验步骤:
- 安装MySQL数据库
- 配置MySQl一主两从
- 安装MHA软件
- 配置无密码认证
- 配置MySQL MHA 高可用
- 模拟 master 故障切换
2、实验部署
2.1 master、slave-1、slave-2安装mysql服务
安装包下载地址:https://downloads.mysql.com/archives/community/
[root@master Mysql]# ls
mysql-5.7.40-linux-glibc2.12-x86_64 mysql-5.7.40-linux-glibc2.12-x86_64.tar.gz mysql_install.sh
[root@master Mysql]# cat mysql_install.sh
#!/bin/bash
# 解决软件的依赖关系
yum install cmake ncurses-devel gcc gcc-c++ lsof bzip2 openssl-devel ncurses-compat-libs -y
# 解压mysql二进制安装包
tar xf mysql-5.7.40-linux-glibc2.12-x86_64.tar.gz
# 移动mysql解压后的文件到/usr/local下改名叫mysql
mv mysql-5.7.40-linux-glibc2.12-x86_64 /usr/local/mysql
# 新建组和用户mysql
groupadd mysql
# mysql这个用户的shell 是/bin/false 属于mysql组
useradd -r -g mysql -s /bin/false mysql
# 关闭firewalld防火墙服务,并且设置开机不要启动
systemctl stop firewalld
systemctl disable firewalld
# 临时关闭selinux
setenforce 0
# 永久关闭selinux
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config
# 新建存放数据的目录
mkdir -p /data/mysql
# 修改/data/mysql目录的权限归mysql组所有,这样mysql用户可以对这个文件夹进行读写了
chown mysql:mysql /data/mysql
# 只是允许mysql这个用户和mysql组可以访问,其他人不能访问
chmod 750 /data/mysql
# 进入/usr/local/mysql/bin目录
cd /usr/local/mysql/bin
# 初始化mysql
./mysqld --initialize --user=mysql --basedir=/usr/local/mysql/ --datadir=/data/mysql &>passwd.txt
# 让mysql支持ssl方式登录的设置
./mysql_ssl_rsa_setup --datadir=/data/mysql/
# 获得临时密码
tem_passwd=$(cat passwd.txt | grep "temporary" | awk '{print $NF}')
# $NF表示最后一个字段
# abc=$(命令) 优先执行命令,然后将结果赋值给abc
# 修改PATH变量,加入mysql bin目录的路径
# 临时修改PATH变量的值
export PATH=/usr/local/mysql/bin/:$PATH
# 重新启动linux系统后也生效,永久修改
echo 'PATH=/usr/local/mysql/bin:$PATH' >>/root/.bashrc
# 复制support-files里的mysql.server文件到/etc/init.d目录下叫mysqld
cp ../support-files/mysql.server /etc/init.d/mysqld
# 修改/etc/init.d
sed -i '70c datadir=/data/mysql' /etc/init.d/mysqld
# 生成/etc/my.cnf配置文件
cat >/etc/my.cnf <<EOF
[mysqld_safe]
[client]
socket=/data/mysql/mysql.sock
[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8
[mysql]
auto-rehash
prompt=\\u@\\d \\R:\\m mysql>
EOF
# 修改内核的openfile的数量
ulimit -n 1000000
# 设置开机启动的时候配置也生效
echo "ulimit -n 1000000" >>/etc/rc.local
chmod +x /etc/rc.d/rc.local
# 启动mysqld进程
service mysqld start
# 将mysqld添加到linux系统里服务管理名单里
/sbin/chkconfig --add mysqld
# 设置mysql服务开机启动
/sbin/chkconfig mysqld on
# 初次修改密码需要使用 --connet-expired-password 选项
# -e 后面接的表示是在mysql里需要执行命令 execute 执行
# set password='Sanchuang123#'; 修改root用户的密码为Sanchaung123#
mysql -uroot -p$tem_passwd --connect-expired-password -e "set password='zjx3203766770.';"
# 检验上一步修改密码是否成功,如果有输出能看到mysql里的数据库,说明成功。
mysql -uroot -p'zjx3203766770.' -e "show databases;"
# 执行mysql_install.sh安装脚本
[root@master Mysql]# source mysql_install.sh
登录查看是否安装成功
[root@slave-1 ~]# mysql -uroot -p'zjx3203766770.'
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 20
Server version: 5.7.40-log MySQL Community Server (GPL)
Copyright (c) 2000, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
root@(none) 22:07 mysql>
# 登录成功
2.2 master、slave-1、slave-2上配置my.cnf
修改master的my.cnf文件
[root@master ~]# cat /etc/my.cnf
[mysqld_safe]
[client]
socket=/data/mysql/mysql.sock
[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8
log_bin=1
server_id=1
# 半同步
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=3000 # 3 second
expire_logs_days=7
# 开启GTID
gtid-mode=on
enforce-gtid-consistency=on
relay-log_purge=0
log_slave_updates=1
[mysql]
auto-rehash
prompt=\u@\d \R:\m mysql>
修改slave-1的my.cnf文件
[root@slave-1 ~]# cat /etc/my.cnf
[mysqld_safe]
[client]
socket=/data/mysql/mysql.sock
[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8
# binlog
log_bin=1
server_id=2
expire-logs-days=7
# 半同步
rpl_semi_sync_slave_enabled=1
# 开启GTID
gtid_mode=on
enforce-gtid-consistency=on
log_slave_updates=on
relay_log_purge=0
[mysql]
auto-rehash
prompt=\u@\d \R:\m mysql>
修改slave-2的my.cnf文件
[root@slave-2 ~]# cat /etc/my.cnf
[mysqld_safe]
[client]
socket=/data/mysql/mysql.sock
[mysqld]
socket=/data/mysql/mysql.sock
port=3306
open_files_limit=8192
innodb_buffer_pool_size=512M
character-set-server=utf8
# log
log_bin=1
server_id=3
expire-logs-days=7
# 开启GTID
gtid_mode=on
enforce-gtid-consistency=on
# 半同步
rpl_semi_sync_slave_enabled=1
relay_log_purge=0
log_slave_updates=1
[mysql]
auto-rehash
prompt=\u@\d \R:\m mysql>
2.3 在master、slave-1、slave-2上分别做两个软链接
# 因为下载路径都是一样的,所以slave-1、slave-2做和master一样的操作
[root@master ~]# ln -s /usr/local/mysql/bin/mysql /usr/sbin
[root@master ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin
[root@master ~]# ll /usr/sbin/mysql
lrwxrwxrwx 1 root root 26 May 20 10:33 /usr/sbin/mysql -> /usr/local/mysql/bin/mysql
[root@master ~]# ll /usr/sbin/mysqlbinlog
lrwxrwxrwx 1 root root 32 May 20 10:33 /usr/sbin/mysqlbinlog -> /usr/local/mysql/bin/mysqlbinlog
2.4 重启3台MySQL服务,查看是否启动成功
# slave-1、slave-2根master做同样操作
[root@master ~]# systemctl restart mysqld
[root@slave-1 ~]# netstat -anplut | grep mysqld
tcp6 0 0 :::3306 :::* LISTEN 1295/mysqld
2.5 通过GTID配置主从复制
在所有的slave机器上导入基础数据,使得基础数据一致
# master导出备份文件
[root@master mysql]# mysqldump --all-databases --set-gtid-purged=OFF -uroot -p'zjx3203766770.' >all_db.sql
mysqldump: [Warning] Using a password on the command line interface can be insecure.
# 正常警告,来自于密码暴露不安全。
# master将备份文件复制到slave-1、2中的/data/mysql文件夹下去
[root@master mysql]# scp /data/mysql/all_db.sql root@192.168.220.100:/data/mysql
root@192.168.220.100’s password:
all_db.sql 100% 865KB 89.7MB/s 00:00
# slave上传备份文件,使数据与master一致
[root@slave-1 mysql]# mysql -uroot -p'zjx3203766770.' <all_db.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@slave-2 mysql]# mysql -uroot -p'zjx3203766770.' <all_db.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
在master上创建授权用户
root@(none) 01:23 mysql>grant replication slave on *.* to 'zhang'@'192.168.220.%' identified by 'zjx3203766770.'
Query OK, 0 rows affected, 1 warning (0.02 sec)
# relication slave 授予slave复制的权限
开启gtid功能,启动主从复制服务(在slave服务器上都要操作)
# 首先停止slave服务器上MySQL服务的slave服务
stop slave;
# 在所有slave中执行此命令
CHANGE MASTER TO MASTER_HOST='192.168.220.110' ,
MASTER_USER='zhang',
MASTER_PASSWORD='zjx3203766770.',
MASTER_PORT=3306,
master_auto_position=1;
# 重新启动slave服务器上MySQL服务的slave服务
start slave;
查看主从复制是否成功
# 只查看了一个slave-2的状态
root@(none) 10:49 mysql>show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.220.110
Master_User: zhang
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: 1.000008
Read_Master_Log_Pos: 533
Relay_Log_File: slave-2-relay-bin.000010
Relay_Log_Pos: 690
Relay_Master_Log_File: 1.000008
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 533
Relay_Log_Space: 5007
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: 313ad416-d849-11ed-ad7c-000c29397dd0
Master_Info_File: /data/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 313ad416-d849-11ed-ad7c-000c29397dd0:1-4
Executed_Gtid_Set: 0e1ad29a-da16-11ed-8935-000c29617fb6:1,
313ad416-d849-11ed-ad7c-000c29397dd0:1-4,
841dad98-ce79-11ed-a03b-000c29aeef9f:1-8
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.03 sec)
最后一步在master创建好mha用户,并通过主从复制同步给其他的slave服务器
root@(none) 22:07 mysql> grant all privileges on *.* to mha@'192.168.220.%' identified by 'mha_chaoge';
Query OK, 0 rows affected (0.00 sec)
2.6 给mysql集群配置以及MHA Manager免密通道
MHA Manager对整个mysql集群建立免密通道
[root@mha_manager ~]# ssh-keygen
[root@mha_manager ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.110
[root@mha_manager ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.100
[root@mha_manager ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.200
master对slave-1、slave-2建立免密通道
[root@master ~]# ssh-keygen
[root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.100
[root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.200
slave-1对master、slave-2建立免密通道
[root@slave-1 ~]# ssh-keygen
[root@slave-1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.110
[root@slave-1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.200
slave-2对master,slave-1建立免密通道
[root@slave-2 ~]# ssh-keygen
[root@slave-2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.110
[root@slave-2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.220.100
2.7 在manager主机上配置mha
安装MHA依赖的环境及安装epel源(每台节点都需要安装这依赖环境,因为每台都要安装node)
yum install epel-release --nogpgcheck -y
yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN
安装node组件(四台节点都需要安装,方法一样)(下载地址:https://github.com/yoshinorim/mha4mysql-node)
[root@master mha]# ls
mha4mysql-node-0.58-0.el7.centos.noarch.rpm
# 编译安装mha4mysql-node组件
[root@master mha]# rpm -iav mha4mysql-node-0.58-0.el7.centos.noarch.rpm
# 检查rpm安装出的命令
[root@master mha]# ls -l /usr/bin/*_*log*
-rwxr-xr-x 1 root root 17639 Mar 23 2018 /usr/bin/apply_diff_relay_logs
-rwxr-xr-x. 1 root root 15704 Aug 9 2019 /usr/bin/db_log_verify
-rwxr-xr-x. 1 root root 33032 Aug 9 2019 /usr/bin/db_printlog
-rwxr-xr-x 1 root root 4807 Mar 23 2018 /usr/bin/filter_mysqlbinlog
-rwxr-xr-x 1 root root 8337 Mar 23 2018 /usr/bin/purge_relay_logs
-rwxr-xr-x 1 root root 7525 Mar 23 2018 /usr/bin/save_binary_logs
-rwxr-xr-x. 1 root root 7910 Apr 1 2020 /usr/bin/scsi_logging_level
-rwxr-xr-x. 1 root root 94696 Apr 1 2020 /usr/bin/sg_logs
四台节点在node安装后都会在/usr/bin 下面会生成一下几个脚本(这些工具通常由MHA manager的脚本触发,无需人为操作)主要如下:
save_binary_logs # 保存和复制master的二进制日志
apply_diff_relay_logs # 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog # 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs # 清除中继日志(不会阻塞SQL线程)
安装manager组件(只在mha_manager服务器上安装)(下载地址:https://github.com/yoshinorim/mha4mysql-manager)
[root@mha_manager mha]# ls
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm mha4mysql-node-0.58-0.el7.centos.noarch.rpm
# 编译安装mha4mysql-manager组件
[root@mha_manager mha]# rpm -iav mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
# 检查rpm安装出的命令
[root@mha_manager mha]# ls -l /usr/bin/masterha_*
-rwxr-xr-x 1 root root 1995 Mar 23 2018 /usr/bin/masterha_check_repl
-rwxr-xr-x 1 root root 1779 May 19 18:57 /usr/bin/masterha_check_ssh
-rwxr-xr-x 1 root root 1865 Mar 23 2018 /usr/bin/masterha_check_status
-rwxr-xr-x 1 root root 3201 Mar 23 2018 /usr/bin/masterha_conf_host
-rwxr-xr-x 1 root root 2517 Mar 23 2018 /usr/bin/masterha_manager
-rwxr-xr-x 1 root root 2165 Mar 23 2018 /usr/bin/masterha_master_monitor
-rwxr-xr-x 1 root root 2373 Mar 23 2018 /usr/bin/masterha_master_switch
-rwxr-xr-x 1 root root 5172 Mar 23 2018 /usr/bin/masterha_secondary_check
-rwxr-xr-x 1 root root 1739 Mar 23 2018 /usr/bin/masterha_stop
manager 安装后在/usr/bin 下面会生成几个工具,主要包括以下几个:
masterha_check_ssh ## 检查MHA的SSH的配置状况
masterha_check_repl ## 检查MySQL复制状况
masterha_manager ## 启动manager脚本
masterha_check_status ## 检查当前MHA运行状态
masterha_master_monitor ## 检测master是否宕机
masterha_master_switch ## 控制故障转移(自动或者手动)
masterha_conf_host ## 添加或删除配置的server信息
masterha_stop ## 关闭manager
2.8 在manager节点上配置MHA脚本
[root@mha_manager ~]# vim /usr/local/bin/master_ip_failover
# 给与执行权限
[root@mha_manager ~]# chmod +x /usr/local/bin/master_ip_failover
[root@mha_manager ~]# cat /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.220.99/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
2.9 在manager节点上创建MHA配置文件
[root@mha_manager ~]# mkdir -p /etc/mha # 在/etc下创建mha目录。
[root@mha_manager ~]# mkdir -p /var/log/mha/app1 # 创建/var/log/mha/app1目录,用来放manager日志。
[root@mha_manager ~]# vim /etc/mha/app1.cnf # 编辑mha配置文件,增加配置内容。
[root@mha_manager ~]# cat /etc/mha/app1.cnf
# 配置文件如下
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1.log
master_binlog_dir=/data/mysql
# 自愈脚本
master_ip_failover_script=/usr/local/bin/master_ip_failover
user=mha
password=mha_chaoge
ping_interval=2
repl_user=zhang
repl_password=zjx3203766770.
ssh_user=root
shutdown_script=""
[server1]
candidate_master=1
check_repl_delay=0
hostname=192.168.220.110
port=3306
[server2]
hostname=192.168.220.100
port=3306
candidate_master=1
check_repl_delay=0
[server3]
hostname=192.168.220.200
port=3306
3、实验测试
3.1 在manager端测试无密码认证
[root@mha_manager ~]# masterha_check_ssh --conf=etcmhaapp1.conf
Fri May 19 155110 2023 - [warning] Global configuration file etcmasterha_default.cnf not found. Skipping.
Fri May 19 155110 2023 - [info] Reading application default configuration from etcmhaapp1.conf..
Fri May 19 155110 2023 - [info] Reading server configuration from etcmhaapp1.conf..
Fri May 19 155110 2023 - [info] Starting SSH connection tests..
Fri May 19 155112 2023 - [debug]
Fri May 19 155110 2023 - [debug] Connecting via SSH from root@192.168.220.100(192.168.220.10022) to root@192.168.220.110(192.168.220.11022)..
Fri May 19 155111 2023 - [debug] ok.
Fri May 19 155111 2023 - [debug] Connecting via SSH from root@192.168.220.100(192.168.220.10022) to root@192.168.220.200(192.168.220.20022)..
Fri May 19 155112 2023 - [debug] ok.
Fri May 19 155112 2023 - [debug]
Fri May 19 155111 2023 - [debug] Connecting via SSH from root@192.168.220.200(192.168.220.20022) to root@192.168.220.110(192.168.220.11022)..
Fri May 19 155111 2023 - [debug] ok.
Fri May 19 155111 2023 - [debug] Connecting via SSH from root@192.168.220.200(192.168.220.20022) to root@192.168.220.100(192.168.220.10022)..
Fri May 19 155112 2023 - [debug] ok.
Fri May 19 155112 2023 - [debug]
Fri May 19 155110 2023 - [debug] Connecting via SSH from root@192.168.220.110(192.168.220.11022) to root@192.168.220.100(192.168.220.10022)..
Fri May 19 155111 2023 - [debug] ok.
Fri May 19 155111 2023 - [debug] Connecting via SSH from root@192.168.220.110(192.168.220.11022) to root@192.168.220.200(192.168.220.20022)..
Fri May 19 155111 2023 - [debug] ok.
Fri May 19 155112 2023 - [info] All SSH connection tests passed successfully.
3.2 在manager端测试主从复制
[root@mha_manager ~]# masterha_check_repl --conf=etcmhaapp1.conf
Fri May 19 155121 2023 - [warning] Global configuration file etcmasterha_default.cnf not found. Skipping.
Fri May 19 155121 2023 - [info] Reading application default configuration from etcmhaapp1.conf..
Fri May 19 155121 2023 - [info] Reading server configuration from etcmhaapp1.conf..
Fri May 19 155121 2023 - [info] MHAMasterMonitor version 0.58.
Fri May 19 155122 2023 - [info] GTID failover mode = 1
Fri May 19 155122 2023 - [info] Dead Servers
Fri May 19 155122 2023 - [info] Alive Servers
Fri May 19 155122 2023 - [info] 192.168.220.110(192.168.220.1103306)
Fri May 19 155122 2023 - [info] 192.168.220.100(192.168.220.1003306)
Fri May 19 155122 2023 - [info] 192.168.220.200(192.168.220.2003306)
Fri May 19 155122 2023 - [info] Alive Slaves
Fri May 19 155122 2023 - [info] 192.168.220.100(192.168.220.1003306) Version=5.7.40-log (oldest major version between slaves) log-binenabled
Fri May 19 155122 2023 - [info] GTID ON
Fri May 19 155122 2023 - [info] Replicating from 192.168.220.110(192.168.220.1103306)
Fri May 19 155122 2023 - [info] Primary candidate for the new Master (candidate_master is set)
Fri May 19 155122 2023 - [info] 192.168.220.200(192.168.220.2003306) Version=5.7.40-log (oldest major version between slaves) log-binenabled
Fri May 19 155122 2023 - [info] GTID ON
Fri May 19 155122 2023 - [info] Replicating from 192.168.220.110(192.168.220.1103306)
Fri May 19 155122 2023 - [info] Current Alive Master 192.168.220.110(192.168.220.1103306)
Fri May 19 155122 2023 - [info] Checking slave configurations..
Fri May 19 155122 2023 - [info] read_only=1 is not set on slave 192.168.220.100(192.168.220.1003306).
Fri May 19 155122 2023 - [info] read_only=1 is not set on slave 192.168.220.200(192.168.220.2003306).
Fri May 19 155122 2023 - [info] Checking replication filtering settings..
Fri May 19 155122 2023 - [info] binlog_do_db= , binlog_ignore_db=
Fri May 19 155122 2023 - [info] Replication filtering check ok.
Fri May 19 155122 2023 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri May 19 155122 2023 - [info] Checking SSH publickey authentication settings on the current master..
Fri May 19 155122 2023 - [info] HealthCheck SSH to 192.168.220.110 is reachable.
Fri May 19 155122 2023 - [info]
192.168.220.110(192.168.220.1103306) (current master)
+--192.168.220.100(192.168.220.1003306)
+--192.168.220.200(192.168.220.2003306)
Fri May 19 155122 2023 - [info] Checking replication health on 192.168.220.100..
Fri May 19 155122 2023 - [info] ok.
Fri May 19 155122 2023 - [info] Checking replication health on 192.168.220.200..
Fri May 19 155122 2023 - [info] ok.
Fri May 19 155122 2023 - [warning] master_ip_failover_script is not defined.
Fri May 19 155122 2023 - [warning] shutdown_script is not defined.
Fri May 19 155122 2023 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
3.3 启动MHA
在master节点上配置ens33:1的虚拟子接口,给与VIP:192.168.220.99/24
# 创建
[root@mha_manager ~]# ifconfig ens33:1 192.168.220.99/24
后台运行MHA
[root@mha_manager ~]# nohup masterha_manager --conf=/etc/mha/app1.conf --ignore_last_failover /var/log/mha/app1/manager.log 2>&1 &
[1] 15912
[root@mha_manager ~]# nohup: ignoring input and appending output to ‘nohup.out’
命令参数:
--remove_dead_master_conf 该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
--manger_log 日志存放位置
--ignore_last_failover 在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面设置的manager_workdir目录中产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。
# 查询当前的master角色
[root@mha_manager ~]# masterha_check_status --conf=/etc/mha/app1.conf
app1 (pid:15912) is running(0:PING_OK), master:192.168.220.110 # 当前master
3.4 查看MHA日志
[root@mha_manager ~]# tail -f /var/log/mha/app1/manager.log
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.220.99/24===
Checking the Status of the script.. OK
Fri May 19 15:51:22 2023 - [info] OK.
Fri May 19 15:51:22 2023 - [warning] shutdown_script is not defined.
Fri May 19 15:51:22 2023 - [info] Set master ping interval 1 seconds.
Fri May 19 15:51:22 2023 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s 192.168.220.100 -s 192.168.220.200
Fri May 19 15:51:22 2023 - [info] Starting ping health check on 192.168.220.110(192.168.220.110:3306)..
Fri May 19 15:51:22 2023 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
3.5 模拟master宕机,slave-1是否能顶上
1、vip在master上
2、关闭master上的MySQL服务
[root@master ~]# systemctl stop mysqld
# 查看发现虚拟VIP没有了
[root@master ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.220.110 netmask 255.255.255.0 broadcast 192.168.220.255
ether 00:0c:29:ae:ef:9f txqueuelen 1000 (Ethernet)
RX packets 220136 bytes 136131834 (129.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 31626 bytes 3742784 (3.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 170 bytes 14516 (14.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 170 bytes 14516 (14.1 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
3、vip漂移到slave-1上
[root@slave-1 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.220.100 netmask 255.255.255.0 broadcast 192.168.220.255
inet6 fe80::8d93:427a:e6e1:1c9c prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:39:7d:d0 txqueuelen 1000 (Ethernet)
RX packets 112950 bytes 7964159 (7.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14713 bytes 1831490 (1.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.220.99 netmask 255.255.255.0 broadcast 192.168.220.255
ether 00:0c:29:39:7d:d0 txqueuelen 1000 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 38 bytes 3000 (2.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 38 bytes 3000 (2.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
4、进入slave-2的数据库中查看状态,查看slave-2的主是否从master变成slave-1,即slave信息中的Master_Host由192.168.220.110变成192.168.220.100
root@(none) 16:22 mysql>show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.220.100
Master_User: zhang
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: 1.000008
Read_Master_Log_Pos: 234
Relay_Log_File: slave-2-relay-bin.000010
Relay_Log_Pos: 391
Relay_Master_Log_File: 1.000008
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 234
Relay_Log_Space: 4708
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: 313ad416-d849-11ed-ad7c-000c29397dd0
Master_Info_File: /data/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 313ad416-d849-11ed-ad7c-000c29397dd0:1-2
Executed_Gtid_Set: 0e1ad29a-da16-11ed-8935-000c29617fb6:1,
313ad416-d849-11ed-ad7c-000c29397dd0:1-2,
841dad98-ce79-11ed-a03b-000c29aeef9f:1-8
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
实验成功~ 项目结束~