MySQL-MHA搭建、故障测试

一、架构说明

MHA(Master High Availability)是一个用于 MySQL 主从复制管理和自动故障转移的开源工具集。MHA 的主要目的是提供 MySQL 环境的高可用性和自动故障转移功能,确保在主库发生故障时能够快速切换到备库,降低业务中断时间。

MHA 主要由以下几个组件组成:

1.Manager:管理节点,负责监控主库的状态,并处理自动故障转移。Manager 会不断检查数据库的主从同步状态,当主库发生故障时,会自动将备库提升为新的主库,并重新配置其他从库的复制关系。
如果主从延迟,则等待所有relay log全部应用后,将VIP漂移到新主库上,整个故障转移过程对应用程序完全透明。

2.Node:节点,即 MySQL 实例。MHA 通过在每个 MySQL 实例上部署 MHA 脚本来对节点进行监控和操作。

3.SSH:MHA 使用 SSH 协议来进行节点之间的通信和操作。确保在使用 MHA 之前,通过 SSH 配置了节点之间的互信。

MHA 的工作流程通常如下:

1.Manager 通过 SSH 登录到节点上监控 MySQL 主库的状态,包括主从复制延迟、节点健康状况等。

2.当发现主库发生故障或不可用时,Manager 会触发自动故障转移流程,将备库晋升为新的主库。

3.Manager 会更新其他从库的复制关系,确保它们能够正确复制新的主库。
在这里插入图片描述

主从复制(一主一从) + 增强半同步 + GTID + MHA(ping_type=INSERT)

架构操作系统IPVIP主机名服务版本端口磁盘空间内存CPU
MHAredhat7.9192.168.111.34192.168.111.36mha-master01mysql-8.0.35、manager-0.58、node-0.58330720G4G8C
192.168.111.35192.168.111.36mha-slave01mysql-8.0.35、manager-0.58、node-0.58330720G4G8C

注:该方案仅支持一主一从
缺点:
1.manager管理节点存在单点故障
2. io_threads异常或者延迟时,有丢失数据的风险(主要由网络故障引发,所以主从节点必须在同一个网段)

二、环境准备

2.1 环境准备、搭建主从在此链接跳转

删除依赖冲突
yum remove mariadb*

注意:搭建MHA需要的依赖包如下(两个节点都安装)

perl依赖

yum -y install perl-Module-Install perl-Module-Build

MySQL驱动

yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
yum install perl-Class-Load perl-MIME-Lite perl-Mail-Send perl-Mail-Sender perl-Mail-Sendmail perl-Params-Validate perl-Sys-Syslog perl-Email-Date-Format perl-MIME-Types
以上依赖包如果yum源没有则手动下载安装,依赖包可在以下网址搜索
perl-DBD-MySQL-4.023-6.el7.x86_64
perl-Config-Tiny-2.14-7.el7.noarch
perl-Log-Dispatch-2.41-1.el7.1.noarch
perl-Parallel-ForkManager-1.18-2.el7.noarch
perl-MIME-Lite-3.030-1.el7.noarch
perl-Mail-Sender-0.8.23-1.el7.noarch
perl-Mail-Sendmail-0.79-21.el7.noarch
perl-Email-Date-Format-1.002-15.el7.noarch
perl-MIME-Types-1.38-2.el7.noarch

若出现libmysqlclient.so.18报错问题以下是解决方式。

https://rpm.pbone.net/
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/aarch64/Packages/p/perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-10-05/x86_64/Packages/p/perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-10-05/ppc64/Packages/p/perl-MIME-Lite-3.030-1.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2019-05-29/ppc64le/Packages/p/perl-Mail-Sender-0.8.23-1.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/aarch64/Packages/p/perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/ppc64le/Packages/p/perl-Email-Date-Format-1.002-15.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-04-20/x86_64/Packages/p/perl-MIME-Types-1.38-2.el7.noarch.rpm

安装mysql驱动
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-libs-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-common-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-libs-compat-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-5.6/MySQL-shared-5.6.51-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm

rpm -ivh mysql-community-common-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-compat-8.0.28-1.el7.x86_64.rpm
rpm -ivh MySQL-shared-5.6.51-1.el7.x86_64.rpm <---> 与mysql-community-libs 互补

2.2 数据库用户权限管理

2.2.1 权限管理

在主从环境按照以下建议赋予权限,禁止赋予all privileges权限
普通用户DDL权限控制

管理员账号,账号名统一为admin_root,由项目负责人管理,用于数据初始化等临时操作

create user 'admin_root'@'%' identified by 'myttrepl@2222#TO'; 
grant alter,Alter routine,Create,Create routine,Create view,Delete,Drop,Select,Insert,Show view,Update on *.* to 'admin_root'@'%';
grant Event,Execute,Index,Lock tables,Process,References,Reload,Show databases,Trigger on *.* to 'admin_root'@'%';

应用账号权限
#无特殊需求是,默认赋予以下权限

create user 'app_du'@'%' identified by 'myttrepl@2222#TO'; 
grant select,update,delete,insert,Show view,Trigger,Execute,Alter routine on db.* to 'app_du'@'%';
grant event on *.* to 'app_du'@'%';

如果主库宕机后,能够快速上线,要避免业务写从库,这样主库可以以slave身份重新加入,并保持数据同步。
处理方式:不给业务分配超级权限账号,就可以避免从库写入(因为从库始终处于read_only模式)

2.3 配置互信

方法1:使用ssh-copy命令配置(分别在2个节点执行)

ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.111.34
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.111.35

方法2:手工配置
执行ssh-keygen -t rsa生成一对密钥
然后将公钥(id_rsa.pub)分别写入34、35两台机器的/root/.ssh/authorized_keys中

测试是否成功

ssh 192.168.111.34 date  && ssh 192.168.111.35 date

三、安装MHA

3.1 创建目录

mkdir -p /data/mha/{scripts,conf,manager}

3.2 配置在线切换、故障切换检测脚本

vi /data/mha/scripts/master_ip_failover_vip
需要修改的地方
my $vip = '192.168.111.36/24';
my $ssh_start_vip = "/usr/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/usr/sbin/ifconfig ens32:$key down";
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
 
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
 
my $vip = '192.168.111.36/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";
 
GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);
exit &main();
sub main {
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {
 
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
 
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
 
sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
vi /data/mha/scripts/master_ip_online_change_vip
需要修改的地方
my $vip = '192.168.111.36/24';
my $ssh_start_vip = "/usr/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/usr/sbin/ifconfig ens32:$key down";
#!/usr/bin/env perl
use strict;  
use warnings FATAL =>'all';  
use Getopt::Long;   
my $vip = '192.168.111.36/24';  # Virtual IP  
my $key = "1";  
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";  
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";  
my $exit_code = 0;  
my (  
  $command,              $orig_master_is_new_slave, $orig_master_host,  
  $orig_master_ip,       $orig_master_port,         $orig_master_user,  
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,  
  $new_master_ip,        $new_master_port,          $new_master_user,  
  $new_master_password,  $new_master_ssh_user,  
);  
GetOptions(  
  'command=s'                => \$command,  
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,  
  'orig_master_host=s'       => \$orig_master_host,  
  'orig_master_ip=s'         => \$orig_master_ip,  
  'orig_master_port=i'       => \$orig_master_port,  
  'orig_master_user=s'       => \$orig_master_user,  
  'orig_master_password=s'   => \$orig_master_password,  
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,  
  'new_master_host=s'        => \$new_master_host,  
  'new_master_ip=s'          => \$new_master_ip,  
  'new_master_port=i'        => \$new_master_port,  
  'new_master_user=s'        => \$new_master_user,  
  'new_master_password=s'    => \$new_master_password,  
  'new_master_ssh_user=s'    => \$new_master_ssh_user,  
);  
exit &main();  
sub main {  
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";  
  
if ( $command eq "stop" || $command eq "stopssh" ) {  
  
        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  
        # If you manage master ip address at global catalog database,  
        # invalidate orig_master_ip here.  
        my $exit_code = 1;  
        eval {  
            print "\n\n\n***************************************************************\n";  
            print "Disabling the VIP - $vip on old master: $orig_master_host\n";  
            print "***************************************************************\n\n\n\n";  
&stop_vip();  
            $exit_code = 0;  
        };  
        if ($@) {  
            warn "Got Error: $@\n";  
            exit $exit_code;  
        }  
        exit $exit_code;  
}  
elsif ( $command eq "start" ) {  
  
        # all arguments are passed.  
        # If you manage master ip address at global catalog database,  
        # activate new_master_ip here.  
        # You can also grant write access (create user, set read_only=0, etc) here.  
my $exit_code = 10;  
        eval {  
            print "\n\n\n***************************************************************\n";  
            print "Enabling the VIP - $vip on new master: $new_master_host \n";  
            print "***************************************************************\n\n\n\n";  
&start_vip();  
            $exit_code = 0;  
        };  
        if ($@) {  
            warn $@;  
            exit $exit_code;  
        }  
        exit $exit_code;  
}  
elsif ( $command eq "status" ) {  
        print "Checking the Status of the script.. OK \n";  
        `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`;  
        exit 0;  
}  
else {  
&usage();  
        exit 1;  
}  
}  
# A simple system call that enable the VIP on the new master  
sub start_vip() {  
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;  
}  
# A simple system call that disable the VIP on the old_master  
sub stop_vip() {  
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;  
}  
sub usage {  
print  
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";  
}
chmod +x /data/mha/scripts/master_ip_failover_vip
chmod +x /data/mha/scripts/master_ip_online_change_vip

3.3 安装MHA软件

mha4mysql-node-0.58.tar.gz —> 两节点都安装
mha4mysql-manager-0.58.tar.gz —> 两节点都安装

cd /data/mha
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
tar -zxvf mha4mysql-node-0.58.tar.gz
cd mha4mysql-node-0.58
perl Makefile.PL
make && make install
ln -s /data/mha/mha4mysql-node-0.58/bin/* /usr/local/bin/
ll /usr/local/bin/

tar -zxvf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
perl Makefile.PL
make && make install
ln -s /data/mha/mha4mysql-manager-0.58/bin/* /usr/local/bin/
ll /usr/local/bin/

3.4 在manager节点配置masterha.cnf、app.cnf(两个节点都配置)

创建mha管理账号(仅在节点1创建)

create user 'i_mha'@'192.168.%.%' identified by "myttrepl@2222#TO";
GRANT ALL PRIVILEGES ON *.* TO 'i_mha'@'192.168.%.%' with grant option;
mkdir -p /data/mha/masterha/
vi /data/mha/conf/masterha.cnf
[server default]
user=i_mha		#设置监控用户mha
password=myttrepl@2222#TO		#设置mysql中mha用户的密码,这个密码是前文中创建监控用户的那个密码

repl_user=repl		#设置复制用户的用户repl
repl_password=myttrepl@2222#TO		#设置复制用户repl的密码
ssh_user=root		#设置ssh的登录用户名
ping_interval=10		#设置监控主库,发送ping包的时间间隔10秒,默认是3秒,尝试三次没有回应的时候自动进行failover
ping_type=CONNECT		
remote_workdir=/data/mha/masterha/		#manager工作目录
vi /data/mha/conf/app.cnf
manager_workdir=/data/mha/manager/		#manager工作目录
manager_log=/data/mha/manager/manager.log	#manager日志
#workdir on the node for mysql server
master_binlog_dir=/data/mysql8.0.35/3307/binlog/		#master保存binlog的位置,这里的路径要与master里配置的binlog的路径一致,以便MHA能找到
#自动故障切换master脚本
master_ip_failover_script=/data/mha/scripts/master_ip_failover_vip
#手动切换时运行的脚本
master_ip_online_change_script= /data/mha/scripts/master_ip_online_change_vip
#指定检查的从服务器IP地址
secondary_check_script = /usr/local/bin/masterha_secondary_check -s 192.168.111.34 -s 192.168.111.35	

[server1]
hostname=192.168.111.34
ssh_port=22
port=3307
#设置为候选master,设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中最新的slave
candidate_master=1
#默认情况下如果一个slave落后master 超过100M的relay logs的话,MHA将不会选择该slave作为一个新的master, 因为对于这个slave的恢复需要花费很长时间;通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
check_repl_delay=0 #忽略relay logs日志的复制延迟

[server2]
hostname=192.168.111.35
ssh_port=22
port=3307
candidate_master=1
check_repl_delay=0

3.5 检测互信

[root@mhaserver01 conf]# masterha_check_ssh --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
Thu Feb 22 16:34:13 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Feb 22 16:34:13 2024 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Thu Feb 22 16:34:13 2024 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Thu Feb 22 16:34:13 2024 - [info] Starting SSH connection tests..
Thu Feb 22 16:34:13 2024 - [debug]
Thu Feb 22 16:34:13 2024 - [debug]  Connecting via SSH from root@192.168.111.34(192.168.111.34:22) to root@192.168.111.35(192.168.111.35:22)..
Thu Feb 22 16:34:13 2024 - [debug]   ok.
Thu Feb 22 16:34:14 2024 - [debug]
Thu Feb 22 16:34:13 2024 - [debug]  Connecting via SSH from root@192.168.111.35(192.168.111.35:22) to root@192.168.111.34(192.168.111.34:22)..
Thu Feb 22 16:34:14 2024 - [debug]   ok.
Thu Feb 22 16:34:14 2024 - [info] All SSH connection tests passed successfully.

3.6 检测主从状态

[root@mhaserver01 conf]# masterha_check_repl --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
Fri Feb 23 10:50:09 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 23 10:50:09 2024 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Fri Feb 23 10:50:09 2024 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Fri Feb 23 10:50:09 2024 - [info] MHA::MasterMonitor version 0.58.
Fri Feb 23 10:50:10 2024 - [info] GTID failover mode = 1
Fri Feb 23 10:50:10 2024 - [info] Dead Servers:
Fri Feb 23 10:50:10 2024 - [info] Alive Servers:
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.35(192.168.111.35:3307)
Fri Feb 23 10:50:10 2024 - [info] Alive Slaves:
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.35(192.168.111.35:3307)  Version=8.0.35 (oldest major version between slaves) log-bin:enabled
Fri Feb 23 10:50:10 2024 - [info]     GTID ON
Fri Feb 23 10:50:10 2024 - [info]     Replicating from 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Feb 23 10:50:10 2024 - [info] Current Alive Master: 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info] Checking slave configurations..
Fri Feb 23 10:50:10 2024 - [info]  read_only=1 is not set on slave 192.168.111.35(192.168.111.35:3307).
Fri Feb 23 10:50:10 2024 - [info] Checking replication filtering settings..
Fri Feb 23 10:50:10 2024 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Feb 23 10:50:10 2024 - [info]  Replication filtering check ok.
Fri Feb 23 10:50:10 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Feb 23 10:50:10 2024 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 23 10:50:11 2024 - [info] HealthCheck: SSH to 192.168.111.34 is reachable.
Fri Feb 23 10:50:11 2024 - [info]
192.168.111.34(192.168.111.34:3307) (current master)
 +--192.168.111.35(192.168.111.35:3307)

Fri Feb 23 10:50:11 2024 - [info] Checking replication health on 192.168.111.35..
Fri Feb 23 10:50:11 2024 - [info]  ok.
Fri Feb 23 10:50:11 2024 - [info] Checking master_ip_failover_script status:
Fri Feb 23 10:50:11 2024 - [info]   /data/mha/scripts/master_ip_failover_vip --command=status --ssh_user=root --orig_master_host=192.168.111.34 --orig_master_ip=192.168.111.34 --orig_master_port=3307

IN SCRIPT TEST====/usr/sbin/ifconfig ens32:1 down==/usr/sbin/ifconfig ens32:1 192.168.111.36/24===

Checking the Status of the script.. OK
Fri Feb 23 10:50:11 2024 - [info]  OK.
Fri Feb 23 10:50:11 2024 - [warning] shutdown_script is not defined.
Fri Feb 23 10:50:11 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

3.7 开启MHA-manager监控进程(mhamaster02启用manager管理)

注意:第一次启动,主库上的VIP 不会自动绑定,需要手动去绑定,主库发生故障切换会进行vip的漂移

#起服务
nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &
#停服务
masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf

3.8 检测MHA状态

masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:28984) is running(0:PING_OK), master:192.168.111.34

##查看MHA日志,看到当前matser是192.168.111.34
cat /data/mha/manager/manager.log |grep "current master"
Mon Feb 26 16:43:12 2024 - [info] Checking SSH publickey authentication settings on the current master..
192.168.111.34(192.168.111.34:3307) (current master)

/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -h192.168.111.34 -p'y-yxoqupe33Z' -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| server_id     | 343307 |
+---------------+--------+
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -h192.168.111.35 -p'y-yxoqupe33Z' -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| server_id     | 353307 |
+---------------+--------+

3.9 MHA-VIP 配置

mha-master01节点手动执行vip命令

#挂载vip
/sbin/ifconfig ens32:1 192.168.111.36/24

#卸载vip
/usr/sbin/ip addr del 192.168.111.36/24 dev ens32:1

3.10 编写脚本

3.10.1 登录数据库脚本

vi /data/mha/scripts/mha_login_mysql
#!/bin/bash
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -p'y-yxoqupe33Z'

ln -s /data/mha/scripts/mha_login_mysql /usr/local/bin/mha_login_mysql

3.10.2 在线切换主从脚本

vi /data/mha/scripts/mha_switchover
#!/bin/bash
masterha_master_switch --master_state=alive --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=60

ln -s /data/mha/scripts/mha_switchover /usr/local/bin/mha_switchover
chmod +x /usr/local/bin/mha_login_mysql
chmod +x /usr/local/bin/mha_switchover

四、故障模拟测试

4.1 在线手工切换(维护切换,需要把MHA监控进程关掉):

masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf

masterha_master_switch --master_state=alive --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=60

–orig_master_is_new_slave:把旧的master配置为从库
–running_updates_limit=60:如果主从库同步延迟在60s内都允许切换,但是但是切换的时间长短是由recover时relay 日志的大小决定

切换成功需要看到类似下面的提示:

[info] Switching master to 192.168.111.35(192.168.111.35:3307) completed successfully.

同时要查看VIP是否已经漂移到了新的主库上面,新的主库是否可读写,最后在新的主库上起manager服务。

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &

masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:4996) is running(0:PING_OK), master:192.168.111.35

附:MHA在线切换的原理

1. 检查当前的配置信息及主从服务器的信息
包括读取MHA的配置文件/data/mha/conf/app.cnf及检查当前slave的健康状态

2. 阻止对当前master的更新
主要通过如下步骤:
   2.1.等待1.5s($time_until_kill_threads*100ms),等待当前连接断开。
   2.2 执行 read_only=1,阻止新的DML操作
   2.3 等待0.5s,等待当前DML操作完成。
   2.4 kill掉所有连接。
   2.5 FLUSH NO_WRITE_TO_BINLOG TABLES
   2.6 FLUSH TABLES WITH READ LOCK
   
3. 等待新master执行完所有的relay log
Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..

4. 将新master的read_only设置为off,并添加VIP

5. slave切换到新master上。
   5.1 等待slave(192.168.244.30)应用完原主从复制产生的relay log,然后执行change master操作切换到新master上。
   5.2 释放原master上加的锁。
   5.3 因masterha_master_switch命令行中带有--orig_master_is_new_slave参数,故原master也切换为新master的从。

6. 清理新master的相关信息。
主要是执行了reset slave all操作,清除之前的复制信息。

4.2 故障手工切换(MHA进程没启动或者挂了的同时主库也挂了):

4.2.1 模拟场景

#主库35停manager监控服务
masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf

#主库35停数据库服务
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysqladmin --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -p shutdown

4.2.2 故障手工切换命令

masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --dead_master_host=192.168.111.35 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.111.34  --new_master_port=3307 --ignore_last_failover

切换成功需要看到类似如下提示:

Started manual(interactive) failover.
Invalidated master IP address on 192.168.111.35(192.168.111.35:3307)
Selected 192.168.111.34(192.168.111.34:3307) as a new master.
192.168.111.34(192.168.111.34:3307): OK: Applying all logs succeeded.
192.168.111.34(192.168.111.34:3307): OK: Activated master IP address.
192.168.111.34(192.168.111.34:3307): Resetting slave info succeeded.
Master failover to 192.168.111.34(192.168.111.34:3307) completed successfully.

表示成功切换,切换成功后,查看VIP是否漂移到了从库上(切换成功后,MHA进程会自动停止),同时查看数据库是否可读写。

故障主库起来后,需要确认数据是否跟新的主库一样,如果一样,那么就把故障主库作为新的从库加入新主库下。然后在主库上启动MHA进程。

4.2.3 将故障主库35数据库服务拉起,然后加入集群成为从库。

nohup /data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysqld_safe --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf &

#数据库服务拉起来后,配置从库
change master to
master_host='192.168.111.34',
master_port=3307,
master_user='repl',
master_password='myttrepl@2222#TO',
master_auto_position=1;

start slave;	//启动slave进程

 --查看slave状态
确认IO线程、SQL线程都以运行
mysql> show slave status\G;	//查看当前的从库状态
    Slave_IO_Running: Yes			//IO线程已运行
     Slave_SQL_Running: Yes	//SQL线程已运行
    mysql> show slave status\G;	//查看当前的从库状态

4.2.4 主库启动manager服务并查看MHA状态

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &
	
masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:14837) is running(0:PING_OK), master:192.168.111.34

五、MHA 日常维护命令集:

1 查看ssh 登陆是否成功

masterha_check_ssh --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

2 查看主从同步情况

masterha_check_repl  --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

3 检查启动的状态

masterha_check_status  --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

4 停止mha

masterha_stop --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

5 启动mha

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &

注意:当有slave 节点宕掉的情况是启动不了的,加上–ignore_fail_on_start 即使有节点宕掉也能启动mha,需要在配置文件中设置ignore_fail=1

6 failover 后下次重启

每次failover 切换后会在管理目录生成文件app.failover.complete ,下次在切换的时候会发现有这个文件导致切换不成功,需要手动清理掉。
rm -rf /masterha/app1/app1.failover.complete
也可以加上参数--ignore_last_failover

7 手工failover

手工failover 场景,master 死掉,但是masterha_manager 没有开启,可以通过手工failover:
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --dead_master_host=old_ip --dead_port=port --master_state=dead --new_master_host=new_ip --new_master_port=port --ignore_last_failover

8 masterha_manager 是一种监视和故障转移的程序。另一方面,masterha_master_switch 程序不监控主库。masterha_master_switch 可以用于主库故障转移,也可用于在线总开关。

9 手动在线切换(比如做维护切换时)

masterha_master_switch ---global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=192.168.111.34 --orig_master_is_new_slave
或者
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=192.168.111.34 -orig_master_is_new_slave --running_updates_limit=10000

--orig_master_is_new_slave 切换时加上此参数是将原master 变为slave 节点,如果不加此参数,原来的master 将不启动
--running_updates_limit=10000 切换时候选master 如果有延迟的话,mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover时relay 日志的大小决定

手动在线切换mha,切换时需要将在运行的mha 停掉后才能切换。
在备库先执行DDL,一般先stop slave,一般不记录mysql 日志,可以通过set SQL_LOG_BIN =0 实现。然后进行一次主备切换操作,再在原来的主库上执行			DDL。这种方法适用于增减索引,如果是增加字段就需要额外注意。
  • 19
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值