MySQL-MHA故障切换

MySQL MHA故障切换

资源列表

操作系统主机名IP地址软件
CentOS 7.9master192.168.93.101mysql-5.7.8.tar.gz
mha4mysql-node-0.57.tar.gz
CentOS 7.9slave1192.168.93.102mysql-5.7.8.tar.gz
mha4mysql-node-0.57.tar.gz
CentOS 7.9slave2192.168.93.103mysql-5.7.8.tar.gz
mha4mysql-node-0.57.tar.gz
CentOS 7.9manager192.168.93.104mha4mysql-node-0.57.tar.gz
mha4mysql-manager-0.57.tar.gz

基础环境

  • 关闭防火墙
[root@localhost ~]# systemctl stop firewalld
[root@localhost ~]# systemctl disable firewalld
  • 关闭内核安全机制
[root@localhost ~]# setenforce 0
[root@localhost ~]# sed -i "s/^SELINUX=.*/SELINUX=disabled/g" /etc/selinux/config
  • 修改主机名
[root@localhost ~]# hostnamectl set-hostname master
[root@localhost ~]# hostnamectl set-hostname slave1
[root@localhost ~]# hostnamectl set-hostname slave2
[root@localhost ~]# hostnamectl set-hostname manager
  • 添加主机映射文件
[root@master ~]# cat >> /etc/hosts << EOF
192.168.93.101 master
192.168.93.102 slave1
192.168.93.103 slave2
192.168.93.104 manager
EOF

一、安装MySQL数据库

  • 在三台mysql节点上分别安装数据库
1.1、安装相关依赖
  • 卸载CentOS 7 自带的Mariadb数据库
[root@mysql ~]# rpm -qa | grep mariadb
mariadb-libs-5.5.68-1.el7.x86_64
[root@mysql ~]# yum -y remove mariadb*
  • 如果采用CentOS 7 minni的操作系统,需要安装一些基础软件工具。
[root@mysql ~]# yum -y install vim wget net-tools lrzsz
  • 安装MySQL依赖的软件包
[root@mysql ~]# yum -y install libaio gcc gcc-c++
  • 创建运行MySQL程序的用户
[root@mysql ~]# useradd -M -s /sbin/nologin mysql
2.1、二进制安装
  • 二进制安装的版本采用跟上面编译安装的版本一样MySQL 5.7.28.首先需要下载软件包或者提前上传,然后再解压进行配置
  • MySQL官网地址:https://www.mysql.com/cn/
[root@mysql ~]# tar -zxvf mysql-5.7.28-linux-glibc2.12-x86_64.tar.gz 
[root@mysql ~]# mv mysql-5.7.28-linux-glibc2.12-x86_64 /usr/local/mysql
[root@mysql ~]# mkdir /usr/local/mysql/data
[root@mysql ~]# chown -R mysql:mysql /usr/local/mysql/data/
[root@mysql ~]# cd /usr/local/mysql/bin/

#执行下面命令配置和初始化数据库,回显的最后12字母就是临时root密码,要牢记,等下会用到
[root@mysql bin]# ./mysqld --user=mysql --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --initialize
########部分内容省略
2024-04-08T08:49:29.616458Z 1 [Note] A temporary password is generated for root@localhost: t;eQso<Ah1EZ
#########部分内容省略
3.1、设定配置文件
  • 二进制安装的MySQL配置文件跟源码编译安装的配置文件类似。
#添加以下内容
[root@mysql ~]# vim /etc/my.cnf
[client]
socket=/usr/local/mysql/data/mysql.sock
[mysqld]
socket=/usr/local/mysql/data/mysql.sock
# 绑定监听地址0.0.0.0
bind-address = 0.0.0.0
# 禁止域名解析,减少mysql对外部连接客户端DNS解析的时间
skip-name-resolve
# 设置端口为3306
port=3306
# 设置mysql的安装目录
basedir=/usr/local/mysql
# 设置mysql数据库的数据的存放目录
datadir=/usr/local/mysql/data
# 允许最大连接数
max_connections=2048
# 服务端使用的字符集默认为utf8
character-set-server=utf8
# 创建新表时将使用默认存储引擎
default-storage-engine=INNODB
# sql语句不区分大小写
lower_case_table_names=1
# 设置一次消息传输的最大值
max_allowed_packet=16M
  • 将MySQL的可执行文件写入环境变量中
[root@mysql ~]# echo "export PATH=$PATH:/usr/local/mysql/bin" >> /etc/profile
[root@mysql ~]# . /etc/profile  #加载文件内容
4.1、配置systemctl方式启动
[root@mysql ~]# cp /usr/local/mysql/support-files/mysql.server /etc/rc.d/init.d/mysqld
[root@mysql ~]# chmod +x /etc/rc.d/init.d/mysqld 
  • 编辑生成mysqld.service服务,通过systemctl方式来管理
[root@mysql ~]# vim /lib/systemd/system/mysqld.service
[Unit]
Description=mysqld
After=network.target

[Service]
Type=forking
ExecStart=/etc/rc.d/init.d/mysqld start
ExecReload=/etc/rc.d/init.d/mysqld restart
ExecStop=/etc/rc.d/init.d/mysqld stop
PrivateTmp=true

[Install]
WantedBy=multi-user.target  

[root@mysql ~]# systemctl daemon-reload 
[root@mysql ~]# systemctl enable mysqld
[root@mysql ~]# systemctl start mysqld
[root@mysql ~]# netstat -anpt | grep 3306
tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      8938/mysqld   
5.1、访问MySQL数据库
  • 临时密码登录MySQl数据库,然后给root用户设置一个新密码
[root@mysql ~]# mysql -u root -p
Enter password:         #输入临时密码
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.  
mysql> set password=password('wzh.2005');   #设置新密码
Query OK, 0 rows affected, 1 warning (0.00 sec)

二、配置server-id

  • master、slave1、slave2添加server-id标识服务器,注意每个MySQL的server-id一定不能一样
#master
[root@master ~]# vim /etc/my.cnf 
[mysqld]
server-id=1
log-bin=mysql-bin
log-slave-updates=true
# server-id=1:设置MySQL服务器的唯一标识符,用于在主从复制中区分不同的服务器。
# log-bin=mysql-bin:启用二进制日志功能,将所有的数据更改操作记录到二进制日志文件中。
#log-slave-updates=true:在从服务器上也记录二进制日志,以便从服务器可以成为其他从服务器的主服务器。


# slave1
[root@slave1 ~]# vim /etc/my.cnf 
[mysqld]
server-id = 2 
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index
log-bin = mysql-bin		#开启二进制日志,从节点有可能称为主节点,所以要开启
relay_log_purge=0 #是否自动删除中继日志,默认值为1。0为手动清除
[root@slave1 ~]# systemctl restart mysqld


# slave2
[root@slave2 ~]# vim /etc/my.cnf 
[mysqld]
server-id = 3 
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index
log-bin = mysql-bin		#开启二进制日志,从节点有可能称为主节点,所以要开启
relay_log_purge=0 #是否自动删除中继日志,默认值为1。0为手动清除
[root@slave2 ~]# systemctl restart mysqld

三、配置mysql一主两从

3.1、搭建ntp服务器

  • 三台mysql服务器都要做如下操作
yum -y install ntpdate
ntpdate ntp.aliyun.com
# 能联网的情况下,所有MySQL服务器全部同步aliyun

3.2、MySQL授权

  • 三台数据库节点上授权两个用户,一个是数据库同步用户,一个是mha监控用户
mysql> grant replication slave on *.* to 'myslave'@'192.168.93.%' identified by '123456';
mysql> grant all privileges on *.* to 'mha'@'192.168.93.%' identified by 'manager';

  • 下面三条授权按理论是不用添加的,但是在做实验案例时通过MHA检查MySQL主从有报错,报两个从数据库通过主机名连接不上主库,所以三个数据库都添加下面的授权
mysql> grant all privileges on *.* to 'mha'@'master' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'slave1' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'slave2' identified by 'manager';

3.3、查看二进制文件和同步点

  • master主机上查看二进制文件和同步点
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 |     1890 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

3.4、执行同步操作

  • slave1slave2分别执行同步
mysql> change master to master_host='192.168.93.101',master_user='myslave',master_password='123456',  master_log_file='mysql-bin.000001',master_log_pos=1890;
Query OK, 0 rows affected, 2 warnings (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

3.5、slave1slave2查看数据同步结果

  • 查看IO和SQL线程都是yes代表同步是否正常
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.93.101
                  Master_User: myslave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1890
               Relay_Log_File: relay-log-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

  • 必须设置两个从库为只读模式,在两个从库上分别执行
mysql> set global read_only=1;
# 1是只读,0是读写

3.6、测试数据同步

  • master主库插入条数据,测试是否同步
mysql> create database test_db;
Query OK, 1 row affected (0.00 sec)
mysql> use test_db;
Database changed
mysql> create table test(id int);
Query OK, 0 rows affected (0.00 sec)
mysql> insert into test(id) values ('1');
Query OK, 1 row affected (0.01 sec)

  • 两个从库分别查询,出现如下所示执行结果则说明主从同步正常
mysql> select * from test_db.test;
+------+
| id   |
+------+
|    1 |
+------+
1 row in set (0.00 sec)

四、安装MHA软件

  • 四台服务器上都需要安装MHA依赖的环境,首先安epel源
[root@manager ~]# yum -y install epel-release
[root@manager ~]# yum makecache 
[root@manager ~]# yum install -y perl-DBD-MySQL perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-Tiny perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN

  • MHA软件包对于每个操作系统版本不一样,这里CentOS 7.9必须选择0.57版本。在四台服务器上安装node组件,最后在manager节点上安装manager组件,因为manager依赖node组件。下面是master上操作演示安装node组件
[root@master ~]# tar -zxvf mha4mysql-node-0.57.tar.gz 
[root@master ~]# cd mha4mysql-node-0.57/
[root@master mha4mysql-node-0.57]# perl Makefile.PL 
*** Module::AutoInstall version 1.06
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node
[root@master mha4mysql-node-0.57]# make && make install
  • manager 上安装manager组件和node组件
[root@manager ~]# tar -zxvf mha4mysql-manager-0.57.tar.gz 
[root@manager ~]# cd mha4mysql-manager-0.57/
[root@manager mha4mysql-manager-0.57]# perl Makefile.PL 
*** Module::AutoInstall version 1.06
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.627)
- DBD::mysql            ...loaded. (4.023)
- Time::HiRes           ...loaded. (1.9725)
- Config::Tiny          ...loaded. (2.14)
- Log::Dispatch         ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.18)
- MHA::NodeConst        ...loaded. (0.57)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::manager
[root@manager mha4mysql-manager-0.57]# make && make install
  • manager安装后在/usr/local/bin下面会生成几个工具,主要包括
# masterha_check_ssh:检查MHA的SSH配置状况
# masterha_check_repl:检查MySQL主从复制状况
# masterha_manager:启动MHA
# masterha_check_status:检查当前MHA运行状态
# masterha_master_monitor:检测master是否宕机
# masterha_master_switch:控制故障转移(自动或者手动)
# masterha_conf_host:添加或删除配置的server信息
  • node安装后也会在/usr/local/bin下面会生成几个脚本(这些工具通常由MHA manager的脚本触发,无需人为操作)
# save_binary_logs:保存和复制master的二进制日志
# apply_diff_relay_logs:识别差异的中继日志事件并将其差异的事件应用于其他的slave
# filter_mysqlbinlog:去除不必要的ORLLBACK事件(MHA已不再使用这个工具)
# purge_relay_logs:清除中继日志(不会阻塞SQL线程)

五、配置无密码登录

作用:实现远程控制和数据管理

  • manager上配置到所有节点的无密码认证
[root@manager ~]# ssh-keygen -t rsa		#一路回车键
[root@manager ~]# ssh-copy-id 192.168.93.101	#master
[root@manager ~]# ssh-copy-id 192.168.93.102	#slave1
[root@manager ~]# ssh-copy-id 192.168.93.103	#slave2
  • master上配置到数据库节点的无密码认证
[root@master ~]# ssh-keygen -t rsa
[root@master ~]# ssh-copy-id 192.168.93.102		#slave1
[root@master ~]# ssh-copy-id 192.168.93.103		#slave2
  • slave1上配置到数据库节点的无密码认证
[root@slave1 ~]# ssh-keygen -t rsa
[root@slave1 ~]# ssh-copy-id 192.168.93.101		#master
[root@slave1 ~]# ssh-copy-id 192.168.93.103		#slave2
  • slave2上配置到数据库节点的无密码认证
[root@slave2 ~]# ssh-keygen -t rsa
[root@slave2 ~]# ssh-copy-id 192.168.93.101		#master
[root@slave2 ~]# ssh-copy-id 192.168.93.102		#slave1

六、配置MHA

  • manager节点上复制相关脚本到/usr/local/bin目录
[root@manager ~]# ll /root/mha4mysql-manager-0.57/samples/scripts/
总用量 32
-rwxr-xr-x 1 1001 1001  3648 531 2015 master_ip_failover
-rwxr-xr-x 1 1001 1001  9870 531 2015 master_ip_online_change
-rwxr-xr-x 1 1001 1001 11867 531 2015 power_manager
-rwxr-xr-x 1 1001 1001  1360 531 2015 send_report

# 脚本具体作用如下所示
		# master_ip_failover:自动切换VIP管理的脚本
		# master_ip_online_change:在线切换vip的管理的脚本
		# power_manager:故障发生后关闭主机的脚本
		# send_report:因故障切换后发送报警的脚本
[root@manager ~]# cp /root/mha4mysql-manager-0.57/samples/scripts/* /usr/local/bin/
[root@manager ~]# chmodc    +x /usr/local/bin/master_ip_failover 


# MHA引入VIP有两种方式,一种是使用keepalived,一种是使用MHA自带的脚本

# 复制“master_ip_failover”脚本到/usr/local/bin目录,这里使用脚本管理VIP,也是推荐的一种方式,生产环境下不建议使用keepalived
  • 使用下面内容完整替换master_ip_failover文件的内容,IP部分更改为同网段的虚拟IP地址
[root@manager ~]# mv /usr/local/bin/master_ip_failover /opt/master_ip_failover.bak
[root@manager ~]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
   $command,        $ssh_user,         $orig_master_host,
   $orig_master_ip, $orig_master_port, $new_master_host,
   $new_master_ip,  $new_master_port,  $new_master_user,
   $new_master_password
);

## 添加四行内容,在实际工作中适当修改 
my $vip = '192.168.93.200/24';            
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip up";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip \
--orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
  • 创建MHA软件目录并拷贝配置文件
[root@manager ~]# mkdir /etc/masterha
[root@manager ~]# cp /root/mha4mysql-manager-0.57/samples/conf/app1.cnf /etc/masterha/
[root@manager ~]# mkdir -p /var/log/masterha/app1
[root@manager ~]# vim /etc/masterha/app1.cnf 
[server default]
#manager日志
manager_log=/var/log/masterha/app1/manager.log        
#manager工作目录
manager_workdir=/var/log/masterha/app1
#master保存binlog的位置,这里的路径要与master里配置的binlog的路径一致,以便MHA能找到
master_binlog_dir=/usr/local/mysql/data
#设置自动故障切换的脚本
master_ip_failover_script=/usr/local/bin/master_ip_failover
#设置手动故障切换的脚本
master_ip_online_change_script=/usr/local/bin/master_ip_online_change

#登录数据库监控用户
user=mha
#登录数据库监控用户的密码
password=manager
#设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行故障转移
ping_interval=1
#设置远端mysql在发生切换时binlog的保存位置
remote_workdir=/tmp
#设置数据库主从复制的用户
repl_user=myslave
#设置数据库主从复制的用户密码
repl_password=123456
#检查从服务器
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.93.102 -s 192.168.93.103
#设置故障发生后关闭master脚本,防止发生脑裂,此处没有使用
shutdown_script=""
#设置ssh免密登录用户名
ssh_user=root

[server1]
hostname=192.168.93.101
port=3306

[server2]
hostname=192.168.93.102
port=3306
#设置为候选master,发生主从切换以后,强制将此库提升为主库,即使这个数据库不是集群中最新的slave。
candidate_master=1
#默认情况下如果一个slave落后master超过100M的relay logs的话,MHA将不会选择该slave作为一个新的master。该参数设置为0,则切换时会忽略该限制。
check_repl_delay=0

[server3]
hostname=192.168.93.103
port=3306




# candidate_master与check_repl_delay的主要作用如下所示
	# candidate_master=1:设置为候选master,如果设置该参数以后,发生主从切换以后会将此从库提升为主库,即使这和主库不是集群中的最新slave
	# check_repl_delay=0:默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master。因为对于这个slave的恢复需要花费很长事件,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延迟时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
  • 测试ssh无密码认证,如果正常最后会输出successfully,如下所示
[root@manager ~]# masterha_check_ssh -conf=/etc/masterha/app1.cnf
Wed Apr 17 21:28:19 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Apr 17 21:28:19 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Apr 17 21:28:19 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Apr 17 21:28:19 2024 - [info] Starting SSH connection tests..
Wed Apr 17 21:28:20 2024 - [debug] 
Wed Apr 17 21:28:19 2024 - [debug]  Connecting via SSH from root@192.168.93.101(192.168.93.101:22) to root@192.168.93.102(192.168.93.102:22)..
Wed Apr 17 21:28:19 2024 - [debug]   ok.
Wed Apr 17 21:28:19 2024 - [debug]  Connecting via SSH from root@192.168.93.101(192.168.93.101:22) to root@192.168.93.103(192.168.93.103:22)..
Wed Apr 17 21:28:20 2024 - [debug]   ok.
Wed Apr 17 21:28:20 2024 - [debug] 
Wed Apr 17 21:28:20 2024 - [debug]  Connecting via SSH from root@192.168.93.102(192.168.93.102:22) to root@192.168.93.101(192.168.93.101:22)..
Wed Apr 17 21:28:20 2024 - [debug]   ok.
Wed Apr 17 21:28:20 2024 - [debug]  Connecting via SSH from root@192.168.93.102(192.168.93.102:22) to root@192.168.93.103(192.168.93.103:22)..
Wed Apr 17 21:28:20 2024 - [debug]   ok.
Wed Apr 17 21:28:21 2024 - [debug] 
Wed Apr 17 21:28:20 2024 - [debug]  Connecting via SSH from root@192.168.93.103(192.168.93.103:22) to root@192.168.93.101(192.168.93.101:22)..
Wed Apr 17 21:28:20 2024 - [debug]   ok.
Wed Apr 17 21:28:20 2024 - [debug]  Connecting via SSH from root@192.168.93.103(192.168.93.103:22) to root@192.168.93.102(192.168.93.102:22)..
Wed Apr 17 21:28:21 2024 - [debug]   ok.
Wed Apr 17 21:28:21 2024 - [info] All SSH connection tests passed successfully.
# 成功
  • 测试mysql主从连接情况,最后出现MySQL Replication Health is OK字样说明正常。如下所示。
[root@manager ~]# masterha_check_repl -conf=/etc/masterha/app1.cnf
Wed Apr 17 21:37:23 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Apr 17 21:37:23 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Apr 17 21:37:23 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Apr 17 21:37:23 2024 - [info] MHA::MasterMonitor version 0.57.
Wed Apr 17 21:37:24 2024 - [info] GTID failover mode = 0
Wed Apr 17 21:37:24 2024 - [info] Dead Servers:
Wed Apr 17 21:37:24 2024 - [info] Alive Servers:
Wed Apr 17 21:37:24 2024 - [info]   192.168.93.101(192.168.93.101:3306)
Wed Apr 17 21:37:24 2024 - [info]   192.168.93.102(192.168.93.102:3306)
Wed Apr 17 21:37:24 2024 - [info]   192.168.93.103(192.168.93.103:3306)
Wed Apr 17 21:37:24 2024 - [info] Alive Slaves:
Wed Apr 17 21:37:24 2024 - [info]   192.168.93.102(192.168.93.102:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Wed Apr 17 21:37:24 2024 - [info]     Replicating from 192.168.93.101(192.168.93.101:3306)
Wed Apr 17 21:37:24 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Apr 17 21:37:24 2024 - [info]   192.168.93.103(192.168.93.103:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Wed Apr 17 21:37:24 2024 - [info]     Replicating from 192.168.93.101(192.168.93.101:3306)
Wed Apr 17 21:37:24 2024 - [info] Current Alive Master: 192.168.93.101(192.168.93.101:3306)
Wed Apr 17 21:37:24 2024 - [info] Checking slave configurations..
Wed Apr 17 21:37:24 2024 - [info] Checking replication filtering settings..
Wed Apr 17 21:37:24 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Apr 17 21:37:24 2024 - [info]  Replication filtering check ok.
Wed Apr 17 21:37:24 2024 - [info] GTID (with auto-pos) is not supported
Wed Apr 17 21:37:24 2024 - [info] Starting SSH connection tests..
Wed Apr 17 21:37:26 2024 - [info] All SSH connection tests passed successfully.
Wed Apr 17 21:37:26 2024 - [info] Checking MHA Node version..
Wed Apr 17 21:37:26 2024 - [info]  Version check ok.
Wed Apr 17 21:37:26 2024 - [info] Checking SSH publickey authentication settings on the current master..
Wed Apr 17 21:37:26 2024 - [info] HealthCheck: SSH to 192.168.93.101 is reachable.
Wed Apr 17 21:37:26 2024 - [info] Master MHA Node version is 0.57.
Wed Apr 17 21:37:26 2024 - [info] Checking recovery script configurations on 192.168.93.101(192.168.93.101:3306)..
Wed Apr 17 21:37:26 2024 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --start_file=mysql-bin.000001 
Wed Apr 17 21:37:26 2024 - [info]   Connecting to root@192.168.93.101(192.168.93.101:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /usr/local/mysql/data, up to mysql-bin.000001
Wed Apr 17 21:37:27 2024 - [info] Binlog setting check done.
Wed Apr 17 21:37:27 2024 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Apr 17 21:37:27 2024 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.93.102 --slave_ip=192.168.93.102 --slave_port=3306 --workdir=/tmp --target_version=5.7.28-log --manager_version=0.57 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Wed Apr 17 21:37:27 2024 - [info]   Connecting to root@192.168.93.102(192.168.93.102:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to relay-log-bin.000002
    Temporary relay log file is /usr/local/mysql/data/relay-log-bin.000002
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Apr 17 21:37:27 2024 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.93.103 --slave_ip=192.168.93.103 --slave_port=3306 --workdir=/tmp --target_version=5.7.28-log --manager_version=0.57 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Wed Apr 17 21:37:27 2024 - [info]   Connecting to root@192.168.93.103(192.168.93.103:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to relay-log-bin.000002
    Temporary relay log file is /usr/local/mysql/data/relay-log-bin.000002
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Apr 17 21:37:27 2024 - [info] Slaves settings check done.
Wed Apr 17 21:37:27 2024 - [info] 
192.168.93.101(192.168.93.101:3306) (current master)
 +--192.168.93.102(192.168.93.102:3306)
 +--192.168.93.103(192.168.93.103:3306)

Wed Apr 17 21:37:27 2024 - [info] Checking replication health on 192.168.93.102..
Wed Apr 17 21:37:27 2024 - [info]  ok.
Wed Apr 17 21:37:27 2024 - [info] Checking replication health on 192.168.93.103..
Wed Apr 17 21:37:27 2024 - [info]  ok.
Wed Apr 17 21:37:27 2024 - [info] Checking master_ip_failover_script status:
Wed Apr 17 21:37:27 2024 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.93.101 --orig_master_ip=192.168.93.101 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.93.200/24 up===

Checking the Status of the script.. OK 
Wed Apr 17 21:37:27 2024 - [info]  OK.
Wed Apr 17 21:37:27 2024 - [warning] shutdown_script is not defined.
Wed Apr 17 21:37:27 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
  • 如果在执行的过程中,如果有如下报错信息;
Can't exec "mysqlbinlog": 没有那个文件或目录 at /usr/local/share/perl5/MHA/BinlogManager.pm line 106.

# 解决方案:
采用设置软连接的方式解决,三台MySQL上都需要执行
[root@master ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
[root@master ~]# ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql

  • 如果在执行过程中,如果有如下报错信息;
Bareword "FIXME_xxx" not allowed while "strict subs" in use at /usr/local/bin/master_ip_failover line
100.

# 解决方案:
在 manager 机器上面使用如下解决方法。
[root@manager ~]# vim /usr/local/bin/master_ip_failover
## Update master ip on the catalog database, etc
#FIXME_xxx; //将此行注释 99行左右
  • 首次配置MHA的VIP地址需要手动进行配置,在master上执行如下命令
[root@master ~]# ifconfig ens33:1 192.168.93.200
[root@master ~]# ifconfig ens33:1
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.93.200  netmask 255.255.255.0  broadcast 192.168.93.255
        ether 00:0c:29:cd:7a:7a  txqueuelen 1000  (Ethernet)
  • manager启动MHA
[root@manager ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover  </dev/null>/var/log/masterha/app1/manager.log 2>&1 &

配置参数具体作用如下。
 --remove_dead_master_conf 该参数代表当发生主从切换后,老的主库的 IP 地址将会
从配置文件中移除。
--ignore_last_failover 在缺省情况下,如果 MHA 检测到连续发生宕机,且两次宕机间
隔不足 8 小时的话,则不会进行故障转移,之所以这样限制是为了避免 ping-pong 效
应。该参数代表忽略上次 MHA 触发切换产生的文件,默认情况下,MHA 发生切换后
会在日志中记录,下次再切换的时候如果发现该目录下存在该文件将不允许触发切换,
除非
在第一次切换后收到删除该文件。为了方便,这里设置为–ignore_last_failover。


# 若要关闭manager服务,可以使用如下命令。
[root@manager ~]# masterha_stop --conf=/etc/masterha/app1.cnf
# 或者直接kill进程ID的方式关闭

  • 查看MHA状态,可以看到当前的master节点
[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:12766) is running(0:PING_OK), master:192.168.93.101

# 也可以通过日志进行查看
[root@manager ~]# cat /var/log/masterha/app1/manager.log

七、模拟master故障转移

7.1、自动切换

7.1.1、关闭当前的master服务
# 以下三种关闭mysql的方法任选其一
[root@master ~]# systemctl stop mysqld
[root@master ~]# kill -9 mysql
[root@master ~]# mysqladmin -u root -p wzh.2005 shutdown

7.2.2、manager观察MHA日志
# 大概日志文件中出现以下两个日志可以断定自动切换成功
[root@manager ~]# cat /var/log/masterha/app1/manager.log | grep "Enabling"
Enabling the VIP - 192.168.93.200/24 on the new master - 192.168.93.102 
[root@manager ~]# cat /var/log/masterha/app1/manager.log | grep "Selected"
Selected 192.168.93.102(192.168.93.102:3306) as a new master.

  • 注意:VIP地址不会因为MHA服务停止而消失
7.2.3、查看slave1是否接管VIP地址
  • 正常自动切换一次后,MHA进程会退出。MHA会自动修改app1.cnf文件内容将宕机的mysql节点删除。查看slave1是否接管VIP地址
  • 如果主库恢复正常后,使其重新加入群集,并将master作为新的从库
[root@slave1 ~]# ifconfig ens33:1
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.93.200  netmask 255.255.255.0  broadcast 192.168.93.255
        ether 00:0c:29:c2:6c:57  txqueuelen 1000  (Ethernet)

7.2.4、启动原来的master主库
  • 先查看新的master(slave1)上面的二进制日志和pos位置
[root@slave1 ~]# mysql -u root -pwzh.2005
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 |     1593 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

  • master(新从)授权
[root@master ~]# systemctl start mysqld
[root@master ~]# mysql -u root -pwzh.2005
mysql> change master to master_host='192.168.93.102',master_user='myslave',master_password='123456',  master_log_file='mysql-bin.000001',master_log_pos=1593;
Query OK, 0 rows affected, 2 warnings (0.01 sec)


mysql> start slave;
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.93.102
                  Master_User: myslave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1593
               Relay_Log_File: master-relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
mysql> set global read_only=1;
7.2.5、停掉slave1(新库)的同步进行
  • 停掉当前主库(slave1)的同步进程,不然下次作为从库同步会报错
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.00 sec)

# 这两个命令用于停止并重置MySQL的复制进程。stop slave命令用于停止正在运行的复制进程,而reset slave命令用于清除复制进程的所有配置信息,包括主从服务器的连接信息、复制位置等,相当于将复制进程重置为初始状态。

7.2、手动切换

7.2.1、修改配置文件
  • 手动修改/etc/masterha/app1.cnf文件,可以手动把宕机的master作为主库继续提供服务,注意手动切换VIP不会漂移。重新检查数据库主从状态是否正常,修改后的app1.cnf文件内容如下:
[root@manager ~]# vim /etc/masterha/app1.cnf 
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/usr/local/mysql/data
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=manager
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=myslave
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.93.102 -s 192.168.93.103
shutdown_script=""
ssh_user=root
user=mha
###############################
[server1]
candidate_master=1
check_repl_delay=0
hostname=192.168.93.101
port=3306
###############################
[server2]
#candidate_master=1
#check_repl_delay=0
hostname=192.168.93.102
port=3306

[server3]
hostname=192.168.93.103
port=3306

7.2.3、启动MHA检查是否正常
  • 在启动MHA之前检查一次无密码认证和MySQL主从状态是否正常,请参考之前的步骤,最后启动MHA检查是否正常
[root@manager ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover  </dev/null>/var/log/masterha/app1/manager.log 2>&1 &

7.2.4、查看当前主库是slave1
[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:13337) is running(0:PING_OK), master:192.168.93.102

7.2.5、停止MHA
  • 注意:手动切换master,必须先关闭mha,然后再启动
[root@manager ~]# masterha_stop --conf=/etc/masterha/app1.cnf

7.2.6、手动设置主库
  • 手动设置当前的主库slave1为dead,最后报错没有影响
[root@manager ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=dead --dead_master_host=192.168.93.102
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.93.102.
--dead_master_port=<dead_master_port> is not set. Using 3306.
Wed Apr 17 22:12:36 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Apr 17 22:12:36 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Apr 17 22:12:36 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Apr 17 22:12:36 2024 - [info] MHA::MasterFailover version 0.57.
Wed Apr 17 22:12:36 2024 - [info] Starting master failover.
Wed Apr 17 22:12:36 2024 - [info] 
Wed Apr 17 22:12:36 2024 - [info] * Phase 1: Configuration Check Phase..
Wed Apr 17 22:12:36 2024 - [info] 
Wed Apr 17 22:12:37 2024 - [info] GTID failover mode = 0
Wed Apr 17 22:12:37 2024 - [info] Dead Servers:
Wed Apr 17 22:12:37 2024 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover.
Wed Apr 17 22:12:37 2024 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53.
# 最后报错没有影响
7.2.7、查看主库状态
  • 再次检查主库的状态,说明slave1已经停止
[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

7.2.8、设置新的master
  • 设置新的master,在输入命令后会有交互需要输入yes。另外需要注释手动切换脚本/usr/local/bin/master_ip_online_change里面的152行,不然会报错
  • 解决方案如下:
[root@manager ~]# vim /usr/local/bin/master_ip_online_change 
#FIXME_xxx_drop_app_user($orig_master_handler);		#默认152行

7.2.9、重新设置新的master为alive命令
  • 重新执行设置新的master为alive命令,出现completed successf字样,表示手动切换成功
[root@manager ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.93.101 --orig_master_is_new_slave
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.93.102(192.168.93.102:3306)? (YES/no): yes
Starting master switch from 192.168.93.102(192.168.93.102:3306) to 192.168.93.101(192.168.93.101:3306)? (yes/NO): yes
Wed Apr 17 22:20:05 2024 - [info] Switching master to 192.168.93.101(192.168.93.101:3306) completed successfully.
# 部分内容省略
  • 注意:手动切换master后,记得重新恢复主从复制群集;注意手动切换主库,VIP不会漂移
7.2.10、登录新的master
  • 登录到新的master,停掉当前主库的同步进行
[root@master ~]# mysql -u root -pwzh.2005
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.00 sec)

7.2.11、手动解除VIP地址,并绑定到新的master上
[root@slave1 ~]# ifconfig ens33:1 down		#93.102关闭VIP地址
[root@master ~]# ifconfig ens33:1 192.168.93.200 #93.101绑定VIP地址
  • 在manager上开启MHA,查看当前master
[root@manager ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover  </dev/null>/var/log/masterha/app1/manager.log 2>&1 &

[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:13605) is running(0:PING_OK), master:192.168.93.101

  • 至此,MySQL MHA可高用环境搭建及故障切换完成。
  • 4
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值