MySQL高可用自动故障恢复笔记(MHA)

2 篇文章 0 订阅
1 篇文章 0 订阅

一、MHA架构

MHA-Manager

  1. MHA 可以在几秒钟内完成故障转移(9-12 秒检测 master 故障,可选 7-10 秒关闭 master 机器以避免裂脑,几秒钟将差分中继日志应用到新 master,因此总停机时间为一般10-30秒),只要slave不严重延迟复制。在恢复新的 master 后,MHA 并行恢复其余的 slave。即使你有几十个slave,也不影响master恢复时间,你可以很快恢复slave。
  2. 在 150 多个 {master, slaves} 环境中使用 MHA。当其中一个主节点崩溃时,MHA 在 4 秒内完成了故障转移。
a. Master crash 不会导致数据不一致

当前master崩溃时,MHA会自动识别slave之间的差异中继日志事件,并应用于每个slave。所以最后所有从服务器都可以同步,只要所有从服务器都活着。通过与半同步复制一起使用,也可以保证(几乎)没有数据丢失。

b. 没有性能损失

MHA 与常规异步或半同步 MySQL 复制一起使用。在监控主服务器时,MHA 只是每 N 秒(默认为 3)向主服务器发送 ping 数据包,并且不会发送大量查询。您可以期待与常规 MySQL 复制一样快的性能。

c. 适用于任何存储引擎

MHA 适用于任何存储引擎,只要 MySQL 复制有效,不限于 InnoDB(崩溃安全的事务存储引擎)。即使您使用不易迁移的旧版 MyISAM 环境,您也可以使用 MHA。

二、MHA服务角色

  1. MHA Manager:监控MySQL master、控制master故障切换等管理程序
  2. MHA Node:故障转移辅助脚本,例如解析 MySQL 二进制(binary log)/中继日志(relay log)、识别应将中继日志应用到其他从站的中继日志位置、将事件应用到目标从站等。 MHA 节点在每个 MySQL 服务器上运行

当 MHA 管理器进行故障转移时,MHA 管理器通过 SSH 连接 MHA 节点并在需要时执行 MHA 节点命令。

a. Manager节点脚本

masterha_check_ssh:MHA 依赖的 ssh 环境监测工具

masterha_check_repl:MYSQL 复制环境检测工具

masterga_manager:MHA 服务主程序

masterha_check_status:MHA 运行状态探测工具

masterha_master_monitor:MYSQL master 节点可用性监测工具

masterha_master_swith:master:节点切换工具

masterha_conf_host:添加或删除配置的节点

masterha_stop:关闭 MHA 服务的工具

b. Node节点脚本

(通常由MHA Manager的脚本触发,无需人为操作)

save_binary_logs:保存和复制 master 的二进制日志

apply_diff_relay_logs:识别差异的中继日志事件并应用于其他 slave

purge_relay_logs:清除中继日志(不会阻塞 SQL 线程)

c. 自定义扩展脚本

secondary_check_script:用于检查来自多个网络路由的 master 可用性

master_ip_failover_script: 用于更新应用程序使用的 master ip 地址

shutdown_script:强制关闭master

report_script:用于发送报告

init_conf_load_script:用于加载初始配置参数

master_ip_online_change_script:用于更新主IP地址。这不用于主故障转移,而是用于在线主交换机

三、安装MySQL

1. 节点规划
资源版本
Manager(MHA管理节点)4c8g0.58
Master4c8g(根据mysql实际使用情况)5.7.34
Slave14c8g(根据mysql实际使用情况)5.7.34
Slave24c8g(根据mysql实际使用情况)5.7.34
2. 安装mysql(此文mysql版本为mysqk5.7.34)
#添加源
vim /etc/yum.repo.d/mariadb.repo
[mariadb]
name = MariaDB 
baseurl = https://mirrors.ustc.edu.cn/mariadb/yum/10.4/centos7-amd64 gpgkey=https://mirrors.ustc.edu.cn/mariadb/yum/RPM-GPG-KEY-MariaDB 
gpgcheck=1
# 安装
yum -y install MariaDB-server MariaDB-client
# 初始化
mysql_secure_installation
3. master节点配置my.cnf
#配置my.cnf
[mysqld]
server-id = 1               //复制集群中的各节点的id均必须唯一
log-bin = master-log        //开启二进制日志
relay-log = relay-log       //开启中继日志
skip_name_resolve           //关闭名称解析(非必须)
4. 所有slave节点配置my.cnf
[root@slave1 ~]# vim /etc/my.cnf
    [mysqld]
    server-id = 2               //复制集群中的各节点的id均必须唯一;
    relay-log = relay-log       //开启中继日志
    log-bin = master-log        //开启二进制日志
    read_only = ON              //启用只读属性
    relay_log_purge = 0         //是否自动清空不再需要中继日志
    skip_name_resolve           //关闭名称解析(非必须)
    log_slave_updates = 1       //使得更新的数据写进二进制日志中
[root@slave1 ~]# systemctl restart mariadb
[root@slave2 ~]# vim /etc/my.cnf
    [mysqld]
    server-id = 3               //复制集群中的各节点的id均必须唯一;
    relay-log = relay-log       //开启中继日志
    log-bin = master-log        //开启二进制日志
    read_only = ON              //启用只读属性
    relay_log_purge = 0         //是否自动清空不再需要中继日志
    skip_name_resolve           //关闭名称解析(非必须)
    log_slave_updates = 1       //使得更新的数据写进二进制日志中
[root@slave2 ~]# systemctl restart mariadb
5. 配置一主多从mysql
# master节点
# 授权同步数据账号
[root@master ~]# mysql -uroot -p'keer'
MariaDB [(none)]>grant replication slave,replication client on *.* to 'slave'@'192.168.%.%' identified by 'keer';
# 备份数据导出到从库
[root@master ~]# mysqldump -uroot -p'keer' --all-databases > `date +%F`-mysql-all.sql
[root@master ~]# scp *mysql-all.sql  192.168.20.100:/root
[root@master ~]# mysql -uroot -p'keer'
MariaDB [(none)]> show master status;
# slave节点
#导入数据
[root@slave1 ~]# mysql -uroot -p'keer' < *mysql-all.sql
[root@slave1 ~]# mysql -uroot -p'keer'
MariaDB [(none)]> change master to master_host='192.168.37.122', 
    -> master_user='slave', 
    -> master_password='keer',
    -> master_log_file='mysql-bin.000001',
    -> master_log_pos=415;
MariaDB [(none)]> start slave;
MariaDB [(none)]> show slave status\G;

四、安装MHA-manager(manager管理节点)、MHA-node(所有mysql节点)

MHA下载链接:https://github.com/yoshinorim/mha4mysql-manager/releases/tag/v0.58

1. 操作系统依赖

yum install epel-release -y

yum -y install libmysqlclient.so.18

yum install perl-DBD-MySQL -y

yum install perl-Config-Tiny -y

yum install perl-Log-Dispatch -y

yum install perl-Parallel-ForkManager -y

rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

所有节点对向免密

2. 配置mha.cnf
[server default]
user=mhaadmin
password=MHAP@ssw0rd_2021
manager_workdir=/var/lib/mha-manager/app
manager_log=/etc/mha_master/manager.log
remote_workdir=/var/lib/mha
ssh_user=root
repl_user=slave
repl_password=P@ssw0rd_2021
ping_interval=3
master_ip_failover_script=/etc/mha_master/master_ip_failover.pl
[server1]
hostname=111.16.100.253
ssh_port=22
candidate_master=1
[server2]
hostname=111.16.100.224
ssh_port=22
candidate_master=1
[server3]
hostname=111.16.101.73
ssh_port=22
candidate_master=1

配置说明参考地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Configuration

3. master故障ip切换脚本

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);
# 这里的vip和网卡需要自己根据服务器网络配置
my $vip = '111.16.100.189/22';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {

      # updating global catalog, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
    elsif ( $command eq "start" ) {

        # all arguments are passed.
        # If you manage master ip address at global catalog database,
        # activate new_master_ip here.
        # You can also grant write access (create user, set read_only=0, etc) here.
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}


sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
   `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}


sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

五、检查MHA

检查主机之间是否完成对向免密配置

masterha_check_ssh

$ > masterha_check_ssh --conf=/etc/mha_master/mha.cnf
Mon Sep 27 16:32:07 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 27 16:32:07 2021 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
Mon Sep 27 16:32:07 2021 - [info] Reading server configuration from /etc/mha_master/mha.cnf..
Mon Sep 27 16:32:07 2021 - [info] Starting SSH connection tests..
Mon Sep 27 16:32:08 2021 - [debug]
Mon Sep 27 16:32:07 2021 - [debug]  Connecting via SSH from root@111.16.100.253(111.16.100.253:22) to root@111.16.100.224(111.16.100.224:22)..
Mon Sep 27 16:32:07 2021 - [debug]   ok.
Mon Sep 27 16:32:07 2021 - [debug]  Connecting via SSH from root@111.16.100.253(111.16.100.253:22) to root@111.16.101.73(111.16.101.73:22)..
Mon Sep 27 16:32:08 2021 - [debug]   ok.
Mon Sep 27 16:32:08 2021 - [debug]
Mon Sep 27 16:32:07 2021 - [debug]  Connecting via SSH from root@111.16.100.224(111.16.100.224:22) to root@111.16.100.253(111.16.100.253:22)..
Mon Sep 27 16:32:08 2021 - [debug]   ok.
Mon Sep 27 16:32:08 2021 - [debug]  Connecting via SSH from root@111.16.100.224(111.16.100.224:22) to root@111.16.101.73(111.16.101.73:22)..
Mon Sep 27 16:32:08 2021 - [debug]   ok.
Mon Sep 27 16:32:09 2021 - [debug]
Mon Sep 27 16:32:08 2021 - [debug]  Connecting via SSH from root@111.16.101.73(111.16.101.73:22) to root@111.16.100.253(111.16.100.253:22)..
Mon Sep 27 16:32:08 2021 - [debug]   ok.
Mon Sep 27 16:32:08 2021 - [debug]  Connecting via SSH from root@111.16.101.73(111.16.101.73:22) to root@111.16.100.224(111.16.100.224:22)..
Mon Sep 27 16:32:08 2021 - [debug]   ok.
Mon Sep 27 16:32:09 2021 - [info] All SSH connection tests passed successfully.

检查replication是否正常

$ > masterha_check_repl --conf=/etc/mha_master/mha.cnf
Mon Sep 27 16:32:46 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 27 16:32:46 2021 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
Mon Sep 27 16:32:46 2021 - [info] Reading server configuration from /etc/mha_master/mha.cnf..
Mon Sep 27 16:32:46 2021 - [info] MHA::MasterMonitor version 0.58.
Mon Sep 27 16:32:47 2021 - [info] GTID failover mode = 0
Mon Sep 27 16:32:47 2021 - [info] Dead Servers:
Mon Sep 27 16:32:47 2021 - [info] Alive Servers:
Mon Sep 27 16:32:47 2021 - [info]   111.16.100.253(111.16.100.253:3306)
Mon Sep 27 16:32:47 2021 - [info]   111.16.100.224(111.16.100.224:3306)
Mon Sep 27 16:32:47 2021 - [info]   111.16.101.73(111.16.101.73:3306)
Mon Sep 27 16:32:47 2021 - [info] Alive Slaves:
Mon Sep 27 16:32:47 2021 - [info]   111.16.100.253(111.16.100.253:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 16:32:47 2021 - [info]     Replicating from 111.16.100.224(111.16.100.224:3306)
Mon Sep 27 16:32:47 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 16:32:47 2021 - [info]   111.16.101.73(111.16.101.73:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 16:32:47 2021 - [info]     Replicating from 111.16.100.224(111.16.100.224:3306)
Mon Sep 27 16:32:47 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 16:32:47 2021 - [info] Current Alive Master: 111.16.100.224(111.16.100.224:3306)
Mon Sep 27 16:32:47 2021 - [info] Checking slave configurations..
Mon Sep 27 16:32:47 2021 - [warning]  relay_log_purge=0 is not set on slave 111.16.100.253(111.16.100.253:3306).
Mon Sep 27 16:32:47 2021 - [info] Checking replication filtering settings..
Mon Sep 27 16:32:47 2021 - [info]  binlog_do_db= , binlog_ignore_db=
Mon Sep 27 16:32:47 2021 - [info]  Replication filtering check ok.
Mon Sep 27 16:32:47 2021 - [info] GTID (with auto-pos) is not supported
Mon Sep 27 16:32:47 2021 - [info] Starting SSH connection tests..
Mon Sep 27 16:32:50 2021 - [info] All SSH connection tests passed successfully.
Mon Sep 27 16:32:50 2021 - [info] Checking MHA Node version..
Mon Sep 27 16:32:51 2021 - [info]  Version check ok.
Mon Sep 27 16:32:51 2021 - [info] Checking SSH publickey authentication settings on the current master..
Mon Sep 27 16:32:51 2021 - [info] HealthCheck: SSH to 111.16.100.224 is reachable.
Mon Sep 27 16:32:51 2021 - [info] Master MHA Node version is 0.58.
Mon Sep 27 16:32:51 2021 - [info] Checking recovery script configurations on 111.16.100.224(111.16.100.224:3306)..
Mon Sep 27 16:32:51 2021 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/lib/mha/save_binary_logs_test --manager_version=0.58 --start_file=master-log.000003
Mon Sep 27 16:32:51 2021 - [info]   Connecting to root@111.16.100.224(111.16.100.224:22)..
  Creating /var/lib/mha if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to master-log.000003
Mon Sep 27 16:32:51 2021 - [info] Binlog setting check done.
Mon Sep 27 16:32:51 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Sep 27 16:32:51 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhaadmin' --slave_host=111.16.100.253 --slave_ip=111.16.100.253 --slave_port=3306 --workdir=/var/lib/mha --target_version=5.7.34-log --manager_version=0.58 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Mon Sep 27 16:32:51 2021 - [info]   Connecting to root@111.16.100.253(111.16.100.253:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-log.000002
    Temporary relay log file is /var/lib/mysql/relay-log.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Sep 27 16:32:52 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhaadmin' --slave_host=111.16.101.73 --slave_ip=111.16.101.73 --slave_port=3306 --workdir=/var/lib/mha --target_version=5.7.34-log --manager_version=0.58 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Mon Sep 27 16:32:52 2021 - [info]   Connecting to root@111.16.101.73(111.16.101.73:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-log.000002
    Temporary relay log file is /var/lib/mysql/relay-log.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Sep 27 16:32:52 2021 - [info] Slaves settings check done.
Mon Sep 27 16:32:52 2021 - [info]
111.16.100.224(111.16.100.224:3306) (current master)
 +--111.16.100.253(111.16.100.253:3306)
 +--111.16.101.73(111.16.101.73:3306)

Mon Sep 27 16:32:52 2021 - [info] Checking replication health on 111.16.100.253..
Mon Sep 27 16:32:52 2021 - [info]  ok.
Mon Sep 27 16:32:52 2021 - [info] Checking replication health on 111.16.101.73..
Mon Sep 27 16:32:52 2021 - [info]  ok.
Mon Sep 27 16:32:52 2021 - [info] Checking master_ip_failover_script status:
Mon Sep 27 16:32:52 2021 - [info]   /etc/mha_master/master_ip_failover.pl --command=status --ssh_user=root --orig_master_host=111.16.100.224 --orig_master_ip=111.16.100.224 --orig_master_port=3306
Checking the Status of the script.. OK
Mon Sep 27 16:32:53 2021 - [info]  OK.
Mon Sep 27 16:32:53 2021 - [warning] shutdown_script is not defined.
Mon Sep 27 16:32:53 2021 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

启动MHA

nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /var/log/mha-manager/manager.log &

六、测试mysql-master故障恢复

1. 停止master mysql节点,查看mha-manager日志
Mon Sep 27 14:50:08 2021 - [info] Set master ping interval 3 seconds.
Mon Sep 27 14:50:08 2021 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Sep 27 14:50:08 2021 - [info] Starting ping health check on 111.16.100.253(111.16.100.253:3306)..
Mon Sep 27 14:50:08 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '111.16.100.253' (111))
Mon Sep 27 14:50:08 2021 - [warning] Connection failed 1 time(s)..
Mon Sep 27 14:50:08 2021 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/lib/mha/save_binary_logs_test --manager_version=0.58 --binlog_prefix=master-log
Mon Sep 27 14:50:09 2021 - [info] HealthCheck: SSH to 111.16.100.253 is reachable.
Mon Sep 27 14:50:11 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '111.16.100.253' (111))
Mon Sep 27 14:50:11 2021 - [warning] Connection failed 2 time(s)..
Mon Sep 27 14:50:14 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '111.16.100.253' (111))
Mon Sep 27 14:50:14 2021 - [warning] Connection failed 3 time(s)..
Mon Sep 27 14:50:17 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '111.16.100.253' (111))
Mon Sep 27 14:50:17 2021 - [warning] Connection failed 4 time(s)..
Mon Sep 27 14:50:17 2021 - [warning] Master is not reachable from health checker!
Mon Sep 27 14:50:17 2021 - [warning] Master 111.16.100.253(111.16.100.253:3306) is not reachable!

Mon Sep 27 14:50:17 2021 - [warning] SSH is reachable.
Mon Sep 27 14:50:17 2021 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha_master/mha.cnf again, and trying to connect to all servers to check server status..
Mon Sep 27 14:50:17 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 27 14:50:17 2021 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
Mon Sep 27 14:50:17 2021 - [info] Reading server configuration from /etc/mha_master/mha.cnf..
Mon Sep 27 14:50:18 2021 - [info] GTID failover mode = 0
Mon Sep 27 14:50:18 2021 - [info] Dead Servers:
Mon Sep 27 14:50:18 2021 - [info]   111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:18 2021 - [info] Alive Servers:
Mon Sep 27 14:50:18 2021 - [info]   111.16.100.224(111.16.100.224:3306)
Mon Sep 27 14:50:18 2021 - [info]   111.16.101.73(111.16.101.73:3306)
Mon Sep 27 14:50:18 2021 - [info] Alive Slaves:
Mon Sep 27 14:50:18 2021 - [info]   111.16.100.224(111.16.100.224:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:18 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:18 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:18 2021 - [info]   111.16.101.73(111.16.101.73:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:18 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:18 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:18 2021 - [info] Checking slave configurations..
Mon Sep 27 14:50:18 2021 - [info]  read_only=1 is not set on slave 111.16.100.224(111.16.100.224:3306).
Mon Sep 27 14:50:18 2021 - [info] Checking replication filtering settings..
Mon Sep 27 14:50:18 2021 - [info]  Replication filtering check ok.
Mon Sep 27 14:50:18 2021 - [info] Master is down!
Mon Sep 27 14:50:18 2021 - [info] Terminating monitoring script.
Mon Sep 27 14:50:18 2021 - [info] Got exit code 20 (Master dead).
Mon Sep 27 14:50:18 2021 - [info] MHA::MasterFailover version 0.58.
Mon Sep 27 14:50:18 2021 - [info] Starting master failover.
Mon Sep 27 14:50:18 2021 - [info]
Mon Sep 27 14:50:18 2021 - [info] * Phase 1: Configuration Check Phase..
Mon Sep 27 14:50:18 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] GTID failover mode = 0
Mon Sep 27 14:50:20 2021 - [info] Dead Servers:
Mon Sep 27 14:50:20 2021 - [info]   111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:20 2021 - [info] Checking master reachability via MySQL(double check)...
Mon Sep 27 14:50:20 2021 - [info]  ok.
Mon Sep 27 14:50:20 2021 - [info] Alive Servers:
Mon Sep 27 14:50:20 2021 - [info]   111.16.100.224(111.16.100.224:3306)
Mon Sep 27 14:50:20 2021 - [info]   111.16.101.73(111.16.101.73:3306)
Mon Sep 27 14:50:20 2021 - [info] Alive Slaves:
Mon Sep 27 14:50:20 2021 - [info]   111.16.100.224(111.16.100.224:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:20 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:20 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:20 2021 - [info]   111.16.101.73(111.16.101.73:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:20 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:20 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:20 2021 - [info] Starting Non-GTID based failover.
Mon Sep 27 14:50:20 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Sep 27 14:50:20 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Sep 27 14:50:20 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Sep 27 14:50:20 2021 - [info] Executing master IP deactivation script:
Mon Sep 27 14:50:20 2021 - [info]   /etc/mha_master/master_ip_failover.pl --orig_master_host=111.16.100.253 --orig_master_ip=111.16.100.253 --orig_master_port=3306 --command=stopssh --ssh_user=root
Mon Sep 27 14:50:20 2021 - [info]  done.
Mon Sep 27 14:50:20 2021 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Sep 27 14:50:20 2021 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Sep 27 14:50:20 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] * Phase 3: Master Recovery Phase..
Mon Sep 27 14:50:20 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Sep 27 14:50:20 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] The latest binary log file/position on all slaves is master-log.000007:154
Mon Sep 27 14:50:20 2021 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Sep 27 14:50:20 2021 - [info]   111.16.100.224(111.16.100.224:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:20 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:20 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:20 2021 - [info]   111.16.101.73(111.16.101.73:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:20 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:20 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:20 2021 - [info] The oldest binary log file/position on all slaves is master-log.000007:154
Mon Sep 27 14:50:20 2021 - [info] Oldest slaves:
Mon Sep 27 14:50:20 2021 - [info]   111.16.100.224(111.16.100.224:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:20 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:20 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:20 2021 - [info]   111.16.101.73(111.16.101.73:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:20 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:20 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:20 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Mon Sep 27 14:50:20 2021 - [info]
Mon Sep 27 14:50:20 2021 - [info] Fetching dead master's binary logs..
Mon Sep 27 14:50:20 2021 - [info] Executing command on the dead master 111.16.100.253(111.16.100.253:3306): save_binary_logs --command=save --start_file=master-log.000007  --start_pos=154 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
  Creating /var/lib/mha if not exists..    ok.
 Concat binary/relay logs from master-log.000007 pos 154 to master-log.000007 EOF into /var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 154.. ok.
  Dumping effective binlog data from /var/lib/mysql/master-log.000007 position 154 to tail(177).. ok.
 Binlog Checksum enabled
 Concat succeeded.
Mon Sep 27 14:50:21 2021 - [info] scp from root@111.16.100.253:/var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog to local:/var/lib/mha-manager/app/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog succeeded.
Mon Sep 27 14:50:21 2021 - [info] HealthCheck: SSH to 111.16.100.224 is reachable.
Mon Sep 27 14:50:21 2021 - [info] HealthCheck: SSH to 111.16.101.73 is reachable.
Mon Sep 27 14:50:22 2021 - [info]
Mon Sep 27 14:50:22 2021 - [info] * Phase 3.3: Determining New Master Phase..
Mon Sep 27 14:50:22 2021 - [info]
Mon Sep 27 14:50:22 2021 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Mon Sep 27 14:50:22 2021 - [info] All slaves received relay logs to the same position. No need to resync each other.
Mon Sep 27 14:50:22 2021 - [info] Searching new master from slaves..
Mon Sep 27 14:50:22 2021 - [info]  Candidate masters from the configuration file:
Mon Sep 27 14:50:22 2021 - [info]   111.16.100.224(111.16.100.224:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:22 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:22 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:22 2021 - [info]   111.16.101.73(111.16.101.73:3306)  Version=5.7.34-log (oldest major version between slaves) log-bin:enabled
Mon Sep 27 14:50:22 2021 - [info]     Replicating from 111.16.100.253(111.16.100.253:3306)
Mon Sep 27 14:50:22 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Sep 27 14:50:22 2021 - [info]  Non-candidate masters:
Mon Sep 27 14:50:22 2021 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Mon Sep 27 14:50:22 2021 - [info] New master is 111.16.100.224(111.16.100.224:3306)
Mon Sep 27 14:50:22 2021 - [info] Starting master failover..
Mon Sep 27 14:50:22 2021 - [info]
From:
111.16.100.253(111.16.100.253:3306) (current master)
 +--111.16.100.224(111.16.100.224:3306)
 +--111.16.101.73(111.16.101.73:3306)

To:
111.16.100.224(111.16.100.224:3306) (new master)
 +--111.16.101.73(111.16.101.73:3306)
Mon Sep 27 14:50:22 2021 - [info]
Mon Sep 27 14:50:22 2021 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Mon Sep 27 14:50:22 2021 - [info]
Mon Sep 27 14:50:22 2021 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Sep 27 14:50:22 2021 - [info] Sending binlog..
Mon Sep 27 14:50:22 2021 - [info] scp from local:/var/lib/mha-manager/app/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog to root@111.16.100.224:/var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog succeeded.
Mon Sep 27 14:50:22 2021 - [info]
Mon Sep 27 14:50:22 2021 - [info] * Phase 3.5: Master Log Apply Phase..
Mon Sep 27 14:50:22 2021 - [info]
Mon Sep 27 14:50:22 2021 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Mon Sep 27 14:50:22 2021 - [info] Starting recovery on 111.16.100.224(111.16.100.224:3306)..
Mon Sep 27 14:50:22 2021 - [info]  Generating diffs succeeded.
Mon Sep 27 14:50:22 2021 - [info] Waiting until all relay logs are applied.
Mon Sep 27 14:50:22 2021 - [info]  done.
Mon Sep 27 14:50:22 2021 - [info] Getting slave status..
Mon Sep 27 14:50:22 2021 - [info] This slave(111.16.100.224)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(master-log.000007:154). No need to recover from Exec_Master_Log_Pos.
Mon Sep 27 14:50:22 2021 - [info] Connecting to the target slave host 111.16.100.224, running recover script..
Mon Sep 27 14:50:22 2021 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mhaadmin' --slave_host=111.16.100.224 --slave_ip=111.16.100.224  --slave_port=3306 --apply_files=/var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog --workdir=/var/lib/mha --target_version=5.7.34-log --timestamp=20210927145018 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx
Mon Sep 27 14:50:22 2021 - [info]
MySQL client version is 5.7.34. Using --binary-mode.
Applying differential binary/relay log files /var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog on 111.16.100.224:3306. This may take long time...
Applying log files succeeded.
Mon Sep 27 14:50:22 2021 - [info]  All relay logs were successfully applied.
Mon Sep 27 14:50:22 2021 - [info] Getting new master's binlog name and position..
Mon Sep 27 14:50:22 2021 - [info]  master-log.000003:154
Mon Sep 27 14:50:22 2021 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='111.16.100.224', MASTER_PORT=3306, MASTER_LOG_FILE='master-log.000003', MASTER_LOG_POS=154, MASTER_USER='slave', MASTER_PASSWORD='xxx';
Mon Sep 27 14:50:22 2021 - [info] Executing master IP activate script:
Mon Sep 27 14:50:22 2021 - [info]   /etc/mha_master/master_ip_failover.pl --command=start --ssh_user=root --orig_master_host=111.16.100.253 --orig_master_ip=111.16.100.253 --orig_master_port=3306 --new_master_host=111.16.100.224 --new_master_ip=111.16.100.224 --new_master_port=3306 --new_master_user='mhaadmin'   --new_master_password=xxx
Enabling the VIP - 111.16.100.189/24 on the new master - 111.16.100.224
Mon Sep 27 14:50:23 2021 - [info]  OK.
Mon Sep 27 14:50:23 2021 - [info] ** Finished master recovery successfully.
Mon Sep 27 14:50:23 2021 - [info] * Phase 3: Master Recovery Phase completed.
Mon Sep 27 14:50:23 2021 - [info]
Mon Sep 27 14:50:23 2021 - [info] * Phase 4: Slaves Recovery Phase..
Mon Sep 27 14:50:23 2021 - [info]
Mon Sep 27 14:50:23 2021 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Mon Sep 27 14:50:23 2021 - [info]
Mon Sep 27 14:50:23 2021 - [info] -- Slave diff file generation on host 111.16.101.73(111.16.101.73:3306) started, pid: 26625. Check tmp log /var/lib/mha-manager/app/111.16.101.73_3306_20210927145018.log if it takes time..
Mon Sep 27 14:50:24 2021 - [info]
Mon Sep 27 14:50:24 2021 - [info] Log messages from 111.16.101.73 ...
Mon Sep 27 14:50:24 2021 - [info]
Mon Sep 27 14:50:23 2021 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Sep 27 14:50:24 2021 - [info] End of log messages from 111.16.101.73.
Mon Sep 27 14:50:24 2021 - [info] -- 111.16.101.73(111.16.101.73:3306) has the latest relay log events.
Mon Sep 27 14:50:24 2021 - [info] Generating relay diff files from the latest slave succeeded.
Mon Sep 27 14:50:24 2021 - [info]
Mon Sep 27 14:50:24 2021 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Sep 27 14:50:24 2021 - [info]
Mon Sep 27 14:50:24 2021 - [info] -- Slave recovery on host 111.16.101.73(111.16.101.73:3306) started, pid: 26627. Check tmp log /var/lib/mha-manager/app/111.16.101.73_3306_20210927145018.log if it takes time..
Mon Sep 27 14:50:25 2021 - [info]
Mon Sep 27 14:50:25 2021 - [info] Log messages from 111.16.101.73 ...
Mon Sep 27 14:50:25 2021 - [info]
Mon Sep 27 14:50:24 2021 - [info] Sending binlog..
Mon Sep 27 14:50:24 2021 - [info] scp from local:/var/lib/mha-manager/app/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog to root@111.16.101.73:/var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog succeeded.
Mon Sep 27 14:50:24 2021 - [info] Starting recovery on 111.16.101.73(111.16.101.73:3306)..
Mon Sep 27 14:50:24 2021 - [info]  Generating diffs succeeded.
Mon Sep 27 14:50:24 2021 - [info] Waiting until all relay logs are applied.
Mon Sep 27 14:50:24 2021 - [info]  done.
Mon Sep 27 14:50:24 2021 - [info] Getting slave status..
Mon Sep 27 14:50:24 2021 - [info] This slave(111.16.101.73)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(master-log.000007:154). No need to recover from Exec_Master_Log_Pos.
Mon Sep 27 14:50:24 2021 - [info] Connecting to the target slave host 111.16.101.73, running recover script..
Mon Sep 27 14:50:24 2021 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mhaadmin' --slave_host=111.16.101.73 --slave_ip=111.16.101.73  --slave_port=3306 --apply_files=/var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog --workdir=/var/lib/mha --target_version=5.7.34-log --timestamp=20210927145018 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx
Mon Sep 27 14:50:25 2021 - [info]
MySQL client version is 5.7.34. Using --binary-mode.
Applying differential binary/relay log files /var/lib/mha/saved_master_binlog_from_111.16.100.253_3306_20210927145018.binlog on 111.16.101.73:3306. This may take long time...
Applying log files succeeded.
Mon Sep 27 14:50:25 2021 - [info]  All relay logs were successfully applied.
Mon Sep 27 14:50:25 2021 - [info]  Resetting slave 111.16.101.73(111.16.101.73:3306) and starting replication from the new master 111.16.100.224(111.16.100.224:3306)..
Mon Sep 27 14:50:25 2021 - [info]  Executed CHANGE MASTER.
Mon Sep 27 14:50:25 2021 - [info]  Slave started.
Mon Sep 27 14:50:25 2021 - [info] End of log messages from 111.16.101.73.
Mon Sep 27 14:50:25 2021 - [info] -- Slave recovery on host 111.16.101.73(111.16.101.73:3306) succeeded.
Mon Sep 27 14:50:25 2021 - [info] All new slave servers recovered successfully.
Mon Sep 27 14:50:25 2021 - [info]
Mon Sep 27 14:50:25 2021 - [info] * Phase 5: New master cleanup phase..
Mon Sep 27 14:50:25 2021 - [info]
Mon Sep 27 14:50:25 2021 - [info] Resetting slave info on the new master..
Mon Sep 27 14:50:25 2021 - [info]  111.16.100.224: Resetting slave info succeeded.
Mon Sep 27 14:50:25 2021 - [info] Master failover to 111.16.100.224(111.16.100.224:3306) completed successfully.
Mon Sep 27 14:50:25 2021 - [info]

----- Failover Report -----

mha: MySQL Master failover 111.16.100.253(111.16.100.253:3306) to 111.16.100.224(111.16.100.224:3306) succeeded

Master 111.16.100.253(111.16.100.253:3306) is down!

Check MHA Manager logs at mysql-mha-manager:/etc/mha_master/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 111.16.100.253(111.16.100.253:3306)
The latest slave 111.16.100.224(111.16.100.224:3306) has all relay logs for recovery.
Selected 111.16.100.224(111.16.100.224:3306) as a new master.
111.16.100.224(111.16.100.224:3306): OK: Applying all logs succeeded.
111.16.100.224(111.16.100.224:3306): OK: Activated master IP address.
111.16.101.73(111.16.101.73:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
111.16.101.73(111.16.101.73:3306): OK: Applying all logs succeeded. Slave started, replicating from 111.16.100.224(111.16.100.224:3306)
111.16.100.224(111.16.100.224:3306): Resetting slave info succeeded.
Master failover to 111.16.100.224(111.16.100.224:3306) completed successfully
2. 切换master节点完成,检测VIP以及master状态
$ > masterha_check_status --conf=/etc/mha_master/mha.cnf
mha (pid:26678) is running(0:PING_OK), master:111.16.100.224

登陆新master节点以及模拟故障master节点,查看是否已经完成VIP切换

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0sgs9YQH-1632732235168)(/Users/xiaohu/Library/Application Support/typora-user-images/image-20210927164007176.png)]

参考:

https://github.com/yoshinorim/mha4mysql-manager/wiki

https://zhuanlan.zhihu.com/p/132508138

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值