MySQL Master高可用---MHA的安装与配置说明

一、背景

今天讲下工作中使用较多的MHA

首先需要确认一点的是,MHA实现的是Master的高可用。

贴一张MHA作者在 MySQL Conference and Expo 2011 上分享的图片

 

二、复制结构与基础配置

1、MySQL复制关系

Manager Node 192.168.237.11(管理节点)

Data Node_1 192.168.237.12(数据节点,MySQL主库)

Data Node_2 192.168.237.13(数据节点,MySQL备库,候选主库)

Data Node_3 192.168.237.14(数据节点,MySQL备库)

2、节点基础配置

(1)配置数据节点的iptables

配置数据节点的iptables,确保管理节点能访问到数据节点的MySQL端口

(2)打通SSH互信

正确配置管理节点和数据节点的域名解析,很重要!

管理节点及数据节点间打通SSH互信:

# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.11
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.12
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.13
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.14

(3)MySQL基础配置

所有数据节点配置相同的复制过滤规则(MHA在启动的时候会检查过滤规则,不同则启动失败)

开启候选主库的log-bin。管理节点会检测候选主库是否有配置log-bin,若没有该配置项,MHA将不会进行failover

 

三、MHA安装

1、安装包下载

RHEL7:

https://github.com/yoshinorim/mha4mysql-manager/releases

https://github.com/yoshinorim/mha4mysql-node/releases

RHEL6:

https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads

2、配置CPAN源及EPEL源(可选)

(1)配置CPAN源(阿里源为例)

cpan[1]> o conf urllist push http://mirrors.aliyun.com/CPAN/
Please use 'o conf commit' to make the config permanent!
cpan[2]> o conf commit
commit: wrote '/usr/share/perl5/CPAN/Config.pm'
cpan[3]> o conf urllist
    urllist           
    0 [http://mirrors.aliyun.com/CPAN/]
Type 'o conf' to view all configuration items

(2)配置EPEL源

RHEL6:

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm

RHEL7:

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# yum clean all

# yum makecache


3、数据节点部署

(1)更新libmysqlclient.so

# rpm -qa |grep mysql-libs
mysql-libs-5.1.73-8.el6_8.x86_64
# rpm -e --nodeps mysql-libs-5.1.73-8.el6_8.x86_64

找一台MySQL拷贝libmysqlclient.so运行库文件
# cp /usr/local/mysql3306/lib/libmysqlclient.so.20 /usr/lib64/
# cp /usr/local/mysql3306/lib/libmysqlclient.so /usr/lib64/

(2)更新DBD-MySQL

# rpm -qa |grep perl-DBD-MySQL
perl-DBD-MySQL-4.013-3.el6.x86_64
# rpm -e --nodeps perl-DBD-MySQL-4.013-3.el6.x86_64

# yum install perl-CPAN
# ln -s /usr/local/mysql3306/bin/mysql_config /usr/local/bin/
cpan[1]> install DBD-MySQL

cpan[2]>  install DBI

cpan[2]>  install ExtUtils MakeMaker

(3)安装mha4mysql-node

# tar zxvf mha4mysql-node-0.56.tar.gz
# cd mha4mysql-node-0.56
# perl Makefile.PL
# make && make install

Tips:常见安装报错(一般是由于yum方式安装perl模块较旧导致)

Tip1、编译时出现Can't locate ExtUtils/MakeMaker.pm in @INC

https://github.com/Perl-Toolchain-Gang/ExtUtils-MakeMaker/releases
# tar zxvf ExtUtils-MakeMaker-7.18.tar.gz
# cd ExtUtils-MakeMaker-7.18
# perl Makefile.PL

# make && make install

Tip2、编译时出现Can't locate CPAN.pm in @INC

https://cpan.metacpan.org/authors/id/A/AN/ANDK/CPAN-2.26.tar.gz
# tar zxvf CPAN-2.26.tar.gz
# cd CPAN-2.26
# perl Makefile.PL

# make && make install

 

(2)设置relay log清除方式
mysql > set global relay_log_purge=0;
my.cnf中设置 relay_log_purge=0
关闭该参数的原因是因为在默认情况下,从库上的relay log在SQL线程执行完后会自动被删掉。但是在failover过程中从库需要利用候选主库上的中继日志来补数据,所以MySQL默认打开的自动清除中继日志的功能需要关闭。

但是为了不撑爆磁盘就需要定期清除旧的relay log。清除relay log需要考虑复制延时的问题。在ext3文件系统下,删除大文件建议采用硬链接的方法。
关于relay log的清除,可参考以下文章:

http://daisywei.blog.51cto.com/7837970/1881154

 

4、管理节点部署

(1)同数据节点一样,也需要安装 mha4mysql-node

(2)安装依赖包
# yum install gcc

cpan[1]> install Config::Tiny
cpan[2]> install Log::Dispatch
cpan[3]> install Parallel::ForkManager
cpan[4]> install Time::HiRes

(3)安装mha4mysql-manager
# tar zxvf mha4mysql-manager-0.56.tar.gz
# cd mha4mysql-manager-0.56
# perl Makefile.PL
# make && make install
(4)修改配置文件

建立mha配置文件及脚本目录

# mkdir -p /etc/masterha/

复制源码包内的配置文件、脚本(app1.cnf、masterha_default.cnf 可以合并为一个文件)

# cp ./mha4mysql-manager-0.56/samples/conf/masterha_default.cnf /etc/

# cp ./mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/materha/

# cp ./mha4mysql-manager-0.56/samples/scripts/* /etc/masterha/

# vim /etc/masterha_default.cnf

[server default]
#数据库账户(需要select、insert、super、reload权限,以及对mysql.user的更改权限)
user=mha
password=123456
#SSH账户
ssh_user=root
#复制账户
repl_user=replication_user
repl_password=123456
ping_interval=1
# master_ip_failover_script= /etc/masterha/master_ip_failover
# master_ip_online_change_script= /etc/masterha/master_ip_online_change
# shutdown_script= /etc/masterha/power_manager
# report_script= /etc/masterha/send_report

# vim /etc/masterha/app1.cnf

[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
remote_workdir=/var/log/

[server1]
hostname=237_12
master_binlog_dir=/data/mysql3306
candidate_master=1
check_repl_delay=0 #防止master故障时,切换时slave有延迟,卡在那里切不过去

[server2]
hostname=237_13
master_binlog_dir=/data/mysql3306
candidate_master=1
check_repl_delay=0

[server3]
hostname=237_14
master_binlog_dir=/data/mysql3306
no_master=1

(5)环境检测
step1

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

提示:All SSH connection tests passed successfully 表示节点间SSH连同

step2

# masterha_check_repl --conf=/etc/masterha/app1.cnf

提示:MySQL Replication Health is OK.

(6)启动MHA Manager

# nohup masterha_manager --conf=/etc/masterha/app1.cnf &

观察日志情况

# tail -f /tmp/mha_manager.log
Thu Aug 10 21:49:09 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Aug 10 21:49:09 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Aug 10 21:49:09 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..

查看MHA状态:

# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:1716) is running(0:PING_OK), master:237_12

 

四、故障演练

1、failover故障切换

例:模拟主库(192.168.237.12)MySQL实例挂掉

观察备库237.14的主从复制关系:主库已经切换为237.13

观察此时新的主库的binlog位置点:

查看管理节点的日志记录(位置由配置文件中参数:manager_log决定)

# cat /var/log/masterha/app1/manager.log

From:
237_12(192.168.237.12:3306) (current master)
 +--237_13(192.168.237.13:3306)
 +--237_14(192.168.237.14:3306)

To:
237_13(192.168.237.13:3306) (new master)
 +--237_14(192.168.237.14:3306)

[root@237_11 script]# masterha_check_status --conf=/etc/masterha/app1.cnf 
app1 is stopped(2:NOT_RUNNING).

需要注意的是,无论宕机导致的master切换、还是没有特别设置地手动切换master(不含orig_master_is_new_slave选项),原来的master都不在MHA架构内了,即使重新启动也不会加入,必须手动加入。配置文件中仍然是以挂掉的master为主库,并没有提升candidated master为主库,此时MHA将不再具备高可用的特性,需尽快更新配置文件,并在管理节点运行masterha_check_ssh确认无误后重新启用MHA manager。

 

此时,若假设原主库(237.12)上的MySQL已成功修复故障,正常启动后,可以以备库的身份加入MHA架构内:

(1)复制采用GTID方式

直接change master ... MASTER_AUTO_POSITION=1;

(2)binlog+position方式

首先,在MHA管理节点的日志中找到,复制需要的主库binlog位置点:

建立到新主库237.13的复制关系

Tips:该步骤很有可能会报1236的错误,并且Slave_IO_Running状态为 No

解决办法如下(从库切换到新的binlog,并重新指向之前卡在的主库binlog位置点):

mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)
mysql> 
mysql> flush logs;
Query OK, 0 rows affected (0.01 sec)
mysql> 
mysql> show master status \G
*************************** 1. row ***************************
            File: mysql-bin.000035
        Position: 107
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

mysql> change master to master_log_file='mysql-bin.000018',master_log_pos=107;
Query OK, 0 rows affected (0.01 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.237.13
                  Master_User: replication_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000018
          Read_Master_Log_Pos: 107
               Relay_Log_File: 237_12-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000018
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 410
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 23713
1 row in set (0.00 sec)

当建立新主从关系后,可以开始修改MHA配置文件(app1.cnf)中的server1、server2的hostname,重新启用新的高可用。启动MHA前需做如下检查:

step1、删除前一次failover标记文件:

# rm -rf app1.failover.complete

step2、检查配置文件正确性:

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

step3、再次启动MHA:
# nohup masterha_manager --conf=/etc/masterha/app1.cnf &

确认mha状态:
# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:1977) is running(0:PING_OK), master:node2

 

2、在线手动切换

(online master switch理论写操作的阻塞时间在0.5秒-2秒)

功能:用于调整无法在线变更的参数(例如innodb_buffer_pool_size等)。通过在从库配置文件中调整参数并重启生效后,执行online master switch将从库提升为主库,原主库同样修改参数后重启,并以从库的身份重新加入MHA架构,从而曲线实现了在线变更参数的需求而无需中断业务。

切换前需要先停止MHA manger,否则会有如下报错。

 

以下为成功切换过程:

[root@237_11 app1]# masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf < /dev/null > /tmp/mha_manager.log 2>&1
[root@237_11 app1]# 
[root@237_11 app1]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=237_12 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
Sun Aug 13 21:46:10 2017 - [info] MHA::MasterRotate version 0.56.
Sun Aug 13 21:46:10 2017 - [info] Starting online master switch..
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info] * Phase 1: Configuration Check Phase..
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Aug 13 21:46:10 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Aug 13 21:46:10 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Aug 13 21:46:10 2017 - [info] GTID failover mode = 0
Sun Aug 13 21:46:10 2017 - [info] Current Alive Master: 237_13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info] Alive Slaves:
Sun Aug 13 21:46:10 2017 - [info]   237_12(192.168.237.12:3306)  Version=5.5.54-log (oldest major version between slaves) log-bin:enabled
Sun Aug 13 21:46:10 2017 - [info]     Replicating from 192.168.237.13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Aug 13 21:46:10 2017 - [info]   237_14(192.168.237.14:3306)  Version=5.5.54-log (oldest major version between slaves) log-bin:enabled
Sun Aug 13 21:46:10 2017 - [info]     Replicating from 192.168.237.13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info]     Not candidate for the new Master (no_master is set)
Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] Checking MHA is not monitoring or doing failover..
Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_12..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_14..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] 237_12 can be new master.
Sun Aug 13 21:46:10 2017 - [info] 
From:
237_13(192.168.237.13:3306) (current master)
 +--237_12(192.168.237.12:3306)
 +--237_14(192.168.237.14:3306)

To:
237_12(192.168.237.12:3306) (new master)
 +--237_14(192.168.237.14:3306)
 +--237_13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info] Checking whether 237_12(192.168.237.12:3306) is ok for the new master..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): Resetting slave pointing to the dummy host.
Sun Aug 13 21:46:10 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info] * Phase 2: Rejecting updates Phase..
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
Sun Aug 13 21:46:10 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] Orig master binlog:pos is mysql-bin.000018:107.
Sun Aug 13 21:46:10 2017 - [info]  Waiting to execute all relay logs on 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:10 2017 - [info]  master_pos_wait(mysql-bin.000018:107) completed on 237_12(192.168.237.12:3306). Executed 0 events.
Sun Aug 13 21:46:10 2017 - [info]   done.
Sun Aug 13 21:46:10 2017 - [info] Getting new master's binlog name and position..
Sun Aug 13 21:46:10 2017 - [info]  mysql-bin.000035:107
Sun Aug 13 21:46:10 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='237_12 or 192.168.237.12', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000035', MASTER_LOG_POS=107, MASTER_USER='replication_user', MASTER_PASSWORD='xxx';
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info] * Switching slaves in parallel..
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) started, pid: 8397
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:11 2017 - [info] Log messages from 237_14 ...
Sun Aug 13 21:46:11 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info]  Waiting to execute all relay logs on 237_14(192.168.237.14:3306)..
Sun Aug 13 21:46:10 2017 - [info]  master_pos_wait(mysql-bin.000018:107) completed on 237_14(192.168.237.14:3306). Executed 0 events.
Sun Aug 13 21:46:10 2017 - [info]   done.
Sun Aug 13 21:46:10 2017 - [info]  Resetting slave 237_14(192.168.237.14:3306) and starting replication from the new master 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:11 2017 - [info]  Executed CHANGE MASTER.
Sun Aug 13 21:46:11 2017 - [info]  Slave started.
Sun Aug 13 21:46:11 2017 - [info] End of log messages from 237_14 ...
Sun Aug 13 21:46:11 2017 - [info] 
Sun Aug 13 21:46:11 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) succeeded.
Sun Aug 13 21:46:11 2017 - [info] Unlocking all tables on the orig master:
Sun Aug 13 21:46:11 2017 - [info] Executing UNLOCK TABLES..
Sun Aug 13 21:46:11 2017 - [info]  ok.
Sun Aug 13 21:46:11 2017 - [info] Starting orig master as a new slave..
Sun Aug 13 21:46:11 2017 - [info]  Resetting slave 237_13(192.168.237.13:3306) and starting replication from the new master 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:11 2017 - [info]  Executed CHANGE MASTER.
Sun Aug 13 21:46:11 2017 - [info]  Slave started.
Sun Aug 13 21:46:11 2017 - [info] All new slave servers switched successfully.
Sun Aug 13 21:46:11 2017 - [info] 
Sun Aug 13 21:46:11 2017 - [info] * Phase 5: New master cleanup phase..
Sun Aug 13 21:46:11 2017 - [info] 
Sun Aug 13 21:46:11 2017 - [info]  237_12: Resetting slave info succeeded.
Sun Aug 13 21:46:11 2017 - [info] Switching master to 237_12(192.168.237.12:3306) completed successfully.

 

五、数据库无感访问

1、VIP方式

通过修改脚本:master_ip_failover实现,VIP:192.168.237.50

前提要求:

(1)相应设备不做mac地址绑定;

(2)首次bond vip,需要手工在主库上执行(后续漂移利用master_ip_failover实现)

init_vip.sh脚本如下

#!/bin/bash
key="1"
vip="192.168.47.50/24"
/sbin/ifconfig eth0:$key $vip

(3)candidate_master获取到VIP后, 需要如下执行:

# arping -I eth0 -c 3 -s $VIP $GATEWAY >/dev/null

作用是刷新ARP映射表,避免了VIP实际已漂移到从库的情况下,由于缓存的问题导致应用仍然无法访问的情况(已合并到下述脚本中执行)

 

缺陷:适用于单机单实例

 

master_ip_failover脚本如下:

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

#自定义该组机器的vip
my $vip = "192.168.237.50";
my $gateway = "192.168.237.2";
my $key = "1";

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);
GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    # 此处也可以添加dns、zookeeper、consul等剔除dead master信息
    my $exit_code = 1;
    eval {

      # updating global catalog, etc
      &stop_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print "Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      print "Creating app user on the new master..\n";
      #FIXME_xxx_create_user( $new_master_handler->{dbh} );
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
      &start_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn $@;

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

#ssh -o ConnectTimeout=15 -o ConnectionAttempts=3,设置SSH等待超时时间15S,重试3次。防止SSH连不上,MHA超长等待。
#arping -I eth0 -c 3 -s $VIP $GATEWAY > /dev/null,刷新arp映射表
sub start_vip() {
    `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" arping -I eth0 -c 3 -s $vip $gateway > /dev/null \"`;
}
sub stop_vip() {
    `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}


sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

 

2、Consul

利用服务发现替代,适用于单机多实例

 

3、ProxySQL

利用ProxySQL进行路由规则的切换,详见我的另外一篇博文的第九节:

ProxySQL初体验

 

六、MHA存活监控

manager本身是一个单点,存在风险。

1、利用masterha_master_monitor,不再赘述

send_report   #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成

2、masterha_check_status --conf=/etc/masterha/app1.cnf

 

参考文档

https://severalnines.com/blog/mysql-replication-failover-maxscale-vs-mha-part-1

MHA参数详解:http://wubx.net/mha-parameters/

MHA在线切换过程:https://www.bbsmax.com/A/amd0ml0Xzg/

MHA 清理relay log(purge_relay_logs):https://blog.csdn.net/leshami/article/details/45688503

安装报错排查:https://blog.51cto.com/arthur376/1812640

虚拟IP(VIP)在高可用上的原理:https://gitos.org/2019/08/19/vip-ha.html

 

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值