Mysql MHA工作原理及搭建

最新推荐文章于 2024-03-29 16:53:44 发布

ciqiyuan9775

最新推荐文章于 2024-03-29 16:53:44 发布

阅读量258

点赞数

一.MHA工作原理
1.MHA

MHA 是当 master 出现故障，挑选一个 slave 作为新的 master 并构建成新的主从架构的管理工具。从 master 出现故障到构建成新的主从架构时间是 10-30 秒。在 master 出现故障时可能会出现 slave 同步的数据不一致的现象，此工具可以自动应用差异的中继日志到其他 slave 上保证数据的一致性。

参考：MySQL最牛高可用MHA+KEEPALIVE安装维护完美手册.pdf

2.优点
(1).可以快速的进行故障切换
(2).自动选择数据同步最全的slave，并把差异日志应用到其他slave上
(3).一个mha manage server可以监控很多master，更改升级配置，不影响线上正在运行的数据库
(4).对原先的体系结构的性能影响很小
(5).适合任何支持主从复制的存储引擎

3.主要命令
manager节点的主要脚本
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制配置
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运?行?状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(?自动或者?手动)
masterha_conf_host 添加或删除配置的server信息

node节点的主要脚本
save_binary_logs 保存和复制master的?二进制?日志
apply_diff_relay_logs failover时候对?比差异的relaylog?日志
purge_relay_logs 清除slave的中继?日志(不?会阻塞SQL线程)

二.MHA环境搭建
1.环境信息
OS: redhat 6.9 64bit
192.168.1.203 mysql5.7(master)
192.168.1.204 mysql5.7-2(slave异步)
192.168.1.205 mysql5.7-3(slave异步)
192.168.1.206 mha-manager(mha manager)

软件信息
MySQL 5.7
MHA 0.57

2.配置hosts环境
vi /etc/hosts
192.168.1.203 mysql5.7
192.168.1.204 mysql5.7-2
192.168.1.205 mysql5.7-3
192.168.1.206 mha-manager

3.配置主从复制
(1).每个节点都要执行
grant replication client,replication slave on *.* to 'REPL_USER'@'192.168.1.%' IDENTIFIED BY 'repl1234';
grant all on *.* to ‘root'@'%' identified by ‘root' with grant option;
(2).重建软链接
ln -s /usr/local/mysql/bin/* /usr/bin/
(3).最好先不开启半同步复制，可以先开启并行复制
(4).为集群所有节点设置read_only=1
(5).relay_log_purge=0
有时候，我们希望将 MySQL 的 relay log 多保留一段时间，比如用于高可用切换后的relay log补偿，于是就会设置 relay_log_purge=0，禁止 SQL 线程在执行完一个relay log 后自动将其删除。

4.配置ssh互信
每个节点都进行如下操作
(1).生成key
# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
5c:ad:b3:91:28:7b:94:0e:d2:85:0f:0d:10:9a:e5:de root@mysql5.7
The key's randomart image is:
+--[ RSA 2048]----+
| +o. |
| = + . |
| o . o o . . |
| . o = + o |
| o E S = |
| . * + |
| . o . |
| . |
| |
+-----------------+
(2).将key传输至其他server
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.203
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.204
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.205
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.206

5.安装MHA
--本地安装
(1).在三个节点依次安装MHA node
rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm

(2).在manager(mysql4) 上安装MHA manage
-安装所需包
--mha4mysql-manager-0.55-0.el6.noarch.rpm
--perl-Config-Tiny-2.14-7.el7.noarch.rpm
--perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
--perl-Class-Load-0.20-3.el7.noarch.rpm
--perl-Data-OptList-0.107-9.el7.noarch.rpm
--perl-Params-Util-1.07-6.el7.x86_64.rpm
--perl-Sub-Install-0.926-6.el7.noarch.rpm
--perl-Module-Implementation-0.06-6.el7.noarch.rpm
--perl-Module-Runtime-0.013-4.el7.noarch.rpm
--perl-Try-Tiny-0.12-2.el7.noarch.rpm
--perl-Package-Stash-0.34-2.el7.noarch.rpm
--perl-Package-DeprecationManager-0.13-7.el7.noarch.rpm
--perl-List-MoreUtils-0.33-9.el7.x86_64.rpm
--perl-Package-Stash-XS-0.26-3.el7.x86_64.rpm
--perl-MIME-Lite-3.030-1.el7.noarch.rpm
--perl-Email-Date-Format-1.002-15.el7.noarch.rpm
--perl-MIME-Types-1.38-2.el7.noarch.rpm
--perl-Mail-Sender-0.8.21-2.el7.noarch.rpm
--perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
--perl-Params-Validate-1.08-4.el7.x86_64.rpm
--perl-Sys-Syslog-0.33-3.el7.x86_64.rpm
--perl-MailTools-2.12-2.el7.noarch.rpm
--perl-Net-SMTP-SSL-1.01-13.el7.noarch.rpm
--perl-Parallel-ForkManager-1.05-1.el7.noarch.rpm
--perl-Time-HiRes-1.9725-3.el7.x86_64

--网络安装
(1).在主从复制结构安装mha node
yum -y install perl-DBD-MySQL
yum -y install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
yum -y install perl-devel perl-CPAN

tar -xvzf /tmp/mha4mysql-node-0.57.tar.gz
mkdir -p /etc/mha/mhanode
mv mha4mysql-node-0.57/* /etc/mha/mhanode
cd /etc/mha/mhanode
perl Makefile.PL
make
make install

(2).在mha-manager 安装mha-node和mha-manage包
yum -y install perl-DBD-MySQL
yum -y install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
yum -y install perl-devel perl-CPAN
yum install -y perl-ExtUtils-Embed
yum install -y perl-Time-HiRes

yum install --skip-broken perl*
yum localinstall -y perl-Params-Validate-0.92-3.el6.x86_64.rpm
yum localinstall -y perl-Config-Tiny-2.12-1.el6.rfx.noarch.rpm
yum localinstall -y perl-Log-Dispatch-2.26-1.el6.rf.noarch.rpm
yum localinstall -y perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm

安装node package
tar -xvzf /tmp/mha4mysql-node-0.57.tar.gz
mkdir -p /etc/mha/mhanode
mv mha4mysql-node-0.57/* /etc/mha/mhanode
cd /etc/mha/mhanode
perl Makefile.PL
make
make install

安装manage package
tar -zxvf /tmp/mha4mysql-manager-0.57.tar.gz
mv /tmp/mha4mysql-manager-0.57/* /etc/mha/mhamanager
cd /etc/mha/mhamanager
perl Makefile.PL
make
make install

-验证
# masterha_check_ssh
--conf= must be set.

6.配置MHA
参数说明:
https://blog.csdn.net/wutuobang321/article/details/39347185
(1).修改配置文件
vim /etc/mha/mhamanager/app1/conf/app1.cnf
[server default]
manager_workdir=/etc/mha/app1
manager_log=/etc/mha/app1/manager.log
#mysql root用户及密码
user=root
password=rootpass
#ssh 用户
ssh_user=root
#复制环境中用户名及密码
repl_user=REPL_USER
repl_password=repl1234
#检查管理机到集群的连通性
secondary_check_script= masterha_secondary_check -s 192.168.1.203 -s 192.168.1.204 -s 192.168.1.205 --user=REPL_USER --master_host=mysql5.7 --master_ip= 192.168.1.203 --master_port=3306
#向主库发送ping包，尝试3次没有回应时自动进行failover
ping_interval=3
#当failover时，slave提升为master时，调用此脚本，因此可将vip信息写到此配置文件
#master_ip_failover_script= /etc/mha/script/master_ip_failover
#设置故障发生后，关闭故障主机脚本
# shutdown_script= /script/masterha/power_manager
#当新主服务器切换完成后通过此脚本发送邮件报告
# report_script= /script/masterha/send_report
#当手动切换时调用的切换脚本，使用masterha_master_switch命令手动切换MYSQL主服务器时后会调用此脚本
# master_ip_online_change_script= /script/masterha/master_ip_online_change

[server1]
#主机名
hostname= 192.168.1.203
port=3306
ssh_port=22
#该主机优先被选为master，如果配置多个该参数，则根据[serverX]顺序确定优先级
candidate_master=1
#指定每个节点bin-log日志路径
master_binlog_dir=/mysql/mysql3306/mysql3306

[server2]
hostname= 192.168.1.204
port=3306
ssh_port=22
#该节点永远不会成为新的master节点
#no_master=1
master_binlog_dir=/mysql/mysql3306/mysql3306

[server3]
hostname= 192.168.1.205
port=3306
ssh_port=22
candidate_master=1
#默认当一个slave落后master 100M的relay logs的话MHA将不会选择该slave作为一个新的master。该参数设置为0，则切换时会忽略该限制。
check_repl_delay=0
master_binlog_dir=/mysql/mysql3306/mysql3306

(2).清空全局配置文件
>masterha_default.cnf

(3).配置master_ip_failover

点击(此处)折叠或打开

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.1.207/24';
my $key = '0';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip
--orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

(4).配置master_ip_online_change

点击(此处)折叠或打开

#!/usr/bin/env perl
use strict;
use warnings FATAL =>'all';
use Getopt::Long;
my $vip = '192.168.1.207/24'; # Virtual IP
my $key = "0";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $exit_code = 0;
my (
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $orig_master_ssh_user, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password, $new_master_ssh_user,
);
GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'orig_master_ssh_user=s' => \$orig_master_ssh_user,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
'new_master_ssh_user=s' => \$new_master_ssh_user,
);
exit &main();
sub main {
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "\n\n\n***************************************************************\n";
print "Disabling the VIP - $vip on old master: $orig_master_host\n";
print "***************************************************************\n\n\n\n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "\n\n\n***************************************************************\n";
print "Enabling the VIP - $vip on new master: $new_master_host \n";
print "***************************************************************\n\n\n\n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover -command=start|stop|stopssh|status -orig_master_host=host -orig_master_ip=ip -
orig_master_port=po
rt -new_master_host=host -new_master_ip=ip -new_master_port=port\n";
}

(5).测试SSH
# masterha_check_ssh --conf=/etc/mha/mhamanager/app1/conf/app1.cnf
Thu May 3 21:17:37 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 3 21:17:37 2018 - [info] Reading application default configuration from /etc/mha/mhamanager/app1/conf/app1.cnf..
Thu May 3 21:17:37 2018 - [info] Reading server configuration from /etc/mha/mhamanager/app1/conf/app1.cnf..
Thu May 3 21:17:37 2018 - [info] Starting SSH connection tests..
Thu May 3 21:17:38 2018 - [debug]
Thu May 3 21:17:37 2018 - [debug] Connecting via SSH from root@192.168.1.203(192.168.1.203:22) to root@192.168.1.204(192.168.1.204:22)..
Thu May 3 21:17:37 2018 - [debug] ok.
Thu May 3 21:17:37 2018 - [debug] Connecting via SSH from root@192.168.1.203(192.168.1.203:22) to root@192.168.1.205(192.168.1.205:22)..
Thu May 3 21:17:38 2018 - [debug] ok.
Thu May 3 21:17:39 2018 - [debug]
Thu May 3 21:17:37 2018 - [debug] Connecting via SSH from root@192.168.1.204(192.168.1.204:22) to root@192.168.1.203(192.168.1.203:22)..
Thu May 3 21:17:38 2018 - [debug] ok.
Thu May 3 21:17:38 2018 - [debug] Connecting via SSH from root@192.168.1.204(192.168.1.204:22) to root@192.168.1.205(192.168.1.205:22)..
Thu May 3 21:17:39 2018 - [debug] ok.
Thu May 3 21:17:39 2018 - [debug]
Thu May 3 21:17:38 2018 - [debug] Connecting via SSH from root@192.168.1.205(192.168.1.205:22) to root@192.168.1.203(192.168.1.203:22)..
Thu May 3 21:17:38 2018 - [debug] ok.
Thu May 3 21:17:38 2018 - [debug] Connecting via SSH from root@192.168.1.205(192.168.1.205:22) to root@192.168.1.204(192.168.1.204:22)..
Thu May 3 21:17:39 2018 - [debug] ok.
Thu May 3 21:17:39 2018 - [info] All SSH connection tests passed successfully.

(6).测试主从复制
# masterha_check_repl --conf=/etc/mha/mhamanager/app1/conf/app1.cnf
Thu May 3 22:28:08 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 3 22:28:08 2018 - [info] Reading application default configuration from /etc/mha/mhamanager/app1/conf/app1.cnf..
Thu May 3 22:28:08 2018 - [info] Reading server configuration from /etc/mha/mhamanager/app1/conf/app1.cnf..
Thu May 3 22:28:08 2018 - [info] MHA::MasterMonitor version 0.57.
Thu May 3 22:28:08 2018 - [info] GTID failover mode = 0
Thu May 3 22:28:08 2018 - [info] Dead Servers:
Thu May 3 22:28:08 2018 - [info] Alive Servers:
Thu May 3 22:28:08 2018 - [info] 192.168.1.203(192.168.1.203:3306)
Thu May 3 22:28:08 2018 - [info] 192.168.1.204(192.168.1.204:3306)
Thu May 3 22:28:08 2018 - [info] 192.168.1.205(192.168.1.205:3306)
Thu May 3 22:28:08 2018 - [info] Alive Slaves:
Thu May 3 22:28:08 2018 - [info] 192.168.1.204(192.168.1.204:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 22:28:08 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 22:28:08 2018 - [info] Not candidate for the new Master (no_master is set)
Thu May 3 22:28:08 2018 - [info] 192.168.1.205(192.168.1.205:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 22:28:08 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 22:28:08 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Thu May 3 22:28:08 2018 - [info] Current Alive Master: 192.168.1.203(192.168.1.203:3306)
Thu May 3 22:28:08 2018 - [info] Checking slave configurations..
Thu May 3 22:28:08 2018 - [warning] relay_log_purge=0 is not set on slave 192.168.1.204(192.168.1.204:3306).
Thu May 3 22:28:08 2018 - [warning] relay_log_purge=0 is not set on slave 192.168.1.205(192.168.1.205:3306).
Thu May 3 22:28:08 2018 - [info] Checking replication filtering settings..
Thu May 3 22:28:08 2018 - [info] binlog_do_db= , binlog_ignore_db=
Thu May 3 22:28:08 2018 - [info] Replication filtering check ok.
Thu May 3 22:28:08 2018 - [info] GTID (with auto-pos) is not supported
Thu May 3 22:28:08 2018 - [info] Starting SSH connection tests..
Thu May 3 22:28:20 2018 - [info] All SSH connection tests passed successfully.
Thu May 3 22:28:20 2018 - [info] Checking MHA Node version..
Thu May 3 22:28:21 2018 - [info] Version check ok.
Thu May 3 22:28:21 2018 - [info] Checking SSH publickey authentication settings on the current master..
Thu May 3 22:28:21 2018 - [info] HealthCheck: SSH to 192.168.1.203 is reachable.
Thu May 3 22:28:22 2018 - [info] Master MHA Node version is 0.57.
Thu May 3 22:28:22 2018 - [info] Checking recovery script configurations on 192.168.1.203(192.168.1.203:3306)..
Thu May 3 22:28:22 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/home/mysql3306/mysql3306 --output_file=/var/tmp/save_binary_logs_test --manager_version=0.57 --start_file=mysql-bin.000013
Thu May 3 22:28:22 2018 - [info] Connecting to root@192.168.1.203(192.168.1.203:22)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /home/mysql3306/mysql3306, up to mysql-bin.000013
Thu May 3 22:28:22 2018 - [info] Binlog setting check done.
Thu May 3 22:28:22 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu May 3 22:28:22 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.204 --slave_ip=192.168.1.204 --slave_port=3306 --workdir=/var/tmp --target_version=5.7.21-log --manager_version=0.57 --relay_dir=/home/mysql3306/mysql3306 --current_relay_log=mysql-relay.000018 --slave_pass=xxx
Thu May 3 22:28:22 2018 - [info] Connecting to root@192.168.1.204(192.168.1.204:22)..
Checking slave recovery environment settings..
Relay log found at /home/mysql3306/mysql3306, up to mysql-relay.000019
Temporary relay log file is /home/mysql3306/mysql3306/mysql-relay.000019
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu May 3 22:28:22 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.205 --slave_ip=192.168.1.205 --slave_port=3306 --workdir=/var/tmp --target_version=5.7.21-log --manager_version=0.57 --relay_dir=/home/mysql3306/mysql3306 --current_relay_log=mysql-relay.000015 --slave_pass=xxx
Thu May 3 22:28:22 2018 - [info] Connecting to root@192.168.1.205(192.168.1.205:22)..
Checking slave recovery environment settings..
Relay log found at /home/mysql3306/mysql3306, up to mysql-relay.000016
Temporary relay log file is /home/mysql3306/mysql3306/mysql-relay.000016
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu May 3 22:28:23 2018 - [info] Slaves settings check done.
Thu May 3 22:28:23 2018 - [info]
192.168.1.203(192.168.1.203:3306) (current master)
+--192.168.1.204(192.168.1.204:3306)
+--192.168.1.205(192.168.1.205:3306)

Thu May 3 22:28:23 2018 - [info] Checking replication health on 192.168.1.204..
Thu May 3 22:28:23 2018 - [info] ok.
Thu May 3 22:28:23 2018 - [info] Checking replication health on 192.168.1.205..
Thu May 3 22:28:23 2018 - [info] ok.
Thu May 3 22:28:23 2018 - [info] Checking master_ip_failover_script status:
Thu May 3 22:28:23 2018 - [info] /etc/mha/mhamanager/app1/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.203 --orig_master_ip=192.168.1.203 --orig_master_port=3306

IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.207/24===

Checking the Status of the script.. OK
Thu May 3 22:28:23 2018 - [info] OK.
Thu May 3 22:28:23 2018 - [warning] shutdown_script is not defined.
Thu May 3 22:28:23 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

7.启动MHA
(1).给master节点添加vip
ifconfig eth0:0 192.168.1.207

(2).启动MHA监控
nohup masterha_manager --conf= /etc/mha/mhamanager/app1/conf/app1.cnf --ignore_last_failover --remove_dead_master_conf > /tmp/mha_manager.log < /dev/null 2>&1 &

启动参数介绍:
--remove_dead_master_conf #该参数代表当发生主从切换后，老的主库的ip将会从配置文件app1.cnf中删除
--ignore_last_failover #默认情况下，如果MHA检测到连续发生宕机，且两次宕机间隔不足8小时，则不会进行failover。该参数代表忽略上次MHA触发切换产生的文件。

(3).检查MHA状态
# masterha_check_status --conf=/etc/mha/mhamanager/app1/conf/app1.cnf
app1 (pid:6718) is running(0:PING_OK), master:192.168.1.203

(4).停止MHA
masterha_stop --conf= /etc/mha/mhamanager/app1/conf/app1.cnf

三.MHA failover 测试
1.failover过程
1).配置文件检查阶段，这个阶段会检查整个集群配置文件配置
2).宕机的 master 处理，这个阶段包括虚拟 ip 摘除操作，主机关机操作
3).复制 dead maste 和最新 slave 相差的 relay log，并保存到 MHA Manger 具体的目录下
4).识别含有最新更新的 slave
5).应用从 master 保存的二进制日志事件(binlog events) 6).提升一个 slave 为新的 master 进行复制
7).使其他的 slave 连接新的 master 进行复制切换完成后，关注如下变化:
-vip 自动从原来的 master 切换到新的 master，同时，manager 节点的监控进程自动退出。
-在日志目录(/var/log/mha/app1)产生一个 app1.failover.complete 文件
-/etc/mha/app1.cnf 配置文件中原来老的 master 配置被删除。

将宕机节点重新加入到集群中
(1).重新启动mysql
(2).从manager.log中获取同步信息
(3).将原主库作为slave重新加入到集群中

2.自动failover
master 节点执行pkill mysql

点击(此处)折叠或打开

Thu May 3 23:29:49 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Thu May 3 23:29:49 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/home/mysql3306/mysql3306 --output_file=/var/tmp/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin
Thu May 3 23:29:49 2018 - [info] Executing secondary network check script: masterha_secondary_check -s 192.168.1.203 -s 192.168.1.204 -s 192.168.1.205 --user=REPL_USER --master_host=mysql5.7 --master_ip=192.168.1.203 --master_port=3306 --user=root --master_host=192.168.1.203 --master_ip=192.168.1.203 --master_port=3306 --master_user=root --master_password=root --ping_type=SELECT
Thu May 3 23:29:52 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu May 3 23:29:52 2018 - [warning] Connection failed 2 time(s)..
Thu May 3 23:29:54 2018 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.1.203! at /usr/local/share/perl5/MHA/HealthCheck.pm line 342.
Monitoring server 192.168.1.203 is reachable, Master is not reachable from 192.168.1.203. OK.
Thu May 3 23:29:55 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu May 3 23:29:55 2018 - [warning] Connection failed 3 time(s)..
Monitoring server 192.168.1.204 is reachable, Master is not reachable from 192.168.1.204. OK.
Thu May 3 23:29:58 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu May 3 23:29:58 2018 - [warning] Connection failed 4 time(s)..
Monitoring server 192.168.1.205 is reachable, Master is not reachable from 192.168.1.205. OK.
Thu May 3 23:30:02 2018 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Thu May 3 23:30:02 2018 - [warning] Master is not reachable from health checker!
Thu May 3 23:30:02 2018 - [warning] Master 192.168.1.203(192.168.1.203:3306) is not reachable!
Thu May 3 23:30:02 2018 - [warning] SSH is NOT reachable.
Thu May 3 23:30:02 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/mhamanager/app1/conf/app1.cnf again, and trying to connect to all servers to check server status..
Thu May 3 23:30:02 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May 3 23:30:02 2018 - [info] Reading application default configuration from /etc/mha/mhamanager/app1/conf/app1.cnf..
Thu May 3 23:30:02 2018 - [info] Reading server configuration from /etc/mha/mhamanager/app1/conf/app1.cnf..
Thu May 3 23:30:02 2018 - [info] GTID failover mode = 0
Thu May 3 23:30:02 2018 - [info] Dead Servers:
Thu May 3 23:30:02 2018 - [info] 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Alive Servers:
Thu May 3 23:30:02 2018 - [info] 192.168.1.204(192.168.1.204:3306)
Thu May 3 23:30:02 2018 - [info] 192.168.1.205(192.168.1.205:3306)
Thu May 3 23:30:02 2018 - [info] Alive Slaves:
Thu May 3 23:30:02 2018 - [info] 192.168.1.204(192.168.1.204:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Not candidate for the new Master (no_master is set)
Thu May 3 23:30:02 2018 - [info] 192.168.1.205(192.168.1.205:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Thu May 3 23:30:02 2018 - [info] Checking slave configurations..
Thu May 3 23:30:02 2018 - [warning] relay_log_purge=0 is not set on slave 192.168.1.204(192.168.1.204:3306).
Thu May 3 23:30:02 2018 - [warning] relay_log_purge=0 is not set on slave 192.168.1.205(192.168.1.205:3306).
Thu May 3 23:30:02 2018 - [info] Checking replication filtering settings..
Thu May 3 23:30:02 2018 - [info] Replication filtering check ok.
Thu May 3 23:30:02 2018 - [info] Master is down!
Thu May 3 23:30:02 2018 - [info] Terminating monitoring script.
Thu May 3 23:30:02 2018 - [info] Got exit code 20 (Master dead).
Thu May 3 23:30:02 2018 - [info] MHA::MasterFailover version 0.57.
Thu May 3 23:30:02 2018 - [info] Starting master failover.
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] * Phase 1: Configuration Check Phase..
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] GTID failover mode = 0
Thu May 3 23:30:02 2018 - [info] Dead Servers:
Thu May 3 23:30:02 2018 - [info] 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Checking master reachability via MySQL(double check)...
Thu May 3 23:30:02 2018 - [info] ok.
Thu May 3 23:30:02 2018 - [info] Alive Servers:
Thu May 3 23:30:02 2018 - [info] 192.168.1.204(192.168.1.204:3306)
Thu May 3 23:30:02 2018 - [info] 192.168.1.205(192.168.1.205:3306)
Thu May 3 23:30:02 2018 - [info] Alive Slaves:
Thu May 3 23:30:02 2018 - [info] 192.168.1.204(192.168.1.204:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Not candidate for the new Master (no_master is set)
Thu May 3 23:30:02 2018 - [info] 192.168.1.205(192.168.1.205:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Thu May 3 23:30:02 2018 - [info] Starting Non-GTID based failover.
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Thu May 3 23:30:02 2018 - [info] Executing master IP deactivation script:
Thu May 3 23:30:02 2018 - [info] /etc/mha/mhamanager/app1/scripts/master_ip_failover --orig_master_host=192.168.1.203 --orig_master_ip=192.168.1.203 --orig_master_port=3306 --command=stop
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.207/24===
Disabling the VIP on old master: 192.168.1.203
Thu May 3 23:30:02 2018 - [info] done.
Thu May 3 23:30:02 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu May 3 23:30:02 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] * Phase 3: Master Recovery Phase..
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000013:194
Thu May 3 23:30:02 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu May 3 23:30:02 2018 - [info] 192.168.1.204(192.168.1.204:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Not candidate for the new Master (no_master is set)
Thu May 3 23:30:02 2018 - [info] 192.168.1.205(192.168.1.205:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Thu May 3 23:30:02 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000013:194
Thu May 3 23:30:02 2018 - [info] Oldest slaves:
Thu May 3 23:30:02 2018 - [info] 192.168.1.204(192.168.1.204:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Not candidate for the new Master (no_master is set)
Thu May 3 23:30:02 2018 - [info] 192.168.1.205(192.168.1.205:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] * Phase 3.3: Determining New Master Phase..
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Thu May 3 23:30:02 2018 - [info] All slaves received relay logs to the same position. No need to resync each other.
Thu May 3 23:30:02 2018 - [info] Searching new master from slaves..
Thu May 3 23:30:02 2018 - [info] Candidate masters from the configuration file:
Thu May 3 23:30:02 2018 - [info] 192.168.1.205(192.168.1.205:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Thu May 3 23:30:02 2018 - [info] Non-candidate masters:
Thu May 3 23:30:02 2018 - [info] 192.168.1.204(192.168.1.204:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu May 3 23:30:02 2018 - [info] Replicating from 192.168.1.203(192.168.1.203:3306)
Thu May 3 23:30:02 2018 - [info] Not candidate for the new Master (no_master is set)
Thu May 3 23:30:02 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Thu May 3 23:30:02 2018 - [info] New master is 192.168.1.205(192.168.1.205:3306)
Thu May 3 23:30:02 2018 - [info] Starting master failover..
Thu May 3 23:30:02 2018 - [info]
From:
192.168.1.203(192.168.1.203:3306) (current master)
+--192.168.1.204(192.168.1.204:3306)
+--192.168.1.205(192.168.1.205:3306)
To:
192.168.1.205(192.168.1.205:3306) (new master)
+--192.168.1.204(192.168.1.204:3306)
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] * Phase 3.4: Master Log Apply Phase..
Thu May 3 23:30:02 2018 - [info]
Thu May 3 23:30:02 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Thu May 3 23:30:02 2018 - [info] Starting recovery on 192.168.1.205(192.168.1.205:3306)..
Thu May 3 23:30:02 2018 - [info] This server has all relay logs. Waiting all logs to be applied..
Thu May 3 23:30:02 2018 - [info] done.
Thu May 3 23:30:02 2018 - [info] All relay logs were successfully applied.
Thu May 3 23:30:02 2018 - [info] Getting new master's binlog name and position..
Thu May 3 23:30:02 2018 - [info] mysql-bin.000006:234
Thu May 3 23:30:02 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.205', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=234, MASTER_USER='REPL_USER', MASTER_PASSWORD='xxx';
Thu May 3 23:30:02 2018 - [info] Executing master IP activate script:
Thu May 3 23:30:02 2018 - [info] /etc/mha/mhamanager/app1/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.1.203 --orig_master_ip=192.168.1.203 --orig_master_port=3306 --new_master_host=192.168.1.205 --new_master_ip=192.168.1.205 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.207/24===
Enabling the VIP - 192.168.1.207/24 on the new master - 192.168.1.205
Thu May 3 23:30:10 2018 - [info] OK.
Thu May 3 23:30:10 2018 - [info] Setting read_only=0 on 192.168.1.205(192.168.1.205:3306)..
Thu May 3 23:30:10 2018 - [info] ok.
Thu May 3 23:30:10 2018 - [info] ** Finished master recovery successfully.
Thu May 3 23:30:10 2018 - [info] * Phase 3: Master Recovery Phase completed.
Thu May 3 23:30:10 2018 - [info]
Thu May 3 23:30:10 2018 - [info] * Phase 4: Slaves Recovery Phase..
Thu May 3 23:30:10 2018 - [info]
Thu May 3 23:30:10 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu May 3 23:30:10 2018 - [info]
Thu May 3 23:30:10 2018 - [info] -- Slave diff file generation on host 192.168.1.204(192.168.1.204:3306) started, pid: 7311. Check tmp log /etc/mha/mhamanager/app1/192.168.1.204_3306_20180503233002.log if it takes time..
Thu May 3 23:30:10 2018 - [info]
Thu May 3 23:30:10 2018 - [info] Log messages from 192.168.1.204 ...
Thu May 3 23:30:10 2018 - [info]
Thu May 3 23:30:10 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu May 3 23:30:10 2018 - [info] End of log messages from 192.168.1.204.
Thu May 3 23:30:10 2018 - [info] -- 192.168.1.204(192.168.1.204:3306) has the latest relay log events.
Thu May 3 23:30:10 2018 - [info] Generating relay diff files from the latest slave succeeded.
Thu May 3 23:30:10 2018 - [info]
Thu May 3 23:30:10 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu May 3 23:30:10 2018 - [info]
Thu May 3 23:30:10 2018 - [info] -- Slave recovery on host 192.168.1.204(192.168.1.204:3306) started, pid: 7313. Check tmp log /etc/mha/mhamanager/app1/192.168.1.204_3306_20180503233002.log if it takes time..
Thu May 3 23:30:11 2018 - [info]
Thu May 3 23:30:11 2018 - [info] Log messages from 192.168.1.204 ...
Thu May 3 23:30:11 2018 - [info]
Thu May 3 23:30:11 2018 - [info] Starting recovery on 192.168.1.204(192.168.1.204:3306)..
Thu May 3 23:30:11 2018 - [info] This server has all relay logs. Waiting all logs to be applied..
Thu May 3 23:30:11 2018 - [info] done.
Thu May 3 23:30:11 2018 - [info] All relay logs were successfully applied.
Thu May 3 23:30:11 2018 - [info] Resetting slave 192.168.1.204(192.168.1.204:3306) and starting replication from the new master 192.168.1.205(192.168.1.205:3306)..
Thu May 3 23:30:11 2018 - [info] Executed CHANGE MASTER.
Thu May 3 23:30:11 2018 - [info] Slave started.
Thu May 3 23:30:11 2018 - [info] End of log messages from 192.168.1.204.
Thu May 3 23:30:11 2018 - [info] -- Slave recovery on host 192.168.1.204(192.168.1.204:3306) succeeded.
Thu May 3 23:30:11 2018 - [info] All new slave servers recovered successfully.
Thu May 3 23:30:11 2018 - [info]
Thu May 3 23:30:11 2018 - [info] * Phase 5: New master cleanup phase..
Thu May 3 23:30:11 2018 - [info]
Thu May 3 23:30:11 2018 - [info] Resetting slave info on the new master..
Thu May 3 23:30:11 2018 - [info] 192.168.1.205: Resetting slave info succeeded.
Thu May 3 23:30:11 2018 - [info] Master failover to 192.168.1.205(192.168.1.205:3306) completed successfully.
Thu May 3 23:30:11 2018 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.1.203(192.168.1.203:3306) to 192.168.1.205(192.168.1.205:3306) succeeded
Master 192.168.1.203(192.168.1.203:3306) is down!
Check MHA Manager logs at mha-manager:/etc/mha/mhamanager/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.1.203(192.168.1.203:3306)
The latest slave 192.168.1.204(192.168.1.204:3306) has all relay logs for recovery.
Selected 192.168.1.205(192.168.1.205:3306) as a new master.
192.168.1.205(192.168.1.205:3306): OK: Applying all logs succeeded.
192.168.1.205(192.168.1.205:3306): OK: Activated master IP address.
192.168.1.204(192.168.1.204:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.1.204(192.168.1.204:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.1.205(192.168.1.205:3306)
192.168.1.205(192.168.1.205:3306): Resetting slave info succeeded.
Master failover to 192.168.1.205(192.168.1.205:3306) completed successfully.

切换成功 192.168.1.205 成为master节点

在新主库中执行一些事务
create table wwj.t4 select * from wwj.t1;
insert into wwj.t1 values(8,'fwefwef','天津');

重新启动原主库
mysqld_safe --defaults-file=/etc/my5.7_3306.cnf --user=mysql &
设置同步信息
CHANGE MASTER TO MASTER_HOST='192.168.1.205', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=234, MASTER_USER='REPL_USER', MASTER_PASSWORD='repl1234';
start slave;

成功将原主机加入集群！

重新启动mha 监控
nohup masterha_manager --conf= /etc/mha/mhamanager/app1/conf/app1.cnf --ignore_last_failover --remove_dead_master_conf > /tmp/mha_manager.log < /dev/null 2>&1 &
Fri May 4 06:46:25 2018 - [info] Starting ping health check on 192.168.1.205(192.168.1.205:3306)..
Fri May 4 06:46:25 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

3.手动failover
手动 failover，这种场景意味着在业务上没有启用 MHA 自动切换功能，当主服务器故障时，人工手动调用 MHA 来进行故障切换操作，进行手动切换命令如下:

注意:如果，MHA manager 检测到没有 dead 的 server，将报错，并结束 failover:

(1).关闭MHA manager
masterha_stop --conf= /etc/mha/mhamanager/app1/conf/app1.cnf

(2).执行切换命令
masterha_master_switch --master_state=dead --conf=/etc/mha/mhamanager/app1/conf/app1.cnf
--dead_master_host=192.168.1.205 --dead_master_port=3306 --new_master_host=192.168.1.203 --new_master_port=3306 --ignore_last_failover

4.MHA在线切换

MHA 在线切换是 MHA 除了自动监控切换换提供的另外一种方式，多用于诸如硬件升级， MySQL 数据库迁移等等。该方式提供快速切换和优雅的阻塞写入，无关关闭原有服务器，整个切换过程在 0.5-2s 的时间左右，大大减少了停机时间。

MHA 在线切换基本步骤:

a、检测 MHA 配置置及确认当前 master
b、决定新的 master
c、阻塞写入到当前 master
d、等待所有从服务器与现有 master 完成同步
e、在新 master 授予写权限，以及并行切换从库
f、重置原 master 为新 master 的 slave

(1).执行切换(需要先停止MHA)
masterha_master_switch --conf=/etc/mha/mhamanager/app1/conf/app1.cnf --master_state=alive --new_master_host=192.168.1.203 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0

--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点，如果不加此参数，原来的 master 将不启动
--running_updates_limit=10000 切换时候选 master 如果有延迟的话，mha 切换不能成功，加上此参数表示延迟在此时间范围内都可切换(单位为 s)，但是切换的时间长短是由 recover 时 relay 日志的大小决定

点击(此处)折叠或打开

Fri May 4 07:48:44 2018 - [info] MHA::MasterRotate version 0.57.
Fri May 4 07:48:44 2018 - [info] Starting online master switch..
Fri May 4 07:48:44 2018 - [info]
Fri May 4 07:48:44 2018 - [info] * Phase 1: Configuration Check Phase..
Fri May 4 07:48:44 2018 - [info]
Fri May 4 07:48:44 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri May 4 07:48:44 2018 - [info] Reading application default configuration from /etc/mha/mhamanager/app1/conf/app1.cnf..
Fri May 4 07:48:44 2018 - [info] Reading server configuration from /etc/mha/mhamanager/app1/conf/app1.cnf..
Fri May 4 07:48:45 2018 - [info] GTID failover mode = 0
Fri May 4 07:48:45 2018 - [info] Current Alive Master: 192.168.1.205(192.168.1.205:3306)
Fri May 4 07:48:45 2018 - [info] Alive Slaves:
Fri May 4 07:48:45 2018 - [info] 192.168.1.203(192.168.1.203:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Fri May 4 07:48:45 2018 - [info] Replicating from 192.168.1.205(192.168.1.205:3306)
Fri May 4 07:48:45 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Fri May 4 07:48:45 2018 - [info] 192.168.1.204(192.168.1.204:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Fri May 4 07:48:45 2018 - [info] Replicating from 192.168.1.205(192.168.1.205:3306)
Fri May 4 07:48:45 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info] Checking MHA is not monitoring or doing failover..
Fri May 4 07:48:45 2018 - [info] Checking replication health on 192.168.1.203..
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info] Checking replication health on 192.168.1.204..
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info] 192.168.1.203 can be new master.
Fri May 4 07:48:45 2018 - [info]
From:
192.168.1.205(192.168.1.205:3306) (current master)
+--192.168.1.203(192.168.1.203:3306)
+--192.168.1.204(192.168.1.204:3306)
To:
192.168.1.203(192.168.1.203:3306) (new master)
+--192.168.1.204(192.168.1.204:3306)
+--192.168.1.205(192.168.1.205:3306)
Fri May 4 07:48:45 2018 - [info] Checking whether 192.168.1.203(192.168.1.203:3306) is ok for the new master..
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info] 192.168.1.205(192.168.1.205:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Fri May 4 07:48:45 2018 - [info] 192.168.1.205(192.168.1.205:3306): Resetting slave pointing to the dummy host.
Fri May 4 07:48:45 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] * Phase 2: Rejecting updates Phase..
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] Executing master ip online change script to disable write on the current master:
Fri May 4 07:48:45 2018 - [info] /etc/mha/mhamanager/app1/scripts/master_ip_online_change --command=stop --orig_master_host=192.168.1.205 --orig_master_ip=192.168.1.205 --orig_master_port=3306 --orig_master_user='root' --new_master_host=192.168.1.203 --new_master_ip=192.168.1.203 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
***************************************************************
Disabling the VIP - 192.168.1.207/24 on old master: 192.168.1.205
***************************************************************
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Fri May 4 07:48:45 2018 - [info] Executing FLUSH TABLES WITH READ LOCK..
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info] Orig master binlog:pos is mysql-bin.000008:1180.
Fri May 4 07:48:45 2018 - [info] Waiting to execute all relay logs on 192.168.1.203(192.168.1.203:3306)..
Fri May 4 07:48:45 2018 - [info] master_pos_wait(mysql-bin.000008:1180) completed on 192.168.1.203(192.168.1.203:3306). Executed 0 events.
Fri May 4 07:48:45 2018 - [info] done.
Fri May 4 07:48:45 2018 - [info] Getting new master's binlog name and position..
Fri May 4 07:48:45 2018 - [info] mysql-bin.000016:194
Fri May 4 07:48:45 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.203', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000016', MASTER_LOG_POS=194, MASTER_USER='REPL_USER', MASTER_PASSWORD='xxx';
Fri May 4 07:48:45 2018 - [info] Executing master ip online change script to allow write on the new master:
Fri May 4 07:48:45 2018 - [info] /etc/mha/mhamanager/app1/scripts/master_ip_online_change --command=start --orig_master_host=192.168.1.205 --orig_master_ip=192.168.1.205 --orig_master_port=3306 --orig_master_user='root' --new_master_host=192.168.1.203 --new_master_ip=192.168.1.203 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
***************************************************************
Enabling the VIP - 192.168.1.207/24 on new master: 192.168.1.203
***************************************************************
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info] Setting read_only=0 on 192.168.1.203(192.168.1.203:3306)..
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] * Switching slaves in parallel..
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] -- Slave switch on host 192.168.1.204(192.168.1.204:3306) started, pid: 14205
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] Log messages from 192.168.1.204 ...
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] Waiting to execute all relay logs on 192.168.1.204(192.168.1.204:3306)..
Fri May 4 07:48:45 2018 - [info] master_pos_wait(mysql-bin.000008:1180) completed on 192.168.1.204(192.168.1.204:3306). Executed 0 events.
Fri May 4 07:48:45 2018 - [info] done.
Fri May 4 07:48:45 2018 - [info] Resetting slave 192.168.1.204(192.168.1.204:3306) and starting replication from the new master 192.168.1.203(192.168.1.203:3306)..
Fri May 4 07:48:45 2018 - [info] Executed CHANGE MASTER.
Fri May 4 07:48:45 2018 - [info] Slave started.
Fri May 4 07:48:45 2018 - [info] End of log messages from 192.168.1.204 ...
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] -- Slave switch on host 192.168.1.204(192.168.1.204:3306) succeeded.
Fri May 4 07:48:45 2018 - [info] Unlocking all tables on the orig master:
Fri May 4 07:48:45 2018 - [info] Executing UNLOCK TABLES..
Fri May 4 07:48:45 2018 - [info] ok.
Fri May 4 07:48:45 2018 - [info] Starting orig master as a new slave..
Fri May 4 07:48:45 2018 - [info] Resetting slave 192.168.1.205(192.168.1.205:3306) and starting replication from the new master 192.168.1.203(192.168.1.203:3306)..
Fri May 4 07:48:45 2018 - [info] Executed CHANGE MASTER.
Fri May 4 07:48:45 2018 - [info] Slave started.
Fri May 4 07:48:45 2018 - [info] All new slave servers switched successfully.
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] * Phase 5: New master cleanup phase..
Fri May 4 07:48:45 2018 - [info]
Fri May 4 07:48:45 2018 - [info] 192.168.1.203: Resetting slave info succeeded.
Fri May 4 07:48:45 2018 - [info] Switching master to 192.168.1.203(192.168.1.203:3306) completed successfully.