mysql mha keepalived_MySQL高可用方案 MHA之四 keepalived 半同步复制

主从架构(开启5.7的增强半同步模式)

master:

10.150.20.90      ed3jrdba90

slave:

10.150.20.97      ed3jrdba97

10.150.20.132    ed3jrdba132

manager:

10.150.20.95      ed3jrdba95

vip:10.150.20.200

MySQL:5.7

os:CentOS 7.3

网卡名:ens3

这里,ed3jrdba90、ed3jrdba97部署keepalived,vip绑定在ed3jrdba90,并且ed3jrdba90节点为MHA master主库,ed3jrdba97为candidate_master。

1:配置文件candidate_masterkeepalived.conf如下:

# cat /etc/keepalived/keepalived.conf

! Configuration File for keepalived

global_defs {

notification_email {

eric@gmail.cn

}

notification_email_from eric@gmail.cn

smtp_server 127.0.0.1

smtp_connect_timeout 30

router_id mysql_mha

}

vrrp_instance VI_1 {

state master

interface ens3

nopreempt

virtual_router_id 51

priority 100

advert_int 1

authentication {

auth_type PASS

auth_pass 1111

}

virtual_ipaddress {

10.150.20.200

}

}

virtual_server 10.150.20.200 33061 {

delay_loop 6

lb_algo wrr

lb_kind DR

nat_mask 255.255.255.0

persistence_timeout 50

protocol TCP

real_server 10.150.20.90 33061 {

weight 1

notify_down /etc/keepalived/shutdown_keepalived.sh

TCP_CHECK {

connect_timeout 10

nb_get_retry 3

connect_port 33061

}

}

}

# cat /etc/keepalived/shutdown_keepalived.sh

systemctl stop keepalived

检查vip绑定在ed3jrdba90的ens3:1

# ip a

1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

2: ens3: mtu 1500 qdisc pfifo_fast state UP qlen 1000

link/ether 54:52:00:01:61:2b brd ff:ff:ff:ff:ff:ff

inet 10.150.20.90/24 brd 10.150.20.255 scope global ens3

valid_lft forever preferred_lft forever

inet 10.150.20.200/32 scope global ens3

valid_lft forever preferred_lft forever

ed3jrdba95节点:

MHA manager配置文件

# vi /etc/mysql_mha/app1.cnf

[server default]

manager_workdir=/data/mysql_mha/app1

manager_log=/data/mysql_mha/app1-manager.log

master_binlog_dir=/data/mysql_33061/logs

user=mha_monitor

password=mha_monitor

ping_interval=5

remote_workdir=/data/mysql_mha/app1

master_ip_failover_script= /usr/local/bin/master_ip_failover

#master_ip_online_change_script= /usr/local/bin/master_ip_online_change

secondary_check_script=/usr/local/bin/masterha_secondary_check -s 10.150.20.97 -s 10.150.20.90

repl_user=replicator

repl_password=replicator

report_script=/usr/local/bin/send_report

shutdown_script=""

ssh_user=root

[server1]

hostname=10.150.20.90

port=33061

[server2]

hostname=10.150.20.97

port=33061

candidate_master=1

check_repl_delay=0

[server3]

hostname=10.150.20.132

port=33061

master_ip_failover脚本:

# cat /usr/local/bin/master_ip_failover

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

#!/usr/bin/env perluse strict;

use warnings FATAL=> 'all';

use Getopt::Long;

my (

$command,$ssh_user,$orig_master_host,$orig_master_ip,

$orig_master_port, $new_master_host, $new_master_ip,$new_master_port

);

my $vip= '10.150.20.200';

my $ssh_start_vip= "systemctl start keepalived";

my $ssh_stop_vip= "systemctl stop keepalived";

GetOptions('command=s' =>\$command,'ssh_user=s' =>\$ssh_user,'orig_master_host=s' =>\$orig_master_host,'orig_master_ip=s'=>\$orig_master_ip,'orig_master_port=i'=>\$orig_master_port,'new_master_host=s'=>\$new_master_host,'new_master_ip=s'=>\$new_master_ip,'new_master_port=i'=>\$new_master_port,

);

exit&main();

sub main {

print"\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";if ( $command eq "stop" || $command eq "stopssh") {

my $exit_code= 1;

eval {

print"Disabling the VIP on old master: $orig_master_host \n";&stop_vip();

$exit_code= 0;

};if($@) {

warn"Got Error: $@\n";

exit $exit_code;

}

exit $exit_code;

}

elsif ( $command eq"start") {

my $exit_code= 10;

eval {

print"Enabling the VIP - $vip on the new master - $new_master_host \n";&start_vip();

$exit_code= 0;

};if($@) {

warn $@;

exit $exit_code;

}

exit $exit_code;

}

elsif ( $command eq"status") {

print"Checking the Status of the script.. OK \n";

#`ssh $ssh_user\@cluster1 \"$ssh_start_vip \"`;

exit 0;

}else{&usage();

exit1;

}

}

# A simple system call that enable the VIP on the new master

sub start_vip() {

`ssh $ssh_user\@$new_master_host \"$ssh_start_vip \"`;

}

# A simple system call that disable the VIP on the old_master

sub stop_vip() {

return0unless ($ssh_user);

`ssh $ssh_user\@$orig_master_host \"$ssh_stop_vip \"`;

}

sub usage {

print"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";

}

master_ip_failover

检测mha复制环境:

# masterha_check_repl --conf=/etc/mysql_mha/app1.cnf

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

Thu Dec 13 15:53:37 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..

Thu Dec13 15:53:37 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..

Thu Dec13 15:53:37 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..

Thu Dec13 15:53:37 2018 - [info] MHA::MasterMonitor version 0.58.

Thu Dec13 15:53:39 2018 - [info] GTID failover mode = 0Thu Dec13 15:53:39 2018 - [info] Dead Servers:

Thu Dec13 15:53:39 2018 - [info] Alive Servers:

Thu Dec13 15:53:39 2018 - [info] 10.150.20.90(10.150.20.90:33061)

Thu Dec13 15:53:39 2018 - [info] 10.150.20.97(10.150.20.97:33061)

Thu Dec13 15:53:39 2018 - [info] 10.150.20.132(10.150.20.132:33061)

Thu Dec13 15:53:39 2018 - [info] Alive Slaves:

Thu Dec13 15:53:39 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 15:53:39 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 15:53:39 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 15:53:39 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 15:53:39 2018 - [info] Current Alive Master: 10.150.20.90(10.150.20.90:33061)

Thu Dec13 15:53:39 2018 - [info] Checking slave configurations..

Thu Dec13 15:53:39 2018 - [info] read_only=1 is not set on slave 10.150.20.97(10.150.20.97:33061).

Thu Dec13 15:53:39 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.97(10.150.20.97:33061).

Thu Dec13 15:53:39 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).

Thu Dec13 15:53:39 2018 - [info] Checking replication filtering settings..

Thu Dec13 15:53:39 2018 - [info] binlog_do_db= , binlog_ignore_db=Thu Dec13 15:53:39 2018 - [info] Replication filtering check ok.

Thu Dec13 15:53:39 2018 - [info] GTID (with auto-pos) is not supported

Thu Dec13 15:53:39 2018 - [info] Starting SSH connection tests..

Thu Dec13 15:53:41 2018 - [info] All SSH connection tests passed successfully.

Thu Dec13 15:53:41 2018 - [info] Checking MHA Node version..

Thu Dec13 15:53:42 2018 - [info] Version check ok.

Thu Dec13 15:53:42 2018 - [info] Checking SSH publickey authentication settings on the current master..

Thu Dec13 15:53:42 2018 - [info] HealthCheck: SSH to 10.150.20.90is reachable.

Thu Dec13 15:53:42 2018 - [info] Master MHA Node version is 0.58.

Thu Dec13 15:53:42 2018 - [info] Checking recovery script configurations on 10.150.20.90(10.150.20.90:33061)..

Thu Dec13 15:53:42 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000009Thu Dec13 15:53:42 2018 - [info] Connecting to root@10.150.20.90(10.150.20.90:22)..

Creating/data/mysql_mha/app1 ifnot exists.. ok.

Checking output directory is accessible or not..

ok.

Binlog found at/data/mysql_33061/logs, up to mysql-bin.000009Thu Dec13 15:53:42 2018 - [info] Binlog setting check done.

Thu Dec13 15:53:42 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Thu Dec13 15:53:42 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.97 --slave_ip=10.150.20.97 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx

Thu Dec13 15:53:42 2018 - [info] Connecting to root@10.150.20.97(10.150.20.97:22)..

Checking slave recovery environment settings..

Opening/data/mysql_33061/logs/relay-log.info... ok.

Relay log found at/data/mysql_33061/logs, up to relaylog.000002Temporary relay logfile is /data/mysql_33061/logs/relaylog.000002Checkingifsuper_read_only is defined and turned on.. not present or turned off, ignoring.

Testing mysql connection and privileges..

mysql: [Warning] Using a password on the command line interface can be insecure.done.

Testing mysqlbinlog output..done.

Cleaning up testfile(s).. done.

Thu Dec13 15:53:43 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx

Thu Dec13 15:53:43 2018 - [info] Connecting to root@10.150.20.132(10.150.20.132:22)..

Checking slave recovery environment settings..

Opening/data/mysql_33061/logs/relay-log.info... ok.

Relay log found at/data/mysql_33061/data, up to cgdb-relay-bin.000002Temporary relay logfile is /data/mysql_33061/data/cgdb-relay-bin.000002Checkingifsuper_read_only is defined and turned on.. not present or turned off, ignoring.

Testing mysql connection and privileges..

mysql: [Warning] Using a password on the command line interface can be insecure.done.

Testing mysqlbinlog output..done.

Cleaning up testfile(s).. done.

Thu Dec13 15:53:43 2018 - [info] Slaves settings check done.

Thu Dec13 15:53:43 2018 - [info]10.150.20.90(10.150.20.90:33061) (current master)+--10.150.20.97(10.150.20.97:33061)+--10.150.20.132(10.150.20.132:33061)

Thu Dec13 15:53:43 2018 - [info] Checking replication health on 10.150.20.97..

Thu Dec13 15:53:43 2018 - [info] ok.

Thu Dec13 15:53:43 2018 - [info] Checking replication health on 10.150.20.132..

Thu Dec13 15:53:43 2018 - [info] ok.

Thu Dec13 15:53:43 2018 - [info] Checking master_ip_failover_script status:

Thu Dec13 15:53:43 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061IN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===Checking the Status of the script.. OK

Thu Dec13 15:53:43 2018 - [info] OK.

Thu Dec13 15:53:43 2018 -[warning] shutdown_script is not defined.

Thu Dec13 15:53:43 2018 - [info] Got exit code 0(Not master dead).

MySQL Replication Health is OK.

输出

检测MHA manager是否开启:

# masterha_check_status --conf=/etc/mysql_mha/app1.cnf

app1 (pid:19724) is running(0:PING_OK), master:10.150.20.90

开启日志监控

# tail -f /data/mysql_mha/app1-manager.log

故障切换测试:

节点ed3jrdba90:关闭主库MySQL:

# ps -ef |grep mysql

root 18908 1 0 12月12 ? 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql_33061/data --pid-file=/data/mysql_33061/run/mysql.pid

mysql 20401 18908 0 12月12 ? 00:03:02 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql_33061/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysql_33061/logs/mysqlerror.log --open-files-limit=65535 --pid-file=/data/mysql_33061/run/mysql.pid --socket=/data/mysql_33061/run/mysql.sock --port=33061

root 31104 29976 0 15:57 pts/0 00:00:00 grep --color=auto mysql

# kill -9 20401 18908

节点ed3jrdba97:主库MySQL宕掉之后,vip飘到节点ed3jrdba97

# ip a

1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

2: ens3: mtu 1500 qdisc pfifo_fast state UP qlen 1000

link/ether 54:52:00:49:48:92 brd ff:ff:ff:ff:ff:ff

inet 10.150.20.97/24 brd 10.150.20.255 scope global ens3

valid_lft forever preferred_lft forever

inet 10.150.20.200/32 scope global ens3

valid_lft forever preferred_lft forever

节点ed3jrdba95:在MHA manager节点的日志显示了MHA切换过程

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

Thu Dec 13 16:37:54 2018 - [warning] Got error on MySQL select ping: 2006(MySQL server has gone away)

Thu Dec13 16:37:54 2018 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 10.150.20.97 -s 10.150.20.90 --user=root --master_host=10.150.20.90 --master_ip=10.150.20.90 --master_port=33061 --master_user=mha_monitor --master_password=mha_monitor --ping_type=SELECT

Thu Dec13 16:37:54 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin

Thu Dec13 16:37:55 2018 - [info] HealthCheck: SSH to 10.150.20.90is reachable.

Monitoring server10.150.20.97 is reachable, Master is not reachable from 10.150.20.97. OK.

Monitoring server10.150.20.90 is reachable, Master is not reachable from 10.150.20.90. OK.

Thu Dec13 16:37:55 2018 - [info] Master is not reachable from all other monitoring servers. Failover should start.

Thu Dec13 16:37:59 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on'10.150.20.90'(111))

Thu Dec 13 16:37:59 2018 - [warning] Connection failed 2 time(s)..

Thu Dec13 16:38:04 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on'10.150.20.90'(111))

Thu Dec 13 16:38:04 2018 - [warning] Connection failed 3 time(s)..

Thu Dec13 16:38:09 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on'10.150.20.90'(111))

Thu Dec 13 16:38:09 2018 - [warning] Connection failed 4 time(s)..

Thu Dec13 16:38:09 2018 - [warning] Master is not reachable from health checker!Thu Dec13 16:38:09 2018 - [warning] Master 10.150.20.90(10.150.20.90:33061) is not reachable!Thu Dec13 16:38:09 2018 -[warning] SSH is reachable.

Thu Dec13 16:38:09 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mysql_mha/app1.cnf again, and trying to connect to all servers to check server status..

Thu Dec13 16:38:09 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..

Thu Dec13 16:38:09 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..

Thu Dec13 16:38:09 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..

Thu Dec13 16:38:10 2018 - [info] GTID failover mode = 0Thu Dec13 16:38:10 2018 - [info] Dead Servers:

Thu Dec13 16:38:10 2018 - [info] 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:10 2018 - [info] Alive Servers:

Thu Dec13 16:38:10 2018 - [info] 10.150.20.97(10.150.20.97:33061)

Thu Dec13 16:38:10 2018 - [info] 10.150.20.132(10.150.20.132:33061)

Thu Dec13 16:38:10 2018 - [info] Alive Slaves:

Thu Dec13 16:38:10 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:10 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:10 2018 - [info] Primary candidate forthe new Master (candidate_master is set)

Thu Dec13 16:38:10 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:10 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:10 2018 - [info] Checking slave configurations..

Thu Dec13 16:38:10 2018 - [info] read_only=1 is not set on slave 10.150.20.97(10.150.20.97:33061).

Thu Dec13 16:38:10 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.97(10.150.20.97:33061).

Thu Dec13 16:38:10 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).

Thu Dec13 16:38:10 2018 - [info] Checking replication filtering settings..

Thu Dec13 16:38:10 2018 - [info] Replication filtering check ok.

Thu Dec13 16:38:10 2018 - [info] Master is down!Thu Dec13 16:38:10 2018 - [info] Terminating monitoring script.

Thu Dec13 16:38:10 2018 - [info] Got exit code 20(Master dead).

Thu Dec13 16:38:10 2018 - [info] MHA::MasterFailover version 0.58.

Thu Dec13 16:38:10 2018 - [info] Starting master failover.

Thu Dec13 16:38:10 2018 - [info]

Thu Dec13 16:38:10 2018 - [info] * Phase 1: Configuration Check Phase..

Thu Dec13 16:38:10 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] GTID failover mode = 0Thu Dec13 16:38:12 2018 - [info] Dead Servers:

Thu Dec13 16:38:12 2018 - [info] 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:12 2018 - [info] Checking master reachability via MySQL(doublecheck)...

Thu Dec13 16:38:12 2018 - [info] ok.

Thu Dec13 16:38:12 2018 - [info] Alive Servers:

Thu Dec13 16:38:12 2018 - [info] 10.150.20.97(10.150.20.97:33061)

Thu Dec13 16:38:12 2018 - [info] 10.150.20.132(10.150.20.132:33061)

Thu Dec13 16:38:12 2018 - [info] Alive Slaves:

Thu Dec13 16:38:12 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:12 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:12 2018 - [info] Primary candidate forthe new Master (candidate_master is set)

Thu Dec13 16:38:12 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:12 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:12 2018 - [info] Starting Non-GTID based failover.

Thu Dec13 16:38:12 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] ** Phase 1: Configuration Check Phase completed.

Thu Dec13 16:38:12 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] * Phase 2: Dead Master Shutdown Phase..

Thu Dec13 16:38:12 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] Forcing shutdown so that applications never connect to the current master..

Thu Dec13 16:38:12 2018 - [info] Executing master IP deactivation script:

Thu Dec13 16:38:12 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --command=stopssh --ssh_user=root

IN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===Disabling the VIP on old master:10.150.20.90Thu Dec13 16:38:12 2018 - [info] done.

Thu Dec13 16:38:12 2018 -[warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.

Thu Dec13 16:38:12 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.

Thu Dec13 16:38:12 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] * Phase 3: Master Recovery Phase..

Thu Dec13 16:38:12 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..

Thu Dec13 16:38:12 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000011:154Thu Dec13 16:38:12 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):

Thu Dec13 16:38:12 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:12 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:12 2018 - [info] Primary candidate forthe new Master (candidate_master is set)

Thu Dec13 16:38:12 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:12 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:12 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000011:154Thu Dec13 16:38:12 2018 - [info] Oldest slaves:

Thu Dec13 16:38:12 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:12 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:12 2018 - [info] Primary candidate forthe new Master (candidate_master is set)

Thu Dec13 16:38:12 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:12 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:12 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..

Thu Dec 13 16:38:12 2018 - [info]

Thu Dec13 16:38:12 2018 - [info] Fetching dead master's binary logs..

Thu Dec 13 16:38:12 2018 - [info] Executing command on the dead master 10.150.20.90(10.150.20.90:33061): save_binary_logs --command=save --start_file=mysql-bin.000011 --start_pos=154 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58Creating/data/mysql_mha/app1 ifnot exists.. ok.

Concat binary/relay logs from mysql-bin.000011 pos 154 to mysql-bin.000011 EOF into /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog ..

Binlog Checksum enabled

Dumping binlog format description event, from position0 to 154.. ok.

Dumping effective binlog data from/data/mysql_33061/logs/mysql-bin.000011 position 154 to tail(177).. ok.

Binlog Checksum enabled

Concat succeeded.

Thu Dec13 16:38:12 2018 - [info] scp from root@10.150.20.90:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog to local:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog succeeded.

Thu Dec13 16:38:13 2018 - [info] HealthCheck: SSH to 10.150.20.97is reachable.

Thu Dec13 16:38:13 2018 - [info] HealthCheck: SSH to 10.150.20.132is reachable.

Thu Dec13 16:38:13 2018 - [info]

Thu Dec13 16:38:13 2018 - [info] * Phase 3.3: Determining New Master Phase..

Thu Dec13 16:38:13 2018 - [info]

Thu Dec13 16:38:13 2018 - [info] Finding the latest slave that has all relay logs forrecovering other slaves..

Thu Dec13 16:38:13 2018 - [info] All slaves received relay logs to the same position. No need to resync each other.

Thu Dec13 16:38:13 2018 - [info] Searching new master from slaves..

Thu Dec13 16:38:13 2018 - [info] Candidate masters from the configuration file:

Thu Dec13 16:38:13 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:38:13 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)

Thu Dec13 16:38:13 2018 - [info] Primary candidate forthe new Master (candidate_master is set)

Thu Dec13 16:38:13 2018 - [info] Non-candidate masters:

Thu Dec13 16:38:13 2018 - [info] Searching from candidate_master slaves whichhave received the latest relay log events..

Thu Dec13 16:38:13 2018 - [info] New master is 10.150.20.97(10.150.20.97:33061)

Thu Dec13 16:38:13 2018 - [info] Starting master failover..

Thu Dec13 16:38:13 2018 - [info]

From:10.150.20.90(10.150.20.90:33061) (current master)+--10.150.20.97(10.150.20.97:33061)+--10.150.20.132(10.150.20.132:33061)

To:10.150.20.97(10.150.20.97:33061) (new master)+--10.150.20.132(10.150.20.132:33061)

Thu Dec13 16:38:13 2018 - [info]

Thu Dec13 16:38:13 2018 - [info] * Phase 3.4: New Master Diff Log Generation Phase..

Thu Dec13 16:38:13 2018 - [info]

Thu Dec13 16:38:13 2018 - [info] This server has all relay logs. No need to generate difffiles from the latest slave.

Thu Dec13 16:38:13 2018 - [info] Sending binlog..

Thu Dec13 16:38:14 2018 - [info] scp from local:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog to root@10.150.20.97:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog succeeded.

Thu Dec13 16:38:14 2018 - [info]

Thu Dec13 16:38:14 2018 - [info] * Phase 3.5: Master Log Apply Phase..

Thu Dec13 16:38:14 2018 - [info]

Thu Dec13 16:38:14 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.

Thu Dec13 16:38:14 2018 - [info] Starting recovery on 10.150.20.97(10.150.20.97:33061)..

Thu Dec13 16:38:14 2018 - [info] Generating diffs succeeded.

Thu Dec13 16:38:14 2018 - [info] Waiting untilall relay logs are applied.

Thu Dec13 16:38:14 2018 - [info] done.

Thu Dec13 16:38:14 2018 - [info] Getting slave status..

Thu Dec13 16:38:14 2018 - [info] This slave(10.150.20.97)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000011:154). No need to recover from Exec_Master_Log_Pos.

Thu Dec 13 16:38:14 2018 - [info] Connecting to the target slave host 10.150.20.97, running recover script..

Thu Dec13 16:38:14 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha_monitor' --slave_host=10.150.20.97 --slave_ip=10.150.20.97 --slave_port=33061 --apply_files=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --timestamp=20181213163810 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx

Thu Dec13 16:38:14 2018 - [info]

MySQL client version is5.7.21. Using --binary-mode.

Applying differential binary/relay log files /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog on 10.150.20.97:33061. This may take long time...

Applying log files succeeded.

Thu Dec13 16:38:14 2018 - [info] All relay logs were successfully applied.

Thu Dec13 16:38:14 2018 - [info] Getting new master's binlog name and position..

Thu Dec 13 16:38:14 2018 - [info] mysql-bin.000018:154Thu Dec13 16:38:14 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.150.20.97', MASTER_PORT=33061, MASTER_LOG_FILE='mysql-bin.000018', MASTER_LOG_POS=154, MASTER_USER='replicator', MASTER_PASSWORD='xxx';

Thu Dec13 16:38:14 2018 - [info] Executing master IP activate script:

Thu Dec13 16:38:14 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --new_master_host=10.150.20.97 --new_master_ip=10.150.20.97 --new_master_port=33061 --new_master_user='mha_monitor' --new_master_password=xxx

Unknown option: new_master_user

Unknown option: new_master_password

IN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===Enabling the VIP- 10.150.20.200 on the new master - 10.150.20.97Thu Dec13 16:38:14 2018 - [info] OK.

Thu Dec13 16:38:14 2018 - [info] **Finished master recovery successfully.

Thu Dec13 16:38:14 2018 - [info] * Phase 3: Master Recovery Phase completed.

Thu Dec13 16:38:14 2018 - [info]

Thu Dec13 16:38:14 2018 - [info] * Phase 4: Slaves Recovery Phase..

Thu Dec13 16:38:14 2018 - [info]

Thu Dec13 16:38:14 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..

Thu Dec13 16:38:14 2018 - [info]

Thu Dec13 16:38:14 2018 - [info] -- Slave diff file generation on host 10.150.20.132(10.150.20.132:33061) started, pid: 21368. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181213163810.log if it takes time..

Thu Dec13 16:38:15 2018 - [info]

Thu Dec13 16:38:15 2018 - [info] Log messages from 10.150.20.132...

Thu Dec13 16:38:15 2018 - [info]

Thu Dec13 16:38:14 2018 - [info] This server has all relay logs. No need to generate difffiles from the latest slave.

Thu Dec13 16:38:15 2018 - [info] End of log messages from 10.150.20.132.

Thu Dec13 16:38:15 2018 - [info] -- 10.150.20.132(10.150.20.132:33061) has the latest relay log events.

Thu Dec13 16:38:15 2018 - [info] Generating relay difffiles from the latest slave succeeded.

Thu Dec13 16:38:15 2018 - [info]

Thu Dec13 16:38:15 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..

Thu Dec13 16:38:15 2018 - [info]

Thu Dec13 16:38:15 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) started, pid: 21370. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181213163810.log if it takes time..

Thu Dec13 16:38:17 2018 - [info]

Thu Dec13 16:38:17 2018 - [info] Log messages from 10.150.20.132...

Thu Dec13 16:38:17 2018 - [info]

Thu Dec13 16:38:15 2018 - [info] Sending binlog..

Thu Dec13 16:38:16 2018 - [info] scp from local:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog to root@10.150.20.132:/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog succeeded.

Thu Dec13 16:38:16 2018 - [info] Starting recovery on 10.150.20.132(10.150.20.132:33061)..

Thu Dec13 16:38:16 2018 - [info] Generating diffs succeeded.

Thu Dec13 16:38:16 2018 - [info] Waiting untilall relay logs are applied.

Thu Dec13 16:38:16 2018 - [info] done.

Thu Dec13 16:38:16 2018 - [info] Getting slave status..

Thu Dec13 16:38:16 2018 - [info] This slave(10.150.20.132)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000011:154). No need to recover from Exec_Master_Log_Pos.

Thu Dec 13 16:38:16 2018 - [info] Connecting to the target slave host 10.150.20.132, running recover script..

Thu Dec13 16:38:16 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --apply_files=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --timestamp=20181213163810 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx

Thu Dec13 16:38:16 2018 - [info]

MySQL client version is5.7.21. Using --binary-mode.

Applying differential binary/relay log files /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181213163810.binlog on 10.150.20.132:33061. This may take long time...

Applying log files succeeded.

Thu Dec13 16:38:16 2018 - [info] All relay logs were successfully applied.

Thu Dec13 16:38:16 2018 - [info] Resetting slave 10.150.20.132(10.150.20.132:33061) and starting replication from the new master 10.150.20.97(10.150.20.97:33061)..

Thu Dec13 16:38:16 2018 - [info] Executed CHANGE MASTER.

Thu Dec13 16:38:16 2018 - [info] Slave started.

Thu Dec13 16:38:17 2018 - [info] End of log messages from 10.150.20.132.

Thu Dec13 16:38:17 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) succeeded.

Thu Dec13 16:38:17 2018 - [info] All new slave servers recovered successfully.

Thu Dec13 16:38:17 2018 - [info]

Thu Dec13 16:38:17 2018 - [info] * Phase 5: New master cleanup phase..

Thu Dec13 16:38:17 2018 - [info]

Thu Dec13 16:38:17 2018 - [info] Resetting slave infoon the new master..

Thu Dec13 16:38:17 2018 - [info] 10.150.20.97: Resetting slave infosucceeded.

Thu Dec13 16:38:17 2018 - [info] Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully.

Thu Dec13 16:38:17 2018 - [info] Deleted server1 entry from /etc/mysql_mha/app1.cnf .

Thu Dec13 16:38:17 2018 - [info]----- Failover Report -----app1: MySQL Master failover10.150.20.90(10.150.20.90:33061) to 10.150.20.97(10.150.20.97:33061) succeeded

Master10.150.20.90(10.150.20.90:33061) is down!Check MHA Manager logs at ed3jrdba95:/data/mysql_mha/app1-manager.log fordetails.

Started automated(non-interactive) failover.

Invalidated master IP address on10.150.20.90(10.150.20.90:33061)

The latest slave10.150.20.97(10.150.20.97:33061) has all relay logs forrecovery.

Selected10.150.20.97(10.150.20.97:33061) as a new master.10.150.20.97(10.150.20.97:33061): OK: Applying all logs succeeded.10.150.20.97(10.150.20.97:33061): OK: Activated master IP address.10.150.20.132(10.150.20.132:33061): This host has the latest relay log events.

Generating relaydifffiles from the latest slave succeeded.10.150.20.132(10.150.20.132:33061): OK: Applying all logs succeeded. Slave started, replicating from 10.150.20.97(10.150.20.97:33061)10.150.20.97(10.150.20.97:33061): Resetting slave infosucceeded.

Master failover to10.150.20.97(10.150.20.97:33061) completed successfully.

failover过程

MHA manager节点的masterha_manager已经停掉

# masterha_check_status --conf=/etc/mysql_mha/app1.cnf

app1 is stopped(2:NOT_RUNNING).

app1.cnf文件被修改为:

# cat /etc/mysql_mha/app1.cnf

[server default]

manager_log=/data/mysql_mha/app1-manager.log

manager_workdir=/data/mysql_mha/app1

master_binlog_dir=/data/mysql_33061/logs

master_ip_online_change_script=/usr/local/bin/master_ip_online_change

password=mha_monitor

ping_interval=5

remote_workdir=/data/mysql_mha/app1

repl_password=replicator

repl_user=replicator

shutdown_script=""

ssh_user=root

user=mha_monitor

[server2]

hostname=10.150.20.97

port=33061

[server3]

hostname=10.150.20.132

port=33061

此时的复制关系为:

新主:10.150.20.97

从库:10.150.20.132

# masterha_check_repl --conf=/etc/mysql_mha/app1.cnf

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

Thu Dec 13 16:03:35 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..

Thu Dec13 16:03:35 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..

Thu Dec13 16:03:35 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..

Thu Dec13 16:03:35 2018 - [info] MHA::MasterMonitor version 0.58.

Thu Dec13 16:03:36 2018 - [info] GTID failover mode = 0Thu Dec13 16:03:36 2018 - [info] Dead Servers:

Thu Dec13 16:03:36 2018 - [info] Alive Servers:

Thu Dec13 16:03:36 2018 - [info] 10.150.20.97(10.150.20.97:33061)

Thu Dec13 16:03:36 2018 - [info] 10.150.20.132(10.150.20.132:33061)

Thu Dec13 16:03:36 2018 - [info] Alive Slaves:

Thu Dec13 16:03:36 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled

Thu Dec13 16:03:36 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061)

Thu Dec13 16:03:36 2018 - [info] Current Alive Master: 10.150.20.97(10.150.20.97:33061)

Thu Dec13 16:03:36 2018 - [info] Checking slave configurations..

Thu Dec13 16:03:36 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).

Thu Dec13 16:03:36 2018 - [info] Checking replication filtering settings..

Thu Dec13 16:03:36 2018 - [info] binlog_do_db= , binlog_ignore_db=Thu Dec13 16:03:36 2018 - [info] Replication filtering check ok.

Thu Dec13 16:03:36 2018 - [info] GTID (with auto-pos) is not supported

Thu Dec13 16:03:36 2018 - [info] Starting SSH connection tests..

Thu Dec13 16:03:37 2018 - [info] All SSH connection tests passed successfully.

Thu Dec13 16:03:37 2018 - [info] Checking MHA Node version..

Thu Dec13 16:03:38 2018 - [info] Version check ok.

Thu Dec13 16:03:38 2018 - [info] Checking SSH publickey authentication settings on the current master..

Thu Dec13 16:03:38 2018 - [info] HealthCheck: SSH to 10.150.20.97is reachable.

Thu Dec13 16:03:38 2018 - [info] Master MHA Node version is 0.58.

Thu Dec13 16:03:38 2018 - [info] Checking recovery script configurations on 10.150.20.97(10.150.20.97:33061)..

Thu Dec13 16:03:38 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000015Thu Dec13 16:03:38 2018 - [info] Connecting to root@10.150.20.97(10.150.20.97:22)..

Creating/data/mysql_mha/app1 ifnot exists.. ok.

Checking output directory is accessible or not..

ok.

Binlog found at/data/mysql_33061/logs, up to mysql-bin.000015Thu Dec13 16:03:38 2018 - [info] Binlog setting check done.

Thu Dec13 16:03:38 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Thu Dec13 16:03:38 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx

Thu Dec13 16:03:38 2018 - [info] Connecting to root@10.150.20.132(10.150.20.132:22)..

Checking slave recovery environment settings..

Opening/data/mysql_33061/logs/relay-log.info... ok.

Relay log found at/data/mysql_33061/data, up to cgdb-relay-bin.000002Temporary relay logfile is /data/mysql_33061/data/cgdb-relay-bin.000002Checkingifsuper_read_only is defined and turned on.. not present or turned off, ignoring.

Testing mysql connection and privileges..

mysql: [Warning] Using a password on the command line interface can be insecure.done.

Testing mysqlbinlog output..done.

Cleaning up testfile(s).. done.

Thu Dec13 16:03:39 2018 - [info] Slaves settings check done.

Thu Dec13 16:03:39 2018 - [info]10.150.20.97(10.150.20.97:33061) (current master)+--10.150.20.132(10.150.20.132:33061)

Thu Dec13 16:03:39 2018 - [info] Checking replication health on 10.150.20.132..

Thu Dec13 16:03:39 2018 - [info] ok.

Thu Dec13 16:03:39 2018 - [info] Checking master_ip_failover_script status:

Thu Dec13 16:03:39 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.97 --orig_master_ip=10.150.20.97 --orig_master_port=33061IN SCRIPT TEST====systemctl stop keepalived==systemctl start keepalived===Checking the Status of the script.. OK

Thu Dec13 16:03:39 2018 - [info] OK.

Thu Dec13 16:03:39 2018 -[warning] shutdown_script is not defined.

Thu Dec13 16:03:39 2018 - [info] Got exit code 0(Not master dead).

MySQL Replication Health is OK.

主从发生变化

实战:https://yq.aliyun.com/articles/58920

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值