MHA+VIP+MySQL 安装与配置

7 篇文章 0 订阅
4 篇文章 0 订阅

一、搭建环境:

OS:Linux.el6uek.x86_64

MySQL:5.7.26

MHA:0.56

 

二、模式:

1 主 2 从 1 Manager

192.168.65.2 test2 # master MHA Node

192.168.65.3 test3 # slave1 MHA Node

192.168.65.4 test4 # slave2 MHA Node

192.168.65.5 test5 # MHA Manger

 

三、配置主从复制

本文重点不在主从复制(GTID),故此步省略,有需要可以看我其他文章。

PS:从库一定要配置 set global read_only=1 和 set global relay_log_purge=0

 

四、配置 ssh 互信

ps:一下步骤凡是标注为 ALL 的,则表示每台服务器都需执行。

 

1.修改 hosts 文件(ALL)

# vi /etc/hosts 添加如下内容:

192.168.65.2 test2

192.168.65.3 test3

192.168.65.5 test5

192.168.65.4 test4

 

2.配置 ssh 互信(ALL)

输入 ssh-keygen -t rsa 命令,按三下回车,如下:

 

[root@test2 ~]# ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

ea:55:f3:80:99:ae:4d:40:ee:62:11:fd:f1:df:38:68 root@test2

The key's randomart image is:

+--[ RSA 2048]----+

| |

| . |

| . o . |

| + . * |

| . o S = |

| o + . * o |

| o o + E = . |

| . o = . . |

| o . |

+-----------------+

 

然后

[root@test2 ~]# ssh-copy-id 192.168.65.2

[root@test2 ~]# ssh-copy-id 192.168.65.3

[root@test2 ~]# ssh-copy-id 192.168.65.4

[root@test2 ~]# ssh-copy-id 192.168.65.5

 

3.验证互信(ALL)

[root@test2 ~]# ssh 192.168.65.2

[root@test2 ~]# ssh 192.168.65.3

[root@test2 ~]# ssh 192.168.65.4

[root@test2 ~]# ssh 192.168.65.5

如果都显示时间并连过去,则表示成功。

 

五、安装 MHA Manager 和 MHA Node

1.安装依赖包(ALL)

yum install -y perl-DBD-MySQL perl-Params-Validate

 

2.安装 MHA Node(ALL)

[root@test5 mha_soft]# yum install -y mha4mysql-node-0.56-0.el6.noarch.rpm

 

3.安装 MHA Manger (管理节点:192.168.65.5)

a.安装依赖包

[root@test5 mha_soft]# yum install -y

compat-db43-4.3.29-15.el6.x86_64.rpm

perl-Config-Tiny-2.12-7.1.el6.noarch.rpm

perl-Email-Date-Format-1.002-5.el6.noarch.rpm

perl-Log-Dispatch-2.27-1.el6.noarch.rpm

perl-Mail-Sender-0.8.16-3.el6.noarch.rpm perl-Mail-Sendmail-0.79-12.el6.noarch.rpm

perl-MIME-Lite-3.027-2.el6.noarch.rpm

perl-MIME-Types-1.28-2.el6.noarch.rpm

perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm

 

b.安装 MHA Manager

[root@test5 mha_soft]# yum install -y mha4mysql-manager-0.56-0.el6.noarch.rpm

 

c.查看是否安装成功

[root@test5 mha_soft]# masterha_manager --version

 

六、配置 MHA Manager 参数文件

[root@test5 ]# mkdir /etc/masterha/app1/ -p

[root@test5 ]# cd

[root@test5 app1]# vi app1.cnf

[server default]

user=root

password=123

manager_workdir=/etc/masterha/app1

manager_log=/etc/masterha/app1/manager.log

remote_workdir = /etc/masterha/app1

ssh_user=root

repl_user=rpl

repl_password=123

ping_interval=1

master_ip_failover_script=/etc/masterha/app1/master_ip_failover #master failover时执行,不配置vip时不用配

#shutdown_script=/etc/masterha/power_manager

#report_script=/etc/masterha/app1/send_report  #master failover时执行,可选

master_ip_online_change_script=/etc/masterha/app1/master_ip_online_change #masterswitchover时执行,不配置vip时不用配

 

[server1]

hostname=192.168.65.2

port=3306

master_binlog_dir=/data/mysql/data

check_repl_delay=0

candidate_master=1

 

[server2]

hostname=192.168.65.3

port=3306

master_binlog_dir=/data/mysql/data

candidate_master=1  #如果候选master有延迟的话,relay日志超过100m,failover切换不能成功,加上此参数后会忽略延迟日志大小。

check_repl_delay=0

 

[server3]

hostname=192.168.65.4

port=3308

master_binlog_dir=/data/mysql/logs

check_repl_delay=0

ignore_fail=1 #如果这个节点挂了,mha将不可用,加上这个参数,slave挂了一样可以用

no_master=1 #从不将这台主机转换为master

 

PS:

1.这里备注是让大家明白参数的含义,但正式的配置文件中一定不能有中文,否则后面的检查不能通过。

2.标红色的2个参数,是可选参数,如果没有该脚本或不需要这个功能,可以从脚本中去掉或注释即可。

3.脚本请参考另一篇文章,如果不配置,请注释,否则下一步的复制检查将不能通过。

4.不行编辑配置文件的,可以通过命令添加:

masterha_conf_host --command=add --conf=/etc/conf/masterha/app1.cnf --hostname=db101 --block=server100 --params="no_master=1;ignore_fail=1"

--block :指定段落名,就是 [ ] 里面的显示内容,如:[Server1]

 

删除:masterha_conf_host --command=delete --conf=/etc/conf/masterha/app1.cnf --block=server100

参数解释:

--command=delete|add #从一个配置文件中添加/删除一个主机记录

--block #显式指定配置文件中,主机配置文件的段落标记

--conf #指定本地配置文件路径

--hostname #指定添加记录的主机名或者IP

--params #指定参数列表,使用分号作为参数间隔,所有参数使用引号引起来

 

七、状态检查

1.ssh 互信检查

[root@test5 app1]# masterha_check_ssh --conf=/etc/masterha/app1/app1.cnf

Sat Jul 20 09:44:47 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sat Jul 20 09:44:47 2019 - [info] Reading application default configuration from /etc/masterha/app1/app1.cnf..

Sat Jul 20 09:44:47 2019 - [info] Reading server configuration from /etc/masterha/app1/app1.cnf..

Sat Jul 20 09:44:47 2019 - [info] Starting SSH connection tests..

Sat Jul 20 09:44:48 2019 - [debug]

Sat Jul 20 09:44:47 2019 - [debug] Connecting via SSH from root@192.168.65.2(192.168.65.2:22) to

root@192.168.65.3(192.168.65.3:22)..

Sat Jul 20 09:44:47 2019 - [debug] ok.

Sat Jul 20 09:44:47 2019 - [debug] Connecting via SSH from root@192.168.65.2(192.168.65.2:22) to

root@192.168.65.4(192.168.65.4:22)..

Sat Jul 20 09:44:48 2019 - [debug] ok.

Sat Jul 20 09:44:48 2019 - [debug]

Sat Jul 20 09:44:47 2019 - [debug] Connecting via SSH from root@192.168.65.3(192.168.65.3:22) to

root@192.168.65.2(192.168.65.2:22)..

Sat Jul 20 09:44:48 2019 - [debug] ok.

Sat Jul 20 09:44:48 2019 - [debug] Connecting via SSH from root@192.168.65.3(192.168.65.3:22) to

root@192.168.65.4(192.168.65.4:22)..

Sat Jul 20 09:44:48 2019 - [debug] ok.

Sat Jul 20 09:44:49 2019 - [debug]

Sat Jul 20 09:44:48 2019 - [debug] Connecting via SSH from root@192.168.65.4(192.168.65.4:22) to

root@192.168.65.2(192.168.65.2:22)..

Sat Jul 20 09:44:48 2019 - [debug] ok.

Sat Jul 20 09:44:48 2019 - [debug] Connecting via SSH from root@192.168.65.4(192.168.65.4:22) to

root@192.168.65.3(192.168.65.3:22)..

Sat Jul 20 09:44:49 2019 - [debug] ok.

Sat Jul 20 09:44:49 2019 - [info] All SSH connection tests passed successfully.

出现 successfully 即可。

 

2.检查复制状态

[root@test5 app1]# masterha_check_repl --conf=/etc/masterha/app1/app1.cnf

Sat Jul 20 10:02:30 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sat Jul 20 10:02:30 2019 - [info] Reading application default configuration from /etc/masterha/app1/app1.cnf..

Sat Jul 20 10:02:30 2019 - [info] Reading server configuration from /etc/masterha/app1/app1.cnf..

Sat Jul 20 10:02:30 2019 - [info] MHA::MasterMonitor version 0.56.

Sat Jul 20 10:02:31 2019 - [info] GTID failover mode = 1

Sat Jul 20 10:02:31 2019 - [info] Dead Servers:

Sat Jul 20 10:02:31 2019 - [info] Alive Servers:

Sat Jul 20 10:02:31 2019 - [info] 192.168.65.2(192.168.65.2:3306)

Sat Jul 20 10:02:31 2019 - [info] 192.168.65.3(192.168.65.3:3306)

Sat Jul 20 10:02:31 2019 - [info] 192.168.65.4(192.168.65.4:3308)

Sat Jul 20 10:02:31 2019 - [info] Alive Slaves:

Sat Jul 20 10:02:31 2019 - [info] 192.168.65.3(192.168.65.3:3306) Version=5.7.26-log (oldest major version between

slaves) log-bin:enabled

Sat Jul 20 10:02:31 2019 - [info] GTID ON

Sat Jul 20 10:02:31 2019 - [info] Replicating from 192.168.65.2(192.168.65.2:3306)

Sat Jul 20 10:02:31 2019 - [info] Primary candidate for the new Master (candidate_master is set)

Sat Jul 20 10:02:31 2019 - [info] 192.168.65.4(192.168.65.4:3308) Version=5.7.26-log (oldest major version between

slaves) log-bin:enabled

Sat Jul 20 10:02:31 2019 - [info] GTID ON

Sat Jul 20 10:02:31 2019 - [info] Replicating from 192.168.65.2(192.168.65.2:3306)

Sat Jul 20 10:02:31 2019 - [info] Not candidate for the new Master (no_master is set)

Sat Jul 20 10:02:31 2019 - [info] Current Alive Master: 192.168.65.2(192.168.65.2:3306)

Sat Jul 20 10:02:31 2019 - [info] Checking slave configurations..

Sat Jul 20 10:02:31 2019 - [info] read_only=1 is not set on slave 192.168.65.3(192.168.65.3:3306).

Sat Jul 20 10:02:31 2019 - [info] read_only=1 is not set on slave 192.168.65.4(192.168.65.4:3308).

Sat Jul 20 10:02:31 2019 - [info] Checking replication filtering settings..

Sat Jul 20 10:02:31 2019 - [info] binlog_do_db= , binlog_ignore_db=

Sat Jul 20 10:02:31 2019 - [info] Replication filtering check ok.

Sat Jul 20 10:02:31 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.

Sat Jul 20 10:02:31 2019 - [info] Checking SSH publickey authentication settings on the current master..

Sat Jul 20 10:02:31 2019 - [info] HealthCheck: SSH to 192.168.65.2 is reachable.

Sat Jul 20 10:02:31 2019 - [info]

192.168.65.2(192.168.65.2:3306) (current master)

+--192.168.65.3(192.168.65.3:3306)

+--192.168.65.4(192.168.65.4:3308)

 

Sat Jul 20 10:02:31 2019 - [info] Checking replication health on 192.168.65.3..

Sat Jul 20 10:02:31 2019 - [info] ok.

Sat Jul 20 10:02:31 2019 - [info] Checking replication health on 192.168.65.4..

Sat Jul 20 10:02:31 2019 - [info] ok.

Sat Jul 20 10:02:31 2019 - [info] Checking master_ip_failover_script status:

Sat Jul 20 10:02:31 2019 - [info] /etc/masterha/app1/master_ip_failover --command=status --ssh_user=root --

orig_master_host=192.168.65.2 --orig_master_ip=192.168.65.2 --orig_master_port=3306

Checking the Status of the script.. OK

Sat Jul 20 10:02:31 2019 - [info] OK.

Sat Jul 20 10:02:31 2019 - [warning] shutdown_script is not defined.

Sat Jul 20 10:02:31 2019 - [info] Got exit code 0 (Not master dead).

 

MySQL Replication Health is OK.

 

八、启动 & 停止 & 状态

1.启动

[root@test5 app1]# nohup masterha_manager --conf=/etc/masterha/app1/app1.cnf > /etc/masterha/app1/mha_manager.log 2>&1 &

PS:启动后,你在master_ip_failover_script 和 master_ip_online_change_script 中配置的 VIP 就会在 master 上产生

[root@test2 mha]# ifconfig

eth0 Link encap:Ethernet HWaddr 00:0C:29:DB:FC:D1

inet addr:192.168.65.2 Bcast:192.168.65.255 Mask:255.255.255.0

inet6 addr: fe80::20c:29ff:fedb:fcd1/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:1882124 errors:0 dropped:0 overruns:0 frame:0

TX packets:1869349 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:320631172 (305.7 MiB) TX bytes:320699318 (305.8 MiB)

 

eth0:1 Link encap:Ethernet HWaddr 00:0C:29:DB:FC:D1

inet addr:192.168.65.21 Bcast:192.168.65.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:292173 errors:0 dropped:0 overruns:0 frame:0

TX packets:292173 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:64526655 (61.5 MiB) TX bytes:64526655 (61.5 MiB)

 

2.状态

[root@test5 app1]# masterha_check_status --conf=/etc/masterha/app1/app1.cnf

app1 (pid:14405) is running(0:PING_OK), master:192.168.65.2

 

3.停止

[root@test5 app1]# masterha_stop --conf=/etc/masterha/app1/app1.cnf

Stopped app1 successfully.

[1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1/app1.cnf > /etc/masterha/app1/mha_manager.log 2>&1

[root@test5 app1]# masterha_check_status --conf=/etc/masterha/app1/app1.cnf

app1 is stopped(2:NOT_RUNNING).

 

九、测试故障转移

1. master 宕机自动故障转移

a. kil 主库 mysql

[root@test2 mha]# pkill mysqld

[root@test2 mha]# ps -ef |grep mysql

root 13471 13086 0 10:19 pts/2 00:00:00 grep mysql

 

b. 查看 VIP

[root@test2 mha]# ifconfig

eth0 Link encap:Ethernet HWaddr 00:0C:29:DB:FC:D1

inet addr:192.168.65.2 Bcast:192.168.65.255 Mask:255.255.255.0

inet6 addr: fe80::20c:29ff:fedb:fcd1/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:1883143 errors:0 dropped:0 overruns:0 frame:0

TX packets:1869929 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:320716057 (305.8 MiB) TX bytes:320778073 (305.9 MiB)

 

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:292473 errors:0 dropped:0 overruns:0 frame:0

TX packets:292473 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:64558215 (61.5 MiB) TX bytes:64558215 (61.5 MiB)

 

[root@test3 mha]# ifconfig

eth0 Link encap:Ethernet HWaddr 00:0C:29:8F:0E:C5

inet addr:192.168.65.3 Bcast:192.168.65.255 Mask:255.255.255.0

inet6 addr: fe80::20c:29ff:fe8f:ec5/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:1821570 errors:0 dropped:0 overruns:0 frame:0

TX packets:1796331 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:301351935 (287.3 MiB) TX bytes:305148308 (291.0 MiB)

 

eth0:1 Link encap:Ethernet HWaddr 00:0C:29:8F:0E:C5

inet addr:192.168.65.21 Bcast:192.168.65.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:251432 errors:0 dropped:0 overruns:0 frame:0

TX packets:251432 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:56029214 (53.4 MiB) TX bytes:56029214 (53.4 MiB)

 

可以看到,这里 VIP 已经漂移到 192.168.65.3 上了(MHA 配置文件中定义了 192.168.65.3 是候选 master,所以会飘到 65.3上)

 

c. 查看 MHA Manager 日志

[root@test5 etc]# tail -f masterha/app1/manager.log

----- Failover Report -----

 

app1: MySQL Master failover 192.168.65.2(192.168.65.2:3306) to 192.168.65.3(192.168.65.3:3306) succeeded

 

Master 192.168.65.2(192.168.65.2:3306) is down!

 

Check MHA Manager logs at test5:/etc/masterha/app1/manager.log for details.

 

Started automated(non-interactive) failover.

Invalidated master IP address on 192.168.65.2(192.168.65.2:3306)

Selected 192.168.65.3(192.168.65.3:3306) as a new master.

192.168.65.3(192.168.65.3:3306): OK: Applying all logs succeeded.

192.168.65.3(192.168.65.3:3306): OK: Activated master IP address.

192.168.65.4(192.168.65.4:3308): OK: Slave started, replicating from 192.168.65.3(192.168.65.3:3306)

192.168.65.3(192.168.65.3:3306): Resetting slave info succeeded.

Master failover to 192.168.65.3(192.168.65.3:3306) completed successfully.

 

PS:这里日志截图只是一部分,最后显示 192.168.65.3 已经成功切换为新的 master了

 

d.查看实际数据库复制状态

mysql> select @@report_host;

+---------------+

| @@report_host |

+---------------+

| 192.168.65.4 |

+---------------+

1 row in set (0.00 sec)

 

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.65.3

Master_User: rpl

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: bin.000001

Read_Master_Log_Pos: 586

Relay_Log_File: relay.000002

Relay_Log_Pos: 787

Relay_Master_Log_File: bin.000001

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

 

 

可以看到 192.168.65.4 的复制关系已自动转移到新 master上。

 

2.故障机器恢复后重新加入集群

# 如果old master的Executed_Gtid_Set大于新主库中对应UUID哪一个的事物号,那么就表示,新主库丢了数据了,这个时候你

需要自行从旧主库上提取这部分数据差异出来,然后与应用研发核对这部分数据是否需要补回新主库中,这个比较蛋疼,补完数据

之后,还得把旧主库的数据干掉重新做成从库;如果旧主库的GTID与新主库对应的UUID那一个GTID相等,那么恭喜你 ,表示数据

并没有丢失。

PS:如果你是使用的传统复制,那么后续旧主库恢复之后,有没有多数据需要去切换日志里边找 latest binary log

file/position记录,这个就是主库挂掉之后,其他从库所拥有的最新的对应旧主库的binlog pos,然后,使用mysqlbinlog在旧

主库上提取这个位置之后的数据,再人工确认这些数据是否要补到新主库上。

 

a.在 MHA Manager 节点执行下面语句找到 change master 语句

[root@test5 app1]# grep 'CHANGE MASTER TO' /etc/masterha/app1/manager.log |tail -1

Sat Jul 20 10:16:45 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE

MASTER TO MASTER_HOST='192.168.65.3', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rpl',

MASTER_PASSWORD='xxx';

 

b.然后在 old master 上执行此语句(xxx换成复制用户的密码)

mysql> CHANGE

-> MASTER TO MASTER_HOST='192.168.65.3', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rpl',

-> MASTER_PASSWORD='123';

Query OK, 0 rows affected, 2 warnings (0.04 sec)

 

mysql> start slave;

Query OK, 0 rows affected (0.01 sec)

 

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.65.3

Master_User: rpl

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: bin.000001

Read_Master_Log_Pos: 586

Relay_Log_File: relay.000002

Relay_Log_Pos: 787

Relay_Master_Log_File: bin.000001

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

 

可以看到复制重新建立,并指向新的 master。

 

3. master 未宕机手工故障转移

a.检查 MHA 运行状态

[root@test5 app1]# masterha_check_status --conf=/etc/masterha/app1/app1.cnf

app1 is stopped(2:NOT_RUNNING).

 

b.删除 app1.failover.complete 文件

[root@test5 app1]# rm -rf app1.failover.complete

 

PS:手工切换前提,

1.MHA Manager 必须是停止状态;

2.配置文件中 manager_workdir 这个参数的路径下,不能有 app1.failover.complete 这个文件,此

文件代表故障转移完成,完成的同时 MHA Manager 会中关闭;

 

c.在线切换

[root@test5 app1]# masterha_master_switch --conf=/etc/masterha/app1/app1.cnf --master_state=alive --

new_master_host=192.168.65.2 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=100

Sat Jul 20 10:50:24 2019 - [info] ok.

Sat Jul 20 10:50:24 2019 - [info]

Sat Jul 20 10:50:24 2019 - [info] * Switching slaves in parallel..

Sat Jul 20 10:50:24 2019 - [info]

Sat Jul 20 10:50:24 2019 - [info] -- Slave switch on host 192.168.65.4(192.168.65.4:3308) started, pid: 14930

Sat Jul 20 10:50:24 2019 - [info]

Sat Jul 20 10:50:24 2019 - [info] Log messages from 192.168.65.4 ...

Sat Jul 20 10:50:24 2019 - [info]

Sat Jul 20 10:50:24 2019 - [info] Waiting to execute all relay logs on 192.168.65.4(192.168.65.4:3308)..

Sat Jul 20 10:50:24 2019 - [info] master_pos_wait(bin.000001:586) completed on 192.168.65.4(192.168.65.4:3308). Executed 0

events.

Sat Jul 20 10:50:24 2019 - [info] done.

Sat Jul 20 10:50:24 2019 - [info] Resetting slave 192.168.65.4(192.168.65.4:3308) and starting replication from the new

master 192.168.65.2(192.168.65.2:3306)..

Sat Jul 20 10:50:24 2019 - [info] Executed CHANGE MASTER.

Sat Jul 20 10:50:24 2019 - [info] Slave started.

Sat Jul 20 10:50:24 2019 - [info] End of log messages from 192.168.65.4 ...

Sat Jul 20 10:50:24 2019 - [info]

Sat Jul 20 10:50:24 2019 - [info] -- Slave switch on host 192.168.65.4(192.168.65.4:3308) succeeded.

Sat Jul 20 10:50:24 2019 - [info] Unlocking all tables on the orig master:

Sat Jul 20 10:50:24 2019 - [info] Executing UNLOCK TABLES..

Sat Jul 20 10:50:24 2019 - [info] ok.

Sat Jul 20 10:50:24 2019 - [info] Starting orig master as a new slave..

Sat Jul 20 10:50:24 2019 - [info] Resetting slave 192.168.65.3(192.168.65.3:3306) and starting replication from the new

master 192.168.65.2(192.168.65.2:3306)..

Sat Jul 20 10:50:24 2019 - [info] Executed CHANGE MASTER.

Sat Jul 20 10:50:24 2019 - [info] Slave started.

Sat Jul 20 10:50:24 2019 - [info] All new slave servers switched successfully.

Sat Jul 20 10:50:24 2019 - [info]

Sat Jul 20 10:50:24 2019 - [info] * Phase 5: New master cleanup phase..

Sat Jul 20 10:50:24 2019 - [info]

Sat Jul 20 10:50:24 2019 - [info] 192.168.65.2: Resetting slave info succeeded.

Sat Jul 20 10:50:24 2019 - [info] Switching master to 192.168.65.2(192.168.65.2:3306) completed successfully.

 

这里只是部分输入,期间会输入 2 个 yes,意思分别是:

1.这里询问你是否需要在切换之前在当前主库上执行FLUSH NO_WRITE_TO_BINLOG TABLES语句;

2.确认是否开始执行切换;

 

可以看到 master 又切回 192.68.65.2 上了,

[root@test2 mha]# ifconfig

eth0 Link encap:Ethernet HWaddr 00:0C:29:DB:FC:D1

inet addr:192.168.65.2 Bcast:192.168.65.255 Mask:255.255.255.0

inet6 addr: fe80::20c:29ff:fedb:fcd1/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:1883723 errors:0 dropped:0 overruns:0 frame:0

TX packets:1870450 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:320768727 (305.9 MiB) TX bytes:320892475 (306.0 MiB)

 

eth0:1 Link encap:Ethernet HWaddr 00:0C:29:DB:FC:D1

inet addr:192.168.65.21 Bcast:192.168.65.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:293046 errors:0 dropped:0 overruns:0 frame:0

TX packets:293046 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:64612191 (61.6 MiB) TX bytes:64612191 (61.6 MiB)

 

VIP 也回到了 192.168.65.2 上

 

masterha_master_switch 常用参数解释:

--orig_master_is_new_slave:切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不会作为从库

加入到新主库上,要使用这个参数,配置文件中必须配置repl_password参数,因为新主库并不知道旧主库的

复制帐号密码是什么 --running_updates_limit=100:故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内

都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定、 --master_state=alive : 告诉mha,当前主库是活者的,对应的还有一个值是dead,即告诉mha当前主库死了 --new_master_host : 指定新的主库是谁 --new_master_port : 指定新主库的数据库端口

 

4.主库宕机在线切换

masterha_master_switch --master_state=dead --conf=/etc/masterha/app1/app1.conf --dead_master_host=192.168.65.2 --

dead_master_port=3306 --new_master_host=192.168.65.3 --new_master_port=3306 --ignore_last_failover --

remove_dead_master_conf

 

# 如果你是跟着文档流程做到这里,在切换过程中会有两个提示,输入yes回车即可

* Master 10.10.30.169(10.10.30.169:3306) is dead. Proceed? (yes/NO): yes #让你确认下主库是否真的死了

* Starting master switch from 10.10.30.169(10.10.30.169:3306) to 10.10.30.155(10.10.30.155:3306)? (yes/NO): yes #让你确认

是否开始

执行切换

 

# 切换成功之后,应该看到类似如下提示

Master failover to 10.10.30.155(10.10.30.155:3306) completed successfully.

 

# 常用参数解释

--remove_dead_master_conf :该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。默认情况下,主库切换之后并不会

修改配置文件

--manger_log :日志存放位置

--ignore_last_failover : 在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以

这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会

在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存

在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--

ignore_last_failover,另外,新版本0.56可以不使用这个参数,也不需要专门去删除app1.failover.complete文件

--dead_master_host 指定死掉的主库的IP

--dead_master_port 指定死掉的主库的数据库端口

 

# 查看app1.conf文件,会发现死掉的主库配置被删掉了

grep '10.10.30.169' /etc/masterha/app1/app1.conf

 

# 在线切换需要满足如下条件

* 所有从库的IO和SQL线程必须为运行状态

* 所有从库的Seconds_Behind_Master 的值必须小于--running_updates_limit 参数指定的数值

* 在主库上,show processlist出来的更新(个人觉得这里应该是指的对数据有变更操作的任何请求,而不仅仅是update)请求不能有超过-

-running_updates_limit 参数指定的数值的时间

 

其他参数:

--interactive=(0|1) 如果你想非交互式的执行切换,那么设置这个参数为0,默认是1,要使用该选项,你需要确定主库的状态是dead还是

online,或者其他程序调用masterha_master_switch切换命令时使用

 

--skip_change_master 0.56版本新增功能, 默认情况下,主库切换时,其他从库会change到新主库上,这个参数可以在从库数据恢复

之后,跳过change 到新主库以及跳过start slave,这样你就可以手动到从库上去再次确认数据有没有恢复成功

 

--skip_disable_read_only 0.56版本新增功能,默认情况下,主库切换到新主库时,新主库会执行set global read_only=0; 打开新主

库可读写,加上这个参数之后,跳过设置read_only=0语句

 

--wait_until_gtid_in_sync(0|1) 0.56版本新增的功能,默认为1,表示发生故障转移时,MHA会等待其他从库追赶上新主库的GTID位

置,如果设置为0,则不会等待

 

--ignore_binlog_server_error 0.56版本新增功能,表示在故障转移时,跳过任何binlog server的错误

 

--skip_lock_all_tables 默认情况下,在执行主库切换的时候,旧主库会执行flush table with read lock语句来确保主库停止写,但是执

行flush table with read lock是一个昂贵的操作,如果你能够人工判断并确认主库没有写操作,那么可以使用这个选项来跳过执行这个语

句。master_ip_online_change_script 这个参数并杀掉所有的客户端连接

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Ty_FFTQ

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值