Centos7下部署MySql5.7.31高可用之MHA
一、环境描述
系统:Centos7.8
Manager : https://github.com/yoshinorim/mha4mysql-manager
版本:0.58
Node : https://github.com/yoshinorim/mha4mysql-node
Mysql版本:5.7.31
Mysql安装及Mysql主从配置参考:mysql主从之基于gtid的主从复制
MHA wiki:https://github.com/yoshinorim/mha4mysql-manager/wiki
二、部署服务器
备注:mysql采用二进制安装,并且已配置基于Gtid主从复制。
①、服务器规划
②、配置服务器hosts
[root@localhost ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.6.101 server1
192.168.6.102 server2
192.168.6.103 server3
③、配置SSH免密登录
注:集群中的所有主机都需要保证互相可以免密登录。
[root@server1 src]# ssh-keygen -t rsa
[root@server1 src]# for i in server1 server2 server3;do ssh-copy-id $i;done
验证:
[root@server1 src]# ssh server3
Last login: Mon Jan 11 14:12:28 2021 from 192.168.7.119
[root@server3 ~]#
④、配置epel并安装MHA所需依赖
所有节点都需要执行:
[root@server1 src]# wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
[root@server1 src]# yum makecache
[root@server1 src]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-ParallelForkManager perl-Config-IniFiles ncftp perl-Params-Validate perl-CPAN perl-TestMock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
三、基于半同步进行主从复制
注:接下来所有操作,如果没有特别说明,是需要在所有数据库服务器上进行执行的。
①、查找半同步所需插件
mysql> show variables like '%plugin_dir%'; #查找插件所在目录(每台DB服务器可能不一样)
+---------------+------------------------------+
| Variable_name | Value |
+---------------+------------------------------+
| plugin_dir | /usr/local/mysql/lib/plugin/ |
+---------------+------------------------------+
1 row in set (0.00 sec)
mysql> show variables like '%have_dynamic%'; #检查是否支持动态检测
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| have_dynamic_loading | YES |
+----------------------+-------+
1 row in set (0.00 sec)
mysql> exit
Bye
[root@server2 ~]# ls /usr/local/mysql/lib/plugin/ | egrep 'master|slave'
#确定插件的目录下有以下两个文件(默认安装数据库后就有这些的)
semisync_master.so
semisync_slave.so
②、安装插件
# semisync_master.so和semisync_slave.so为上面查询到的插件名字
mysql> install plugin rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.02 sec)
mysql> install plugin rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.00 sec)
③、检查半同步插件是否正确安装
mysql> show plugins;
# 或者
mysql> select * from information_schema.plugins;
show plugins执行后返回如下结果,表示没有问题:
④、查看半同步先关信息
mysql> show variables like '%rpl_semi_sync%';
+-------------------------------------------+------------+
| Variable_name | Value |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled | OFF |
| rpl_semi_sync_master_timeout | 10000 |
| rpl_semi_sync_master_trace_level | 32 |
| rpl_semi_sync_master_wait_for_slave_count | 1 |
| rpl_semi_sync_master_wait_no_slave | ON |
| rpl_semi_sync_master_wait_point | AFTER_SYNC |
| rpl_semi_sync_slave_enabled | OFF |
| rpl_semi_sync_slave_trace_level | 32 |
+-------------------------------------------+------------+
8 rows in set (0.00 sec)
可以看到半同步插件还是未启用的off状态,所以需要修改my.cnf配置文件,完整配置如下:
[mysqld]
server_id=22
max_connections=5000
port=3301
#此配置修改了mysql默认3306端口
user=mysql
character_set_server=utf8
innodb_file_per_table=1
expire_logs_days=15
explicit_defaults_for_timestamp=true
gtid_mode=on
enforce_gtid_consistency=on
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=1000
rpl_semi_sync_slave_enabled=1
log_error=/data/log/mysql3301_log.err
innodb_buffer_pool_size=6000MB
innodb_flush_log_at_trx_commit=1
log_bin=/data/log/binlog/binlog
sync_binlog=1
binlog_format=row
[mysqld_safe]
basedir=/usr/local/mysql
datadir=/data/mysql
pid-file=/data/log/mysql3301.pid
注:
rpl_semi_sync_master_enabled=1:1表是启用,0表示关闭
rpl_semi_sync_master_timeout=1000:毫秒单位 ,该参数主服务器等待确认消息10秒后,不再等待,变为异步方式。
修改完配置后重启mysql服务使配置生效!
查看半同步状态是否开启
mysql> show variables like '%rpl_semi_sync%';
+-------------------------------------------+------------+
| Variable_name | Value |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled | ON |
| rpl_semi_sync_master_timeout | 1000 |
| rpl_semi_sync_master_trace_level | 32 |
| rpl_semi_sync_master_wait_for_slave_count | 1 |
| rpl_semi_sync_master_wait_no_slave | ON |
| rpl_semi_sync_master_wait_point | AFTER_SYNC |
| rpl_semi_sync_slave_enabled | ON |
| rpl_semi_sync_slave_trace_level | 32 |
+-------------------------------------------+------------+
mysql> show status like '%rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients | 1 |
| Rpl_semi_sync_master_net_avg_wait_time | 0 |
| Rpl_semi_sync_master_net_wait_time | 0 |
| Rpl_semi_sync_master_net_waits | 0 |
| Rpl_semi_sync_master_no_times | 0 |
| Rpl_semi_sync_master_no_tx | 0 |
| Rpl_semi_sync_master_status | ON |
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 0 |
| Rpl_semi_sync_master_tx_wait_time | 0 |
| Rpl_semi_sync_master_tx_waits | 0 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 0 |
| Rpl_semi_sync_slave_status | OFF |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)
rpl_semi_sync_master_status :显示主服务是异步复制模式还是半同步复制模式,ON为半同步;
rpl_semi_sync_master_clients :显示有多少个从服务器配置为半同步复制模式;
rpl_semi_sync_master_yes_tx :显示从服务器确认成功提交的数量
rpl_semi_sync_master_no_tx :显示从服务器确认不成功提交的数量
rpl_semi_sync_master_tx_avg_wait_time :事务因开启 semi_sync ,平均需要额外等待的时间
rpl_semi_sync_master_net_avg_wait_time :事务进入等待队列后,到网络平均等待时间
四、创建相关用户
注:server1为主server2为从(备主),所以server1和server2都需要创建可远程复制的权限。
①、Server1主库
mysql> GRANT REPLICATION SLAVE ON *.* TO 'testrepl'@'192.168.6.%' IDENTIFIED BY '123456';
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql>
mysql> grant all on *.* to manager@'192.168.6.%' identified by '123456';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show master status\G
*************************** 1. row ***************************
File: binlog.000004
Position: 783
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 267b27d0-48b5-11eb-92ec-000c290f457f:1-13
1 row in set (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
②、server2从库(备主)
mysql> GRANT REPLICATION SLAVE ON *.* TO 'testrepl'@'192.168.6.%' IDENTIFIED BY '123456';
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql>
mysql> grant all on *.* to manager@'192.168.6.%' identified by '123456';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
③、server3 从库(只读)
mysql>
mysql> grant all on *.* to manager@'192.168.6.%' identified by '123456';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
④、配置主从
server2和server3
mysql> change master to master_host='192.168.6.101',master_port=3301,master_user='testrepl',master_password='123456',master_auto_position = 1;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
mysql> start slave;
⑤、查看slave状态
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.6.101
Master_User: testrepl
Master_Port: 3301
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 935
Relay_Log_File: server2-relay-bin.000005
Relay_Log_Pos: 1142
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
⑥、在master主机上查看半同步相关信息,会发现同步的client已经变成了2(如不是2请检查从库是否正常复制)
mysql> show status like ‘%rpl_semi_sync%’;
±-------------------------------------------±------+
| Variable_name | Value |
±-------------------------------------------±------+
| Rpl_semi_sync_master_clients | 2 |
| Rpl_semi_sync_master_net_avg_wait_time | 0 |
| Rpl_semi_sync_master_net_wait_time | 0 |
| Rpl_semi_sync_master_net_waits | 5 |
| Rpl_semi_sync_master_no_times | 0 |
| Rpl_semi_sync_master_no_tx | 0 |
| Rpl_semi_sync_master_status | ON |
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 924 |
| Rpl_semi_sync_master_tx_wait_time | 4623 |
| Rpl_semi_sync_master_tx_waits | 5 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 5 |
| Rpl_semi_sync_slave_status | OFF |
±-------------------------------------------±------+
15 rows in set (0.00 sec)
五、安装MHA-node
需要MHA-node需要在所有节点安装(包括manager主机节点)
#下载包
[root@server1 src]# wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
#安装
[root@server1 src]# tar zxf mha4mysql-node-0.58.tar.gz
[root@server1 src]# cd mha4mysql-node-0.58/
[root@server1 mha4mysql-node-0.58]# perl Makefile.PL
[root@server1 mha4mysql-node-0.58]# make && make install
六、安装MHA-manager
①、server3(192.168.6.103) manager节点安装
#下载包
[root@server3 src]# wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
#安装
[root@server3 src]# tar zxf mha4mysql-manager-0.58.tar.gz
[root@server3 src]# cd mha4mysql-manager-0.58/
[root@server3 mha4mysql-manager-0.58]# perl Makefile.PL
[root@server3 mha4mysql-manager-0.58]# make && make install
安装安装MHA-manager 需要花费点时间,可能十来分钟,如下即安装完成。
Installing /usr/local/share/perl5/MHA/ManagerUtil.pm
Installing /usr/local/share/perl5/MHA/Config.pm
Installing /usr/local/share/perl5/MHA/HealthCheck.pm
Installing /usr/local/share/perl5/MHA/ServerManager.pm
Installing /usr/local/share/perl5/MHA/ManagerConst.pm
Installing /usr/local/share/perl5/MHA/FileStatus.pm
Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm
Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm
Installing /usr/local/share/perl5/MHA/MasterFailover.pm
Installing /usr/local/share/perl5/MHA/MasterRotate.pm
Installing /usr/local/share/perl5/MHA/MasterMonitor.pm
Installing /usr/local/share/perl5/MHA/Server.pm
Installing /usr/local/share/perl5/MHA/SSHCheck.pm
Installing /usr/local/share/perl5/MHA/DBHelper.pm
Installing /usr/local/share/man/man1/masterha_stop.1
Installing /usr/local/share/man/man1/masterha_conf_host.1
Installing /usr/local/share/man/man1/masterha_check_repl.1
Installing /usr/local/share/man/man1/masterha_check_status.1
Installing /usr/local/share/man/man1/masterha_master_monitor.1
Installing /usr/local/share/man/man1/masterha_check_ssh.1
Installing /usr/local/share/man/man1/masterha_master_switch.1
Installing /usr/local/share/man/man1/masterha_secondary_check.1
Installing /usr/local/share/man/man1/masterha_manager.1
Installing /usr/local/bin/masterha_stop
Installing /usr/local/bin/masterha_conf_host
Installing /usr/local/bin/masterha_check_repl
Installing /usr/local/bin/masterha_check_status
Installing /usr/local/bin/masterha_master_monitor
Installing /usr/local/bin/masterha_check_ssh
Installing /usr/local/bin/masterha_master_switch
Installing /usr/local/bin/masterha_secondary_check
Installing /usr/local/bin/masterha_manager
Appending installation info to /usr/lib64/perl5/perllocal.pod
②、创建相应目录及复制所需文件
[root@server3 mha4mysql-manager-0.58]# mkdir /etc/masterha
[root@server3 mha4mysql-manager-0.58]# mkdir -p /masterha/app1
[root@server3 mha4mysql-manager-0.58]# mkdir /scripts
[root@server3 mha4mysql-manager-0.58]# pwd
/usr/local/src/mha4mysql-manager-0.58 #确定当前所在目录
[root@server3 mha4mysql-manager-0.58]# cp samples/conf/* /etc/masterha/
[root@server3 mha4mysql-manager-0.58]# cp samples/scripts/* /scripts/
③、修改mha-manager配置文件
manager共有两个主要的配置文件,一个是通用默认的,一个是单独的。需要将默认通用的配置文件的内容清空,如下:
#清空默认的配置文件
[root@server3 masterha]# > /etc/masterha/masterha_default.cnf
然后修改单独的配置文件:
[root@server3 mha4mysql-manager-0.58]# cat /etc/masterha/app1.cnf
[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
user=manager
password=123456
ssh_user=rootr
repl_user=testrepl
repl_password=123456
ping_interval=1
[server1]
hostname=192.168.6.101
port=3301
master_binlog_dir=/data/log/binlog/
candidate_master=1
[server1]
hostname=192.168.6.102
port=3301
master_binlog_dir=/data/log/binlog/
candidate_master=1
[server1]
hostname=192.168.6.103
port=3301
master_binlog_dir=/data/log/binlog/
no_master=1
配置解释:
[root@server3 mha4mysql-manager-0.58]# cat /etc/masterha/app1.cnf
[server default]
manager_workdir=/masterha/app1 #指定工作目录
manager_log=/masterha/app1/manager.log #指定日志文件
user=manager #指定manager管理数据库节点所使用的用户名
password=123456 #对应的是上面用户的密码
ssh_user=root #指定配置了ssh免密登录的系统用户
repl_user=testrepl #指定用于同步数据的用户名
repl_password=123456 #对应的是上面同步用户的 密码
ping_interval=1 #设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应时自动进行切换
[server1]
hostname=192.168.6.101
port=3301
master_binlog_dir=/data/log/binlog/ #指定master保存二进制日志的路径,以便MHA可以找到master的日志
candidate_master=1 #设置为候选master,设置该参数后,发生主从切换以后将会将此库提升为主库
[server1]
hostname=192.168.6.102
port=3301
master_binlog_dir=/data/log/binlog/
candidate_master=1 #设置为候选master
[server1]
hostname=192.168.6.103
port=3301
master_binlog_dir=/data/log/binlog/
no_master=1 #设置的不为备选主库
七、启动前测试
①、测试ssh免密登录
[root@server3 mha4mysql-manager-0.58]# masterha_check_ssh --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
报错一:
[root@server3 mha4mysql-manager-0.58]# masterha_check_ssh --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
Can't locate Log/Dispatch.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_ssh line 25.
出现这种问题是由于缺乏perl-Mail-Sender和 perl-Log-Dispatch这两个语言环境
[root@server3 ~]# yum install perl-Mail-Sender
[root@server3 ~]# yum install perl-Mail-Sender
[root@server3 ~]# yum install perl-Log-Dispatch
安装报错二:
[root@server3 mha4mysql-manager-0.58]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.ustc.edu.cn
* extras: mirrors.163.com
* updates: mirrors.163.com
Package perl-DBD-MySQL-4.023-6.el7.x86_64 already installed and latest version
Package perl-Config-Tiny-2.14-7.el7.noarch already installed and latest version
No package perl-Log-Dispatch available.
No package perl-Parallel-ForkManager available.
Nothing to do
解决办法:
去下载这些依赖一个个装,即可!
比如去:
http://www.rpmfind.net/
http://rpm.pbone.net/
都能下载rpm包
下了一份存在百度网盘!
链接:https://pan.baidu.com/s/1g7n1PrFJpd2avmrauKkuaA
提取码:hjde
②、验证集群复制的有效性(MySQL必须都启动),如下
[root@server3 src]# masterha_check_repl --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
报错:
[root@server3 src]# masterha_check_repl --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
Mon Jan 11 17:31:02 2021 - [info] Reading default configuration from /etc/masterha/masterha_default.cnf..
Mon Jan 11 17:31:02 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 11 17:31:02 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 11 17:31:02 2021 - [info] MHA::MasterMonitor version 0.58.
Mon Jan 11 17:31:03 2021 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln781] Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below:
Master 192.168.6.101(192.168.6.101:3301), replicating from 192.168.6.102(192.168.6.102:3301)
Master 192.168.6.102(192.168.6.102:3301), replicating from 192.168.6.101(192.168.6.101:3301)
Mon Jan 11 17:31:03 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/share/perl5/MHA/MasterMonitor.pm line 329.
Mon Jan 11 17:31:03 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Mon Jan 11 17:31:03 2021 - [info] Got exit code 1 (Not master dead).
解决方法:
从服务器set global read_only=1;
mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)
再次检测
[root@server3 src]# masterha_check_repl --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
Mon Jan 11 17:34:00 2021 - [info] Reading default configuration from /etc/masterha/masterha_default.cnf..
Mon Jan 11 17:34:00 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 11 17:34:00 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 11 17:34:00 2021 - [info] MHA::MasterMonitor version 0.58.
Mon Jan 11 17:34:01 2021 - [info] Multi-master configuration is detected. Current primary(writable) master is 192.168.6.101(192.168.6.101:3301)
Mon Jan 11 17:34:01 2021 - [info] Master configurations are as below:
Master 192.168.6.101(192.168.6.101:3301), replicating from 192.168.6.102(192.168.6.102:3301)
Master 192.168.6.102(192.168.6.102:3301), replicating from 192.168.6.101(192.168.6.101:3301), read-only
Mon Jan 11 17:34:01 2021 - [info] GTID failover mode = 1
Mon Jan 11 17:34:01 2021 - [info] Dead Servers:
Mon Jan 11 17:34:01 2021 - [info] Alive Servers:
Mon Jan 11 17:34:01 2021 - [info] 192.168.6.101(192.168.6.101:3301)
Mon Jan 11 17:34:01 2021 - [info] 192.168.6.102(192.168.6.102:3301)
Mon Jan 11 17:34:01 2021 - [info] 192.168.6.103(192.168.6.103:3301)
Mon Jan 11 17:34:01 2021 - [info] Alive Slaves:
Mon Jan 11 17:34:01 2021 - [info] 192.168.6.102(192.168.6.102:3301) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Mon Jan 11 17:34:01 2021 - [info] GTID ON
Mon Jan 11 17:34:01 2021 - [info] Replicating from 192.168.6.101(192.168.6.101:3301)
Mon Jan 11 17:34:01 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 11 17:34:01 2021 - [info] 192.168.6.103(192.168.6.103:3301) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Mon Jan 11 17:34:01 2021 - [info] GTID ON
Mon Jan 11 17:34:01 2021 - [info] Replicating from 192.168.6.101(192.168.6.101:3301)
Mon Jan 11 17:34:01 2021 - [info] Not candidate for the new Master (no_master is set)
Mon Jan 11 17:34:01 2021 - [info] Current Alive Master: 192.168.6.101(192.168.6.101:3301)
Mon Jan 11 17:34:01 2021 - [info] Checking slave configurations..
Mon Jan 11 17:34:01 2021 - [info] Checking replication filtering settings..
Mon Jan 11 17:34:01 2021 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jan 11 17:34:01 2021 - [info] Replication filtering check ok.
Mon Jan 11 17:34:01 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jan 11 17:34:01 2021 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jan 11 17:34:01 2021 - [info] HealthCheck: SSH to 192.168.6.101 is reachable.
Mon Jan 11 17:34:01 2021 - [info]
192.168.6.101(192.168.6.101:3301) (current master)
+--192.168.6.102(192.168.6.102:3301)
+--192.168.6.103(192.168.6.103:3301)
Mon Jan 11 17:34:01 2021 - [info] Checking replication health on 192.168.6.102..
Mon Jan 11 17:34:01 2021 - [info] ok.
Mon Jan 11 17:34:01 2021 - [info] Checking replication health on 192.168.6.103..
Mon Jan 11 17:34:01 2021 - [info] ok.
Mon Jan 11 17:34:01 2021 - [warning] master_ip_failover_script is not defined.
Mon Jan 11 17:34:01 2021 - [warning] shutdown_script is not defined.
Mon Jan 11 17:34:01 2021 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK
③、启动MHA-manager
[root@server3 src]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &> /var/log/mha_manager.log &
[2] 12413
注:可以多次执行上述命令,指定不同的app.cnf配置文件,以便监控多个MySQL集群,我这里就一个MySQL集群,所以只需要执行上述一条命令即可。
④、验证故障转移
可以自行查看当前MySQL集群中目前的主是哪台服务器,我这里MySQL集群中的主是master主机,如下:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.6.101
Master_User: testrepl
Master_Port: 3301
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 2213
Relay_Log_File: server3-relay-bin.000003
Relay_Log_Pos: 2380
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
现在模拟master主机故障,然后再查看集群中的master是哪台,如下:
server1 停止mysql服务
[root@server1 mha4mysql-node-0.58]# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 999/haproxy
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 950/sshd
tcp 0 0 0.0.0.0:35672 0.0.0.0:* LISTEN 999/haproxy
tcp 0 0 0.0.0.0:55672 0.0.0.0:* LISTEN 999/haproxy
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1405/master
tcp 0 0 0.0.0.0:514 0.0.0.0:* LISTEN 953/rsyslogd
tcp6 0 0 :::22 :::* LISTEN 950/sshd
tcp6 0 0 ::1:25 :::* LISTEN 1405/master
tcp6 0 0 :::514 :::* LISTEN 953/rsyslogd
tcp6 0 0 :::3301 :::* LISTEN 5192/mysqld
[root@server1 mha4mysql-node-0.58]# kill 5192
[root@server1 mha4mysql-node-0.58]# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 999/haproxy
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 950/sshd
tcp 0 0 0.0.0.0:35672 0.0.0.0:* LISTEN 999/haproxy
tcp 0 0 0.0.0.0:55672 0.0.0.0:* LISTEN 999/haproxy
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1405/master
tcp 0 0 0.0.0.0:514 0.0.0.0:* LISTEN 953/rsyslogd
tcp6 0 0 :::22 :::* LISTEN 950/sshd
tcp6 0 0 ::1:25 :::* LISTEN 1405/master
tcp6 0 0 :::514 :::* LISTEN 953/rsyslogd
server3 查看slave状态
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.6.102
Master_User: testrepl
Master_Port: 3301
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 1340
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 1547
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
server2(备主)
复制信息被清空了!
mysql> show slave status\G
Empty set (0.00 sec)
mysql>
具体查看MHA Manager日志(只截取了部分日志)
[root@server3 app1]# cat /masterha/app1/manager.log
Started automated(non-interactive) failover.
Selected 192.168.6.102(192.168.6.102:3301) as a new master.
192.168.6.102(192.168.6.102:3301): OK: Applying all logs succeeded.
192.168.6.103(192.168.6.103:3301): OK: Slave started, replicating from 192.168.6.102(192.168.6.102:3301)
192.168.6.102(192.168.6.102:3301): Resetting slave info succeeded.
Master failover to 192.168.6.102(192.168.6.102:3301) completed successfully.
从日志可知:
本次server1发生了故障主库切换至server2,server3的复制同时也指向了server2,MHA Manager工作目录生成了app1.failover.complete 文件,代表故障转移完成。
八、MHA Manager端日常维护
①、检查是否有以下文件
检查是否有以下文件,有则删除 MHA Manager服务发生主从切换后,MHA manager服务会自动停止,且在manager_workdir(/masterha/app1/)目录下面生成app1.failover.complete文件,若要再次启动MHA,必须先确保没有这个文件,如果有app1.failover.complete或者app1.failover.error这个文件,则会报错如下:
[root@server3 app1]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &> /var/log/mha_manager.log &
[2] 12616
[root@server3 app1]#
[2]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf &>/var/log/mha_manager.log
日志记录如下:
Mon Jan 11 17:51:20 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln364] None of slaves can be master. Check failover configuration file or log-bin settings in my.cnf
Mon Jan 11 17:51:20 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/bin/masterha_manager line 50.
Mon Jan 11 17:51:20 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Mon Jan 11 17:51:20 2021 - [info] Got exit code 1 (Not master dead).
重新启动server1并将其设置为从库
mysql> change master to master_host='192.168.6.102',master_port=3301,master_user='testrepl',master_password='123456',master_auto_position = 1;
Query OK, 0 rows affected, 2 warnings (0.80 sec)
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.6.102
Master_User: testrepl
Master_Port: 3301
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 1619
Relay_Log_File: server1-relay-bin.000002
Relay_Log_Pos: 1826
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Server3再次启动MHA Manager
[root@server3 app1]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &> /var/log/mha_manager.log &
[2] 12631
[root@server3 app1]# jobs -l
[1]- 5576 Running /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/mysql.cnf & (wd: ~)
[2]+ 12631 Running nohup masterha_manager --conf=/etc/masterha/app1.cnf &>/var/log/mha_manager.log &
②、检查MHA的复制
[root@server3 app1]# masterha_check_repl --conf=/etc/masterha/app1.cnf
③、停止MHA
[root@server3 app1]# masterha_stop --conf=/etc/masterha/app1.cnf
④、启动MHA
[root@server3 app1]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &>/tmp/mha_manager.log &
当有slave节点宕机时,manager服务是无法启动的,建议在配置文件中暂时注释掉宕机节点的信息,待修复后再取消注释。
[root@server3 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf #manager没有启动时的状态如下
app1 is stopped(2:NOT_RUNNING).
#启动manager服务再进行检查如下
[root@server3 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &>/tmp/mha_manager.log &
[1] 12631
[root@server3 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:12631) is running(0:PING_OK), master:192.168.6.102
转载自:https://blog.51cto.com/14154700/2472806#h0