1.环境架构介绍
机器名称 | IP | 角色 | 权限 |
---|---|---|---|
Mysql_Master | 10.0.0.111 | 数据库Master数据库Master | 可读写、主库 |
Mysql_Slave1 | 10.0.0.112 | 数据库Master | 只读、从库 |
Mysql_Slave2 | 10.0.0.113 | 数据库Master | 只读、从库 |
Mysql_MHA | 10.0.0.114 | 数据库Master | 高可用监控 |
2.主从搭建
2.1 MySQL安装
2.1.1 下载
wget https://cdn.mysql.com/archives/mysql-5.7/mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar
解压
tar xvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar
2.1.2 安装
移除mariadb-lib依赖
[root@localhost ~]# rpm -qa|grep mariadb
mariadb-libs-5.5.60-1.el7_5.x86_64
[root@localhost ~]# rpm -e mariadb-libs-5.5.60-1.el7_5.x86_64 --nodeps
安装依赖 libaio 为mysql-community-server-5.7.28-1.el7.x86_64.rpm需要的依赖
[root@localhost ~]# yum install -y libaio
缺失情况会报以下错误
[root@localhost ~]# rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm
warning: mysql-community-server-5.7.28-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
error: Failed dependencies:
/usr/bin/perl is needed by mysql-community-server-5.7.28-1.el7.x86_64
libaio.so.1()(64bit) is needed by mysql-community-server-5.7.28-1.el7.x86_64
libaio.so.1(LIBAIO_0.1)(64bit) is needed by mysql-community-server-5.7.28-1.el7.x86_64
libaio.so.1(LIBAIO_0.4)(64bit) is needed by mysql-community-server-5.7.28-1.el7.x86_64
net-tools is needed by mysql-community-server-5.7.28-1.el7.x86_64
perl(Getopt::Long) is needed by mysql-community-server-5.7.28-1.el7.x86_64
perl(strict) is needed by mysql-community-server-5.7.28-1.el7.x86_64
安装mysql
rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-5.7.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-devel-5.7.28-1.el7.x86_64.rpm
2.1.3 启动MySQL
初始化用户
mysqld --initialize --user=mysql
查看初始化密码
cat /var/log/mysqld.log | grep password
启动MySQL服务
systemctl start mysqld
配置为开机启动
systemctl enable mysqld
修改默认密码
mysql -uroot -p
xxxxxx 输入默认密码
SET PASSWORD = PASSWORD('123456');
2.2 设置防火墙
2.2.1 配置/etc/sysconfig/iptables增加防火墙规则(生产推荐)
2.2.2.直接关闭防火墙(测试环境)
记得关,血的教训。
systemctl stop firewalld
2.3 Mysql主从配置
2.3.1 Master节点配置
修改Master的my.cnf配置文件
vi /etc/my.cnf
#bin_log配置
log_bin=mysql-bin
server-id=1
sync-binlog=1
binlog-ignore-db=information_schema
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
#relay_log配置
relay_log=mysql-relay-bin
log_slave_updates=1
relay_log_purge=0
重启服务
systemctl restart mysqld
主库给从库授权
登录主库后执行命令:
mysql -uroot -p123456
mysql> grant replication slave on *.* to root@'%' identified by '123456';
mysql> grant all privileges on *.* to root@'%' identified by '123456';
mysql> flush privileges;
//查看主库状态信息,例如master_log_file='mysql-bin.000001',master_log_pos=869
mysql> show master status;
+------------------+----------+--------------+-------------------------------------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+-------------------------------------------------+-------------------+
| mysql-bin.000001 | 869 | | information_schema,mysql,performance_schema,sys | |
+------------------+----------+--------------+-------------------------------------------------+-------------------+
1 row in set (0.00 sec)
注意:这个File:mysql-bin.000001 和 Position:869 是下面用于从节点同步填入的参数信息。
change master to master_host='10.0.0.111',master_port=3306,master_user='root',master_password ='123456',master_log_file='mysql-bin.000001',master_log_pos=869;
2.3.2 Slave节点配置
修改Slave的my.cnf配置文件
vi /etc/my.cnf
10.0.0.112配置
#bin_log配置
log_bin=mysql-bin
server-id=2
sync-binlog=1
binlog-ignore-db=information_schema
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
#relay_log配置
relay_log=mysql-relay-bin
log_slave_updates=1
relay_log_purge=0
read_only=1
10.0.0.113配置
#bin_log配置
log_bin=mysql-bin
server-id=3
sync-binlog=1
binlog-ignore-db=information_schema
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
#relay_log配置
relay_log=mysql-relay-bin
log_slave_updates=1
relay_log_purge=0
read_only=1
重启服务
systemctl restart mysqld
开启同步
登录Slave节点MySQL,执行同步操作命令,该命令内部参数为上面 show master status操作显示的参数
change master to master_host='10.0.0.111',master_port=3306,master_user='root',master_password ='123456',master_log_file='mysql-bin.000001',master_log_pos=869;
start slave; // 开启同步
2.4 配置半同步复制
2.4.1 Master节点
登录主节点MySQL,安装插件
mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so';
Query OK, 0 rows affected (0.03 sec)
mysql> show variables like '%semi%';
+-------------------------------------------+------------+
| Variable_name | Value |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled | OFF |
| rpl_semi_sync_master_timeout | 10000 |
| rpl_semi_sync_master_trace_level | 32 |
| rpl_semi_sync_master_wait_for_slave_count | 1 |
| rpl_semi_sync_master_wait_no_slave | ON |
| rpl_semi_sync_master_wait_point | AFTER_SYNC |
+-------------------------------------------+------------+
6 rows in set (0.00 sec)
mysql>
使用vi /etc/my.cnf,修改MySQL配置文件
# 自动开启半同步复制
rpl_semi_sync_master_enabled=ON
rpl_semi_sync_master_timeout=1000
重启MySQL服务
systemctl restart mysqld
2.4.2 Slave节点
登录Slave命令行执行命令安装插件
mysql> install plugin rpl_semi_sync_slave soname 'semisync_slave.so';
Query OK, 0 rows affected (0.02 sec)
mysql> show variables like '%semi%';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| rpl_semi_sync_slave_enabled | OFF |
| rpl_semi_sync_slave_trace_level | 32 |
+---------------------------------+-------+
2 rows in set (0.00 sec)
mysql>
使用vi /etc/my.cnf,修改MySQL配置文件
# 自动开启半同步复制
rpl_semi_sync_slave_enabled=ON
重启MySQL服务
systemctl restart mysqld
测试半同步状态
通过MySQL命令检查参数是否开启半同步
show variables like '%semi%';
主节点信息:
mysql> show variables like '%semi%';
+-------------------------------------------+------------+
| Variable_name | Value |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled | ON |
| rpl_semi_sync_master_timeout | 1000 |
| rpl_semi_sync_master_trace_level | 32 |
| rpl_semi_sync_master_wait_for_slave_count | 1 |
| rpl_semi_sync_master_wait_no_slave | ON |
| rpl_semi_sync_master_wait_point | AFTER_SYNC |
+-------------------------------------------+------------+
6 rows in set (0.00 sec)
mysql>
从节点信息:
mysql> show variables like '%semi%';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| rpl_semi_sync_slave_enabled | ON |
| rpl_semi_sync_slave_trace_level | 32 |
+---------------------------------+-------+
2 rows in set (0.00 sec)
mysql>
查看主节点日志
cat -f /var/log/mysqld.log
如果正确启动会出现如下日志信息
Start binlog_dump to master_thread_id(5) slave_server(2), pos(mysql-bin.000002, 154)
[Note] Start semi-sync binlog_dump to slave (server_id: 2), pos(mysql-bin.000002, 154)
3.MHA高可用搭建
3.1 配置四台机器ssh互通
全部服务器执行如下命令,生成公钥和私钥
ssh-keygen -t rsa
在Master、Slave1、Slave2分别执行下面命令,将公钥拷贝到MHA服务器
ssh-copy-id 10.0.0.114
MHA服务器检查.ssh/authorized_key文件是否包含3个公钥
cat /root/.ssh/authorized_keys
将MHA的公钥追加到authorized_keys文件中(此时共计4个公钥)
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
从MHA服务器执行命令,向Master、Slave1、Slave2同步公钥信息。
scp /root/.ssh/authorized_keys root@10.0.0.111:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys root@10.0.0.112:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys root@10.0.0.113:/root/.ssh/authorized_keys
在MHA服务器通过SSH命令分别测试连接Master、Slave1、Slave2服务器。
ssh 10.0.0.111
exit
ssh 10.0.0.112
exit
ssh 10.0.0.113
exit
3.2 MHA下载安装
MHA下载
MySQL5.7对应的MHA版本是0.5.8
采用wget命令分别下载manager和node包
所有服务器都安装Node
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
MHA Manager服务器需要安装manager
wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
MHA Node 安装
所有服务器上安装mha4mysql-node(反复检查确认全部安装成功,血的教训)
MHA的Node依赖per-DBD-MySQL,需要先安装per-DBD-MySQL。
yum install perl-DBD-MySQL -y
安装mha4mysql-node
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
MHA Manager安装
MHA Manager依赖 per-Config-Tiny、per-Log-Dispatch、per-Parallel-ForkManager。
安装EPEL源,不安装的话 yum install 无法找到perl-Log-Dispatch、perl-Parallel-ForkManager的包。
wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
更新后执行命令:
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager -y
安装manager
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
MHA 配置文件
MHA服务器需要为每个监控的Master/Slave集群提供一个专用的配置文件,所有的Master/Slave集群共享配置文件。
初始化配置目录
#目录说明
#/var/log (CentOS目录)
# /mha (MHA监控根目录)
# /app1 (MHA监控实例根目录)
# /manager.log (MHA监控实例日志文件)
mkdir -p /var/log/mha/app1
touch /var/log/mha/app1/manager.log
配置监控全局配置文件
在主数据库(10.0.0.111) mysql库中创建一个新用户
mysql> create user 'mha'@'%' identified by '123123';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on *.* to mha@'%' identified by '123123';
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql>
MHA服务器(10.0.0.114)创建全局监控文件
vi /etc/masterha_default.cnf
[server default]
user=mha
password=123123
port=3306
#ssh登录账号
ssh_user=root
#从库复制账号和密码
repl_user=root
repl_password=123456
port=3306
#ping次数
ping_interval=1
#二次检查的主机
secondary_check_script=masterha_secondary_check -s 10.0.0.111 -s 10.0.0.112 -s 10.0.0.113
配置监控实例配置文件
# 先创建目录
mkdir -p /etc/mha
vi /etc/mha/app1.cnf
[server default]
#MHA监控实例根目录
manager_workdir=/var/log/mha/app1
#MHA监控实例日志文件
manager_log=/var/log/mha/app1/manager.log
#[serverx] 服务器编号
#hostname 主机名
#candidate_master 可以做主库
#master_binlog_dir binlog日志文件目录
[server1]
hostname=10.0.0.111
candidate_master=1
master_binlog_dir="/var/lib/mysql"
[server2]
hostname=10.0.0.112
candidate_master=1
master_binlog_dir="/var/lib/mysql"
[server3]
hostname=10.0.0.113
candidate_master=1
master_binlog_dir="/var/lib/mysql"
MHA配置检查
执行ssh通信检查
在MHA Manager服务器执行命令:
masterha_check_ssh --conf=/etc/mha/app1.cnf
正确响应结果
[root@localhost mha]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Jun 11 06:46:22 2021 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Jun 11 06:46:22 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Jun 11 06:46:22 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Jun 11 06:46:22 2021 - [info] Starting SSH connection tests..
Fri Jun 11 06:46:24 2021 - [debug]
Fri Jun 11 06:46:22 2021 - [debug] Connecting via SSH from root@10.0.0.111(10.0.0.111:22) to root@10.0.0.112(10.0.0.112:22)..
Warning: Permanently added '10.0.0.112' (ECDSA) to the list of known hosts.
Fri Jun 11 06:46:22 2021 - [debug] ok.
Fri Jun 11 06:46:22 2021 - [debug] Connecting via SSH from root@10.0.0.111(10.0.0.111:22) to root@10.0.0.113(10.0.0.113:22)..
Fri Jun 11 06:46:23 2021 - [debug] ok.
Fri Jun 11 06:46:24 2021 - [debug]
Fri Jun 11 06:46:22 2021 - [debug] Connecting via SSH from root@10.0.0.112(10.0.0.112:22) to root@10.0.0.111(10.0.0.111:22)..
Warning: Permanently added '10.0.0.111' (ECDSA) to the list of known hosts.
Fri Jun 11 06:46:23 2021 - [debug] ok.
Fri Jun 11 06:46:23 2021 - [debug] Connecting via SSH from root@10.0.0.112(10.0.0.112:22) to root@10.0.0.113(10.0.0.113:22)..
Fri Jun 11 06:46:24 2021 - [debug] ok.
Fri Jun 11 06:46:25 2021 - [debug]
Fri Jun 11 06:46:23 2021 - [debug] Connecting via SSH from root@10.0.0.113(10.0.0.113:22) to root@10.0.0.111(10.0.0.111:22)..
Fri Jun 11 06:46:24 2021 - [debug] ok.
Fri Jun 11 06:46:24 2021 - [debug] Connecting via SSH from root@10.0.0.113(10.0.0.113:22) to root@10.0.0.112(10.0.0.112:22)..
Fri Jun 11 06:46:24 2021 - [debug] ok.
Fri Jun 11 06:46:25 2021 - [info] All SSH connection tests passed successfully.
检测MySQL主从复制
在MHA Manager服务器上执行:
masterha_check_repl --conf=/etc/mha/app1.cnf
出现如下信息,表示集群没有问题。
[root@localhost ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Fri Jun 11 06:54:46 2021 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Jun 11 06:54:46 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Jun 11 06:54:46 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Jun 11 06:54:46 2021 - [info] MHA::MasterMonitor version 0.58.
Fri Jun 11 06:54:47 2021 - [info] GTID failover mode = 0
Fri Jun 11 06:54:47 2021 - [info] Dead Servers:
Fri Jun 11 06:54:47 2021 - [info] Alive Servers:
Fri Jun 11 06:54:47 2021 - [info] 10.0.0.111(10.0.0.111:3306)
Fri Jun 11 06:54:47 2021 - [info] 10.0.0.112(10.0.0.112:3306)
Fri Jun 11 06:54:47 2021 - [info] 10.0.0.113(10.0.0.113:3306)
Fri Jun 11 06:54:47 2021 - [info] Alive Slaves:
Fri Jun 11 06:54:47 2021 - [info] 10.0.0.112(10.0.0.112:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Fri Jun 11 06:54:47 2021 - [info] Replicating from 10.0.0.111(10.0.0.111:3306)
Fri Jun 11 06:54:47 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Jun 11 06:54:47 2021 - [info] 10.0.0.113(10.0.0.113:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Fri Jun 11 06:54:47 2021 - [info] Replicating from 10.0.0.111(10.0.0.111:3306)
Fri Jun 11 06:54:47 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Jun 11 06:54:47 2021 - [info] Current Alive Master: 10.0.0.111(10.0.0.111:3306)
Fri Jun 11 06:54:47 2021 - [info] Checking slave configurations..
Fri Jun 11 06:54:47 2021 - [info] Checking replication filtering settings..
Fri Jun 11 06:54:47 2021 - [info] binlog_do_db= , binlog_ignore_db= information_schema,mysql,performance_schema,sys
Fri Jun 11 06:54:47 2021 - [info] Replication filtering check ok.
Fri Jun 11 06:54:47 2021 - [info] GTID (with auto-pos) is not supported
Fri Jun 11 06:54:47 2021 - [info] Starting SSH connection tests..
Fri Jun 11 06:54:49 2021 - [info] All SSH connection tests passed successfully.
Fri Jun 11 06:54:49 2021 - [info] Checking MHA Node version..
Fri Jun 11 06:54:50 2021 - [info] Version check ok.
Fri Jun 11 06:54:50 2021 - [info] Checking SSH publickey authentication settings on the current master..
Fri Jun 11 06:54:50 2021 - [info] HealthCheck: SSH to 10.0.0.111 is reachable.
Fri Jun 11 06:54:51 2021 - [info] Master MHA Node version is 0.58.
Fri Jun 11 06:54:51 2021 - [info] Checking recovery script configurations on 10.0.0.111(10.0.0.111:3306)..
Fri Jun 11 06:54:51 2021 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000003
Fri Jun 11 06:54:51 2021 - [info] Connecting to root@10.0.0.111(10.0.0.111:22)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000003
Fri Jun 11 06:54:51 2021 - [info] Binlog setting check done.
Fri Jun 11 06:54:51 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Jun 11 06:54:51 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.0.0.112 --slave_ip=10.0.0.112 --slave_port=3306 --workdir=/var/tmp --target_version=5.7.28-log --manager_version=0.58 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Jun 11 06:54:51 2021 - [info] Connecting to root@10.0.0.112(10.0.0.112:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000005
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000005
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Jun 11 06:54:52 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.0.0.113 --slave_ip=10.0.0.113 --slave_port=3306 --workdir=/var/tmp --target_version=5.7.28-log --manager_version=0.58 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Jun 11 06:54:52 2021 - [info] Connecting to root@10.0.0.113(10.0.0.113:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000005
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000005
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Jun 11 06:54:52 2021 - [info] Slaves settings check done.
Fri Jun 11 06:54:52 2021 - [info]
10.0.0.111(10.0.0.111:3306) (current master)
+--10.0.0.112(10.0.0.112:3306)
+--10.0.0.113(10.0.0.113:3306)
Fri Jun 11 06:54:52 2021 - [info] Checking replication health on 10.0.0.112..
Fri Jun 11 06:54:52 2021 - [info] ok.
Fri Jun 11 06:54:52 2021 - [info] Checking replication health on 10.0.0.113..
Fri Jun 11 06:54:52 2021 - [info] ok.
Fri Jun 11 06:54:52 2021 - [warning] master_ip_failover_script is not defined.
Fri Jun 11 06:54:52 2021 - [warning] shutdown_script is not defined.
Fri Jun 11 06:54:52 2021 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
监测中发现的问题
-
从节点没有安装节点的错误信息
[root@localhost mha]# masterha_check_repl --conf=/etc/mha/app1.cnf Fri Jun 11 06:49:15 2021 - [info] Reading default configuration from /etc/masterha_default.cnf.. Fri Jun 11 06:49:15 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Jun 11 06:49:15 2021 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Jun 11 06:49:15 2021 - [info] MHA::MasterMonitor version 0.58. Fri Jun 11 06:49:16 2021 - [info] GTID failover mode = 0 Fri Jun 11 06:49:16 2021 - [info] Dead Servers: Fri Jun 11 06:49:16 2021 - [info] Alive Servers: Fri Jun 11 06:49:16 2021 - [info] 10.0.0.111(10.0.0.111:3306) Fri Jun 11 06:49:16 2021 - [info] 10.0.0.112(10.0.0.112:3306) Fri Jun 11 06:49:16 2021 - [info] 10.0.0.113(10.0.0.113:3306) Fri Jun 11 06:49:16 2021 - [info] Alive Slaves: Fri Jun 11 06:49:16 2021 - [info] 10.0.0.112(10.0.0.112:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Fri Jun 11 06:49:16 2021 - [info] Replicating from 10.0.0.111(10.0.0.111:3306) Fri Jun 11 06:49:16 2021 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 11 06:49:16 2021 - [info] 10.0.0.113(10.0.0.113:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Fri Jun 11 06:49:16 2021 - [info] Replicating from 10.0.0.111(10.0.0.111:3306) Fri Jun 11 06:49:16 2021 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 11 06:49:16 2021 - [info] Current Alive Master: 10.0.0.111(10.0.0.111:3306) Fri Jun 11 06:49:16 2021 - [info] Checking slave configurations.. Fri Jun 11 06:49:16 2021 - [info] Checking replication filtering settings.. Fri Jun 11 06:49:16 2021 - [info] binlog_do_db= , binlog_ignore_db= information_schema,mysql,performance_schema,sys Fri Jun 11 06:49:16 2021 - [info] Replication filtering check ok. Fri Jun 11 06:49:16 2021 - [info] GTID (with auto-pos) is not supported Fri Jun 11 06:49:16 2021 - [info] Starting SSH connection tests.. Fri Jun 11 06:49:18 2021 - [info] All SSH connection tests passed successfully. Fri Jun 11 06:49:18 2021 - [info] Checking MHA Node version.. Fri Jun 11 06:49:19 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln122] Got error when getting node version. Error: Fri Jun 11 06:49:19 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln123] bash: apply_diff_relay_logs: command not found Fri Jun 11 06:49:19 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln150] node version on 10.0.0.112 not found! Is MHA Node package installed ? at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 377. Fri Jun 11 06:49:19 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. node version on 10.0.0.112 not found! Is MHA Node package installed ? at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 377. ...propagated at /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm line 151. Fri Jun 11 06:49:19 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers. Fri Jun 11 06:49:19 2021 - [info] Got exit code 1 (Not master dead). MySQL Replication Health is NOT OK!
-
主节点没有安装node的错误信息
[root@localhost mha]# masterha_check_repl --conf=/etc/mha/app1.cnf Fri Jun 11 06:51:14 2021 - [info] Reading default configuration from /etc/masterha_default.cnf.. Fri Jun 11 06:51:14 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Jun 11 06:51:14 2021 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Jun 11 06:51:14 2021 - [info] MHA::MasterMonitor version 0.58. Fri Jun 11 06:51:15 2021 - [info] GTID failover mode = 0 Fri Jun 11 06:51:15 2021 - [info] Dead Servers: Fri Jun 11 06:51:15 2021 - [info] Alive Servers: Fri Jun 11 06:51:15 2021 - [info] 10.0.0.111(10.0.0.111:3306) Fri Jun 11 06:51:15 2021 - [info] 10.0.0.112(10.0.0.112:3306) Fri Jun 11 06:51:15 2021 - [info] 10.0.0.113(10.0.0.113:3306) Fri Jun 11 06:51:15 2021 - [info] Alive Slaves: Fri Jun 11 06:51:15 2021 - [info] 10.0.0.112(10.0.0.112:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Fri Jun 11 06:51:15 2021 - [info] Replicating from 10.0.0.111(10.0.0.111:3306) Fri Jun 11 06:51:15 2021 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 11 06:51:15 2021 - [info] 10.0.0.113(10.0.0.113:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Fri Jun 11 06:51:15 2021 - [info] Replicating from 10.0.0.111(10.0.0.111:3306) Fri Jun 11 06:51:15 2021 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 11 06:51:15 2021 - [info] Current Alive Master: 10.0.0.111(10.0.0.111:3306) Fri Jun 11 06:51:15 2021 - [info] Checking slave configurations.. Fri Jun 11 06:51:15 2021 - [info] Checking replication filtering settings.. Fri Jun 11 06:51:15 2021 - [info] binlog_do_db= , binlog_ignore_db= information_schema,mysql,performance_schema,sys Fri Jun 11 06:51:15 2021 - [info] Replication filtering check ok. Fri Jun 11 06:51:16 2021 - [info] GTID (with auto-pos) is not supported Fri Jun 11 06:51:16 2021 - [info] Starting SSH connection tests.. Fri Jun 11 06:51:19 2021 - [info] All SSH connection tests passed successfully. Fri Jun 11 06:51:19 2021 - [info] Checking MHA Node version.. Fri Jun 11 06:51:20 2021 - [info] Version check ok. Fri Jun 11 06:51:20 2021 - [info] Checking SSH publickey authentication settings on the current master.. Fri Jun 11 06:51:20 2021 - [info] HealthCheck: SSH to 10.0.0.111 is reachable. Fri Jun 11 06:51:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln122] Got error when getting node version. Error: Fri Jun 11 06:51:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln123] bash: apply_diff_relay_logs: command not found Fri Jun 11 06:51:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln135] Failed to get MHA node version on the current master even though current master is reachable via SSH! Fri Jun 11 06:51:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48. Fri Jun 11 06:51:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers. Fri Jun 11 06:51:21 2021 - [info] Got exit code 1 (Not master dead). MySQL Replication Health is NOT OK!
启动MHA Manager
MHA Manager服务器执行命令:
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf -- ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
查看监控状态:
[root@localhost ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:5503) is running(0:PING_OK), master:10.0.0.111
[root@localhost ~]#
查看监控日志:
[root@localhost ~]# tail -f /var/log/mha/app1/manager.log
10.0.0.111(10.0.0.111:3306) (current master)
+--10.0.0.112(10.0.0.112:3306)
+--10.0.0.113(10.0.0.113:3306)
Fri Jun 11 07:00:20 2021 - [warning] master_ip_failover_script is not defined.
Fri Jun 11 07:00:20 2021 - [warning] shutdown_script is not defined.
Fri Jun 11 07:00:20 2021 - [info] Set master ping interval 1 seconds.
Fri Jun 11 07:00:20 2021 - [info] Set secondary check script: masterha_secondary_check -s 10.0.0.111 -s 10.0.0.112 -s 10.0.0.113
Fri Jun 11 07:00:20 2021 - [info] Starting ping health check on 10.0.0.111(10.0.0.111:3306)..
Fri Jun 11 07:00:20 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
测试MHA故障转移
在MHA Manager服务器监控日志信息:
tail -200f /var/log/mha/app1/manager.log
关闭主节点MySQL服务,模拟主节点宕机
systemctl stop mysqld
监控日志输出如下信息:
可以看到当前主节点转移到了112
Selected 10.0.0.112(10.0.0.112:3306) as a new master.
----- Failover Report -----
app1: MySQL Master failover 10.0.0.111(10.0.0.111:3306) to 10.0.0.112(10.0.0.112:3306) succeeded
Master 10.0.0.111(10.0.0.111:3306) is down!
Check MHA Manager logs at localhost.localdomain:/var/log/mha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 10.0.0.112(10.0.0.112:3306) has all relay logs for recovery.
Selected 10.0.0.112(10.0.0.112:3306) as a new master.
10.0.0.112(10.0.0.112:3306): OK: Applying all logs succeeded.
10.0.0.113(10.0.0.113:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.0.0.113(10.0.0.113:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.0.0.112(10.0.0.112:3306)
10.0.0.112(10.0.0.112:3306): Resetting slave info succeeded.
Master failover to 10.0.0.112(10.0.0.112:3306) completed successfully.
测试宕机服务恢复,检查数据一致性
在刚选出的主节点上执行测试脚本创建表 插入数据
CREATE DATABASE test;
USE test;
CREATE TABLE position ( id INT ( 20 ), NAME VARCHAR ( 50 ), salary VARCHAR ( 20 ), city VARCHAR ( 50 ) ) ENGINE = INNODB charset = utf8;
create TABLE position_detail ( id int(20), pid int(20), description text ) ENGINE=innodb charset=utf8;
insert into position values(1, 'Java', 13000, 'shanghai');
insert into position values(2, 'DBA', 20000, 'beijing');
insert into position_detail values(1, 1, 'Java Developer');
insert into position_detail values(2, 2, 'Database Administrator');
启动刚下线的节点服务
systemctl start mysqld
查询当前主节点的信息
mysql> show master status;
+------------------+----------+--------------+-------------------------------------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+-------------------------------------------------+-------------------+
| mysql-bin.000005 | 2908 | | information_schema,mysql,performance_schema,sys | |
+------------------+----------+--------------+-------------------------------------------------+-------------------+
1 row in set (0.00 sec)
刚恢复的节点上执行如下SQL加入到集群中
stop slave;//关闭同步
change master to master_host='10.0.0.112',master_port=3306,master_user='root',master_password ='123456',master_log_file='mysql-bin.000005',master_log_pos=2908;
start slave; // 开启同步
三个节点启动成功后,再次尝试关掉主节点(112):
当关掉主节点112的时候 发现mha并没有立即切换其他从节点为主节点。
我们检查MHA Manager服务器执行命令以及监控状态和日志信息发现MHA并没有在执行。
检查监控实例,发现之前宕机的主节点(111)也不再监控文件中。
解决方案:
- 补充实例到 vi /etc/mha/app1.cnf文件中。
- 重新启动MHA Manager执行命令。