一、
**简介
MHA(Master HA)是一款开源的 MySQL 的高可用程序,它为 MySQL 主从复制架构提供了 automating master failover 功能。MHA 在监控到 master 节点故障时,会提升其中拥有最新数据的 slave 节点成为新的master 节点,在此期间,MHA 会通过于其它从节点获取额外信息来避免一致性方面的问题。MHA 还提供了 master 节点的在线切换功能,即按需切换 master/slave 节点。
MHA 是由日本人 yoshinorim(原就职于DeNA现就职于FaceBook)开发的比较成熟的 MySQL 高可用方案。MHA 能够在30秒内实现故障切换,并能在故障切换中,最大可能的保证数据一致性。目前淘宝也正在开发相似产品 TMHA, 目前已支持一主一从。
**
二、
**MHA 服务
MHA 服务有两种角色, MHA Manager(管理节点)和 MHA Node(数据节点):
MHA Manager:
通常单独部署在一台独立机器上管理多个 master/slave 集群(组),每个 master/slave 集群称作一个 application,用来管理统筹整个集群。
MHA node:
运行在每台 MySQL 服务器上(master/slave/manager),它通过监控具备解析和清理 logs 功能的脚本来加快故障转移。
主要是接收管理节点所发出指令的代理,代理需要运行在每一个 mysql 节点上。简单讲 node 就是用来收集从节点服务器上所生成的 bin-log 。对比打算提升为新的主节点之上的从节点的是否拥有并完成操作,如果没有发给新主节点在本地应用后提升为主节点。
在这里插入图片描述
由上图我们可以看出,每个复制组内部和 Manager 之间都需要ssh实现无密码互连,只有这样,在 Master 出故障时, Manager 才能顺利的连接进去,实现主从切换功能。
三、
**
MHA工作原理总结为以下几条:
(1) 从宕机崩溃的 master 保存二进制日志事件(binlog events);
(2) 识别含有最新更新的 slave ;
(3) 应用差异的中继日志(relay log) 到其他 slave ;
(4) 应用从 master 保存的二进制日志事件(binlog events);
(5) 提升一个 slave 为新 master ;
(6) 使用其他的 slave 连接新的 master 进行复制。
在这里插入图片描述
为了方便我们后期的操作,我们在各节点的/etc/hosts文件配置内容中添加如下内容:
192.168.3.12 node1.keer.com node1
192.168.3.13 node2.keer.com node2
192.168.3.14 node3.keer.com node3
在这里插入图片描述
初始主节点 master 的配置
vim /etc/my.cnf
[mysqld]
character_set_server = utf8
log-bin=mysql-bin
server-id=1
relay_log=mysql-relay
skip_name_resolve
所有 slave 节点依赖的配置
slave1
vim /etc/my.cnf
[mysqld]
server-id = 2 //复制集群中的各节点的id均必须唯一;
relay-log = relay-log //开启中继日志
log-bin = master-log //开启二进制日志
read_only = ON //启用只读属性
relay_log_purge = 0 //是否自动清空不再需要中继日志
skip_name_resolve //关闭名称解析(非必须)
log_slave_updates = 1 //使得更新的数据写进二进制日志中
systemctl restart mariadb
slave2
vim /etc/my.cnf
[mysqld]
server-id = 2 //复制集群中的各节点的id均必须唯一;
relay-log = relay-log //开启中继日志
log-bin = master-log //开启二进制日志
read_only = ON //启用只读属性
relay_log_purge = 0 //是否自动清空不再需要中继日志
skip_name_resolve //关闭名称解析(非必须)
log_slave_updates = 1 //使得更新的数据写进二进制日志中
systemctl restart mariadb
本步骤完成。
配置一主多从复制架构
主节点:
MariaDB [(none)]> show master status;
±-----------------±---------±-------------±-----------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
±-----------------±---------±-------------±-----------------+
| mysql-bin.000002 | 245 | | |
±-----------------±---------±-------------±-----------------+
1 row in set (0.00 sec)
从节点1:
MariaDB [(none)]> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.3.13
Master_User: wg
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 316451
Relay_Log_File: mysql-relay.000009
Relay_Log_Pos: 316735
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 316451
Relay_Log_Space: 320644
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
从节点2:
MariaDB [(none)]> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.3.12
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000003
Read_Master_Log_Pos: 735
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 531
Relay_Master_Log_File: mariadb-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 735
Relay_Log_Space: 821
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
安装配置MHA
在 master 上进行授权
在所有 Mysql 节点授权拥有管理权限的用户可在本地网络中有其他节点上远程访问。 当然, 此时仅需要且只能在 master 节点运行类似如下 SQL 语句即可。
grant all on *.* to 'mhaadmin'@'192.168.%.%' identified by 'mhapass';
所有节点都要做一遍
grant replication slave,replication client on *.* to 'slave'@'192.168.%.%' identified by 'keer';
flush privileges;
1
本步骤完成。
准备 ssh 互通环境
MHA集群中的各节点彼此之间均需要基于ssh互信通信,以实现远程控制及数据管理功能。简单起见,可在Manager节点生成密钥对儿,并设置其可远程连接本地主机后, 将私钥文件及authorized_keys文件复制给余下的所有节点即可。
(注意四个节点都要做一遍)
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@node1
当四台机器都进行了上述操作以后,我们可以在 manager 机器上看到如下文件:
cd .ssh/
ls
authorized_keys
cat authorized_keys
四台机器的公钥都已经在authorized_keys这个文件中了,接着,我们只需要把这个文件发送至另外三台机器,这四台机器就可以实现 ssh 无密码互通了:
1| scp authorized_keys root@node2:~/.ssh/
2| scp authorized_keys root@node3:~/.ssh/
3| scp authorized_keys root@node4:~/.ssh/
安装 MHA 包
四个节点都需安装:mha4mysql-node-0.56-0.el6.norch.rpm
Manager 节点另需要安装:mha4mysql-manager-0.56-0.el6.noarch.rpm
下载链接
链接:https://pan.baidu.com/s/1JTYNfe9-KNev8leKtLDJiA
提取码:fphq
本步骤完成。
初始化 MHA ,进行配置
Manager 节点需要为每个监控的 master/slave 集群提供一个专用的配置文件,而所有的 master/slave 集群也可共享全局配置。全局配置文件默认为/etc/masterha_default.cnf,其为可选配置。如果仅监控一组 master/slave 集群,也可直接通过 application 的配置来提供各服务器的默认配置信息。而每个 application 的配置文件路径为自定义。具体操作见下一步骤。
[server default]
user=mhaadmin
password=mhapass
manager_workdir=/etc/mha_master/app1
manager_log=/etc/mha_master/manager.log
remote_workdir=/mydata/mha_master/app1
ssh_user=root
repl_user=slave
repl_password=123
ping_interval=1
[server1]
hostname=192.168.3.12
ssh_port=22
candidate_master=1
[server2]
hostname=192.168.3.13
ssh_port=22
candidate_master=1
[server3]
hostname=192.168.3.14
ssh_port=22
candidate_master=1
本步骤完成。
对四个节点进行检测
检测各节点间 ssh 互信通信配置是否 ok
我们在 Manager 机器上输入下述命令来检测:
masterha_check_ssh -conf=/etc/mha_master/mha.cnf
Tue Dec 22 18:30:14 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Dec 22 18:30:14 2020 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
Tue Dec 22 18:30:14 2020 - [info] Reading server configuration from /etc/mha_master/mha.cnf..
Tue Dec 22 18:30:14 2020 - [info] Starting SSH connection tests..
Tue Dec 22 18:30:15 2020 - [debug]
Tue Dec 22 18:30:14 2020 - [debug] Connecting via SSH from root@192.168.3.12(192.168.3.12:22) to root@192.168.3.13(192.168.3.13:22)..
Tue Dec 22 18:30:14 2020 - [debug] ok.
Tue Dec 22 18:30:14 2020 - [debug] Connecting via SSH from root@192.168.3.12(192.168.3.12:22) to root@192.168.3.14(192.168.3.14:22)..
Tue Dec 22 18:30:14 2020 - [debug] ok.
Tue Dec 22 18:30:15 2020 - [debug]
Tue Dec 22 18:30:14 2020 - [debug] Connecting via SSH from root@192.168.3.13(192.168.3.13:22) to root@192.168.3.12(192.168.3.12:22)..
Tue Dec 22 18:30:15 2020 - [debug] ok.
Tue Dec 22 18:30:15 2020 - [debug] Connecting via SSH from root@192.168.3.13(192.168.3.13:22) to root@192.168.3.14(192.168.3.14:22)..
Tue Dec 22 18:30:15 2020 - [debug] ok.
Tue Dec 22 18:30:16 2020 - [debug]
Tue Dec 22 18:30:15 2020 - [debug] Connecting via SSH from root@192.168.3.14(192.168.3.14:22) to root@192.168.3.12(192.168.3.12:22)..
Tue Dec 22 18:30:15 2020 - [debug] ok.
Tue Dec 22 18:30:15 2020 - [debug] Connecting via SSH from root@192.168.3.14(192.168.3.14:22) to root@192.168.3.13(192.168.3.13:22)..
Tue Dec 22 18:30:15 2020 - [debug] ok.
Tue Dec 22 18:30:16 2020 - [info] All SSH connection tests passed successfully.
2)检查管理的MySQL复制集群的连接配置参数是否OK
masterha_check_repl -conf=/etc/mha_master/mha.cnf
Tue Dec 22 18:35:40 2020 - [warning] Global configuration file /etc/masterha_default.c nf not found. Skipping.
Tue Dec 22 18:35:40 2020 - [info] Reading application default configuration from /etc/ mha_master/mha.cnf..
Tue Dec 22 18:35:40 2020 - [info] Reading server configuration from /etc/mha_master/mh a.cnf..
Tue Dec 22 18:35:40 2020 - [info] MHA::MasterMonitor version 0.56.
Tue Dec 22 18:35:41 2020 - [info] GTID failover mode = 0
Tue Dec 22 18:35:41 2020 - [info] Dead Servers:
Tue Dec 22 18:35:41 2020 - [info] Alive Servers:
Tue Dec 22 18:35:41 2020 - [info] 192.168.3.12(192.168.3.12:3306)
Tue Dec 22 18:35:41 2020 - [info] 192.168.3.13(192.168.3.13:3306)
Tue Dec 22 18:35:41 2020 - [info] 192.168.3.14(192.168.3.14:3306)
Tue Dec 22 18:35:41 2020 - [info] Alive Slaves:
Tue Dec 22 18:35:41 2020 - [info] 192.168.3.12(192.168.3.12:3306) Version=5.5.68-Ma riaDB (oldest major version between slaves) log-bin:enabled
Tue Dec 22 18:35:41 2020 - [info] Replicating from 192.168.3.13(192.168.3.13:3306)
Tue Dec 22 18:35:41 2020 - [info] Primary candidate for the new Master (candidate_ master is set)
Tue Dec 22 18:35:41 2020 - [info] 192.168.3.14(192.168.3.14:3306) Version=5.5.68-Ma riaDB (oldest major version between slaves) log-bin:enabled
Tue Dec 22 18:35:41 2020 - [info] Replicating from 192.168.3.13(192.168.3.13:3306)
Tue Dec 22 18:35:41 2020 - [info] Primary candidate for the new Master (candidate_ master is set)
Tue Dec 22 18:35:41 2020 - [info] Current Alive Master: 192.168.3.13(192.168.3.13:3306 )
Tue Dec 22 18:35:41 2020 - [info] Checking slave configurations..
Tue Dec 22 18:35:41 2020 - [info] Checking replication filtering settings..
Tue Dec 22 18:35:41 2020 - [info] binlog_do_db= , binlog_ignore_db=
Tue Dec 22 18:35:41 2020 - [info] Replication filtering check ok.
Tue Dec 22 18:35:41 2020 - [info] GTID (with auto-pos) is not supported
Tue Dec 22 18:35:41 2020 - [info] Starting SSH connection tests..
Tue Dec 22 18:35:44 2020 - [info] All SSH connection tests passed successfully.
Tue Dec 22 18:35:44 2020 - [info] Checking MHA Node version..
Tue Dec 22 18:35:44 2020 - [info] Version check ok.
Tue Dec 22 18:35:44 2020 - [info] Checking SSH publickey authentication settings on th e current master..
Tue Dec 22 18:35:44 2020 - [info] HealthCheck: SSH to 192.168.3.13 is reachable.
Tue Dec 22 18:35:44 2020 - [info] Master MHA Node version is 0.56.
Tue Dec 22 18:35:44 2020 - [info] Checking recovery script configurations on 192.168.3 .13(192.168.3.13:3306)..
Tue Dec 22 18:35:44 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/mydata/mha_ma ster/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000004
Tue Dec 22 18:35:44 2020 - [info] Connecting to root@192.168.3.13(192.168.3.13:22)..
Creating /mydata/mha_master/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000004
Tue Dec 22 18:35:44 2020 - [info] Binlog setting check done.
Tue Dec 22 18:35:44 2020 - [info] Checking SSH publickey authentication and checking r ecovery script configurations on all alive slave servers..
Tue Dec 22 18:35:44 2020 - [info] Executing command : apply_diff_relay_logs --comman d=test --slave_user='mhaadmin' --slave_host=192.168.3.12 --slave_ip=192.168.3.12 --sla ve_port=3306 --workdir=/mydata/mha_master/app1 --target_version=5.5.68-MariaDB --manag er_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/m ysql/ --slave_pass=xxx
Tue Dec 22 18:35:44 2020 - [info] Connecting to root@192.168.3.12(192.168.3.12:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay.000010
Temporary relay log file is /var/lib/mysql/mysql-relay.000010
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Dec 22 18:35:45 2020 - [info] Executing command : apply_diff_relay_logs --comman d=test --slave_user='mhaadmin' --slave_host=192.168.3.14 --slave_ip=192.168.3.14 --sla ve_port=3306 --workdir=/mydata/mha_master/app1 --target_version=5.5.68-MariaDB --manag er_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/m ysql/ --slave_pass=xxx
Tue Dec 22 18:35:45 2020 - [info] Connecting to root@192.168.3.14(192.168.3.14:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay.000010
Temporary relay log file is /var/lib/mysql/mysql-relay.000010
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Dec 22 18:35:45 2020 - [info] Slaves settings check done.
Tue Dec 22 18:35:45 2020 - [info]
192.168.3.13(192.168.3.13:3306) (current master)
+--192.168.3.12(192.168.3.12:3306)
+--192.168.3.14(192.168.3.14:3306)
Tue Dec 22 18:35:45 2020 - [info] Checking replication health on 192.168.3.12..
Tue Dec 22 18:35:45 2020 - [info] ok.
Tue Dec 22 18:35:45 2020 - [info] Checking replication health on 192.168.3.14..
Tue Dec 22 18:35:45 2020 - [info] ok.
Tue Dec 22 18:35:45 2020 - [warning] master_ip_failover_script is not defined.
Tue Dec 22 18:35:45 2020 - [warning] shutdown_script is not defined.
Tue Dec 22 18:35:45 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
上面最后一行显示ok说明成功了
启动MHA
我们在 manager 节点上执行以下命令来启动 MHA:
nohup masterha_manager -conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
启动成功以后,我们来查看一下 master 节点的状态:
masterha_check_status -conf=/etc/mha_master/mha.cnf
测试 MHA 故障转移
停止master
systemctl stop mariadb
查看日志
tailf /etc/mha_master/manager.log
mha (pid:33546) is running(0:PING_OK), master:192.168.3.13
[root@localhost ~]# tailf /etc/mha_master/manager.log
Started automated(non-interactive) failover.
The latest slave 192.168.3.12(192.168.3.12:3306) has all relay logs for recovery.
Selected 192.168.3.12(192.168.3.12:3306) as a new master.
192.168.3.12(192.168.3.12:3306): OK: Applying all logs succeeded.
192.168.3.14(192.168.3.14:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.3.14(192.168.3.14:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.3.12(192.168.3.12:3306)
192.168.3.12(192.168.3.12:3306): Resetting slave info succeeded.
Master failover to 192.168.3.12(192.168.3.12:3306) completed successfully.
查看日志成功.