MHA(Master HA)是一款开源的MySQL的高可用程序,它为MySQL主从复制架构提供了automating master failover 功能。MHA在监控到master节点故障时,会提升其中拥有最新数据的slave节点成为新的master节点,在此期间,MHA会通过与其它从节点获取额外信息来避免一致性方面的问题。MHA还提供了master节点的在线切换功能,即按需切换master/slave节点。
相较于其它HA软件,MHA的目的在于维持MySQL Replication中Master库的高可用性,其最大特点是可以修复多个Slave之间的差异日志,最终使所有Slave保持数据一致,然后从中选择一个充当新的Master,并将其它Slave指向它。
原理介绍:
MHA的目的在于维持MySQL Replication中Master库的高可用性,其最大特点是可以修复多个Slave之间的差异日志,最终使所有Slave保持数据一致,然后从中选择一个充当新的Master,并将其它Slave指向它。
当master出现故障时,可以通过对比slave之间I/O thread 读取主库binlog的position号,选取最接近的slave做为备选主库(备胎)。其它的从库可以通过与备选主库对比生成差异的中继日志。在备选主库上应用从原来master保存的binlog,同时将备选主库提升为master。最后在其它slave上应用相应的差异中继日志并开始从新的master开始复制。
MHA服务有两种角色:MHA manager(管理节点)和MHA Node(数据节点)
MHA
manager:通常单独部署在一台独立机器上管理多个master/salve集群,每个master/salve集群称为一个application
MHA
node:运行在每台Mysql服务器之上/Master/slave/Manager,它通过监控具备解析和清理logs功能的脚本来加快故障转移。
MHA组件讲解
Master节点:
masterha_check_ssh:MHA依赖的ssh环境测试工具,管理节点要去复制主节点的二进制文件。
masterha_check_repl:Mysql复制环境测试工具,测试MYSQL主从中是否具备切换功能。
masterha_manager:MHA服务的主程序。 masterha_check_status:MHA运行状态探测工具
masterha_master_monitor:Mysql master节点可用性探测工具
masterha_master_switch:master节点切换工具 masterha_conf_host:添加或删除配置的节点
masterha_stop:关闭MHA服务工具
Node节点:
save_binary_logs:保存和复制master的二进制日志
apply_diff_relay_logs:识别差异的中继日志并应用于其他slave
filter_mysqlbinlog:去除不必要的PROLLBACK事件
purge_relay_logs:清除中继日志(不会阻塞SQL线程)
MHA相关配置
准备4个节点,一个manager管理节点(node1),主节点(node2),两个从节点(node3,node4)
节点 | IP地址 |
---|---|
node1 | 10.5.100.208 |
node2 | 10.5.100.207 |
node3 | 10.5.100.183 |
node34 | 10.5.100.146 |
MHA是一个基于数据库主从之上的高可用程序,即我们先配置主从架构,如下基于centos7
NODE2节点
1,安装数据库
修改配置文件
vim /etc/my.cnf
relay-log=relay-log //开启中继日志功能
server-id=1
skip_name_resolve=ON
innodb_file_per_table=ON
log-bin=mysql-bin
2,重启数据库
systemctl restart mariadb
3,查看数据库二进制文件及位置,进行数据库主从
MariaDB [(none)]> show master status;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000003 | 245 | | |
+------------------+----------+--------------+------------------+
MariaDB [(none)]> grant replication slave,replication client on *.* to 'repluser'@'10.5.100.%' identified by 'replpass';
Query OK, 0 rows affected (0.00 sec)
//创建一个可以复制的用户权限账号。
MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
NODE3节点
1,安装数据库
修改配置文件
vim /etc/my.cnf
relay-log=relay-log //开启中继日志功能
server-id=2
skip_name_resolve=ON
innodb_file_per_table=ON
log-bin=mysql-bin
read_only=1 //标识这是个从库,具有只读能力
relay_log_purge = 0 //不自动清理中继日志
2,进行数据库复制。
MariaDB [(none)]> change master to
master_host=‘10.5.100.207’,
master_user=‘repluser’,
master_password=‘replpass’,
master_log_file=mysql-bin.000003’,
master_log_pos=245;
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> show slave status\G;
NODE4节点:
1,安装数据库
修改配置文件
vim /etc/my.cnf
relay-log=relay-log //开启中继日志功能
server-id=3
skip_name_resolve=ON
innodb_file_per_table=ON
log-bin=mysql-bin
read_only=1 //标识这是个从库,具有只读能力
relay_log_purge = 0 //不自动清理中继日志
2,进行数据库复制。
MariaDB [(none)]> change master to
master_host=‘10.5.100.207’,
master_user=‘repluser’,
master_password=‘replpass’,
master_log_file=mysql-bin.000003’,
master_log_pos=245;
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> show slave status\G;
主从复制完成后,我们需要在NODE2节点创建管理权限账号,因为MHA需要通过这个管理账号去修改从节点的配置。
NODE2
MariaDB [(none)]> grant all privileges on . to ‘mhauser’@‘10.5.100.%’ identified by ‘mhapass’; Query OK, 0 rows
affected (0.00 sec)
MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
之所以在做好主从复制之后,在创建管理帐号,因为从节点也需要管理账号,给Manager使用管理。
NODE1节点:
数据库主从架构完成,现在要管理主从架构能正常运行,即对主库做高可用架构
MHA前提需要安装管理节点和NODE节点的yum包,这两个包依赖base源和epel源
https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
MHA相关软件包下载地址。
[root@node1 ~]# ls
anaconda-ks.cfg hellodb1.sql mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm
[root@node1 ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
[root@node1 ~]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
[root@node1 ~]# rpm -ql mha4mysql-manager //管理节点产生的相关程序包
/usr/bin/masterha_check_repl
/usr/bin/masterha_check_ssh
/usr/bin/masterha_check_status
/usr/bin/masterha_conf_host
/usr/bin/masterha_manager
/usr/bin/masterha_master_monitor
/usr/bin/masterha_master_switch
/usr/bin/masterha_secondary_check
/usr/bin/masterha_stop
[root@node1 ~]# rpm -ql mha4mysql-node
/usr/bin/apply_diff_relay_logs
/usr/bin/filter_mysqlbinlog
/usr/bin/purge_relay_logs
/usr/bin/save_binary_logs
2,因为管理节点随时要去主节点拷贝数据,去各个从节点整理数据,这些过程需要ssh协议支持,所以需要设置免密通信,以便于通信
[root@node1 ~]# ssh-keygen -t rsa -p ''
[root@node1 ~]# cat .ssh/id_rsa.pub >> .ssh/authorized_keys //将公钥追加到authorized_key文件中
[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys 10.5.100.207:/root/.ssh/ //将公钥和私钥分发给各节点主机,完成免密认证。
[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys 10.5.100.183:/root/.ssh/
[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys 10.5.100.146:/root/.ssh/
[root@node1 ~]# scp mha4mysql-node-0.56-0.el6.noarch.rpm 10.5.100.207:/root
[root@node1 ~]# scp mha4mysql-node-0.56-0.el6.noarch.rpm 10.5.100.183:/root
[root@node1 ~]# scp mha4mysql-node-0.56-0.el6.noarch.rpm 10.5.100.146:/root
//主节点和从节点都需安装node节点。进行整合差异日志
3,配置MHA的配置文件。
[root@node1 ~]# mkdir /etc/masterha/
[root@node1 masterha]# vim app1.cnf
[server default]
user=mhauser //用于远程管理的用户账号。
password=mhapass
manager_workdir=/data/masterha/app1
manager_log=/data/masterha/app1/manager.log
remote_workdir=/data/masterha/app1
ssh_user=root
repl_user=repluser //用于复制的用户账号和密码
repl_password=replpass
ping_interval=1 //每经过一段时间监测主节点是否在线
[server1] //第一台主机节点
hostname=10.5.100.207
ssh_port=22 //如果连接的端口不是默认的22号,可以进行标明端口
[server2] //第二台主机节点
hostname=10.5.100.183
ssh_port=22
[server3] //第三台主机节点
hostname=10.5.100.146
ssh_port=22
一切配置完成,我们就可以运行MHA来监控管理了
[
root@node1 masterha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
// 检测ssh环境是否运行成功
[root@node1 masterha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sat Nov 30 01:59:49 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Nov 30 01:59:49 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Nov 30 01:59:49 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat Nov 30 01:59:49 2019 - [info] Starting SSH connection tests..
Sat Nov 30 01:59:50 2019 - [debug]
Sat Nov 30 01:59:49 2019 - [debug] Connecting via SSH from root@10.5.100.207(10.5.100.207:22) to root@10.5.100.183(10.5.100.183:22)..
Sat Nov 30 01:59:50 2019 - [debug] ok.
Sat Nov 30 01:59:50 2019 - [debug] Connecting via SSH from root@10.5.100.207(10.5.100.207:22) to root@10.5.100.146(10.5.100.146:22)..
Sat Nov 30 01:59:50 2019 - [debug] ok.
Sat Nov 30 01:59:52 2019 - [debug]
Sat Nov 30 01:59:50 2019 - [debug] Connecting via SSH from root@10.5.100.183(10.5.100.183:22) to root@10.5.100.207(10.5.100.207:22)..
Sat Nov 30 01:59:50 2019 - [debug] ok.
Sat Nov 30 01:59:50 2019 - [debug] Connecting via SSH from root@10.5.100.183(10.5.100.183:22) to root@10.5.100.146(10.5.100.146:22)..
Sat Nov 30 01:59:51 2019 - [debug] ok.
Sat Nov 30 01:59:52 2019 - [debug]
Sat Nov 30 01:59:50 2019 - [debug] Connecting via SSH from root@10.5.100.146(10.5.100.146:22) to root@10.5.100.207(10.5.100.207:22)..
Sat Nov 30 01:59:51 2019 - [debug] ok.
Sat Nov 30 01:59:51 2019 - [debug] Connecting via SSH from root@10.5.100.146(10.5.100.146:22) to root@10.5.100.183(10.5.100.183:22)..
Sat Nov 30 01:59:51 2019 - [debug] ok.
Sat Nov 30 01:59:52 2019 - [info] All SSH connection tests passed successfully.
监测主从复制是否运行正常。
[root@node1 masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sat Nov 30 02:09:18 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Nov 30 02:09:18 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Nov 30 02:09:18 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat Nov 30 02:09:18 2019 - [info] MHA::MasterMonitor version 0.56.
Sat Nov 30 02:09:19 2019 - [info] GTID failover mode = 0
Sat Nov 30 02:09:19 2019 - [info] Dead Servers:
Sat Nov 30 02:09:19 2019 - [info] Alive Servers:
Sat Nov 30 02:09:19 2019 - [info] 10.5.100.207(10.5.100.207:3306)
Sat Nov 30 02:09:19 2019 - [info] 10.5.100.183(10.5.100.183:3306)
Sat Nov 30 02:09:19 2019 - [info] 10.5.100.146(10.5.100.146:3306)
Sat Nov 30 02:09:19 2019 - [info] Alive Slaves:
Sat Nov 30 02:09:19 2019 - [info] 10.5.100.207(10.5.100.207:3306) Version=5.5.41-MariaDB-log (oldest major version between slaves) log-bin:enabled
Sat Nov 30 02:09:19 2019 - [info] Replicating from 10.5.100.183(10.5.100.183:3306)
Sat Nov 30 02:09:19 2019 - [info] 10.5.100.146(10.5.100.146:3306) Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled
Sat Nov 30 02:09:19 2019 - [info] Replicating from 10.5.100.183(10.5.100.183:3306)
Sat Nov 30 02:09:19 2019 - [info] Current Alive Master: 10.5.100.183(10.5.100.183:3306)
Sat Nov 30 02:09:19 2019 - [info] Checking slave configurations..
Sat Nov 30 02:09:19 2019 - [warning] relay_log_purge=0 is not set on slave 10.5.100.207(10.5.100.207:3306).
Sat Nov 30 02:09:19 2019 - [warning] relay_log_purge=0 is not set on slave 10.5.100.146(10.5.100.146:3306).
Sat Nov 30 02:09:19 2019 - [info] Checking replication filtering settings..
Sat Nov 30 02:09:19 2019 - [info] binlog_do_db= , binlog_ignore_db=
Sat Nov 30 02:09:19 2019 - [info] Replication filtering check ok.
Sat Nov 30 02:09:19 2019 - [info] GTID (with auto-pos) is not supported
Sat Nov 30 02:09:19 2019 - [info] Starting SSH connection tests..
Sat Nov 30 02:09:22 2019 - [info] All SSH connection tests passed successfully.
Sat Nov 30 02:09:22 2019 - [info] Checking MHA Node version..
Sat Nov 30 02:09:22 2019 - [info] Version check ok.
Sat Nov 30 02:09:22 2019 - [info] Checking SSH publickey authentication settings on the current master..
Sat Nov 30 02:09:22 2019 - [info] HealthCheck: SSH to 10.5.100.183 is reachable.
Sat Nov 30 02:09:23 2019 - [info] Master MHA Node version is 0.56.
Sat Nov 30 02:09:23 2019 - [info] Checking recovery script configurations on 10.5.100.183(10.5.100.183:3306)..
Sat Nov 30 02:09:23 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000004
Sat Nov 30 02:09:23 2019 - [info] Connecting to root@10.5.100.183(10.5.100.183:22)..
Creating /data/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000004
Sat Nov 30 02:09:23 2019 - [info] Binlog setting check done.
Sat Nov 30 02:09:23 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Nov 30 02:09:23 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=10.5.100.207 --slave_ip=10.5.100.207 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.41-MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Nov 30 02:09:23 2019 - [info] Connecting to root@10.5.100.207(10.5.100.207:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to relay-log.000002
Temporary relay log file is /var/lib/mysql/relay-log.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Nov 30 02:09:23 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=10.5.100.146 --slave_ip=10.5.100.146 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Nov 30 02:09:23 2019 - [info] Connecting to root@10.5.100.146(10.5.100.146:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to relay-log.000004
Temporary relay log file is /var/lib/mysql/relay-log.000004
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Nov 30 02:09:24 2019 - [info] Slaves settings check done.
Sat Nov 30 02:09:24 2019 - [info]
10.5.100.207(10.5.100.207:3306) (current master)
+--10.5.100.183(10.5.100.183:3306)
+--10.5.100.146(10.5.100.146:3306)
//上述这两项表明了我们主从复制正常运行,即那个是主,那个是从。
Sat Nov 30 02:09:24 2019 - [info] Checking replication health on 10.5.100.207..
Sat Nov 30 02:09:24 2019 - [info] ok.
Sat Nov 30 02:09:24 2019 - [info] Checking replication health on 10.5.100.146..
Sat Nov 30 02:09:24 2019 - [info] ok.
Sat Nov 30 02:09:24 2019 - [warning] master_ip_failover_script is not defined.
Sat Nov 30 02:09:24 2019 - [warning] shutdown_script is not defined.
Sat Nov 30 02:09:24 2019 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
查看MHA运行状态。
[root@node1 masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING). //因为我还没有运行。
[root@node1 masterha]#
开启MHA管理机制
[root@node1 masterha]# masterha_manager --conf=/etc/masterha/app1.cnf //如果没有报错就运行成功
Sat Nov 30 02:19:30 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Nov 30 02:19:30 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Nov 30 02:19:30 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf..
另起一个终端,查看运行状况
[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:10738) is running(0:PING_OK), master:10.5.100.183
MHA有一个缺陷,就是主节点发生故障时,才会停止运行然后报出错误日志
现在我把主节点下线,按MHA情况
[root@node2 ~]# systemctl stop mariadb //将主节点数据库下线
//管理节点出错,说没有找到某文件。
[root@node1 masterha]# masterha_manager --conf=/etc/masterha/app1.cnf
Sat Nov 30 02:19:30 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Nov 30 02:19:30 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Nov 30 02:19:30 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Creating /data/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000004
Sat Nov 30 02:25:05 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Nov 30 02:25:05 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Nov 30 02:25:05 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf..
[root@node1 masterha]#
当主节点下线后,MHA会将与主节点最近的当成从节点。并且将其他从节点指向新的主节点。
在node4节点查看是否指向新的主节点。只读权限是否被关闭。
```clike
MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.5.100.183 //原来主节点是207,现在183已经成为新的主了
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 573
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 529
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 573
Relay_Log_Space: 817
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
node从从节点变为了主机点,只读权限也关闭了
MariaDB [(none)]> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
1 row in set (0.00 sec)
**那么假如主节点又上线了呢?
主节点又上线了,只能变为从节点,指向新的主机点。如何指向不做过多介绍了。