简介:
MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。另外对于想快速搭建的可以参考:MHA快速搭建
如何通过MHA Manager管理多组主从复制
MHA工作原理:
(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log)到其他的slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新的master;
(6)使其他的slave连接新的master进行复制;
MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
注意:
为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL 5.5的半同步复制。
实验环境:
master:server1(172.25.8.1)
Candicate slave : server2 (172.25.8.2)
slave : server3 (172.25.8.3)
manager:server4(172.25.8.4)
一.搭建主从复制
1.master搭建(server1)
mysql> grant replication slave on *.* to repl@'172.25.8.%'identified by 'zmy_0808';
Query OK, 0 rows affected, 1 warning (0.06 sec)
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000003 | 843 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
2.slave搭建(server2、server3)
mysql> change master to master_host='172.25.8.1',master_user='repl',master_password='zmy_0808',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.68 sec)
mysql> start slave;
mysql> show slave status\G;
Query OK, 0 rows affected (0.03 sec)
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.8.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000004
Read_Master_Log_Pos: 154
Relay_Log_File: server2-relay-bin.000004
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
二.安装MHA软件
下载并安装 mha4mysql-node-0.54-0.el6.noarch.rpm
(server1、server2、server3)
yum install perl-devel perl-CPAN perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm
管理节点安装manager(server4)
需要的安装包:
mha4mysql-manager-0.56-0.el6.noarch.rpm
mha4mysql-node-0.56-0.el6.noarch.rpm
yum install perl-devel perl-CPAN perl-DBD-MySQL -y
需要下载的rpm包
perl-Log-Dispatch-2.27-1.el6.noarch.rpm
perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
perl-MIME-Lite-3.027-2.el6.noarch.rpm
perl-MIME-Types-1.28-2.el6.noarch.rpm
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
perl-Config-Tiny-2.12-7.1.el6.noarch.rpm
perl-Email-Date-Format-1.002-5.el6.noarch.rpm
免密配置
[root@server4 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
17:dc:04:de:f6:4e:1d:1c:77:f0:3d:47:cf:e9:d4:82 root@server4
The key's randomart image is:
+--[ RSA 2048]----+
| ... .o+|
| o + ..+O|
| + E .*B|
| o .oo+|
| S . o..|
| . o |
| . |
| |
| |
+-----------------+
[root@server4 ~]# cd /root/.ssh/
[root@server4 .ssh]# yum install -y rsync
[root@server4 .ssh]# rsync -p * server1:/root/.ssh/
root@server1's password:
[root@server4 .ssh]# rsync -p * server2:/root/.ssh/
root@server2's password:
[root@server4 .ssh]# rsync -p * server3:/root/.ssh/
root@server3's password:
测试是否免密连接
[root@server4 .ssh]# ssh server1
[root@server4 .ssh]# ssh server2
[root@server4 .ssh]# ssh server3
三.MHA配置
manage节点配置
[root@server4 masterha]# pwd
/etc/masterha ##此目录为mkdir目录
[root@server4 masterha]# vim app.cnf
[server default]
manager_log=/etc/masterha/mha.log ##日志
manager_workdir=/etc/masterha/ ##工作目录
master_binlog_dir=/var/lib/mysql
#master_ip_online_change_script=/etc/masterha/master_ip_online_change
password=zmy_0808 ##监控密码
ping_interval=1 ##设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
remote_workdir=/tmp
repl_password=zmy_0808 ##主从复制用户密码
repl_user=repl ##主从复制用户名
ssh_user=root ##ssh用户名
user=root
[server1]
hostname=172.25.8.1
port=3306
[server2]
candidate_master=1 ##设置为候选master,如果设置该参数以后,发生主从切换以后将会>将此从库提升为主库,即使这个主库不是集群中事件最新的slave
check_repl_delay=0 ##默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
hostname=172.25.8.2
port=3306
[server3]
hostname=172.25.8.3
port=3306
slave节点配置
server2和server3配置relay log的清除方式和slave配置只读,但不要写入配置文件,因为master机down掉后可能随时会升级成master
mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global read_only=on;
Query OK, 0 rows affected (0.00 sec)
检测ssh配置
[root@server4 masterha]# masterha_check_ssh --conf=/etc/masterha/app1/app.cnf
Thu Aug 9 14:58:47 2018 - [info] All SSH connection tests passed successfully.
检测repl环境
[root@server4 masterha]# masterha_check_repl --conf=/etc/masterha/app.cnf
MySQL Replication Health is NOT OK!
解决方法
server1
mysql> grant all on *.* to root@'172.25.8.%' identified by 'zmy_0808';
Query OK, 0 rows affected, 1 warning (0.14 sec)
mysql> grant all on *.* to repl@'172.25.8.%' identified by 'zmy_0808';
Query OK, 0 rows affected, 1 warning (0.08 sec)
再次测试
[root@server4 ~]# masterha_check_repl --conf=/etc/masterha/app.cnf
MySQL Replication Health is OK.
测试:
manager机开启监控
[root@server4 ~]# nohup masterha_manager --conf=/etc/masterha/app.cnf &
[1] 1383
[root@server4 ~]# nohup: ignoring input and appending output to `nohup.out'
测试
将master机的mysql down掉后
1598 pts/0 S 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql
1843 pts/0 Sl 0:01 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/my
1905 pts/0 R+ 0:00 ps ax
[root@server1 ~]# kill -9 1598
[root@server1 ~]# kill -9 1843
manager机会自动生成日志等文件
[root@server4 masterha]# cat mha.log
Selected 172.25.8.2(172.25.8.2:3306) as a new master.
172.25.14.2(172.25.8.2:3306): OK: Applying all logs succeeded.
172.25.14.3(172.25.8.3:3306): OK: Slave started, replicating from 172.25.14.2(172.25.8.2:3306)
172.25.14.2(172.25.8.2:3306): Resetting slave info succeeded.
Master failover to 172.25.8.2(172.25.8.2:3306) completed successfully.
日志文件中提示master已经由2接管
此时我们在server2和server3分别查看
server2
mysql> show master status;
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| mysql-bin.000001 | 444 | | | 0d809020-9b85-11e8-ba58-525400917839:1-3,
ddc02999-9b85-11e8-8220-5254000cc710:1 |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
server3
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.8.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 444
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 657
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
将server的mysql开启,并将他手动设置为slave,指向新的master
[root@server1 ~]# /etc/init.d/mysqld start
Starting mysqld: [ OK ]
[root@server1 ~]# mysql -p
mysql> show slave status;
Empty set (0.00 sec)
mysql> change master to master_host='172.25.8.2',master_user='repl',master_password='zmy_0808',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.71 sec)
mysql> start slave;
Query OK, 0 rows affected (0.02 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.8.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 444
Relay_Log_File: server1-relay-bin.000002
Relay_Log_Pos: 657
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
四.master的在线切换
将manager机上配置文件中的备用master指向注释掉
[server default]
manager_log=/etc/masterha/mha.log
manager_workdir=/etc/masterha/
master_binlog_dir=/var/lib/mysql
#master_ip_online_change_script=/etc/masterha/master_ip_online_change
password=zmy_0808
ping_interval=1
remote_workdir=/tmp
repl_password=zmy_0808
repl_user=repl
ssh_user=root
user=root
[server1]
hostname=172.25.8.1
port=3306
[server2]
#candidate_master=1
#check_repl_delay=0
hostname=172.25.8.2
port=3306
[server3]
hostname=172.25.8.3
port=3306
手动将master从server2切换到server1
[root@server4 masterha]# masterha_master_switch --conf=/etc/masterha/app.cnf --master_state=alive --new_master_host=172.25.8.1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
分别的server1,server2和server3查看
server1
mysql> show master status;
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| mysql-bin.000008 | 194 | | | 0d809020-9b85-11e8-ba58-525400917839:1-3,
ddc02999-9b85-11e8-8220-5254000cc710:1 |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
server2
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.18.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000008
Read_Master_Log_Pos: 194
Relay_Log_File: server2-relay-bin.000002
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000008
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
serevr3
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.8.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000008
Read_Master_Log_Pos: 194
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000008
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
手动切换操作并不会被记录在mha的日志文件中