MHA的介绍
简介:
MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。另外对于想快速搭建的可以参考:MHA快速搭建
我们自己使用其实也可以使用1主1从,但是master主机宕机后无法切换,以及无法补全binlog。master的mysqld进程crash后,还是可以切换成功,以及补全binlog的。
官方介绍:https://code.google.com/p/mysql-master-ha/
原理:
(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log)到其他的slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新的master;
(6)使其他的slave连接新的master进行复制;
环境准备
1.准备三台服务器
2.修改主机名(两种方式)
方便区分每台服务器
方式一
hostnamectl set-hostname 要修改的名字 //然后重新连接xshell就可以了
方式二
vim /etc/hostname //编辑配置文件,里面填写为修改后的名字,这个有一个注意的地方就是要重启服务器
3.三台机器都做相同的操作
1.时间同步
echo "*/5 * * * * /usr/sbin/ntpade ntp1.aliyun.com >/del/null 2>&1" >>/var/spool/cron/root
2.hosts解析
vim /etc/hosts ##加入以下内容
chen1 10.0.0.41
chen2 10.0.0.42
chen3 10.0.0.43
3.关闭防火墙和SELinux
systemctl stop firewalld.service
systemctl disable firewalld.service
setenforce 0
4.互相配置ssh免密登录 我这里是用脚本实现的可以参考
https://blog.csdn.net/weixin_46164213/article/details/103901670
5.安装MySQL5.6以上的版本
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum -y install mysql-server mysql 安装
systemctl start mysqld 开启服务
6.修改MySQL的密码
mysql> update mysql.user set password=password('123456') where user='root' and host='localhost';
mysql> flush privileges;
7.修改主服务器的数据库的配置文件 添加以下内容在【mysqld】下面
vim /etc/my.cfn
server-id=1
log-bin=mysql-bin
#禁止MySQL自动删除relaylog功能
relay_log_purge = 0
#mysql5.6以上特性,开启gtid,必须主从全开
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates = 1
修改完配置文件之后重启数据库服务
systemctl restart mysql
8.创建同步用户
mysql -uroot -p123456
mysql> grant replication slave on *.* to 'rep'@'10.0.0.%' identified by '123456';
mysql> flush privileges;
查看MySQL主库的master状态
mysql> show master status\G;
*************************** 1. row ***************************
File: mysql-bin.000002
Position: 530
查看GTID状态
mysql> show global variables like '%gtid%';
+---------------------------------+------------------------------------------+
| Variable_name | Value |
+---------------------------------+------------------------------------------+
| binlog_gtid_simple_recovery | OFF |
| enforce_gtid_consistency | ON |
| gtid_executed | 15ff904c-53aa-11ea-89c0-000c29abe09d:1-2 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
| simplified_binlog_gtid_recovery | OFF |
+---------------------------------+------------------------------------------+
9.开始在从服务器上的操作 10.0.0.42服务器
编辑服务器的配置文件
vim /etc/my.cnf
server-id=2
log-bin=mysql-bin
relay_log_purge = 0
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates = 1
重启MySQL的服务
systemctl restart mysql
创建同步的用户
mysql -uroot -p123456
mysql> grant replication slave on *.* to 'rep'@'10.0.0.%' identified by '123456';
mysql> flush privileges;
关闭从服务器的复制功能
mysql> stop slave;
配置从服务器指向master
change master to
master_host='10.0.0.42',
master_user='rep',
master_password='123456',
master_log_file='mysql-bin.000003',
master_log_pos=231;
开启从服务器的复制功能
mysql> start slave;
查看从服务器slave的状态
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.41
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 530
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 314
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
注意 :查看状态的时候,在\G后面不要加’;‘,会有如下的一个报错提示
No query specified
第二台从的服务器 10.0.0.43和10.0.0.42服务器的操作基本相同
唯一不同的地方就是在修改数据库的配置文件的时候是不同的
10.0.0.43的配置文件如下
vim /etc/my.cnf
server-id=3 #这里就是不同的地方
log-bin=mysql-bin
relay_log_purge = 0
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates = 1
安装MHA
所有节点都操作
1.安装依赖
yum -y install perl-DBD-MySQL
yum -y install perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
2.授予权限
mysql> grant all privileges on *.* to mha@'10.0.0.%' identified by 'mha';
mysql> flush privileges;
3.安装MHA node节点
上传 mha4mysql-node-0.58-0.el7.centos.noarch.rpm
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
4.安装MHA管理节点
安装MHA管理端,选择的是第三台服务器 也就是10.0.0.43服务器(永远不会切换为主库的节点)
注意:MHA管理节点不要安装到MySQL主库和切换的从库上,否则在后面会出现VIP无法漂移的情况
上传mha4mysql-manager-0.58-0.el7.centos.noarch.rpm包
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
5.配置MHA
[root@ chen3 ~]# mkdir -p /etc/mha
[root@ chen3 ~]# mkdir -p /var/log/mha/app1
[root@ chen3 ~]# vim /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1
master_binlog_dir=/var/lib/mysql #binlog的目录,如果说miysql的环境不一样,binlog位置不同,每台服务器的binlog的位置写在server标签里面即可
user=mha
password=mha
ping_interval=2
repl_password=123456
repl_user=rep
ssh_user=root
[server1]
hostname=10.0.0.41
port=3306
[server2]
hostname=10.0.0.42
port=3306
[server3]
hostname=10.0.0.43
port=3306
ignore_fail=1 #如果这个节点挂了,mha将不可用,加上这个参数,slave挂了一样可以用
no_master=1 #从不将这台主机转换为master
#candidate_master=1 #如果候选master有延迟的话,relay日志超过100m,failover切换不能成功,加上此参数后会忽略延迟日志大小。
#check_repl_delay=0 #用防止master故障时,切换时slave有延迟,卡在那里切不过来
6.启动测试
ssh检查检测
[root@ chen3 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
我在检测的时候出现了以下问题
[root@ chen3 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Feb 21 12:28:52 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 21 12:28:52 2020 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Feb 21 12:28:52 2020 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Feb 21 12:28:52 2020 - [info] Starting SSH connection tests..
Fri Feb 21 12:28:54 2020 - [debug]
Fri Feb 21 12:28:52 2020 - [debug] Connecting via SSH from root@10.0.0.41(10.0.0.41:22) to root@10.0.0.42(10.0.0.42:22)..
Fri Feb 21 12:28:53 2020 - [debug] ok.
Fri Feb 21 12:28:53 2020 - [debug] Connecting via SSH from root@10.0.0.41(10.0.0.41:22) to root@10.0.0.43(10.0.0.43:22)..
Fri Feb 21 12:28:54 2020 - [debug] ok.
Fri Feb 21 12:28:55 2020 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Fri Feb 21 12:28:53 2020 - [debug] Connecting via SSH from root@10.0.0.43(10.0.0.43:22) to root@10.0.0.41(10.0.0.41:22)..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Fri Feb 21 12:28:54 2020 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@10.0.0.43(10.0.0.43:22) to root@10.0.0.41(10.0.0.41:22) failed!
Fri Feb 21 12:29:14 2020 - [debug]
Fri Feb 21 12:28:53 2020 - [debug] Connecting via SSH from root@10.0.0.42(10.0.0.42:22) to root@10.0.0.41(10.0.0.41:22)..
Fri Feb 21 12:29:03 2020 - [debug] ok.
Fri Feb 21 12:29:03 2020 - [debug] Connecting via SSH from root@10.0.0.42(10.0.0.42:22) to root@10.0.0.43(10.0.0.43:22)..
Fri Feb 21 12:29:14 2020 - [debug] ok.
SSH Configuration Check Failed!
at /usr/bin/masterha_check_ssh line 44.
可能是免密登录没有做好,再重新做一次免密登录
主从复制检查
注意这个是不正确的,会看到最下面有is not ok
[root@ chen3 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Fri Feb 21 14:29:34 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 21 14:29:34 2020 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Feb 21 14:29:34 2020 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Feb 21 14:29:34 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Feb 21 14:29:43 2020 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln193] There is no alive slave. We can't do failover
Fri Feb 21 14:29:43 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Fri Feb 21 14:29:43 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Fri Feb 21 14:29:43 2020 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
解决办法:
在每一台的数据库的配置文件中加入
然后重启数据库
skip-name-resolve
再一次检查主从状态 这个是成功的,可以看到最下面有is ok的字样
[root@ chen3 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Fri Feb 21 14:31:18 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 21 14:31:18 2020 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Feb 21 14:31:18 2020 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Feb 21 14:31:18 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Feb 21 14:31:19 2020 - [info] GTID failover mode = 1
Fri Feb 21 14:31:19 2020 - [info] Dead Servers:
Fri Feb 21 14:31:19 2020 - [info] Alive Servers:
Fri Feb 21 14:31:19 2020 - [info] 10.0.0.41(10.0.0.41:3306)
Fri Feb 21 14:31:19 2020 - [info] 10.0.0.42(10.0.0.42:3306)
Fri Feb 21 14:31:19 2020 - [info] 10.0.0.43(10.0.0.43:3306)
Fri Feb 21 14:31:19 2020 - [info] Alive Slaves:
Fri Feb 21 14:31:19 2020 - [info] 10.0.0.42(10.0.0.42:3306) Version=5.6.47-log (oldest major version between slaves) log-bin:enabled
Fri Feb 21 14:31:19 2020 - [info] GTID ON
Fri Feb 21 14:31:19 2020 - [info] Replicating from 10.0.0.41(10.0.0.41:3306)
Fri Feb 21 14:31:19 2020 - [info] 10.0.0.43(10.0.0.43:3306) Version=5.6.47-log (oldest major version between slaves) log-bin:enabled
Fri Feb 21 14:31:19 2020 - [info] GTID ON
Fri Feb 21 14:31:19 2020 - [info] Replicating from 10.0.0.41(10.0.0.41:3306)
Fri Feb 21 14:31:19 2020 - [info] Not candidate for the new Master (no_master is set)
Fri Feb 21 14:31:19 2020 - [info] Current Alive Master: 10.0.0.41(10.0.0.41:3306)
Fri Feb 21 14:31:19 2020 - [info] Checking slave configurations..
Fri Feb 21 14:31:19 2020 - [info] read_only=1 is not set on slave 10.0.0.42(10.0.0.42:3306).
Fri Feb 21 14:31:19 2020 - [info] read_only=1 is not set on slave 10.0.0.43(10.0.0.43:3306).
Fri Feb 21 14:31:19 2020 - [info] Checking replication filtering settings..
Fri Feb 21 14:31:19 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Feb 21 14:31:19 2020 - [info] Replication filtering check ok.
Fri Feb 21 14:31:19 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Feb 21 14:31:19 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 21 14:31:19 2020 - [info] HealthCheck: SSH to 10.0.0.41 is reachable.
Fri Feb 21 14:31:19 2020 - [info]
10.0.0.41(10.0.0.41:3306) (current master)
+--10.0.0.42(10.0.0.42:3306)
+--10.0.0.43(10.0.0.43:3306)
Fri Feb 21 14:31:19 2020 - [info] Checking replication health on 10.0.0.42..
Fri Feb 21 14:31:19 2020 - [info] ok.
Fri Feb 21 14:31:19 2020 - [info] Checking replication health on 10.0.0.43..
Fri Feb 21 14:31:19 2020 - [info] ok.
Fri Feb 21 14:31:19 2020 - [warning] master_ip_failover_script is not defined.
Fri Feb 21 14:31:19 2020 - [warning] shutdown_script is not defined.
Fri Feb 21 14:31:19 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
7.启动MHA
[root@ chen3 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
查看MHA的状态
[root@ chen3 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:8889) is running(0:PING_OK), master:10.0.0.41
关闭MHA
[root@ chen3 ~] # ma sterha_stop --conf=/etc/mha/app1.cnf
从库重新加入新主
[root@ chen3 ~] # grep -i "CHANGE MASTER TO MASTER" /var/log/mha/app1/manager.log | tail -1
测试MHA故障转移
停掉一开始设置的主库的服务 10.0.0.41
[root@ chen1 ~]# systemctl stop mysql
这时查看10.0.0.43的状态,发现Master_Host变成了10.0.0.42
[root@ chen3 ~]# mysql -u root -p123456 -e "show slave status \G"
Warning: Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.42
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 231
Relay_Log_File: mysqld-relay-bin.000005
Relay_Log_Pos: 401
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 231
Relay_Log_Space: 2800
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: eac6965d-53aa-11ea-89c6-000c29fa12e3
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: eac6965d-53aa-11ea-89c6-000c29fa12e3:1-6
Executed_Gtid_Set: 15ff904c-53aa-11ea-89c0-000c29abe09d:3-4,
1fb1498e-53aa-11ea-89c1-000c29dda5bf:1-4,
eac6965d-53aa-11ea-89c6-000c29fa12e3:1-6
Auto_Position: 1
查看10.0.0.42的master的状态
mysql> show master status;
+------------------+----------+--------------+------------------+------------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------------------------------------------------+
| mysql-bin.000002 | 231 | | | 15ff904c-53aa-11ea-89c0-000c29abe09d:3-4,
eac6965d-53aa-11ea-89c6-000c29fa12e3:1-6 |
+------------------+----------+--------------+------------------+------------------------------------------------------------------------------------+
查看10.0.0.43的配置文件发现少了server1的内容
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1
master_binlog_dir=/var/lib/mysql
password=mha
ping_interval=2
repl_password=123456
repl_user=rep
ssh_user=root
user=mha
[server2]
hostname=10.0.0.42
port=3306
[server3]
hostname=10.0.0.43
ignore_fail=1
no_master=1
port=3306
MHA故障还原
[root@ chen3 ~]# grep "CHANGE MASTER TO MASTER" /var/log/mha/app1/manager.log | tail -1
Fri Feb 21 14:46:41 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';
重启刚刚宕机的数据库数据库
[root@ chen1 ~]# systemctl restart mysql
[root@ chen1 ~]# mysql -uroot -p123456 -e "CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='123456';"
开启slave的服务
[root@ chen1 ~]# mysql -uroot -p123456 -e "start slave;"
查看slave的状态
[root@ chen1 ~]# mysql -uroot -p123456 -e "show slave status\G"
Warning: Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.42 这里发现指向新的master 也就是10.0.0.43
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 231
Relay_Log_File: mysqld-relay-bin.000003
Relay_Log_Pos: 401
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 231
Relay_Log_Space: 2202
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: eac6965d-53aa-11ea-89c6-000c29fa12e3
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: eac6965d-53aa-11ea-89c6-000c29fa12e3:1-6
Executed_Gtid_Set: 15ff904c-53aa-11ea-89c0-000c29abe09d:1-4,
eac6965d-53aa-11ea-89c6-000c29fa12e3:1-6
Auto_Position: 1