MHA是一款开源的mysql的高可用程序,它为mysql主从复制架构提供了automating master failover功能。MHA在监控到master节点故障时,会提升其中拥有最新数据的slave节点成为新的master节点。在此期间,MHA会通过于其他节点获取额外信息来避免一致性方面的问题。MHA还提供了master节点的在线切换功能,能够在30秒内实现故障切换,并在故障切换中,最大可能的保证数据一致性。
MHA服务有两种角色,MHA Manager管理节点和MHA Node数据节点。
MHA Manager:通常单独部署在一台独立机器上管理多个master/slave集群,每个master/slave集群称为一个application,用来管理统筹整个集群。
MHA Node:运行在每台mysql服务器上,它是通过监控具备解析和清理logs功能的脚本来加快故障转移;主要是接收管理节点所发出指令的代理,代理需要运行在每一个mysql节点上。
简单的来说,manager监控集群组上的每个节点,并且自动识别集群组中的master,当master宕机时,manager将原master的二进制日志保存下来,检查集群组中具有最新更新数据的服务器,将其提升为新的主节点,将原master的二进制日志更新到新的节点上,并且将其他节点指向新的master。
总结一下MHA工作原理:
1、从宕机崩溃的master保存二进制日志事件
2、识别含有最新更新的slave
3、应用差异的中继日志到其他slave
4、应用从master保存的二进制日志事件
5、提升一个slave为新的master
6、使用其他的slave连接新的master进行复制数据。
【注意】因为manager要监控集群上的所有节点,且集群内的各机器要实现数据复制,所以它们之间应该实现免密钥登录。
【实验】实现mysql的高架构MHA
实验环境:
四台主机:
manager:192.168.216.15
master:192.168.216.13
slave1:192.168.216.17
slave2:192.168.216.16
1、在四台主机上安装node包,可以选择源码编译或者yum安装,在这里使用的是yum安装。
manager主机要安装mysql-manager包,用来实现监控。
manager:
[root@centos7 ~]# yum install mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm
master及slave:
[root@centos7 ~]# yum install mha4mysql-node-0.56-0.el6.noarch.rpm
2、首先实现一主二从的mysql主从复制
master配置:
[root@centos7 ~]# vim /etc/my.cnf
[mysqld]
server_id=1 #配置server_id,让主服务器有唯一ID号
log_bin=master-bin #启动二进制日志
relay_log=relay_log #中继日志
skip_name_resolve=on #跳过名字解析
[root@centos7 ~]# systemctl start mariadb
[root@centos7 ~]# mysql -uroot
MariaDB [(none)]> select user,host,password from mysql.user;
+-------+----------------+-------------------------------------------+
| user | host | password |
+-------+----------------+-------------------------------------------+
| root | localhost | |
| root | 127.0.0.1 | |
| root | ::1 | |
+-------+----------------+-------------------------------------------+
3 rows in set (0.00 sec)
MariaDB [(none)]> grant replication slave,replication client on *.* to slave@'%' identified by 'centos';
#授权用户,使得从服务器可以进行数据复制
MariaDB [(none)]> grant all on *.* to admin@'%' identified by 'centos';
#这个授权用户是对manager设置的超级用户,这个用户在两台slave上也需要设置,可以实现主从以后在master上设置,slave会自动进行复制;
MariaDB [(none)]> show master status;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000003 | 245 | | |
+-------------------+----------+--------------+------------------+
#查看master的Position,slave需要对应其Position值;
1 row in set (0.00 sec)
MariaDB [(none)]> show slave hosts;
Empty set (0.00 sec)
#在未进行slave设置时,master上的slave host为空;
3、两台slave节点设置:
[root@centos7 ~]# vim /etc/my.cnf
[mysqld]
server_id=2 #复制集群中的各节点的id均必须唯一;slave2的ID为3
log_bin=slave1-bin
relay_log=slave-relay-log
read_only=on #作为slave mysql,需要只读权限
skip_name_resolve=on
relay_log_purge=0 #是否自动清空不再需要中继日志,0表示off
[root@centos7 ~]# systemctl start mariadb
[root@centos7 ~]# mysql -uroot
MariaDB [(none)]> change master to master_host='192.168.216.13',master_user='slave',master_password='centos',master_log_file='master-bin.000003',master_log_pos=245;
Query OK, 0 rows affected (0.01 sec)
#让slave连接master,并开始重做master二进制日志中的事件,Position与master对应;
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
#开启复制线程;
MariaDB [(none)]> show slave status\G; #查看从服务器状态
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.216.13
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000003
Read_Master_Log_Pos: 245
Relay_Log_File: slave-relay-log.000002
Relay_Log_Pos: 530
Relay_Master_Log_File: master-bin.000003
Slave_IO_Running: Yes #IO线程正常运行
Slave_SQL_Running: Yes #SQL线程正常运行
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 245
Relay_Log_Space: 824
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
ERROR: No query specified
4、这时查看master主机,可以清楚看到两台slave;
[root@centos7 ~]# mysql -uroot
MariaDB [(none)]> show slave hosts;
+-----------+------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+------+------+-----------+
| 3 | | 3306 | 1 |
| 2 | | 3306 | 1 |
+-----------+------+------+-----------+
2 rows in set (0.00 sec)
#查看server_id
另外,假设当master宕机,新master上任时,新master也需要授权用户来让其他的slave对自己的数据进行复制,所以在slave上我们也要提前授权用户:
MariaDB [(none)]> grant replication slave,replication client on *.* to slave@'%' identified by 'centos';
5、接下来进行配置manager。
由于mha4mysql-manager-0.56-0.el6.noarch.rpm这个包安装以后并没有自定义的配置文件,所以配置文件需要手动来写:
[root@centos7 ~]#
vim /etc/mha_master/app1.cnf
[server default]
user=admin #manager的超级用户,在节点数据库中有体现
password=centos #密码
manager_workdir=/etc/mha_master/app1 #manager工作目录
manager_log=/etc/mha_master/manager.log #manager日志文件
remote_workdir=/mydata/mha_master/app1 #远程主机工作目录
ssh_user=root #ssh连接用户
repl_user=slave #slave授权用户
repl_password=centos
ping_interval=1 #ping间隔时长
[server1]
hostname=192.168.216.17 #节点1的地址
ssh_port=22 #节点1的ssh端口
candidate_master=1 #将来可不可以成为master候选节点/主节点
[server2]
hostname=192.168.216.16
ssh_port=22
candidate_master=1
[server3]
hostname=192.168.216.13
ssh_port=22
candidate_master=1
6、前面说了,实现MHA搭建首先要确认节点之间无秘钥登录,因此需要实现ssh无秘钥登录。这里以一台机器做示例,其他同此。
[root@centos7 ~]# ssh-keygen
[root@centos7 ~]# ssh-copy-id -i ~/.ssh/id_rsa root@192.168.216.13 #分别发送到另外几台主机,不做一一演示
7、检查manager主机环境并启动。
[root@web-server1 mha_master]# masterha_check_ssh -conf=/etc/mha_master/app1.cnf
#检查ssh环境是否成功
Wed Nov 22 14:27:16 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Nov 22 14:27:16 2017 - [info] Reading application default configuration from /etc/mha_master/app1.cnf..
Wed Nov 22 14:27:16 2017 - [info] Reading server configuration from /etc/mha_master/app1.cnf..
Wed Nov 22 14:27:16 2017 - [info] Starting SSH connection tests..
Wed Nov 22 14:27:17 2017 - [debug]
Wed Nov 22 14:27:16 2017 - [debug] Connecting via SSH from root@192.168.216.17(192.168.216.17:22) to root@192.168.216.16(192.168.216.16:22)..
Wed Nov 22 14:27:17 2017 - [debug] ok.
Wed Nov 22 14:27:17 2017 - [debug] Connecting via SSH from root@192.168.216.17(192.168.216.17:22) to root@192.168.216.13(192.168.216.13:22)..
Wed Nov 22 14:27:17 2017 - [debug] ok.
Wed Nov 22 14:27:18 2017 - [debug]
Wed Nov 22 14:27:17 2017 - [debug] Connecting via SSH from root@192.168.216.16(192.168.216.16:22) to root@192.168.216.17(192.168.216.17:22)..
Wed Nov 22 14:27:17 2017 - [debug] ok.
Wed Nov 22 14:27:17 2017 - [debug] Connecting via SSH from root@192.168.216.16(192.168.216.16:22) to root@192.168.216.13(192.168.216.13:22)..
Wed Nov 22 14:27:18 2017 - [debug] ok.
Wed Nov 22 14:27:18 2017 - [debug]
Wed Nov 22 14:27:17 2017 - [debug] Connecting via SSH from root@192.168.216.13(192.168.216.13:22) to root@192.168.216.17(192.168.216.17:22)..
Wed Nov 22 14:27:18 2017 - [debug] ok.
Wed Nov 22 14:27:18 2017 - [debug] Connecting via SSH from root@192.168.216.13(192.168.216.13:22) to root@192.168.216.16(192.168.216.16:22)..
Wed Nov 22 14:27:18 2017 - [debug] ok.
Wed Nov 22 14:27:18 2017 - [info] All SSH connection tests passed successfully.
#successfully表示成功
[root@web-server1 mha_master]# masterha_check_repl -conf=/etc/mha_master/app1.cnf
#检查主从复制环境是否成功
Wed Nov 22 14:27:40 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Nov 22 14:27:40 2017 - [info] Reading application default configuration from /etc/mha_master/app1.cnf..
Wed Nov 22 14:27:40 2017 - [info] Reading server configuration from /etc/mha_master/app1.cnf..
Wed Nov 22 14:27:40 2017 - [info] MHA::MasterMonitor version 0.56.
Wed Nov 22 14:27:40 2017 - [info] GTID failover mode = 0
Wed Nov 22 14:27:40 2017 - [info] Dead Servers:
Wed Nov 22 14:27:40 2017 - [info] Alive Servers:
Wed Nov 22 14:27:40 2017 - [info] 192.168.216.17(192.168.216.17:3306)
Wed Nov 22 14:27:40 2017 - [info] 192.168.216.16(192.168.216.16:3306)
Wed Nov 22 14:27:40 2017 - [info] 192.168.216.13(192.168.216.13:3306)
Wed Nov 22 14:27:40 2017 - [info] Alive Slaves:
Wed Nov 22 14:27:40 2017 - [info] 192.168.216.17(192.168.216.17:3306) Version=5.5.52-MariaDB (oldest major version between slaves) log-bin:enabled
Wed Nov 22 14:27:40 2017 - [info] Replicating from 192.168.216.13(192.168.216.13:3306)
Wed Nov 22 14:27:40 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Nov 22 14:27:40 2017 - [info] 192.168.216.16(192.168.216.16:3306) Version=5.5.52-MariaDB (oldest major version between slaves) log-bin:enabled
Wed Nov 22 14:27:40 2017 - [info] Replicating from 192.168.216.13(192.168.216.13:3306)
Wed Nov 22 14:27:40 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Nov 22 14:27:40 2017 - [info] Current Alive Master: 192.168.216.13(192.168.216.13:3306)
Wed Nov 22 14:27:40 2017 - [info] Checking slave configurations..
Wed Nov 22 14:27:40 2017 - [warning] relay_log_purge=0 is not set on slave 192.168.216.17(192.168.216.17:3306).
Wed Nov 22 14:27:40 2017 - [warning] relay_log_purge=0 is not set on slave 192.168.216.16(192.168.216.16:3306).
Wed Nov 22 14:27:40 2017 - [info] Checking replication filtering settings..
Wed Nov 22 14:27:40 2017 - [info] binlog_do_db= , binlog_ignore_db=
Wed Nov 22 14:27:40 2017 - [info] Replication filtering check ok.
Wed Nov 22 14:27:40 2017 - [info] GTID (with auto-pos) is not supported
Wed Nov 22 14:27:40 2017 - [info] Starting SSH connection tests..
Wed Nov 22 14:27:42 2017 - [info] All SSH connection tests passed successfully.
Wed Nov 22 14:27:42 2017 - [info] Checking MHA Node version..
Wed Nov 22 14:27:42 2017 - [info] Version check ok.
Wed Nov 22 14:27:42 2017 - [info] Checking SSH publickey authentication settings on the current master..
Wed Nov 22 14:27:43 2017 - [info] HealthCheck: SSH to 192.168.216.13 is reachable.
Wed Nov 22 14:27:43 2017 - [info] Master MHA Node version is 0.56.
Wed Nov 22 14:27:43 2017 - [info] Checking recovery script configurations on 192.168.216.13(192.168.216.13:3306)..
Wed Nov 22 14:27:43 2017 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/mydata/mha_master/app1/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000003
Wed Nov 22 14:27:43 2017 - [info] Connecting to root@192.168.216.13(192.168.216.13:22)..
Creating /mydata/mha_master/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to master-bin.000003
Wed Nov 22 14:27:43 2017 - [info] Binlog setting check done.
Wed Nov 22 14:27:43 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Nov 22 14:27:43 2017 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='admin' --slave_host=192.168.216.17 --slave_ip=192.168.216.17 --slave_port=3306 --workdir=/mydata/mha_master/app1 --target_version=5.5.52-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Wed Nov 22 14:27:43 2017 - [info] Connecting to root@192.168.216.17(192.168.216.17:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to slave-relay-log.000002
Temporary relay log file is /var/lib/mysql/slave-relay-log.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Nov 22 14:27:44 2017 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='admin' --slave_host=192.168.216.16 --slave_ip=192.168.216.16 --slave_port=3306 --workdir=/mydata/mha_master/app1 --target_version=5.5.52-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Wed Nov 22 14:27:44 2017 - [info] Connecting to root@192.168.216.16(192.168.216.16:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to slave-relay-log.000002
Temporary relay log file is /var/lib/mysql/slave-relay-log.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Nov 22 14:27:44 2017 - [info] Slaves settings check done.
Wed Nov 22 14:27:44 2017 - [info]
192.168.216.13(192.168.216.13:3306) (current master)
+--192.168.216.17(192.168.216.17:3306)
+--192.168.216.16(192.168.216.16:3306)
Wed Nov 22 14:27:44 2017 - [info] Checking replication health on 192.168.216.17..
Wed Nov 22 14:27:44 2017 - [info] ok.
Wed Nov 22 14:27:44 2017 - [info] Checking replication health on 192.168.216.16..
Wed Nov 22 14:27:44 2017 - [info] ok.
Wed Nov 22 14:27:44 2017 - [warning] master_ip_failover_script is not defined.
Wed Nov 22 14:27:44 2017 - [warning] shutdown_script is not defined.
Wed Nov 22 14:27:44 2017 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
#ok成功
[root@web-server1 mha_master]# nohup masterha_manager -conf=/etc/mha_master/app1.cnf &> /etc/mha_master/manager.log &
#启动manager并放到后台执行
[1] 4330
[root@web-server1 mha_master]# jobs
[1]+ Running nohup masterha_manager -conf=/etc/mha_master/app1.cnf &>/etc/mha_master/manager.log &
[root@web-server1 mha_master]# masterha_check_status -conf=/etc/mha_master/app1.cnf
#检查manager健康状态
app1 (pid:4330) is running(0:PING_OK), master:192.168.216.13
查看目录:
到此,mysql高架构的MHA已经搭建完成,接下来进行测试:
8、将master节点的mysql服务关闭,模拟服务宕机;
[root@centos7 ~]# systemctl stop mariadb
这时查看manager上日志记录:
[root@centos7 mha_master]# ls
app1 app1.cnf manager.log
[root@centos7 mha_master]# pwd
/etc/mha_master
[root@centos7 mha_master]# ls
app1 app1.cnf manager.log
[root@centos7 mha_master]# vim manager.log
日志文件已经清楚的告诉我们,原master已经宕机,192.168.216.17已经成为了新的master,那么接下来验证一下:
[root@centos7 ~]# mysql -uroot
MariaDB [(none)]> show slave hosts;
+-----------+------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+------+------+-----------+
| 3 | | 3306 | 2 |
+-----------+------+------+-----------+
1 row in set (0.00 sec)
MariaDB [(none)]> show master status;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| slave1-bin.000001 | 245 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
数据库显示slave1已经成为了新的master,并且slave2已经作为它的slave指向了slave1,不信可以查看一下slave2的数据库:
另外,master目录下也有记录可以显示:
这里需要注意的是,原master宕机后,即使后来重新启动服务,它并不会自动成为新master的slave,如果想要成为新master的slave,手动来指定即可:
[root@centos7 ~]# mysql -uroot
MariaDB [(none)]> change master to master_host='192.168.216.17',master_user='slave',master_password='centos',master_log_file='slave1-bin.000001',master_log_pos=245;
Query OK, 0 rows affected (0.02 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.01 sec)
指定完成后查看新master(slave1)的slave hosts,看设置是否成功:
[root@centos7 ~]# mysql -uroot
MariaDB [(none)]> show slave hosts;
+-----------+------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+------+------+-----------+
| 1 | | 3306 | 2 |
| 3 | | 3306 | 2 |
+-----------+------+------+-----------+
2 rows in set (0.00 sec)
新的mysql主从复制架构已经实现~~~