环境概述
此文将搭建一个两节点的fabric HA集群
环境如下:
主机: centos6.5
MySQL: 5.7.18 多实例
角色 | IP | port | 备注 |
---|---|---|---|
Fabric | localhost | 3306 | 管理节点 |
node1 | localhost | 3307 | HA成员 |
node1 | localhost | 3308 | HA成员 |
安装Fabric
Fabric现在已经合并到 utilities 中了,但是,下载注意官方的提示:
MySQL Fabric is included in MySQL Utilities versions prior to 1.6.2.
1.6.2之前它还在 utilities 里,但是目前最新的GA版本号是1.6.5,并不在里面,我们需要下上一个GA。链接在此:
https://cdn.mysql.com//Downloads/MySQLGUITools/mysql-utilities-1.5.6-1.el6.noarch.rpm
rpm 一下就可以用了。
MySQL配置
重点设置下面四个参数
log_bin
gtid-mode=ON
enforce-gtid-consistency
log_slave_updates
Fabric配置
创建账号
每个节点都要创建
create user 'fabric'@'%' identified by '123';
grant all on *.* to 'fabric'@'%' ;
flush privileges;
修改MySQL Fabric 配置文件
vim /etc/mysql/fabric.cfg
[storage]
address = 192.168.1.100:3306
user = fabric
password = pass
database = fabric
auth_plugin = mysql_native_password
connection_timeout = 6
connection_attempts = 6
connection_delay = 1
[servers]
user = fabric
password = pass
unreachable_timeout = 5
HA初始化
[root@localhost ~]# mysqlfabric manage setup
[INFO] 1493595918.431773 - MainThread - Initializing persister: user (fabric), server (192.168.1.201:3306), database (fabric).
Finishing initial setup
=======================
Password for admin user is not yet set.
Password for admin/xmlrpc:
Repeat Password:
Password set.
Password set.
创建HA集群
创建group
[root@localhost ~]# mysqlfabric group create mysql_ha #创建一个名为mysql_ha的集群
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
uuid finished success result
------------------------------------ -------- ------- ------
1f8d4b07-6d84-48be-9178-da695397af3c 1 1 1
state success when description
----- ------- ------------- -------------------------------------------------------------
3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad310>.
4 2 1.4936e+09 Executing action (_create_group).
5 2 1.4936e+09 Executed action (_create_group).
添加成员到group
[root@localhost ~]# mysqlfabric group add mysql_ha 192.168.1.201:3307
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
uuid finished success result
------------------------------------ -------- ------- ------
87ea2237-9c81-4cdb-82d4-e88c2028046f 1 1 1
state success when description
----- ------- ------------- -------------------------------------------------------------
3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad6d0>.
4 2 1.4936e+09 Executing action (_add_server).
5 2 1.4936e+09 Executed action (_add_server).
[root@localhost ~]# mysqlfabric group add mysql_ha 192.168.1.201:3308
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
uuid finished success result
------------------------------------ -------- ------- ------
91bff57f-bc1d-4749-89a1-95183f932900 1 1 1
state success when description
----- ------- ------------- -------------------------------------------------------------
3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x10ad6d0>.
4 2 1.4936e+09 Executing action (_add_server).
5 2 1.4936e+09 Executed action (_add_server).
提升一个主
有两种方式提升一个成员为主,一是让fabric自己选择,而是手动指定。
- fabric自选主
[root@localhost ~]# mysqlfabric group promote mysql_ha
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
uuid finished success result
------------------------------------ -------- ------- ------
ae00bc31-e27f-42d9-a656-bad4c0cd2921 1 1 1
state success when description
----- ------- ------------- -------------------------------------------------------------
3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0xec4690>.
4 2 1.4936e+09 Executing action (_define_ha_operation).
5 2 1.4936e+09 Executed action (_define_ha_operation).
3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ae90>.
4 2 1.4936e+09 Executing action (_find_candidate_fail).
5 2 1.4936e+09 Executed action (_find_candidate_fail).
3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ac10>.
4 2 1.4936e+09 Executing action (_check_candidate_fail).
5 2 1.4936e+09 Executed action (_check_candidate_fail).
3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x101ab50>.
4 2 1.4936e+09 Executing action (_wait_slave_fail).
5 2 1.4936e+09 Executed action (_wait_slave_fail).
3 2 1.4936e+09 Triggered by <mysql.fabric.events.Event object at 0x1028110>.
4 2 1.4936e+09 Executing action (_change_to_candidate).
5 2 1.4936e+09 Executed action (_change_to_candidate).
- 手动指定主
mysqlfabric group promote <group_name> --slave_id='<node_uuid>'
HA健康检查
两种方式,一种是UUID显示方式:
[root@localhost data3307]# mysqlfabric group health mysql_ha
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
uuid is_alive status is_not_running is_not_configured io_not_running sql_not_running io_error sql_error
------------------------------------ -------- --------- -------------- ----------------- -------------- --------------- -------- ---------
f0c7d668-2a7c-11e7-b3c3-000c29217b03 1 PRIMARY 0 0 0 0 False False
fe6d0231-2a7c-11e7-b65e-000c29217b03 1 SECONDARY 0 0 0 0 False False
另一种是IP显示方式:
[root@localhost data3307]# mysqlfabric group lookup_servers mysql_ha
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
server_uuid address status mode weight
------------------------------------ ------------------ --------- ---------- ------
f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307 PRIMARY READ_WRITE 1.0
fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308 SECONDARY READ_ONLY 1.0
自动故障转移
激活故障探测
整个集群已经搭建完毕,接下来我们要让集群具有发现问题的能力,
[root@localhost data3307]# mysqlfabric group activate mysql_ha
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
uuid finished success result
------------------------------------ -------- ------- ------
39d80744-c811-4e24-a2c8-2b8a5a9005fa 1 1 1
state success when description
----- ------- ------------- -------------------------------------------------------------
3 2 1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x101aa90>.
4 2 1.49365e+09 Executing action (_activate_group).
5 2 1.49365e+09 Executed action (_activate_group).
模拟故障
现在 master=3307 slave=3308,我们手动关闭3307,看HA故障转移能力
[root@localhost data3307]# mysqld_multi stop 3307
[root@localhost data3307]# mysqld_multi report
Reporting MySQL servers
MySQL server from group: mysqld3306 is running
MySQL server from group: mysqld3307 is not running #3307已经stop
MySQL server from group: mysqld3308 is running
MySQL server from group: mysqld3309 is running
[root@localhost data3307]# mysqlfabric group lookup_servers mysql_ha
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
server_uuid address status mode weight
------------------------------------ ------------------ ------- ---------- ------
f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307 FAULTY READ_WRITE 1.0 #3307已经stop
fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308 PRIMARY READ_WRITE 1.0
可以看到master已经转移到了3308 。But,当我们重启3307之后,3307并不能自动加入集群,需要我们手动做些工作。
[root@localhost data3307]# mysqlfabric server set_status f0c7d668-2a7c-11e7-b3c3-000c29217b03 spare
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
uuid finished success result
------------------------------------ -------- ------- ------
3a2ac6b2-5a79-4ab8-9d3d-a197332ee711 1 1 1
state success when description
----- ------- ------------- -------------------------------------------------------------
3 2 1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x10ada50>.
4 2 1.49365e+09 Executing action (_set_server_status).
5 2 1.49365e+09 Executed action (_set_server_status).
[root@localhost data3307]# mysqlfabric server set_status f0c7d668-2a7c-11e7-b3c3-000c29217b03 secondary
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
uuid finished success result
------------------------------------ -------- ------- ------
8fa3a763-aa48-489d-95a4-52f5323d0bbc 1 1 1
state success when description
----- ------- ------------- -------------------------------------------------------------
3 2 1.49365e+09 Triggered by <mysql.fabric.events.Event object at 0x10ada50>.
4 2 1.49365e+09 Executing action (_set_server_status).
5 2 1.49365e+09 Executed action (_set_server_status).
接着查看集群状态:
[root@localhost data3307]# mysqlfabric group lookup_servers mysql_ha
Password for admin:
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1
server_uuid address status mode weight
------------------------------------ ------------------ --------- ---------- ------
f0c7d668-2a7c-11e7-b3c3-000c29217b03 192.168.1.201:3307 SECONDARY READ_ONLY 1.0
fe6d0231-2a7c-11e7-b65e-000c29217b03 192.168.1.201:3308 PRIMARY READ_WRITE 1.0
至此,3307已经重新加入ha,成为slave。以此类推,将其他节点下线,再上线,也能达到同样效果,不再演示。至此,我们模拟了两个节点分别停机,上线,fabric的自动故障转移功能。
配置过程的报错
问题1
Q:在提升主的时候,由于要选择的主此前purge掉了一些binlog
报错:
Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting \
using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs\
containing GTIDs that the slave requires.'
A:手动将slave指向master
master:
mysql> show master status\G;
*************************** 1. row ***************************
File: mysql-bin.000025
Position: 393
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: d54fe35a-2a7c-11e7-b24b-000c29217b03:1-73,
f0c7d668-2a7c-11e7-b3c3-000c29217b03:1-16
1 row in set (0.00 sec)
slave:
mysql> stop slave ;
Query OK, 0 rows affected (0.01 sec)
mysql> reset slave ;
Query OK, 0 rows affected (0.09 sec)
mysql> reset master;
Query OK, 0 rows affected (0.02 sec)
mysql> set global gtid_purged="d54fe35a-2a7c-11e7-b24b-000c29217b03:1-73,f0c7d668-2a7c-11e7-b3c3-000c29217b03:1-16";
Query OK, 0 rows affected (0.00 sec)
mysql> change master to master_host='localhost',master_port=3307,master_user='fabric',master_password='123',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.05 sec)
mysql> start slave;
Query OK, 0 rows affected (0.02 sec)
问题2
Q:在上个问题的处理过程中,曾试图跳过部分事务,所以设置了GTID_NEXT参数,然后就出现如下报错
mysql> stop slave ;
ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11'.
mysql> start slave ;
ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11'.
mysql> reset slave ;
ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'f0c7d668-2a7c-11e7-b3c3-000c29217b03:11'.
A:
mysql> set gtid_next='automatic';
Query OK, 0 rows affected (0.01 sec)
问题3
Q:当3307下线,从新上线后,设置spare的时候,报错
Last_SQL_Error: Slave failed to initialize relay log info structure from the repository
A:
需要在3307
reset slave all;