2016年12月mysql发布了5.7.17版本开始支持了group replication首先来看先传统复制和半同步复制和组复制的原理比较图
比较图
1.mysql传统复制是异步复制
2.mysql的半同步复制
3.mysql的组复制
部署组复制
看完原理我们就部署组复制,要安装和使用组复制必须要配置正确的mysqlserver实例,建议将配置存储在my.cnf中,以下是组中第一个节点的配置
server_id=1973306 #一般定义为你的机器ip的后几位加上数据库端口号
gtid_mode=on
enforce_gtid_consistenct=on
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_fomat=ROW
transaction_write_set_extraction = XXHASH64 #指示server必须为每个事务收集写集合,并使用XXHASH64哈希算法将其编码为散列
loose-group_replication_group_name = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' #告知插件正在加入或要创建的组要命名为'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa'
loose-group_replication_start_on_boot = off
#插件在server启动时不自动启动组复制
loose-group_replication_local_address = '192.168.1.197:33061'
#告诉插件使用地址为192.168.1.197,端口33061用于接受来自组中其他成员的传入连接
loose-group_replication_group_seeds ='192.168.1.197:33061,192.168.1.198:33062,192.168.1.199:33063'
#当下面这些server需要加入组时,应该连接到这些主机和端口上访问他们
loose-group_replication_allow_local_disjoint_gtids_join=on
loose-group_replication_bootstrap_group = off
#指示插件是否自动引导组
loose-group_replication_single_primary_mode=true
#设置为single-primary模式下,组内只有一个节点负责写入,读可以从任意一个节点读取,组内数据保持最终一致(另一种模式为multi-primary模式即为多写方案,即写操作会下发到组内所有节点,组内所有节点同时可读可写,该模式也是能够保证组内数据最终一致性)
loose-group_replication_enforce_update_everywhere_checks=false
#这个参数设置检查对RP有影响的参数在各个节点是否一致
创建用户
创建具有REPLICATION-SLAVE权限的用户,此操作应该记录到二进制日志中,以避免将更改传递到其他server实例,以下是创建的过程。
root@mysqldb 14:20: [(none)]> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:20: [(none)]> create user 'repl'@'%';
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:21: [(none)]> grant replication slave on *.* to 'repl'@'%' identified by 'repl';
Query OK, 0 rows affected, 1 warning (0.00 sec)
root@mysqldb 14:21: [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:22: [(none)]> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec)
复制凭证
用户进行上述配置后,需要使用change master to语句将server配置为,在下次需要从其他成员恢复其状态时,使用group_replication_recovery复制通道的给定凭据。执行以下命令
[(none)]> change master to
-> master_user='repl',
-> master_password='repl'
-> for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.01 sec)
安装组复制插件
[(none)]> install PLUGIN group_replication SONAME 'group_replication.so';
Query OK, 0 rows affected (0.02 sec)
检查插件是否成功安装
请执行SHOW PLUGINS并检查输出在最后一行,如下显示
[(none)]> show plugins;
+----------------------------+----------+--------------------+----------------------+---------+
| Name | Status | Type | Library | License |
+----------------------------+----------+--------------------+----------------------+---------+
(...)
| group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |
+----------------------------+----------+--------------------+----------------------+---------+
启动第一个节点的group replication
[(none)]> set global group_replication_bootstrap_group=on;
Query OK, 0 rows affected (0.00 sec)
[(none)]> start group_replication;
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log
这里报错了 我们查看错误日志有如下的一段话
[Warning] Plugin group_replication reported: 'Group Replication requires slave-preserve-commit-order to be set to ON when using more than 1 applier threads.'
从这里可以看出 有一个参数值需要设置为ON
我们登录数据库操作
[(none)]> show global variables like '%slave_preserve%';
+-----------------------------+-------+
| Variable_name | Value |
+-----------------------------+-------+
| slave_preserve_commit_order | OFF |
+-----------------------------+-------+
[(none)]> set global slave_preserve_commit_order=on;
Query OK, 0 rows affected (0.00 sec)#这里改好之后我们再启动group replication试一下
[(none)]> start group_replication;
Query OK, 0 rows affected (1.02 sec)
[(none)]> set global group_replication_bootstrap_group=off;
Query OK, 0 rows affected (0.00 sec)
这个时候
组已经启动成功,检查该组是否已创建,并且其中有一个成员
[(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| group_replication_applier | 211de504-c8be-11e7-8e2c-000c294732bd | node0 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
1 rows in set (0.00 sec)
此时组中已经有一个成员了通过添加配置来添加剩下的成员第二个实例的配置文件与第一个实例额配置文件类似
server_id=1983306 #对应该机器的ip的后三位加上mysql的端口号
gtid_mode=on
enforce_gtid_consistenct=on
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_fomat=ROW
transaction_write_set_extraction = XXHASH64
loose-group_replication_group_name = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa'
loose-group_replication_start_on_boot = off
loose-group_replication_local_address = '192.168.1.198:33062' #本机ip和接收的端口
loose-group_replication_group_seeds ='192.168.1.197:33061,192.168.1.198:33062,192.168.1.199:33063'
loose-group_replication_bootstrap_group = off
loose-group_replication_single_primary_mode=true
loose-group_replication_enforce_update_everywhere_checks=false
创建用户和创建恢复凭证
root@mysqldb 14:40: [(none)]> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:40: [(none)]> create user 'repl'@'%';
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:41: [(none)]> grant replication slave on *.* to 'repl'@'%' identified by 'repl';
Query OK, 0 rows affected, 1 warning (0.00 sec)
root@mysqldb 14:41: [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:42: [(none)]> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:42: [(none)]> change master to
-> master_user='repl',
-> master_password='repl'
-> for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.01 sec)
安装组复制插件,并启动将server加入组的程序,与部署第一个节点相同的方式安装插件
[(none)]> install PLUGIN group_replication SONAME 'group_replication.so';
Query OK, 0 rows affected (0.02 sec)
[(none)]> start group_replication;
Query OK, 0 rows affected (1.02 sec)
与之前的步骤不同的是,这里与第一个节点中的步骤有一个区别就是不执行SET GLOBAL group_replication_bootstrap_group=ON的操作;在启动组复制之前,因为该组已由第一个节点创建和引导。此时,第二结点只需添加到已经存在的组中即可;再次检查performance_schema.replication_group_members表,可以看出组中现在有两个ONLINE的server。
[(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| group_replication_applier | 211de504-c8be-11e7-8e2c-000c294732bd | node0 | 3306 | ONLINE |
| group_replication_applier | 8eb4648b-bf0b-11e7-b843-000c29f6d9d2 | node1 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
2 rows in set (0.00 sec)
至此两个节点已经成功加入现在我们来加入第三个节点,首先是配置文件
server_id=1993306 #本机ip的后三位加上mysql的端口号
gtid_mode=on
enforce_gtid_consistenct=on
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_fomat=ROW
transaction_write_set_extraction = XXHASH64
loose-group_replication_group_name = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa'
loose-group_replication_start_on_boot = off
loose-group_replication_local_address = '192.168.1.199:33063' #本机ip加上接收的端口号
loose-group_replication_group_seeds ='192.168.1.197:33061,192.168.1.198:33062,192.168.1.199:33063'
loose-group_replication_bootstrap_group = off
loose-group_replication_single_primary_mode=true
loose-group_replication_enforce_update_everywhere_checks=false
然后与之前两个节点一样创建用户和恢复凭证
root@mysqldb 14:57: [(none)]> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:58: [(none)]> create user 'repl'@'%';
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:58: [(none)]> grant replication slave on *.* to 'repl'@'%' identified by 'repl';
Query OK, 0 rows affected, 1 warning (0.00 sec)
root@mysqldb 14:58: [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:58: [(none)]> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 14:58: [(none)]> change master to
-> master_user='repl',
-> master_password='repl'
-> for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.01 sec)
安装组复制插件并启动将server加入组的程序,与部署前两个节点相同的方式安装
[(none)]> install PLUGIN group_replication SONAME 'group_replication.so';
Query OK, 0 rows affected (0.02 sec)
[(none)]> start group_replication;
Query OK, 0 rows affected (1.02 sec)
此时,第三个节点正在运行,并且已经加入组且与组中的其他成员同步。访问performance_schema.replication_group_members表再次确认情况
[(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| group_replication_applier | 211de504-c8be-11e7-8e2c-000c294732bd | node0 | 3306 | ONLINE |
| group_replication_applier | 8eb4648b-bf0b-11e7-b843-000c29f6d9d2 | node1 | 3306 | ONLINE |
| group_replication_applier | d31f9fbc-c8c7-11e7-9a73-000c2984d9aa | node2 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
3 rows in set (0.00 sec)
现在我们已经把group replication搭建好了,来简单说明下group replication有关的表
replication_group_members
该表用于监控在当前视图中的不同server实例的状态,换句话说,是该组的一部分,用于组员服务追踪
+--------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+----------+------+-----+---------+-------+
| CHANNEL_NAME | char(64) | NO | | NULL | |
| MEMBER_ID | char(36) | NO | | NULL | |
| MEMBER_HOST | char(60) | NO | | NULL | |
| MEMBER_PORT | int(11) | YES | | NULL | |
| MEMBER_STATE | char(64) | NO | | NULL | |
+--------------+----------+------+-----+---------+-------+
5 rows in set (0.00 sec)
replication_group_member_stats
该表提供与认证过程相关的信息,对于了解申请队列增长情况,出发了多少冲突,检查了多少事务,哪些事务已被所有成员提交等等非常有用。
+------------------------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------------------+---------------------+------+-----+---------+-------+
| CHANNEL_NAME | char(64) | NO | | NULL | |
| VIEW_ID | char(60) | NO | | NULL | |
| MEMBER_ID | char(36) | NO | | NULL | |
| COUNT_TRANSACTIONS_IN_QUEUE | bigint(20) unsigned | NO | | NULL | |
| COUNT_TRANSACTIONS_CHECKED | bigint(20) unsigned | NO | | NULL | |
| COUNT_CONFLICTS_DETECTED | bigint(20) unsigned | NO | | NULL | |
| COUNT_TRANSACTIONS_ROWS_VALIDATING | bigint(20) unsigned | NO | | NULL | |
| TRANSACTIONS_COMMITTED_ALL_MEMBERS | longtext | NO | | NULL | |
| LAST_CONFLICT_FREE_TRANSACTION | text | NO | | NULL | |
+------------------------------------+---------------------+------+-----+---------+-------+
9 rows in set (0.00 sec)
这些字段对于监控组中的成员的性能很重要。 例如,假设组的成员之一出现延迟,并且不能与该组的其他成员同步。 在这种情况下,您可能会在队列中看到大量的事务。 基于此信息,您可以决定从组中删除成员或延迟组中其他成员的事务处理,从而减少排队的事务的数量。 此信息还可以帮助您决定如何调整组复制插件的流控制
replication_connection_status
连接到组时,此表中的某些字段显示有关组复制的信息。例如,已从组中接收并在应用队列(中继日志)中排队的事务
+---------------------------+-------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+-------------------------------+------+-----+---------+-------+
| CHANNEL_NAME | char(64) | NO | | NULL | |
| GROUP_NAME | char(36) | NO | | NULL | |
| SOURCE_UUID | char(36) | NO | | NULL | |
| THREAD_ID | bigint(20) unsigned | YES | | NULL | |
| SERVICE_STATE | enum('ON','OFF','CONNECTING') | NO | | NULL | |
| COUNT_RECEIVED_HEARTBEATS | bigint(20) unsigned | NO | | 0 | |
| LAST_HEARTBEAT_TIMESTAMP | timestamp | NO | | NULL | |
| RECEIVED_TRANSACTION_SET | longtext | NO | | NULL | |
| LAST_ERROR_NUMBER | int(11) | NO | | NULL | |
| LAST_ERROR_MESSAGE | varchar(1024) | NO | | NULL | |
| LAST_ERROR_TIMESTAMP | timestamp | NO | | NULL | |
+---------------------------+-------------------------------+------+-----+---------+-------+
11 rows in set (0.00 sec)
replication_applier_status
可以使用常规replication_applier_status表观察组复制相关的通道和线程的状态。如果有许多不同的工作线程在应用事务,那么工作表也可以用来监控每个工作线程正在做什么。
+----------------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+---------------------+------+-----+---------+-------+
| CHANNEL_NAME | char(64) | NO | | NULL | |
| SERVICE_STATE | enum('ON','OFF') | NO | | NULL | |
| REMAINING_DELAY | int(10) unsigned | YES | | NULL | |
| COUNT_TRANSACTIONS_RETRIES | bigint(20) unsigned | NO | | NULL | |
+----------------------------+---------------------+------+-----+---------+-------+
4 rows in set (0.01 sec)
以上的知识点大多来源自京东的数据库技术部京东数据库技术部
以上步骤再进行到第一个节点启动的时候发生错误
ERROR 3096 (HY000): The START GROUP_REPLICATION command failed as there was an error when initializing the group communication layer.
是由于本人马虎在配置文件中的loose-group_replication_group_seeds这个值设置重复 改正以后就好了