背景
1个共用的测试环境:3节点、单主的的group replication。一天启动节点3(node3)时,报错:[ERROR] Plugin group_replication reported: ‘The member contains transactions not present in the group. The member will now exit the group.’
本文是处理过程和总结。
错误的含义
节点包含了非group内的事务。查看node3的数据,确实有与其他节点不一样的数据。应该是没启动GR服务时,被当做独立库操作了。
处理方法
查阅资料,可以通过重置master信息,重新加入GR的方式解决。
- 查看当前Master状态
mysql> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000021
Position: 246
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 14a7ed7d-7e42-11ea-add1-dc4a3e82bdf9:1-66725,
ba279ebc-9ac8-11ea-a379-a08cfdcc0735:1-250791:1000181-1000191
1 row in set (0.01 sec)
- 重置配置信息
官方的描述:RESET MASTER enables you to delete any binary log files and their related binary log index file, returning the master to its state before binary logging was started.
删除了binlog,清空了master配置信息。
RESET MASTER
再看master状态,日志已经清空。
mysql> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000001
Position: 150
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
- 重新加入GR
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='password'
FOR CHANNEL 'group_replication_recovery';
start group_replication;
- 看日志已经启动GR,并进行恢复
Slave I/O thread for channel ‘group_replication_recovery’: connected to master ‘repl@s1:3306’,replication started in log ‘FIRST’ at position 4
2020-06-15T23:09:28.805095Z 2 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2020-06-15T23:09:28.805232Z 2 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 10.91.57.189/26,127.0.0.1/8 to the whitelist'
2020-06-15T23:09:28.805563Z 2 [Note] Plugin group_replication reported: '[GCS] Translated 's3' to 127.0.0.1'
2020-06-15T23:09:28.805713Z 2 [Warning] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2020-06-15T23:09:28.805741Z 2 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2020-06-15T23:09:28.805752Z 2 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "ba279ebc-9ac8-11ea-a379-a08cfdcc0735"; group_replication_local_address: "s3:33061"; group_replication_group_seeds: "s1:33061,s2:33061,s3:33061"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: "AUTOMATIC"'
2020-06-15T23:09:28.805859Z 2 [Note] Plugin group_replication reported: '[GCS] Configured number of attempts to join: 0'
2020-06-15T23:09:28.805880Z 2 [Note] Plugin group_replication reported: '[GCS] Configured time between attempts to join: 5 seconds'
2020-06-15T23:09:28.805893Z 2 [Note] Plugin group_replication reported: 'Member configuration: member_id: 3; member_uuid: "14a7ed7d-7e42-11ea-add1-dc4a3e82bdf9"; single-primary mode: "true"; group_replication_auto_increment_increment: 7; '
2020-06-15T23:09:28.806102Z 185 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-06-15T23:09:28.949470Z 188 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './s3-relay-bin-group_replication_applier.000001' position: 4
2020-06-15T23:09:28.949474Z 2 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2020-06-15T23:09:28.949674Z 2 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
2020-06-15T23:09:28.949683Z 2 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 3'
2020-06-15T23:09:28.949934Z 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2020-06-15T23:09:28.949962Z 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 33061'
2020-06-15T23:09:30.817333Z 2 [Note] Plugin group_replication reported: 'This server is working as secondary member with primary member address s2:3306.'
2020-06-15T23:09:30.817574Z 191 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2020-06-15T23:09:30.817687Z 0 [Note] Plugin group_replication reported: 'Group membership changed to s3:3306, s2:3306, s1:3306 on view 15901818388998341:33.'
2020-06-15T23:09:31.060257Z 191 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='s2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='s1', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2020-06-15T23:09:31.235763Z 191 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 9a44a006-7b43-11ea-a3eb-a08cfdc63e31 at s1 port: 3306.'
2020-06-15T23:09:31.236009Z 193 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-06-15T23:09:31.237547Z 193 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@s1:3306',replication started in log 'FIRST' at position 4
查看GR状态:
mysql> SELECT * FROM performance_schema.replication_group_members;
可以检测到3个节点,但node3一直处于recovering。且过一会就会失败断开。
遇到的问题
- Error_code: 1050 Table ‘repl0523’ already exists
2020-06-15T19:16:28.282318Z 22 [ERROR] Slave SQL for channel 'group_replication_recovery': Error 'Table 'repl0523' already exists' on query. Default database: 'test'. Query: 'create table repl0523 (id int,name varchar(30))', Error_code: 1050
表已经存在,删除该表后继续。
因为已经是GR的从节点,默认是不能写的,同时,也不能把事务发送到其他节点,影响正常数据。
关闭binlog SET SQL_LOG_BIN=0;
修改全局参数 set global read_only=0;
删除表 drop table repl0523
开启binlog SET SQL_LOG_BIN=1;
重设master
重启 start group_replication;
- Multiple primary key defined
2020-06-15T20:14:40.801344Z 150 [ERROR] Slave SQL for channel 'group_replication_recovery': Error 'Multiple primary key defined' on query. Default database: 'test'. Query: 'ALTER TABLE `test`.`test_record_table`
在表上要建多个主键。一个表肯定只能有一个主键,查看node3上,已经有主键了。查看binlog的内容,是又再建主键。删除node3上表的主键后继续。
- Error_code:1007 database exists
2020-06-15T21:57:50.313727Z 165 [ERROR] Slave SQL for channel 'group_replication_recovery': Error 'Can't create database 'test_db'; database exists' on query. Default database: 'test_db'. Query: '/* ApplicationName=DataGrip 2019.3.1 */ create schema TEST_DB', Error_code: 1007
数据库已存在。删除该库后继续。
总结
- resetmaster后,会当做新节点重新加入GR,GR创建后的事务会全部重做一遍。需要把之前残留的数据删除。
- 新加入节点,之前本地事务不受影响。本地事务就是导入node3脱离GR的原因。
- 有个疑问: 在线recovery太慢,怎么把以后数据导入新节点后,在加入GR。后续再研究。