检查mysql的replication_MySQL Group Replication学习记录

最新推荐文章于 2023-09-12 10:51:33 发布

weixin_39833290

最新推荐文章于 2023-09-12 10:51:33 发布

阅读量967

点赞数

文章标签：检查mysql的replication

本文链接：https://blog.csdn.net/weixin_39833290/article/details/112870994

版权

MySQL 常见的高可用方案有 MHA，Galera，Orchestrator，以及官方的 Group Replication(MGR)，本文是 MGR 多主集群的搭建及维护学习记录。

MGR特点

支持单主和多主模式

基于 Paxos 算法，实现数据复制的一致性

插件化设计，支持插件检测，新增节点小于集群当前节点主版本号，拒绝加入集群，大于则加入，但无法作为主节点

没有第三方组件依赖

支持全链路 SSL 通讯

支持 IP 白名单

不依赖网络多播

搭建

可以使用 vagrant 启动 3 台虚拟机。分配 IP 如下：

itop-mgr-1192.168.10.101

itop-mgr-2192.168.10.102

itop-mgr-3192.168.10.103mgr 节点IP

安装

MySQL 版本选择最新的 8.x，用以下命令安装：

yum install -y https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm

yum install -y mysql-community-server mysql-shell mysql-router

yuminstall-yhttps://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm

yuminstall-ymysql-community-servermysql-shellmysql-router

手动配置

可以手动配置一个 mgr 集群，也可以用 mysql-shell 来配置。本文将记录手动配置过程，然后用 mysql-shell 来接管。

my.cnf

my.cnf 配置示例。

[mysqldump]

max_allowed_packet = 64M

[mysqld]

server-id=$ID

innodb_buffer_pool_size = 768M

# Remove leading # to set options mainly useful for reporting servers.

# The server defaults are faster for transactions and fast SELECTs.

# Adjust sizes as needed, experiment to find the optimal values.

# join_buffer_size = 128M

# sort_buffer_size = 2M

# read_rnd_buffer_size = 2M

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

# Binary logging and Replication

log_bin = mysql-bin

binlog_format = ROW

binlog_checksum = NONE # or CRC32

master_verify_checksum = OFF # ON if binlog_checksum = CRC32

slave_sql_verify_checksum = OFF # ON if binlog_checksum = CRC32

binlog_cache_size = 1M

binlog_stmt_cache_size = 3M

max_binlog_size = 512M

sync_binlog = 1

expire_logs_days = 7

log_slave_updates = 1

relay_log = mysql-relay-bin

relay_log_purge = 1

# Group Replication parameter

gtid_mode = ON

enforce_gtid_consistency = ON

master_info_repository = TABLE

relay_log_info_repository = TABLE

slave_parallel_workers = 10

slave_preserve_commit_order = ON

slave_parallel_type = LOGICAL_CLOCK

#以便在server收集写集合的同时将其记录到二进制日志。写集合基于每行的主键，并且是行更改后的唯一标识此标识将用于检测冲突。

transaction_write_set_extraction = XXHASH64

#组的名字可以随便起,但不能用主机的GTID! 所有节点的这个组名必须保持一致！

loose-group_replication_group_name = "abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"

##为了避免每次启动自动引导具有相同名称的第二个组,所以设置为OFF。

loose-group_replication_start_on_boot = OFF

loose-group_replication_local_address = "$MYIP:33061"

loose-group_replication_group_seeds = "$NODE1:33061,$NODE2:33061,$NODE3:33061"

#为了避免每次启动自动引导具有相同名称的第二个组,所以设置为OFF。

loose-group_replication_bootstrap_group = OFF

##关闭单主模式的参数(本例测试时多主模式，所以关闭该项,开启多主模式的参数

loose-group_replication_single_primary_mode = OFF # = multi-primary

loose-group_replication_enforce_update_everywhere_checks=ON # = multi-primary

report_host=$MYIP

report_port=3306

# 允许加入组复制的客户机来源的ip白名单

loose-group_replication_ip_whitelist="192.168.10.0/24,127.0.0.1/8"

[mysqldump]

max_allowed_packet=64M

[mysqld]

server-id=$ID

innodb_buffer_pool_size=768M

# Remove leading # to set options mainly useful for reporting servers.

# The server defaults are faster for transactions and fast SELECTs.

# Adjust sizes as needed, experiment to find the optimal values.

# join_buffer_size = 128M

# sort_buffer_size = 2M

# read_rnd_buffer_size = 2M

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

# Binary logging and Replication

log_bin=mysql-bin

binlog_format=ROW

binlog_checksum=NONE# or CRC32

master_verify_checksum=OFF# ON if binlog_checksum = CRC32

slave_sql_verify_checksum=OFF# ON if binlog_checksum = CRC32

binlog_cache_size=1M

binlog_stmt_cache_size=3M

max_binlog_size=512M

sync_binlog=1

expire_logs_days=7

log_slave_updates=1

relay_log=mysql-relay-bin

relay_log_purge=1

# Group Replication parameter

gtid_mode=ON

enforce_gtid_consistency=ON

master_info_repository=TABLE

relay_log_info_repository=TABLE

slave_parallel_workers=10

slave_preserve_commit_order=ON

slave_parallel_type=LOGICAL_CLOCK

#以便在server收集写集合的同时将其记录到二进制日志。写集合基于每行的主键，并且是行更改后的唯一标识此标识将用于检测冲突。

transaction_write_set_extraction=XXHASH64

#组的名字可以随便起,但不能用主机的GTID! 所有节点的这个组名必须保持一致！

loose-group_replication_group_name="abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"

##为了避免每次启动自动引导具有相同名称的第二个组,所以设置为OFF。

loose-group_replication_start_on_boot=OFF

loose-group_replication_local_address="$MYIP:33061"

loose-group_replication_group_seeds="$NODE1:33061,$NODE2:33061,$NODE3:33061"

#为了避免每次启动自动引导具有相同名称的第二个组,所以设置为OFF。

loose-group_replication_bootstrap_group=OFF

##关闭单主模式的参数(本例测试时多主模式，所以关闭该项,开启多主模式的参数

loose-group_replication_single_primary_mode=OFF# = multi-primary

loose-group_replication_enforce_update_everywhere_checks=ON# = multi-primary

report_host=$MYIP

report_port=3306

# 允许加入组复制的客户机来源的ip白名单

loose-group_replication_ip_whitelist="192.168.10.0/24,127.0.0.1/8"

账号设置

为了方便在 vagrant 中用脚本操作，先重新用 insecure 方式初始化 MySQL。

rm -fr /var/lib/mysql

mysqld --initialize-insecure --user=mysql

systemctl start mysqld

# (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)，重启生成 sock 文件

systemctl restart mysqld

systemctl status mysqld

mysql -uroot -e "show databases;"

rm-fr/var/lib/mysql

mysqld--initialize-insecure--user=mysql

systemctlstartmysqld

# (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)，重启生成 sock 文件

systemctlrestartmysqld

systemctlstatusmysqld

mysql-uroot-e"show databases;"

然后修改 root 账号的密码。

cat > /tmp/init.sql <

ALTER USER 'root'@'localhost' IDENTIFIED BY '$MYSQL_ROOT' PASSWORD EXPIRE NEVER;

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '$MYSQL_ROOT';

create user root@'%' identified WITH mysql_native_password BY '$MYSQL_ROOT';

grant all privileges on *.* to root@'%' with grant option;

flush privileges;

EOF

mysql -uroot < /tmp/init.sql

cat>/tmp/init.sql<

ALTERUSER'root'@'localhost'IDENTIFIEDBY'$MYSQL_ROOT'PASSWORDEXPIRENEVER;

ALTERUSER'root'@'localhost'IDENTIFIEDWITHmysql_native_passwordBY'$MYSQL_ROOT';

createuserroot@'%'identifiedWITHmysql_native_passwordBY'$MYSQL_ROOT';

grantallprivilegeson *.*toroot@'%'withgrantoption;

flushprivileges;

EOF

mysql-uroot

设置复制账号。

cat > /tmp/rep.sql <

SET SQL_LOG_BIN=0;

CREATE USER IF NOT EXISTS repl@'%' IDENTIFIED WITH 'mysql_native_password' BY 'repl';

GRANT SUPER,REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO repl@'%';

FLUSH PRIVILEGES;

SET SQL_LOG_BIN=1;

CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';

EOF

if [ ! -f $LOCK ];then

mysql -uroot -p$MYSQL_ROOT < /tmp/rep.sql

touch $LOCK

cat>/tmp/rep.sql<

SETSQL_LOG_BIN=0;

CREATEUSERIFNOTEXISTSrepl@'%'IDENTIFIEDWITH'mysql_native_password'BY'repl';

GRANTSUPER,REPLICATIONCLIENT,REPLICATIONSLAVEON *.*TOrepl@'%';

FLUSHPRIVILEGES;

SETSQL_LOG_BIN=1;

CHANGEMASTERTOMASTER_USER='repl',MASTER_PASSWORD='repl'FORCHANNEL'group_replication_recovery';

EOF

if[!-f$LOCK];then

mysql-uroot-p$MYSQL_ROOT

touch$LOCK

安装MGR插件

# install mgr

cat > /tmp/mgr.sql <

INSTALL PLUGIN group_replication SONAME 'group_replication.so';

SHOW PLUGINS;

EOF

mysql -uroot -p$MYSQL_ROOT -e "SHOW PLUGINS" |grep -q "group_replication" || mysql -uroot -p$MYSQL_ROOT < /tmp/mgr.sql

# install mgr

cat>/tmp/mgr.sql<

INSTALLPLUGINgroup_replicationSONAME'group_replication.so';

SHOWPLUGINS;

EOF

mysql-uroot-p$MYSQL_ROOT-e"SHOW PLUGINS"|grep-q"group_replication"||mysql-uroot-p$MYSQL_ROOT

启动MGR

在第一个节点上启动 MGR，另外两个节点加入 MGR。

# start mgr

cat > /tmp/start.sql <

SET GLOBAL group_replication_bootstrap_group=ON;

START GROUP_REPLICATION;

SET GLOBAL group_replication_bootstrap_group=OFF;

SELECT * FROM performance_schema.replication_group_members;

EOF

# only run on first node.

[ "$ID"x == "10101"x ] && mysql -uroot -p$MYSQL_ROOT < /tmp/start.sql

# join

cat > /tmp/join.sql <

SELECT * FROM performance_schema.replication_group_members;

show global variables like '%seed%';

START GROUP_REPLICATION;

SELECT * FROM performance_schema.replication_group_members;

SHOW STATUS LIKE 'group_replication_primary_member';

show global variables like 'group_replication_single%';

EOF

# run on other two node

[ "$ID"x == "10101"x ] || mysql -uroot -p$MYSQL_ROOT < /tmp/join.sql

# start mgr

cat>/tmp/start.sql<

SETGLOBALgroup_replication_bootstrap_group=ON;

STARTGROUP_REPLICATION;

SETGLOBALgroup_replication_bootstrap_group=OFF;

SELECT *FROMperformance_schema.replication_group_members;

EOF

# only run on first node.

["$ID"x=="10101"x]&&mysql-uroot-p$MYSQL_ROOT

# join

cat>/tmp/join.sql<

SELECT *FROMperformance_schema.replication_group_members;

showglobalvariableslike'%seed%';

STARTGROUP_REPLICATION;

SELECT *FROMperformance_schema.replication_group_members;

SHOWSTATUSLIKE'group_replication_primary_member';

showglobalvariableslike'group_replication_single%';

EOF

# run on other two node

["$ID"x=="10101"x]||mysql-uroot-p$MYSQL_ROOT

group_replication_bootstrap_group 选项，为了避免每次启动自动引导具有相同名称的第二个组，所以需要设置为 OFF。因此执行 START GROUP_REPLICATION 之后需要关闭。

使用mysql-shell接管MGR

如果在已经配置好的组复制上创建 InnoDB Cluster，并且希望使用它来创建集群，可将 adoptFromGR 选项传递给 dba.createCluster() 函数。创建的InnoDB Cluster 会匹配复制组是以单主数据库还是多主数据库运行。

# 使用 mysqlsh 管理集群

# MGR集群接管：如果在已经配置好的组复制上创建InnoDB Cluster，并且希望使用它来创建集群，可将adoptFromGR选项传递给dba.createCluster()函数。创建的InnoDB Cluster会匹配复制组是以单主数据库还是多主数据库运行。要采用现有的组复制组，使用MySQL Shell连接到组成员。

# 在最后一个节点上操作，即等所有节点启动后在操作

JSDIR="/vagrant/js"

MYSQLSHLOG=/tmp/mysql.log

if [ "$ID"x == "10103"x ];then

echo "Run Get Status"

mysqlsh --js --file=$JSDIR/status.js > $MYSQLSHLOG 2>&1

grep "RuntimeError" $MYSQLSHLOG && r=0 || r=1

if [ $r -eq 0 ];then

grep "but GR is not active" $MYSQLSHLOG && r=0 || r=1

if [ $r -eq 0 ];then

echo "Cluster Need Reboot"

mysqlsh --js --file=$JSDIR/reboot.js

else

echo "Init Cluster"

mysqlsh --js --file=$JSDIR/init.js

else

cat $MYSQLSHLOG

# 使用 mysqlsh 管理集群

# 在最后一个节点上操作，即等所有节点启动后在操作

JSDIR="/vagrant/js"

MYSQLSHLOG=/tmp/mysql.log

if["$ID"x=="10103"x];then

echo"Run Get Status"

mysqlsh--js--file=$JSDIR/status.js>$MYSQLSHLOG2>&1

grep"RuntimeError"$MYSQLSHLOG&&r=0||r=1

if[$r-eq0];then

grep"but GR is not active"$MYSQLSHLOG&&r=0||r=1

if[$r-eq0];then

echo"Cluster Need Reboot"

mysqlsh--js--file=$JSDIR/reboot.js

else

echo"Init Cluster"

mysqlsh--js--file=$JSDIR/init.js

else

cat$MYSQLSHLOG

status.js

shell.connect('root@192.168.10.101', 'root');

var cluster = dba.getCluster('itopMgr');

cluster.status();

shell.connect('root@192.168.10.101','root');

varcluster=dba.getCluster('itopMgr');

cluster.status();

init.js

shell.connect('root@192.168.10.101', 'root');

var cluster = dba.createCluster('itopMgr', {adoptFromGR: true});

cluster.status();

shell.connect('root@192.168.10.101','root');

varcluster=dba.createCluster('itopMgr',{adoptFromGR:true});

cluster.status();

reboot.js

shell.connect('root@192.168.10.101', 'root');

dba.rebootClusterFromCompleteOutage('itopMgr')

var cluster = dba.getCluster('itopMgr');

cluster.status();

shell.connect('root@192.168.10.101','root');

dba.rebootClusterFromCompleteOutage('itopMgr')

varcluster=dba.getCluster('itopMgr');

cluster.status();

如果从一开始就使用 mysql-shell 创建并管理集群，可以参考以下命令。

##链接主节点

MySQL JS >\connect root@192.168.10.101:3306

##创建集群

MySQL 192.168.10.101:3306 ssl JS > dba.createCluster('mgrCluster')

MySQL 192.168.10.101:3306 ssl JS > var cluster = dba.getCluster()

##添加节点

MySQL 192.168.10.101:3306 ssl JS > cluster.addInstance('root@192.168.10.102:3306')

MySQL 192.168.10.101:3306 ssl JS > cluster.addInstance('root@192.168.10.103:3306')

##查看集群状态

MySQL 192.168.10.101:3306 ssl JS > cluster.status()

##解散InnoDB Cluster

MySQL 192.168.10.101:3306 ssl JS > cluster.dissolve()

##配置新主选举权重

MySQL 192.168.10.101:3306 ssl JS > var mycluster = dba.getCluster()

MySQL 192.168.10.101:3306 ssl JS > mycluster.addInstance('root@192.168.10.102:3306', {memberWeight:25})

MySQL 192.168.10.101:3306 ssl JS > mycluster.addInstance('root@192.168.10.103:3306', {memberWeight:50})

##指定一个新的主节点

MySQL 192.168.10.101:3306 ssl JS > cluster.setPrimaryInstance('192.168.10.102:3306')

##Cluster.switchToMultiPrimaryMode()切换到多主模式：

MySQL 192.168.10.101:3306 ssl JS > cluster.switchToMultiPrimaryMode()

##Cluster.switchToSinglePrimaryMode()切换到单主模式

MySQL 192.168.10.101:3306 ssl JS > cluster.switchToSinglePrimaryMode('172.16.1.125:3306')

##链接主节点

MySQLJS>\connectroot@192.168.10.101:3306

##创建集群

MySQL192.168.10.101:3306sslJS>dba.createCluster('mgrCluster')

MySQL192.168.10.101:3306sslJS>varcluster=dba.getCluster()

##添加节点

MySQL192.168.10.101:3306sslJS>cluster.addInstance('root@192.168.10.102:3306')

MySQL192.168.10.101:3306sslJS>cluster.addInstance('root@192.168.10.103:3306')

##查看集群状态

MySQL192.168.10.101:3306sslJS>cluster.status()

##解散InnoDB Cluster

MySQL192.168.10.101:3306sslJS>cluster.dissolve()

##配置新主选举权重

MySQL192.168.10.101:3306sslJS>varmycluster=dba.getCluster()

MySQL192.168.10.101:3306sslJS>mycluster.addInstance('root@192.168.10.102:3306',{memberWeight:25})

MySQL192.168.10.101:3306sslJS>mycluster.addInstance('root@192.168.10.103:3306',{memberWeight:50})

##指定一个新的主节点

MySQL192.168.10.101:3306sslJS>cluster.setPrimaryInstance('192.168.10.102:3306')

##Cluster.switchToMultiPrimaryMode()切换到多主模式：

MySQL192.168.10.101:3306sslJS>cluster.switchToMultiPrimaryMode()

##Cluster.switchToSinglePrimaryMode()切换到单主模式

MySQL192.168.10.101:3306sslJS>cluster.switchToSinglePrimaryMode('172.16.1.125:3306')

运维问题

节点状态

ONLINE - 节点状态正常。

OFFLINE - 实例在运行，但没有加入任何Cluster。

RECOVERING - 实例已加入Cluster，正在同步数据。

ERROR - 同步数据发生异常。

UNREACHABLE - 与其他节点通讯中断，可能是网络问题，可能是节点crash。

MISSING 节点已加入集群，但未启动group replication

集群状态

OK – 所有节点处于online状态，有冗余节点。

OK_PARTIAL – 有节点不可用，但仍有冗余节点。

OK_NO_TOLERANCE – 有足够的online节点，但没有冗余，例如：两个节点的Cluster，其中一个挂了，集群就不可用了。

NO_QUORUM – 有节点处于online状态，但达不到法定节点数，此状态下Cluster无法写入，只能读取。

UNKNOWN – 不是online或recovering状态，尝试连接其他实例查看状态。

UNAVAILABLE – 组内节点全是offline状态，但实例在运行，可能实例刚重启还没加入Cluster。

重新加入集群

如果节点故障后重启，状态为 MISSING 时，需要手动重新加入集群。

var cluster = dba.getCluster("itopMgr")

cluster.rejoinInstance('root@192.168.10.102:3306')

varcluster=dba.getCluster("itopMgr")

cluster.rejoinInstance('root@192.168.10.102:3306')

重启集群

当集群所有节点掉线，比如所有机器都宕机后启动，可以用 mysql-shell 重启集群。

shell.connect('root@192.168.10.101', 'root');

dba.rebootClusterFromCompleteOutage('itopMgr')

var cluster = dba.getCluster('itopMgr');

shell.connect('root@192.168.10.101','root');

dba.rebootClusterFromCompleteOutage('itopMgr')

varcluster=dba.getCluster('itopMgr');

当使用 vagrant reload 虚拟机的时候，1 号虚机最先关机，这时候 2，3 号如果还在写入，会比 1 号数据新，当 reload 完成之后，如果还默认以1 号为准，用 SQL 语句启动集群，会出现以下问题：

Rejoining instance to the cluster ...

ERROR: A GTID set check of the MySQL instance at '192.168.10.102:3306' determined that it contains transactions that do not originate from the cluster, which must be discarded before it can join the cluster.

192.168.10.102:3306 has the following errant GTIDs that do not exist in the cluster:

abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1002598-1002690

Having extra GTID events is not expected, and it is recommended to investigate this further and ensure that the data can be removed prior to rejoining the instance to the cluster.

Discarding these extra GTID events can either be done manually or by completely overwriting the state of 192.168.10.102:3306 with a physical snapshot from an existing cluster member. To achieve this remove the instance from the cluster and add it back using .addInstance() and setting the 'recoveryMethod' option to 'clone'.

Cluster.rejoinInstance: The instance '192.168.10.102:3306' contains errant transactions that did not originate from the cluster. (RuntimeError)

Rejoininginstancetothecluster...

ERROR:AGTIDsetcheckoftheMySQLinstanceat'192.168.10.102:3306'determinedthatitcontainstransactionsthatdonotoriginatefromthecluster,whichmustbediscardedbeforeitcanjointhecluster.

192.168.10.102:3306hasthefollowingerrantGTIDsthatdonotexistinthecluster:

abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1002598-1002690

HavingextraGTIDeventsisnotexpected,anditisrecommendedtoinvestigatethisfurtherandensurethatthedatacanberemovedpriortorejoiningtheinstancetothecluster.

DiscardingtheseextraGTIDeventscaneitherbedonemanuallyorbycompletelyoverwritingthestateof192.168.10.102:3306withaphysicalsnapshotfromanexistingclustermember.Toachievethisremovetheinstancefromtheclusterandadditbackusing.addInstance()andsettingthe'recoveryMethod'optionto'clone'.

Cluster.rejoinInstance:Theinstance'192.168.10.102:3306'containserranttransactionsthatdidnotoriginatefromthecluster.(RuntimeError)

因此这种情况用 mysql-shell 来重启比较好。

当已经出现这种情况的时候，用下面的命令查看所有 member。

[root@itop-mgr-1 ~]# ./run.sh "select * from performance_schema.replication_group_members;"

RUN on 192.168.10.101

mysql: [Warning] Using a password on the command line interface can be insecure.

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

RUN on 192.168.10.102

mysql: [Warning] Using a password on the command line interface can be insecure.

+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

RUN on 192.168.10.103

mysql: [Warning] Using a password on the command line interface can be insecure.

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

[root@itop-mgr-1~]# ./run.sh "select * from performance_schema.replication_group_members;"

RUNon192.168.10.101

mysql:[Warning]Usingapasswordonthecommandlineinterfacecanbeinsecure.

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

|group_replication_applier|||NULL|OFFLINE|||

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

RUNon192.168.10.102

mysql:[Warning]Usingapasswordonthecommandlineinterfacecanbeinsecure.

+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

|group_replication_applier|4dc877f0-becf-11ea-bf0b-5254004d77d3|192.168.10.102|3306|OFFLINE|||

+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+

RUNon192.168.10.103

mysql:[Warning]Usingapasswordonthecommandlineinterfacecanbeinsecure.

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

|group_replication_applier|||NULL|OFFLINE|||

+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+

然后以 MEMBER_ID 和 MEMBER_HOST 不为空的节点为基础，启动 group_replication。

SET GLOBAL group_replication_bootstrap_group=ON;

START GROUP_REPLICATION;

SET GLOBAL group_replication_bootstrap_group=OFF;

SETGLOBALgroup_replication_bootstrap_group=ON;

STARTGROUP_REPLICATION;

SETGLOBALgroup_replication_bootstrap_group=OFF;

然后启动 3 号节点

START GROUP_REPLICATION;

STARTGROUP_REPLICATION;

1 号节点无法直接加入，需要先移除在加入，并且使用 clone 方式恢复数据。

MySQL 192.168.10.102:33060+ ssl JS > c.addInstance('root@192.168.10.101:3306', {recoveryMethod:'clone'})

WARNING: A GTID set check of the MySQL instance at '192.168.10.101:3306' determined that it contains transactions that do not originate from the cluster, which must be discarded before it can join the cluster.

192.168.10.101:3306 has the following errant GTIDs that do not exist in the cluster:

abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4218-8557

WARNING: Discarding these extra GTID events can either be done manually or by completely overwriting the state of 192.168.10.101:3306 with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.

Having extra GTID events is not expected, and it is recommended to investigate this further and ensure that the data can be removed prior to choosing the clone recovery method.

Clone based recovery selected through the recoveryMethod option

NOTE: Group Replication will communicate with other members using '192.168.10.101:33061'. Use the localAddress option to override.

Validating instance configuration at 192.168.10.101:3306...

This instance reports its own address as 192.168.10.101:3306

Instance configuration is suitable.

A new instance will be added to the InnoDB cluster. Depending on the amount of

data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster...

ERROR: Unable to enable clone on the instance '192.168.10.103:3306': Recovery user 'repl' not created by InnoDB Cluster

Monitoring recovery process of the new cluster member. Press ^C to stop monitoring and let it continue in background.

Clone based state recovery is now in progress.

NOTE: A server restart is expected to happen as part of the clone process. If the

server does not support the RESTART command or does not come back after a

while, you may need to manually start it back.

* Waiting for clone to finish...

NOTE: 192.168.10.101:3306 is being cloned from 192.168.10.103:3306

** Stage DROP DATA: Completed

** Clone Transfer

FILE COPY ############################################################ 100% Completed

PAGE COPY ############################################################ 100% Completed

REDO COPY ############################################################ 100% Completed

** Stage RECOVERY: \

NOTE: 192.168.10.101:3306 is shutting down...

* Waiting for server restart... ready

* 192.168.10.101:3306 has restarted, waiting for clone to finish...

* Clone process has finished: 109.54 MB transferred in 4 sec (27.38 MB/s)

State recovery already finished for '192.168.10.101:3306'

The instance '192.168.10.101:3306' was successfully added to the cluster.

MySQL192.168.10.102:33060+sslJS>c.addInstance('root@192.168.10.101:3306',{recoveryMethod:'clone'})

WARNING:AGTIDsetcheckoftheMySQLinstanceat'192.168.10.101:3306'determinedthatitcontainstransactionsthatdonotoriginatefromthecluster,whichmustbediscardedbeforeitcanjointhecluster.

192.168.10.101:3306hasthefollowingerrantGTIDsthatdonotexistinthecluster:

abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4218-8557

WARNING:DiscardingtheseextraGTIDeventscaneitherbedonemanuallyorbycompletelyoverwritingthestateof192.168.10.101:3306withaphysicalsnapshotfromanexistingclustermember.Tousethismethodbydefault,setthe'recoveryMethod'optionto'clone'.

HavingextraGTIDeventsisnotexpected,anditisrecommendedtoinvestigatethisfurtherandensurethatthedatacanberemovedpriortochoosingtheclonerecoverymethod.

ClonebasedrecoveryselectedthroughtherecoveryMethodoption

NOTE:GroupReplicationwillcommunicatewithothermembersusing'192.168.10.101:33061'.UsethelocalAddressoptiontooverride.

Validatinginstanceconfigurationat192.168.10.101:3306...

Thisinstancereportsitsownaddressas192.168.10.101:3306

Instanceconfigurationissuitable.

AnewinstancewillbeaddedtotheInnoDBcluster.Dependingontheamountof

dataontheclusterthismighttakefromafewsecondstoseveralhours.

Addinginstancetothecluster...

ERROR:Unabletoenablecloneontheinstance'192.168.10.103:3306':Recoveryuser'repl'notcreatedbyInnoDBCluster

Monitoringrecoveryprocessofthenewclustermember.Press^Ctostopmonitoringandletitcontinueinbackground.

Clonebasedstaterecoveryisnowinprogress.

NOTE:Aserverrestartisexpectedtohappenaspartofthecloneprocess.Ifthe

serverdoesnotsupporttheRESTARTcommandordoesnotcomebackaftera

while,youmayneedtomanuallystartitback.

*Waitingforclonetofinish...

NOTE:192.168.10.101:3306isbeingclonedfrom192.168.10.103:3306

**StageDROPDATA:Completed

**CloneTransfer

FILECOPY############################################################ 100% Completed

PAGECOPY############################################################ 100% Completed

REDOCOPY############################################################ 100% Completed

**StageRECOVERY:\

NOTE:192.168.10.101:3306isshuttingdown...

*Waitingforserverrestart...ready

*192.168.10.101:3306hasrestarted,waitingforclonetofinish...

*Cloneprocesshasfinished:109.54MBtransferredin4sec(27.38MB/s)

Staterecoveryalreadyfinishedfor'192.168.10.101:3306'

Theinstance'192.168.10.101:3306'wassuccessfullyaddedtothecluster.

执行完成之后，三个节点上 SELECT @@GTID_EXECUTED 应返回相同的结果：

[root@itop-mgr-1 ~]# ./run.sh "SELECT @@GTID_EXECUTED;"

RUN on 192.168.10.101

mysql: [Warning] Using a password on the command line interface can be insecure.

+---------------------------------------------------------------------+

| @@GTID_EXECUTED |

+---------------------------------------------------------------------+

| abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062 |

+---------------------------------------------------------------------+

RUN on 192.168.10.102

mysql: [Warning] Using a password on the command line interface can be insecure.

+---------------------------------------------------------------------+

| @@GTID_EXECUTED |

+---------------------------------------------------------------------+

| abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062 |

+---------------------------------------------------------------------+

RUN on 192.168.10.103

mysql: [Warning] Using a password on the command line interface can be insecure.

+---------------------------------------------------------------------+

| @@GTID_EXECUTED |

+---------------------------------------------------------------------+

| abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062 |

+---------------------------------------------------------------------+

[root@itop-mgr-1~]# ./run.sh "SELECT @@GTID_EXECUTED;"

RUNon192.168.10.101

mysql:[Warning]Usingapasswordonthecommandlineinterfacecanbeinsecure.

+---------------------------------------------------------------------+

|@@GTID_EXECUTED|

+---------------------------------------------------------------------+

|abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062|

+---------------------------------------------------------------------+

RUNon192.168.10.102

mysql:[Warning]Usingapasswordonthecommandlineinterfacecanbeinsecure.

+---------------------------------------------------------------------+

|@@GTID_EXECUTED|

+---------------------------------------------------------------------+

|abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062|

+---------------------------------------------------------------------+

RUNon192.168.10.103

mysql:[Warning]Usingapasswordonthecommandlineinterfacecanbeinsecure.

+---------------------------------------------------------------------+

|@@GTID_EXECUTED|

+---------------------------------------------------------------------+

|abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062|

+---------------------------------------------------------------------+

mysql-shell非交互式

使用 'interactive': false 选项。

dba.configureLocalInstance('root@$MYIP:3306', {'password': 'root', 'interactive': false})

dba.configureLocalInstance('root@$MYIP:3306',{'password':'root','interactive':false})

执行 dba.rebootClusterFromCompleteOutage 时会要求用户确认是否加入集群 (y/N) ，脚本中需要强制 yes，可以用以下方法，注意列表里不要写当前操作的节点 IP。

dba.rebootClusterFromCompleteOutage('itopMgr',{rejoinInstances:["192.168.10.102:3306","192.168.10.103:3306"]})

执行 addInstance 时指定 recoveryMethod 选项，避免交互。

cluster.addInstance('root@192.168.10.203:3306', {'localAddress': '192.168.10.203', 'password': 'root','recoveryMethod':'clone'})

cluster.addInstance('root@192.168.10.203:3306',{'localAddress':'192.168.10.203','password':'root','recoveryMethod':'clone'})

常用命令

集群验证 SQL 语句

select * from performance_schema.replication_group_members; #All members should be online.

select instance_name, mysql_server_uuid, addresses from mysql_innodb_cluster_metadata.instances; # All instances should return same value for mysql_server_uuid

SELECT @@GTID_EXECUTED; #All nodes should return same value

select *fromperformance_schema.replication_group_members;#All members should be online.

selectinstance_name,mysql_server_uuid,addressesfrommysql_innodb_cluster_metadata.instances;# All instances should return same value for mysql_server_uuid

SELECT@@GTID_EXECUTED;#All nodes should return same value

常用语句

mysql> SET SQL_LOG_BIN = 0;

mysql> stop group_replication;

mysql> set global super_read_only=0;

mysql> drop database mysql_innodb_cluster_metadata;

mysql> RESET MASTER;

mysql> RESET SLAVE ALL;

JS > var cluster = dba.getCluster()

JS > var cluster = dba.getCluster("")

JS > var cluster = dba.createCluster('name')

JS > cluster.removeInstance('root@:',{force: true})

JS > cluster.addInstance('root@,:')

JS > dba.getCluster()

JS > dba.getCluster().status()

JS > dba.getCluster().checkInstanceState('root@:')

JS > dba.getCluster().rejoinInstance('root@:')

JS > dba.getCluster().describe()

mysql>SETSQL_LOG_BIN=0;

mysql>stopgroup_replication;

mysql>setglobalsuper_read_only=0;

mysql>dropdatabasemysql_innodb_cluster_metadata;

mysql>RESETMASTER;

mysql>RESETSLAVEALL;

JS>varcluster=dba.getCluster()

JS>varcluster=dba.getCluster("")

JS>varcluster=dba.createCluster('name')

JS>cluster.removeInstance('root@:',{force:true})

JS>cluster.addInstance('root@,:')

JS>dba.getCluster()

JS>dba.getCluster().status()

JS>dba.getCluster().checkInstanceState('root@:')

JS>dba.getCluster().rejoinInstance('root@:')

JS>dba.getCluster().describe()

查看集群状态

mysql-js> dba.getCluster().status()

mysql-js>dba.getCluster().status()

Note: If you get following message, then it's time to reboot cluster from complete outage

Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, but GR is not active) (RuntimeError)

reboot from complete outage 即以下命令。

dba.rebootClusterFromCompleteOutage('')

从仲裁丢失中恢复集群

集群处于 NO_QUORUM 状态。Re-establish quorum using the methodcluster.forceQuorumUsingPartitionOf()

JS > cluster.forceQuorumUsingPartitionOf("root@:")

JS>cluster.forceQuorumUsingPartitionOf("root@:")

调试方法

启动 mysql-shell 时指定 log level。

mysqlsh --log-level=DEBUG3

mysqlsh--log-level=DEBUG3

unmanaged replication group

MySQL JS > dba.getCluster()

Dba.getCluster: This function is not available through a session to an instance belonging to an unmanaged replication group (RuntimeError)

MySQLJS>dba.getCluster()

Dba.getCluster:Thisfunctionisnotavailablethroughasessiontoaninstancebelongingtoanunmanagedreplicationgroup(RuntimeError)

InnoDB cluster status is different for mysqlsh and performance_schema.replicaion_group_members. In performance_schema it is ONLINE but mysqlsh depicts MISSING.( Cluster.status() is out of sync with the Group Replication view replication_group_members)。

这种情况可能是集群尚未被 mysql-shell 接管，可以执行上文提到的接管集群的命令。或者

Restore auto.cnf form backup, restart mysqld. If require remove instance and add it back again.

About auto.cnf and server_uuid:auto.cnf file is similar to my.cnf. It contains server_uuid. Server_uuid is generated automatically. When starting MySQL it read and use server_uuid from auto.cnf. The value of the server_uuid used in reapplication. server_uuid is true UUID in an addition to user supplied server_id system variable.About auto.cnf and server_uuid

参考资料

1. https://jeremyxu2010.github.io/2019/05/mysql-innodb-cluster实战/

2. https://kubedb.com/docs/v0.13.0-rc.0/guides/mysql/clustering/overview/

3. http://blog.itpub.net/26736162/viewspace-2675139/

4. https://juejin.im/post/5df4d1dc6fb9a0164577d0a8

5. http://shrenikp.blogspot.com/2018/10/mysql-innodb-cluster-troubleshooting.html

6. https://www.cnblogs.com/xinysu/p/6674832.html

7. https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-working-with-cluster.html

8. https://dba.stackexchange.com/questions/224117/mysqlsh-rebootclusterfromcompleteoutage-force-prompt

9. https://dev.mysql.com/doc/dev/mysqlsh-api-javascript/8.0/classmysqlsh_1_1dba_1_1_dba.html

1.https://jeremyxu2010.github.io/2019/05/mysql-innodb-cluster实战/

2.https://kubedb.com/docs/v0.13.0-rc.0/guides/mysql/clustering/overview/