在发生紧急故障切换后,如果 ClusterSet 的各个部分之间存在事务集不同的风险,则必须保护集群不受写入流量或所有流量的影响。
如果发生网络分区,则有可能出现脑裂的情况,即实例失去同步,无法正确通信以定义同步状态。当 DBA 决定强制选择一个副本集群成为主集群时,产生多于一个主集群,可能会出现脑裂,从而导致脑裂问题。
在这种情况下,DBA 可以选择隔离原始主集群:
- 写入流向。
- 所有流量。
有三种隔离操作:
-<Cluster>.fenceWrites()
: 停止对 ClusterSet 的主集群的写入流量。副本集群不接受写入,因此此操作对它们没有影响。
从 8.0.31 起,可以在 INVALIDATED
副本集群上使用。此外,如果在禁用 super_read_only
的副本集群上运行,它将启用它。
<Cluster>.unfenceWrites()
: 恢复写入流量。此操作可以在以前使用<Cluster>.fenceWrites()
操作阻止写入流量的集群上运行。
无法在副本集群上使用cluster.unfenceWrites()
。<Cluster>.fenceAllTraffic()
: 将集群与所有流量隔离。如果您使用<Cluster>.fenceAllTraffic()
保护了一个集群免受所有流量的影响,则必须使用 MySQL Shell 命令dba.rebootClusterFromCompleteOutage()
重新启动集群。
有关 dba.rebootClusterFromCompleteOutage()
的更多信息,请参阅 第 7.8.3 节 “从重大停机事故中重新启动集群” 。
fenceWrites()
在副本集群上执行.fenceWrites()
返回报错:ERROR: Unable to fence Cluster from write traffic: operation not permitted on REPLICA Clusters Cluster.fenceWrites: The Cluster '<Cluster>' is a REPLICA Cluster of the ClusterSet '<ClusterSet>' (MYSQLSH 51616)
尽管您主要在属于 ClusterSet 的集群上使用隔离,但也可以使用 <Cluster>.fenceAllTraffic()
隔离独立集群。
-
要阻止主集群写入流量,请按如下方式使用
cluster.fenceWrites
命令:<Cluster>.fenceWrites()
运行命令后:
- 集群上禁用了自动
super_read_only
管理。 - 在集群中的所有实例上都启用了
super_read_only
。 - 所有应用程序都被阻止在集群上执行写入操作。
cluster.fenceWrites() The Cluster 'primary' will be fenced from write traffic * Disabling automatic super_read_only management on the Cluster... * Enabling super_read_only on '127.0.0.1:3311'... * Enabling super_read_only on '127.0.0.1:3312'... * Enabling super_read_only on '127.0.0.1:3313'... NOTE: Applications will now be blocked from performing writes on Cluster 'primary'. Use <Cluster>.unfenceWrites() to resume writes if you are certain a split-brain is not in effect. Cluster successfully fenced from write traffic
- 集群上禁用了自动
-
要检查是否已将主集群与写入流量隔离,请使用
<cluster>.status
命令,如下所示:<Cluster>.clusterset.status()
输出如下:
clusterset.status() { "clusters": { "primary": { "clusterErrors": [ "WARNING: Cluster is fenced from Write traffic. Use cluster.unfenceWrites() to unfence the Cluster." ], "clusterRole": "PRIMARY", "globalStatus": "OK_FENCED_WRITES", "primary": null, "status": "FENCED_WRITES", "statusText": "Cluster is fenced from Write Traffic." }, "replica": { "clusterRole": "REPLICA", "clusterSetReplicationStatus": "OK", "globalStatus": "OK" } }, "domainName": "primary", "globalPrimaryInstance": null, "primaryCluster": "primary", "status": "UNAVAILABLE", "statusText": "Primary Cluster is fenced from write traffic."
-
要解除隔离集群并恢复到主集群的写入流量,请按如下方式使用
cluster.fenceWrites
命令:<Cluster>.unfenceWrites()
主集群上的自动
super_read_only
管理已启用,主集群实例上的super_read_only
状态已启用。cluster.unfenceWrites() The Cluster 'primary' will be unfenced from write traffic * Enabling automatic super_read_only management on the Cluster... * Disabling super_read_only on the primary '127.0.0.1:3311'... Cluster successfully unfenced from write traffic
-
要将集群与所有流量隔离,请使用
cluster.fenceAllTraffic
命令,如下所示:<Cluster>.fenceAllTraffic()
super_read_only
状态在集群实例的主实例上启用。在集群中的所有实例上启用offline_mode
之前:cluster.fenceAllTraffic() The Cluster 'primary' will be fenced from all traffic * Enabling super_read_only on the primary '127.0.0.1:3311'... * Enabling offline_mode on the primary '127.0.0.1:3311'... * Enabling offline_mode on '127.0.0.1:3312'... * Stopping Group Replication on '127.0.0.1:3312'... * Enabling offline_mode on '127.0.0.1:3313'... * Stopping Group Replication on '127.0.0.1:3313'... * Stopping Group Replication on the primary '127.0.0.1:3311'... Cluster successfully fenced from all traffic
-
要解除隔离集群的所有流量,请使用 MySQL Shell命令
dba.rebootClusterFromCompleteOut()
。恢复集群后,当被问及是否要将实例重新连接到集群时,可以通过选择 Y 来重新连接实例到集群:cluster = dba.rebootClusterFromCompleteOutage() Restoring the cluster 'primary' from complete outage... The instance '127.0.0.1:3312' was part of the cluster configuration. Would you like to rejoin it to the cluster? [y/N]: Y The instance '127.0.0.1:3313' was part of the cluster configuration. Would you like to rejoin it to the cluster? [y/N]: Y * Waiting for seed instance to become ONLINE... 127.0.0.1:3311 was restored. Rejoining '127.0.0.1:3312' to the cluster. Rejoining instance '127.0.0.1:3312' to cluster 'primary'... The instance '127.0.0.1:3312' was successfully rejoined to the cluster. Rejoining '127.0.0.1:3313' to the cluster. Rejoining instance '127.0.0.1:3313' to cluster 'primary'... The instance '127.0.0.1:3313' was successfully rejoined to the cluster. The cluster was successfully rebooted. <Cluster:primary>