[AlwaysOn] AlwaysOn可用性组的故障转移和故障转移模式[中英文对照] 6

5 、强制故障转移(可能丢失数据)

5 、Forced Failover (with Possible Data Loss)

强制执行可用性组的故障转移(可能丢失数据)是一种灾难恢复方法,可使你使用次要副本作为热备用服务器。因为强制执行故障转移可能面临数据丢失的风险,因此应审慎使用它。建议仅当您必须立即将服务还原到可用性数据库并愿意承担数据丢失的风险时,才执行强制故障转移。有关强制故障转移的先决条件和建议,以及使用强制故障转移从灾难性故障中恢复的示例应用场景的详细信息,请参阅 执行可用性组的强制手动故障转移(SQL Server)

Forcing failover of an availability group (with possible data loss) is a   disaster recovery method that allows you to use a secondary replica as a warm   standby server.Because forcing failover risks possible data loss, it should   be used cautiously and sparingly. We recommend forcing failover only if you   must restore service to your availability databases immediately and are   willing to risk losing data. For more information about the prerequisites and   recommendations for forcing failover and for an example scenario that uses a   forced failover to recover from a catastrophic failure, see Perform a Forced   Manual Failover of an Availability Group (SQL Server).

警告

Warnings

强制故障转移要求WSFC群集具有仲裁。有关配置仲裁和强制仲裁的信息,请参阅 Windows Server 故障转移群集(WSFC) 与SQL Server

Forcing failover requires that the WSFC cluster have quorum. For   information about configuring quorum and forcing quorum, see Windows Server   Failover Clustering (WSFC) with SQL Server.

5.1 、强制故障转移的原理

5.1 、How Forced Failover Works

强制故障转移会启动一个将主角色转换为角色处于辅助或正在解析状态的目标副本的过程。故障转移目标成为新的主副本,并立即将其数据库副本提供给客户端。当以前的主副本变得可用时,它将转换为辅助角色,并且其数据库将成为辅助数据库。

Forcing failover initiates a transition of the primary role to a target   replica whose role is in the SECONDARY or RESOLVING state. The failover   target becomes the new primary replica and immediately serves its copies of   the databases to clients. When the former primary replica becomes available,   it will transition to the secondary role and its databases will become   secondary databases.

所有辅助数据库(包括现在变得可用的以前的主数据库)将挂起。根据挂起的辅助数据库以前的数据同步状态,它可能适合于补救该主数据库的未能提交的数据。在配置为只读访问的辅助副本上,您可以查询辅助数据库以手动发现丢失的数据。然后,您可以对新的主数据库发出Transact-SQL语句来进行必要的更改。

All secondary databases (including the former primary databases, when   they become available) are SUSPENDED. Depending on the previous data   synchronization state of a suspended secondary database, it might be suitable   for salvaging missing committed data for that primary database. On a   secondary replica that is configured for read-only access, you can query the   secondary databases to manually discover missing data. Then you can issue   Transact-SQL statements on the new primary databases to make any necessary   changes.

5.2 、强制故障转移的风险

5.2 、Risks of Forcing Failover

一定要注意,强制故障转移可能会造成数据丢失。这是因为目标副本无法与主副本进行通信,从而不能保证两个数据库同步。强制故障转移启动新的恢复分叉。因为原始主数据库和辅助数据库位于不同的恢复分叉上,所以每个数据库现在包含另一个数据库不包含的数据:每个原始主数据库包含任何尚未从其发送队列发送到以前的辅助数据库的更改(未发送的日志);以前的辅助数据库包含任何强制故障转移之后发生的更改。

It is essential to understand that forcing failover can cause data loss.   Data loss is possible because the target replica cannot communicate with the   primary replica and, therefore, cannot guarantee that the databases are   synchronized. Forcing failover starts a new recovery fork. Because the   original primary databases and secondary databases are on different recovery   forks, each of them now contains data that the other database does not   contain: each original primary database contains whatever changes were not   yet sent from its send queue to the former secondary database (the unsent   log); the former secondary databases contain whatever changes occur after   failover was forced.

如果因为主副本出现故障而强制进行故障转移,则潜在的数据丢失取决于是否在出现故障之前已将所有事务日志发送到辅助副本。在异步提交模式下,可能会始终存在累积的未发送日志。在同步提交模式下,可能仅在辅助数据库同步之前会出现这种情况。

If failover is forced because the primary replica has failed, potential   data loss depends on whether or not any transaction logs had been sent to the   secondary replica before the failure. Under the asynchronous-commit mode,   accumulated unsent log is always a possibility. Under synchronous-commit   mode, this is possible only until the secondary databases become   synchronized.

下表总结了在强制故障转移到该副本上时特定数据库丢失数据的可能性。

The following table summarizes the possibility of data loss for a   particular database on the replica to which you force failover.

辅助副本的可用性模式

数据库是否同步?

是否可能发生数据丢失?

同步提交

同步提交

异步提交

Availability mode of Secondary Replica

Is database synchronized?

Is data loss possible?

Synchronous-commit

是Yes

否No

Synchronous-commit

否No

是Yes

Asynchronous-commit

否No

是Yes

辅助数据库仅跟踪两个恢复分叉,因此,如果您执行多个强制故障转移,则确实已与先前的强制故障转移启动数据同步的任何辅助数据库都可能无法恢复运行。如果发生这种情况,则需要从可用性组中删除无法恢复的所有辅助数据库,还原到正确的时间点,然后重新加入可用性组。在此方案中,可能会发生状态为103的错误1408(错误:1408,严重性:16,状态:103)。还原不能跨多个恢复分叉执行,因此请确保在执行多个强制故障转移后执行日志备份。

Secondary databases track only two recovery forks, so if you perform   multiple forced failovers, any secondary database that did start data   synchronization with the previous force failover might not be able to resume.   If this occurs, any secondary databases that cannot be resumed will need to   be removed from the availability group, restored to the correct point in   time, and rejoined to the availability group. Error 1408 with state 103 may   be observed in this scenario (Error: 1408, Severity: 16, State: 103). A   restore will not work across multiple recovery forks, therefore, be sure to   perform a log backup after performing more than one forced failover.

5.3 、强制仲裁后需要强制故障转移的原因

5.3 、Why Forced Failover is Required After Forcing Quorum

在对WSFC群集强制执行仲裁(强制仲裁)后,你需要在每个可用性组上执行强制故障转移(可能会丢失数据)。强制故障转移是必需的,因为WSFC群集的真实状态值可能已丢失。在强制仲裁后需要防止常规故障转移,因为在重新配置的WSFC群集上未同步的辅助副本很可能显示为“已同步”。

After quorum is forced on the WSFC cluster ( forced quorum) you   need to perform a forced failover (with possible data loss) on each   availability group. The forced failover is required because the real state of   the WSFC cluster values might have been lost. Preventing normal failovers   after a forced quorum is required because of the possibility than an unsynchronized   secondary replica would appear to be synchronized on the reconfigured WSFC   cluster.

例如,考虑在3个节点上承载可用性组的WSFC群集:节点A承载主要副本,而节点B和节点C分别承载一个次要副本。节点C断开了与WSFC群集的连接,而此时该节点上的本地辅助副本处于同步状态。但是节点A和节点B仍可以正常仲裁,可用性组仍处于联机状态。在节点A上,主副本继续接受更新,在节点B上,辅助副本继续与主副本同步。节点C上的辅助副本就会变得不同步,并且越来越滞后于主副本。但是,由于节点C已断开连接,该副本仍错误地处于同步状态。

For example, consider a WSFC cluster that hosts an availability group on   three nodes: Node A hosts the primary replica and Node B and Node C each   hosts a secondary replica. Node C gets disconnected from the WSFC cluster   while the local secondary replica is SYNCHRONIZED. But Node A and Node B   retain a healthy quorum and the availability group remains online. On Node A,   the primary replica continues to accept updates, and on Node B, the secondary   replica continues to synchronize with the primary replica. The secondary   replica on Node C becomes unsynchronized and falls increasingly behind the   primary replica. However, because Node C is disconnected, the replica   remains, incorrectly, in the SYNCHRONIZED state.

如果仲裁丢失,然后在节点A上强制执行,则WSFC群集上可用性组的同步状态应是正确的(节点C上的辅助副本显示为未同步状态)。但是,如果在节点C上强制执行仲裁,则可用性组的同步状态将是不正确的。群集上的同步状态将恢复为节点C断开连接时所处的状态(节点C上的辅助副本“错误地”显示为同步状态)。由于计划的手动故障转移确保了数据的安全性,在强制仲裁后它们不允许将可用性组恢复为联机状态。

If quorum is lost and is then forced on Node A, the synchronization   state of the availability group on the WSFC cluster should be correct-with   the secondary replica on Node C shown as UNSYNCHRONIZED. However, if quorum   is forced on Node C, the synchronization of the availability group will be   incorrect. The synchronization state on the cluster will have reverted back   to when Node C was disconnected-with the secondary replica on Node C incorrectly   shown as SYNCHRONIZED. Since planned manual failovers guarantee the safety of   the data, they are disallowed for bring an availability group back online   after quorum is forced.

5.4 、跟踪可能的数据丢失

5.4 、Tracking Potential Data Loss

WSFC 群集正常仲裁时,您可以估计数据库上当前可能的数据丢失量。对于给定的辅助副本,当前可能的数据丢失量取决于本地辅助数据库滞后相应主数据库的程度。因为滞后程度随时间而变化,我们建议您定期跟踪未同步的辅助数据库可能的数据丢失情况。跟踪滞后情况涉及比较每个主数据库和辅助数据库的上次提交LSN和上次提交时间,如下所示:

When the WSFC cluster has a healthy quorum, you can estimate the current   potential for data loss on databases. For a given secondary replica, the   current potential for data loss depends on how far the local secondary   databases are lagging behind the corresponding primary databases. Because the   amount of lag varies over time, we recommend that you periodically track   potential data loss for your unsynchronized secondary databases. Tracking lag   involves comparing the Last Commit LSN and Last Commit Time for each primary   database and its secondary databases, as follows:

1. 连接到主副本。

1.Connect to the primary replica.

2. 查询 sys.dm_hadr_database_replica_states动态管理视图的 last_commit_lsn(上次提交事务的LSN)和 last_commit_time(上次提交时间)列。

2.Query the last_commit_lsn (LSN of the last committed   transaction) and last_commit_time (time of the last commit) columns of   the sys.dm_hadr_database_replica_states   dynamic management view.

3. 比较为每个主数据库和它的每个辅助数据库返回的值。它们的上次提交LSN的差值指示滞后的程度。

3.Compare the values returned for each primary database and each of its   secondary databases. The difference between their Last Commit LSNs indicate   the amount of lag.

4. 当某个或某组数据库上的滞后程度超过指定时间段的最大滞后程度时,您可以触发一个警报。例如,可以通过每分钟在每个主数据库上执行的一个作业来运行查询。如果自上次执行该作业以来,主数据库的 last_commit_time和任意辅助数据库的相应值的差值超过恢复点目标(RPO)(例如,5分钟),该作业可能引发一个警报。

4.You can trigger an alert when the amount of lag on a database or set   of databases exceeds your desired maximum lag for a given period of time. For   example, the query can be run by a job that executes every minute on each   primary database. If the difference between the last_commit_time of a   primary database and any of its secondary databases has exceeded the recovery   point objective (RPO) (for example, 5 minutes) since the last time the job   executed, the job can raise an alert.

重要

Important

当WSFC群集缺少仲裁或已强制执行仲裁时, last_commit_lsnlast_commit_time为NULL。有关在强制仲裁后如何避免数据丢失的信息,请参阅 执行可用性组的强制手动故障转移(SQL Server)

When the WSFC cluster lacks quorum or quorum has been forced, last_commit_lsn   and last_commit_time are NULL. For information about how you might be   able to avoid data loss after you forced quorum, see "Potential Ways to   Avoid Data Loss After Quorum is Forced" in Perform a Forced   Manual Failover of an Availability Group(SQL Server).

5.5 、管理潜在的数据丢失

5.5 、Managing the Potential Data Loss

强制故障转移后,所有辅助数据库都将挂起。这包括以前的主数据库(在以前的主副本返回到联机状态并且发现它现在是辅助副本后)。您必须单独在每个辅助副本上手动恢复每个挂起的数据库。

After failover is forced, all secondary databases are suspended. This   includes the former primary databases, after the former primary replica comes   back online and discovers that it is now a secondary replica. You must   manually resume each suspended database individually on each secondary   replica.

以前的主副本可用后,假设其数据库没有损坏,则可以尝试管理可能的数据丢失。管理潜在数据丢失的可用方法取决于原始主副本是否已连接到新的主副本。假设原始主副本可以访问新的主实例,则会自动透明地进行重新连接。

Once the former primary replica is available, assuming that its   databases are undamaged, you can attempt to manage the potential data loss.   The available approach for managing potential data loss depends on whether   the original primary replica has connected to the new primary replica.   Assuming that the original primary replica can access the new primary   instance, reconnecting occurs automatically and transparently.

已重新连接原始主副本

The Original Primary Replica Has Reconnected

通常,出现故障后,原始主副本在重新启动时便会迅速重新连接到其伙伴。重新连接后,原始主副本将成为辅助副本。其数据库将成为辅助数据库,然后进入挂起状态。除非您恢复新的辅助数据库,否则不会回滚它们。

Typically, after a failure, when the original primary replica restarts   it quickly reconnects to its partner. On reconnecting, the original primary   replica becomes the secondary replica. Its databases becomes the secondary   databases and enter the SUSPENDED state. The new secondary databases will not   be not rolled back unless you resume them.

但是,无法访问挂起的数据库;因此,不能对其进行检查以确定恢复给定数据库时可能丢失的数据。因此,确定是恢复还是删除辅助数据库取决于您是否能够完全接受数据丢失,如下所示:

However, the suspended databases are inaccessible, so you cannot inspect   them to evaluate what data would be lost if you were to resume a given   database. Therefore, the decision on whether to resume or remove a secondary   database depends on whether you are willing to accept any data loss, as   follows:

·        如果数据丢失不可接受,则应该从可用性组中删除数据库以对数据进行补救。

·        If losing any data would be unacceptable, you should remove the   databases from the availability group to salvage them.

数据库管理员现在可以恢复以前的主数据库,并尝试恢复可能已丢失的数据。但是,当以前的主数据库处于联机状态后,它与当前主数据库存在偏差,因此,数据库管理员需要使客户端无法访问删除的数据库或当前主要数据库,以免数据库之间出现更大偏差并防止出现客户端故障转移问题。

The database administrator can now recover the former primary databases   and attempt to recover the data that would have been lost. However, when a   former primary database comes online, it is divergent from the current   primary database, so the database administrator needs to make either the   removed database or the current primary database inaccessible to clients to   avoid further divergence of the databases and to prevent client-failover   issues.

·        如果数据丢失对于您的业务目标是可以接受的,您可以恢复辅助数据库。

·        If losing data would be acceptable to your business goals, you can   resume the secondary databases.

恢复辅助数据库会导致它如同步数据库第一步所述那样回滚。如果出现故障时日志记录在发送队列中等待,则相应的事务将会丢失,即使已提交这些事务也会如此。

Resuming a new secondary database causes it to be rolled back as the   first step in synchronizing the database. If any log records were waiting in the   send queue at the time of failure, the corresponding transactions are lost,   even if they were committed.

未重新连接原始主副本

The Original Primary Replica Has Not Reconnected

如果可以暂时阻止原始主副本通过网络重新连接到新的主副本,则可以检查原始主数据库以确定恢复它们时可能丢失的数据。

If you can temporarily prevent the original primary replica from   reconnecting over the network to the new primary replica, you can inspect the   original primary databases to evaluate what data would be lost if they were   resumed.

·        如果潜在的数据丢失可以接受

·        If the potential data loss is acceptable

允许原始主副本重新连接到新的主副本。重新连接会导致新的辅助数据库被挂起。要启动数据库的数据同步,只需恢复它。新的辅助副本会删除该数据库的原始恢复分叉,从而丢失从未发送到以前的辅助副本或由其接收的所有事务。

Allow the original primary replica to reconnect to the new primary   replica. Reconnecting causes the new secondary databases to be suspended. To   start data synchronization on a database, simply resume it. The new secondary   replica drops the original recovery fork for that database, losing any   transactions that were never sent to or received by the former secondary   replica.

·        如果数据丢失不可接受

·        If the data loss is unacceptable

如果原始主数据库包含在恢复挂起的数据库时可能丢失的重要数据,则可以从可用性组中删除它,以保留原始主数据库中的数据。这样会导致数据库进入“正在还原”状态。此时,我们建议您尝试备份已删除数据库的日志尾部。然后,通过从原始主数据库中导出要补救的数据,并将其导入当前主数据库来更新当前主数据库(以前的辅助数据库)。建议尽快对已更新的主数据库执行完整数据库备份。

If the original primary database contains critical data that would be   lost if you resumed the suspended database, you can preserve the data on the   original primary database by removing it from the availability group. This   causes the database to enter the RESTORING state. At this point, we recommend   that you attempt to back up the tail of the removed database's log. Then, you   can update the current primary (the former secondary database) by exporting   the data you want to salvage from the original primary database and importing   it into the current primary database. We recommend taking a full database   backup of the updated primary database as quickly as possible.

然后,在承载新的辅助副本的服务器实例上,您可以使用RESTORE WITH NORECOVERY来还原此备份(以及至少一个后续日志备份),从而删除挂起的辅助数据库并创建新的辅助数据库。我们建议延迟当前主数据库的其他日志备份,直到恢复相应的辅助数据库。

Then, on the server instance that hosts the new secondary replica, you   can delete the suspended secondary database and create a new secondary   database by restoring this backup (and least one subsequent log backup) using   RESTORE WITH NORECOVERY. We recommend delaying additional log backups of the   current primary databases until the corresponding secondary databases are   resumed.

警告

Warnings

在其任何辅助数据库被挂起时,事务日志截断在主数据库上被延迟。此外,只要任何本地数据库保持挂起状态,同步提交辅助副本的同步运行状况就无法转换到“正常”。

Transaction log truncation is delayed on a primary database while any of   its secondary databases is suspended. Also the synchronization health of a   synchronous-commit secondary replica cannot transition to HEALTHY as long as   any local database remains suspended.


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/81227/viewspace-2655027/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/81227/viewspace-2655027/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值