时间戳命名复制_是否想了解带时间戳复制的工作原理？阅读此摘要。

最新推荐文章于 2024-06-21 16:41:46 发布

cumifi2519

最新推荐文章于 2024-06-21 16:41:46 发布

阅读量230

点赞数

文章标签：网络分布式数据库 java python

原文链接：https://www.freecodecamp.org/news/viewstamped-replication-revisited-a-summary-144ac94bd16f/

版权

时间戳命名复制

by Shubheksha

通过Shubheksha

是否想了解带时间戳复制的工作原理？阅读此摘要。 (Want to learn how Viewstamped Replication works? Read this summary.)

This article will distill the contents of the academic paper Viewstamped Replication Revisited by Barbara Liskov and James Cowling. All quotations are taken from that paper.

本文将摘录由Barbara Liskov和James Cowling 重新撰写的学术论文Viewstamped Replication的内容。所有报价均摘自该论文。

It presents an updated explanation of Viewstamped Replication, a replication technique that handles failures in which nodes crash. It describes how client requests are handled, how the group reorganizes when a replica fails, and how a failed replica is able to rejoin the group.

它提供了Viewstamped Replication的最新解释，Viewstamped Replication是一种处理节点崩溃的故障的复制技术。它描述了如何处理客户端请求，当副本失败时如何重新组织该组以及失败的副本如何能够重新加入该组。

介绍 (Introduction)

The Viewstamped Replication protocol, referred to as VR, is used for replicated services that run on many nodes known as replicas. VR uses state machine replication: it maintains state and makes it accessible to the clients consuming that service.

Viewstamped Replication协议(称为VR)用于在许多称为副本的节点上运行的复制服务。 VR使用状态机复制：VR维护状态并使使用该服务的客户端可以访问状态。

Some features of VR:

VR的一些功能：

VR is primarily a replication protocol, but it provides consensus too.
VR主要是一种复制协议，但它也提供了共识。
VR doesn’t use any disk I/O — it uses replicated state for persistence.
VR不使用任何磁盘I / O，而是使用复制状态来保持持久性。
VR deals only with crash failures: a node is either functioning or it completely stops.
VR仅处理崩溃失败：节点正在运行或完全停止。
VR works in an asynchronous network like the internet where nothing can be concluded about a message that doesn’t arrive. It may be lost, delivered out of order, or delivered many times.
VR在像互联网这样的异步网络中工作，对于未到达的消息，无法得出任何结论。它可能会丢失，乱序交付或多次交付。

副本组 (Replica Groups)

VR ensures reliability and availability when no more than a threshold of f replicas are faulty. It does this by using replica groups of size 2f + 1; this is the minimal number of replicas in an asynchronous network under the crash failure model.

当不超过f个副本的阈值出现故障时，VR可确保可靠性和可用性。它通过使用大小为2f +1的副本组来完成此操作；这是崩溃失败模型下异步网络中的最小副本数。

We can provide a simple proof for the above statement: in a system with f crashed nodes, we need at least the majority of f+1 nodes that can mutually agree to keep the system functioning.

我们可以为上述陈述提供一个简单的证明：在具有f个崩溃节点的系统中，我们至少需要能够相互同意才能保持系统正常运行的大多数f + 1个节点。

A group of f+1 replicas is often known as a quorum. The protocol needs the quorum intersection property to be true to work correctly. This property states that:

一组f + 1副本通常被称为仲裁。 该协议需要法定交叉点属性为真才能正常工作。该属性指出：

The quorum of replicas that processes a particular step of the protocol must have a non-empty intersection with the group of replicas available to handle the next step, since this way we can ensure that at each next step at least one participant knows what happened in the previous step.

处理协议的特定步骤的副本的法定人数必须与可用于处理下一步的副本组具有非空交集，因为这样可以确保在每个下一步中至少有一个参与者知道发生了什么上一步。

建筑： (Architecture:)

The architecture of VR is as follows:

VR的体系结构如下：

The user code is run on client machines on top of a VR proxy.
用户代码在VR代理之上的客户端计算机上运行。
The proxy communicates with the replicas to carry out the operations requested by the client. It returns the computed results from the replicas back to the client.
代理与副本进行通信以执行客户端请求的操作。它将计算结果从副本返回给客户端。
The VR code on the side of the replicas accepts client requests from the proxy, executes the protocol, and executes the request by making an up-call to the service code.
副本侧的VR代码接受来自代理的客户端请求，执行协议，并通过向上调用服务代码来执行请求。
The service code returns the result to the VR code which in turn sends a message to the client proxy that requested the operation.
服务代码将结果返回给VR代码，然后VR代码向请求该操作的客户端代理发送消息。

总览 (Overview)

The challenge for the replication protocol is to ensure that operations execute in the same order at all replicas in spite of concurrent requests from clients and in spite of failures.

复制协议面临的挑战是，尽管有来自客户端的并发请求，也有失败的可能，但要确保操作在所有副本上以相同的顺序执行。

If all the replicas should end in the same state, it is important that the above condition is met.

如果所有副本都应以相同状态结束，则必须满足上述条件。

VR deals with the replicas as follows:

VR按照以下方式处理副本：

Primary: Decides the order in which the operations will be executed

主要的 ：决定执行操作的顺序

Secondary: Carries out the operations in the same order as selected by the primary

辅助：以与主要选择的顺序相同的顺序执行操作

What if the primary fails?

如果主服务器出现故障怎么办？

VR allows different replicas to assume the role of primary if it fails over time.
如果VR随着时间的流逝而失败，则它允许不同的副本承担主要角色。
The system moves through a series of views. In each view, one replica assumes the role of primary.
该系统在一系列视图中移动。在每种视图中，一个副本承担主要角色。
The other replicas watch the primary. If it appears to be faulty, then they carry out a view-change to select a new primary.
其他副本监视主副本。如果它看起来有问题，则他们进行视图更改以选择新的主数据库。

We consider the following three scenarios of the VR protocol:

我们考虑以下三种VR协议场景：

Normal case processing of user requests
用户请求的正常案例处理
View changes to select a new primary
查看更改以选择新的主要数据库
Recovery of a failed replica so that it can rejoin the group
恢复失败的副本，以便它可以重新加入组

VR协议 (VR protocol)

The state maintained by each replica is presented in the figure above. Some points to note:

上图中显示了每个副本所维护的状态。需要注意的几点：

The identity of the primary isn’t stored but computed using the view number and the configuration.
主数据库的身份不会存储，而是使用视图号和配置进行计算。
The replica with the smallest IP is replica 1 and so on.
IP最小的副本是副本1，依此类推。

The client side proxy also maintains some state:

客户端代理还维护一些状态：

It records the configuration.
它记录配置。
It records the current view number to track the primary.
它记录当前视图编号以跟踪主要视图。
It has a client id and an incrementing client request number.
它具有一个客户端ID和一个递增的客户端请求编号。

普通手术 (Normal Operation)

Replicas participate in processing of client requests only when their status is normal.
副本仅在其状态正常时才参与客户端请求的处理。
Each message sent contains the sender’s view number. Replicas process only those requests which have a view number that matches what they know. If the sender replica is ahead, it drops the message. If it’s behind, it performs a state transfer.
发送的每条消息均包含发件人的查看号。副本仅处理视图编号与其已知内容匹配的那些请求。如果发件人副本在前，它将丢弃该邮件。如果落后，它将执行状态转移。

The normal operation of VR can be broken down into the following steps:

VR的正常运行可以分为以下步骤：

The client sends a REQUEST message to the primary asking it to perform some operation, passing it the client-id and the request number.
客户端向主服务器发送REQUEST消息，要求其执行某些操作，并向其传递客户端ID和请求号 。
The primary cross-checks the info present in the client table. If the request number is smaller than the one present in the table, it discards it. It re-sends the response if the request was the most recently executed one.
主要检查用户表中存在的信息。如果请求号小于表中的请求号，则将其丢弃。如果请求是最近执行的请求，它将重新发送响应。
The primary increases the op-number, appends the request to its log, and updates the client table with the new request number. It sends a PREPARE message to the replicas with the current view-number, the operation-number, the client’s message, and the commit-number (the operation number of the most recently committed operation).
初级增加 op-number ，将请求追加到其日志中，并使用新的请求号更新客户端表。它使用当前的视图号，操作号，客户端的消息和提交号 (最近提交的操作的操作号)向副本发送一个PREPARE消息。
The replicas won’t accept a message with an op-number until they have all operations preceding it. They use state transfer to catch up if required. Then they add the operation to their log, update the client table, and send a PREPAREOK message to the primary. This message indicates that the operation, including all the preceding ones, has been prepared successfully.
副本将不接受带有操作号的消息，直到它们前面有所有操作。如果需要，他们使用状态转移来追赶。然后，他们将操作添加到日志中，更新客户端表，并向主数据库发送PREPAREOK消息。此消息表明该操作(包括所有前面的操作)已成功准备。
The primary waits for a response from f replicas before committing the operation. It increments the commit-number. After making sure all operations preceding the current one have been executed, it makes an up-call to the service code to execute the current operation. A REPLY message is sent to the client containing the view-number, request-number, and the result of the up-call.
主数据库在提交操作之前等待f个副本的响应。它增加了commit-number 。确保当前操作之前的所有操作均已执行后，它将向上调用服务代码以执行当前操作。一条REPLY消息被发送到客户端，其中包含视图号，请求号和呼叫结果。

Usually the PREPARE message is used to inform the backup replicas of the committed operations. It can also do so by sending a COMMIT message.

通常，PREPARE消息用于通知已提交操作的备份副本。它也可以通过发送COMMIT消息来做到这一点。

To execute a request, a backup has to make sure that the operation is present in its log and that all the previous operations have been executed. Then it executes the said operation, increments its commit-number, and updates the client’s entry in the client-table. But it doesn’t send a reply to the client, as the primary has already done that.

要执行请求，备份必须确保该操作存在于其日志中，并且已经执行了所有先前的操作。然后，它执行上述操作，增加其提交编号 ，并更新客户端表中客户端的条目。但是它不会向客户端发送答复，因为主要服务器已经做到了。

If a client doesn’t receive a timely response to a request, it re-sends the request to all replicas. This way if the group has moved to a later view, its message will reach the new primary. Backups ignore client requests; only the primary processes them.

如果客户端没有及时收到对请求的响应，则会将请求重新发送到所有副本。这样，如果组已移至后一个视图，则其消息将到达新的主数据库。备份将忽略客户端请求；仅主要处理它们。

查看变更操作 (View change operation)

Backups monitor the primary: they expect to hear from it regularly. Normally the primary is sending PREPARE messages, but if it is idle (due to no requests) it sends COMMIT messages instead. If a timeout expires without a communication from the primary, the replicas carry out a view change to switch to a new primary.

备份监视主数据库：他们希望能定期收到它的消息。通常，主服务器正在发送PREPARE消息，但是如果主服务器处于空闲状态(由于没有请求)，它将发送COMMIT消息。如果超时到期而没有来自主服务器的通信，则副本将执行视图更改以切换到新的主服务器。

There is no leader election in this protocol. The primary is selected in a round robin fashion. Each member has a unique IP address. The next primary is the backup replica with the smallest IP that is functioning. Each number in the group is already aware of who is expected to be the next primary.

此协议中没有领导者选举。以循环方式选择初级。每个成员都有一个唯一的IP地址。下一个主要对象是具有最小IP的备份副本。小组中的每个数字都已经知道谁将成为下一个主要成员。

Every executed operation at the replicas must survive the view change in the order specified when it was executed. The up-call is carried out at the primary only after it receives f PREPAREOK messages. Thus the operation has been recorded in the logs of at least f+1 replicas (the old primary and f replicas).

在副本上执行的每个操作必须在视图更改执行后指定的顺序中幸免。仅在收到f PREPAREOK消息后，在主服务器上进行上行呼叫。因此，该操作已记录在至少f + 1个副本(旧的主副本和f个副本)的日志中。

Therefore the view change protocol obtains information from the logs of at least f + 1 replicas. This is sufficient to ensure that all committed operations will be known, since each must be recorded in at least one of these logs; here we are relying on the quorum intersection property. Operations that had not committed might also survive, but this is not a problem: it is beneficial to have as many operations survive as possible.

因此，视图更改协议从至少f + 1个副本的日志中获取信息。这足以确保所有提交的操作都将被知道，因为每个操作都必须记录在这些日志中的至少一个中。在这里，我们依靠仲裁交集属性。未提交的操作也可以生存，但这不是问题：使尽可能多的操作生存是有益的。

A replica that notices the need for a view change advances its view-number, sets its status to view-change, and sends a START-VIEW-CHANGE message. A replica identifies the need for a view change based on its own timer, or because it receives a START-VIEW-CHANGE or a DO-VIEW-CHANGE from others with a view-number higher than its own.
注意到需要更改视图的副本会提高其视图编号 ，将其状态设置为view-change ，并发送START-VIEW-CHANGE消息。副本基于其自己的计时器，或因为它从其他视图编号高于其自身的副本接收到START-VIEW-CHANGE或DO-VIEW-CHANGE，因此确定需要更改视图。
When a replica receives f START-VIEW-CHANGE messages for its view-number, it sends a DO-VIEW-CHANGE to the node expected to be the primary. The messages contain the state of the replica: the log, most recent operation-number and commit-number, and the number of the last view in which its status was normal.
当副本接收到f个其视图编号的START-VIEW-CHANGE消息时，它将DO-VIEW-CHANGE发送到预期为主要节点的节点。消息包含副本的状态：日志，最近的操作号和提交号以及状态为正常的最后一个视图的号。
The new primary waits to receive f+1 DO-VIEW-CHANGE messages from the replicas (including itself). Then it updates its state to the most recent based on the info from replicas (see paper for all rules). It sets its number as the view-number in the messages, and changes its status to normal. It informs all other replicas by sending a STARTVIEW message with the most recent state including the new log, commit-number and op-number.
新的主服务器等待接收来自副本(包括其自身)的f + 1 DO-VIEW-CHANGE消息。然后，它根据副本中的信息将其状态更新为最新状态(有关所有规则，请参见论文)。它将其编号设置为消息中的视图编号 ，并更改其状态正常。它通过发送具有最新状态的STARTVIEW消息(包括新日志， 提交号)来通知所有其他副本和op-number 。
The primary can now accept client requests. It executes any committed operations and sends the replies to clients.
主服务器现在可以接受客户端请求。它执行所有已提交的操作，并将答复发送给客户端。
When the replicas receive a STARTVIEW message, they update their state based on the message. They send PREPAREOK messages for all uncommitted operations present in their log after the update. They execute these operations to to be in sync with the primary.
当副本收到STARTVIEW消息时，它们会根据该消息更新其状态。更新后，它们会为日志中存在的所有未提交的操作发送PREPAREOK消息。他们执行这些操作以与主数据库同步。

To make the view change operation more efficient, the paper describes the following approach:

为了使视图更改操作更有效，本文介绍了以下方法：

The protocol described has a small number of steps, but big messages. We can make these messages smaller, but if we do, there is always a chance that more messages will be required. A reasonable way to get good behavior most of the time is for replicas to include a suffix of their log in their DO-VIEW-CHANGE messages. The amount sent can be small since the most likely case is that the new primary is up to date. Therefore sending the latest log entry, or perhaps the latest two entries, should be sufficient. Occasionally, this information won’t be enough; in this case the primary can ask for more information, and it might even need to first use application state to bring itself up to date.

所描述的协议只有很少的步骤，但是消息却很大。我们可以使这些消息更小，但是如果这样做，总是有可能需要更多的消息。大多数情况下，获得良好行为的合理方法是使副本在其DO-VIEW-CHANGE消息中包含其日志的后缀。发送的金额可能很小，因为最有可能的情况是新的主数据库是最新的。因此，发送最新的日志条目，或者发送最新的两个条目，就足够了。有时候，这些信息是不够的。在这种情况下，主要用户可能会要求提供更多信息，甚至可能需要首先使用应用程序状态来更新自身。

复苏 (Recovery)

When a replica recovers after a crash it cannot participate in request processing and view changes until it has a state at least as recent as when it failed. If it could participate sooner than this, the system can fail.

当副本在崩溃后恢复时，它无法参与请求处理并查看更改，直到它的状态至少与失败时一样。如果它可以早于此时间参与，则系统可能会失败。

The replica should not “forget” anything it has already done. One way to ensure this is to persist the state on disk — but this will slow down the whole system. This isn’t necessary in VR because the state is persisted at other replicas. It can be obtained by using a recovery protocol provided that the replicas are failure independent.

副本不应“忘记”已经完成的任何事情。确保此状态的一种方法是将状态保留在磁盘上-但这会减慢整个系统的速度。在VR中这不是必需的，因为状态会保留在其他副本中。如果副本与故障无关，则可以使用恢复协议来获得它。

When a node comes back up after a crash it sets its status to recovering and carries out the recovery protocol. While a replica’s status is recovering it does not participate in either the request processing protocol or the view change protocol.

当节点在崩溃后恢复正常时，它将其状态设置为正在恢复并执行恢复协议。当副本的状态恢复时，它既不参与请求处理协议，也不参与视图更改协议。

The recovery protocol is as follows:

恢复协议如下：

The recovering replica sends a RECOVERY message to all other replicas with a nonce.
正在恢复的副本使用随机数将RECOVERY消息发送到所有其他副本。
Only if the replica’s status is normal does it reply to the recovering replica with a RECOVERY-RESPONSE message. This message contains its view number and the nonce it received. If it’s the primary, it also sends its log, op-number, and commit-number.
仅当副本的状态正常时，它才会使用RECOVERY-RESPONSE回复正在恢复的副本信息。该消息包含其视图编号和收到的随机数。如果是主数据库，则还会发送其日志，操作号和提交号。
When the replica has received f+1 RECOVERY-RESPONSE messages, including one from the primary, it updates its state and changes its status to normal.
当副本已接收到f + 1 RECOVERY-RESPONSE消息(包括来自主副本的一条消息)时，它将更新其状态并将其状态更改为正常。

The protocol uses the nonce to ensure that the recovering replica accepts only RECOVERY-RESPONSE messages that are for this recovery and not an earlier one.

协议使用随机数来确保恢复的副本仅接受用于此恢复的RECOVERY-RESPONSE消息，而不接受较早的消息。

重新配置 (Reconfiguration)

Reconfiguration deals with epochs. The epoch represents the group of replicas processing client requests. If the threshold for failures, f, is adjusted, the system can either add or remove replicas and transition to a new epoch. It keeps track of epochs through the epoch-number.

重新配置处理时代。时期代表处理客户端请求的副本组。如果调整了故障阈值f，则系统可以添加或删除副本并过渡到新纪元。它通过纪元编号跟踪纪元。

Another status, namely transitioning, is used to signify that a system is moving between epochs.

另一个状态，即过渡，用于表示系统在各个时期之间移动。

The approach to handling reconfiguration is as follows. A reconfiguration is triggered by a special client request. This request is run through the normal case protocol by the old group. When the request commits, the system moves to a new epoch, in which responsibility for processing client requests shifts to the new group. However, the new group cannot process client requests until its replicas are up to date: the new replicas must know all operations that committed in the previous epoch. To get up to date they transfer state from the old replicas, which do not shut down until the state transfer is complete.

处理重新配置的方法如下。重新配置由特殊的客户端请求触发。该请求由旧组通过常规案例协议运行。当请求提交时，系统将移至新纪元，其中处理客户请求的职责将移至新组。但是，新组无法处理客户端请求，除非其副本是最新的：新副本必须知道在上一个时期提交的所有操作。为了保持最新状态，它们从旧副本中转移状态，这些旧副本在状态转移完成之前不会关闭。

The VR sub protocols need to be modified to deal with epochs. A replica doesn’t accept messages from an older epoch compared to what it knows, such as those with an older epoch-number. It informs the sender about the new epoch.

需要修改VR子协议以应对时代。与已知副本相比，副本不接受来自较旧时代的消息，例如具有较旧epoch-number的消息 。它将新纪元通知发送者。

During a view-change, the primary cannot accept client requests when the system is transitioning between epochs. It does this by checking if the topmost request in its log is a RECONFIGURATION request. A recovering replica in an older epoch is informed of the epoch if it is part of the new epoch or if it shuts down.

在视图更改期间，当系统在各个时期之间转换时，主数据库无法接受客户端请求。它通过检查日志中最上面的请求是否为RECONFIGURATION请求来完成此操作。如果旧时代中的恢复副本是新时代的一部分，或者它已关闭，则通知该时代。

The issue that comes to mind is that the client requests can’t be served while the system is moving to a new epoch.

想到的问题是，当系统移至新纪元时，无法满足客户端请求。

The old group stops accepting client requests the moment the primary of the old group receives the RECONFIGURATION request; the new group can start processing client requests only when at least f + 1 new replicas have completed state transfer.

当旧组的主服务器接收到RECONFIGURATION请求时，旧组将停止接受客户端请求。只有在至少f + 1个新副本完成状态转移后，新组才能开始处理客户端请求。

This can be dealt with by “warming up” the nodes before reconfiguration happens. The nodes can be brought up-to-date using state transfer while the old group continues to reply to client requests. This reduces the delay caused during reconfiguration.

这可以通过在重新配置发生之前“预热”节点来解决。可以使用状态转移使节点保持最新状态，同时旧组继续回复客户端请求。这减少了重新配置期间引起的延迟。

This paper has presented an improved version of Viewstamped Replication, a protocol used to build replicated systems that are able to tolerate crash failures. The protocol does not require any disk writes as client requests are processed or even during view changes, yet it allows nodes to recover from failures and rejoin the group.

本文介绍了Viewstamped Replication的改进版本，该协议用于构建能够容忍崩溃故障的复制系统。该协议在处理客户端请求时甚至在视图更改期间都不需要任何磁盘写操作，但是它允许节点从故障中恢复并重新加入组。

The paper also presents a protocol to allow for reconfigurations that change the members of the replica group, and even the failure threshold. A reconfiguration technique is necessary for the protocol to be deployed in practice since the systems of interest are typically long lived.

本文还提出了一种协议，允许重新配置，以更改副本组的成员，甚至更改故障阈值。由于所关注的系统通常寿命长，因此重新配置技术对于协议在实践中的部署是必不可少的。

If you enjoyed this essay, please hit the clap button so more people see it. Thank you!

如果您喜欢这篇文章，请点击拍手按钮，让更多的人看到它。谢谢！

P.S. — If you made it this far and would like to receive a mail whenever I publish one of these posts, sign up here.

PS —如果您到现在为止，并且希望在我发布这些帖子之一时收到邮件，请在此处注册。

翻译自: https://www.freecodecamp.org/news/viewstamped-replication-revisited-a-summary-144ac94bd16f/

时间戳命名复制

cumifi2519

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
时间戳命名复制_是否想了解带时间戳复制的工作原理？阅读此摘要。

时间戳命名复制by Shubheksha 通过Shubheksha 是否想了解带时间戳复制的工作原理？阅读此摘要。 (Want to learn how Viewstamped Replication works? Read this summary.)This article will distill the contents of the academic paper Viewsta...
复制链接

扫一扫