ZooKeeper Recipes and Solutions 两段式提交和选主

Two-phased Commit【两段式提交】

A two-phase commit protocol is an algorithm that lets all clients in a distributed system agree either to commit a transaction or abort.
二阶段提交协议算法,可以实现分布式系统中所有的客户端要么全部提交一个事务,要么全部撤销一个事务。

In ZooKeeper, you can implement a two-phased commit by having a coordinator create a transaction node, say “/app/Tx”, and one child node per participating site, say “/app/Tx/s_i”. When coordinator creates the child node, it leaves the content undefined. Once each site involved in the transaction receives the transaction from the coordinator, the site reads each child node and sets a watch. Each site then processes the query and votes “commit” or “abort” by writing to its respective node. Once the write completes, the other sites are notified, and as soon as all sites have all votes, they can decide either “abort” or “commit”. Note that a node can decide “abort” earlier if some site votes for “abort”.
在ZooKeeper中,你可以通过一个协调器,创建一个事务节点,称之为 “/app/Tx”,并且为每一个参与事务的站点创建一个子节点,称之为 “/app/Tx/s_i”。当协调器创建了子节点时不定义节点的内容。一旦参与到事务的站点从协调器收到事务,站点读取每一个子节点并设置监听器。然后每一个站点都处理查询任务并且通过向对应的节点写入信息投票“提交”还是撤销事务。当写入完成,其他的站点会得到通知,只要所有的站点都具有选举权,他们就可以决定“撤销”还是“提交”事务。注意,如果一些投票“撤销”,一个节点可以提前决定“撤销”

An interesting aspect of this implementation is that the only role of the coordinator is to decide upon the group of sites, to create the ZooKeeper nodes, and to propagate the transaction to the corresponding sites. In fact, even propagating the transaction can be done through ZooKeeper by writing it in the transaction node.
关于这个措施,有趣的一点是,协调器充当的唯一的角色是基于全部站点,确定要创建ZooKeeper节点并将事务传递给对应的站点。实际上,传递事务都可以交给ZooKeeper通过写入事务节点来实现。
在这里插入图片描述
There are two important drawbacks of the approach described above. One is the message complexity, which is O(n²). The second is the impossibility of detecting failures of sites through ephemeral nodes. To detect the failure of a site using ephemeral nodes, it is necessary that the site create the node.
上面的方法有两个严重的缺点。一个是消息的复杂度O(n²)。另一个是临时节点无法探测站点的失效。要通过临时节点探测站点失效,站点创建节点是必需的。

To solve the first problem, you can have only the coordinator notified of changes to the transaction nodes, and then notify the sites once coordinator reaches a decision. Note that this approach is scalable, but it’s is slower too, as it requires all communication to go through the coordinator.
解决第一个问题,你可以只将事务节点的变更消息通知给协调器,然后当协调器做出决定后, 一次性通知各个站点。注意这个方法扩展性更好,但是速度也更慢,因为所有的交流都传递给了协调器。

To address the second problem, you can have the coordinator propagate the transaction to the sites, and have each site creating its own ephemeral node.
为了解决第二个问题,你可以要求协调器把事务传递给各个站点,然后要求各个站点创建自己的临时节点。

Leader Election 【Leader选举】

A simple way of doing leader election with ZooKeeper is to use the SEQUENCE|EPHEMERAL flags when creating znodes that represent “proposals” of clients. The idea is to have a znode, say “/election”, such that each znode creates a child znode “/election/n_” with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper automatically appends a sequence number that is greater that any one previously appended to a child of “/election”. The process that created the znode with the smallest appended sequence number is the leader.
ZooKeeper实现leader选举的一个简单方法使用 创建代表客户端提议的z节点时给节点设置的 顺序号|纪元 标记。这个想法是,创建一个 /election 节点,然后每一个z节点在/election节点下创建一个具有 顺序号|纪元 标记的子节点 /election/n_。因为有顺序号,ZooKeeper会自动在 /election的子节点上追加一个递增的顺序号。创建具有最小顺序号z节点的进程就是leader。

That’s not all, though. It is important to watch for failures of the leader, so that a new client arises as the new leader in the case the current leader fails. A trivial solution is to have all application processes watching upon the current smallest znode, and checking if they are the new leader when the smallest znode goes away (note that the smallest znode will go away if the leader fails because the node is ephemeral). But this causes a effect: upon of failure of the current leader, all other processes receive a notification, and execute getChildren on “/election” to obtain the current list of children of “/election”. If the number of clients is large, it causes a spike on the number of operations that ZooKeeper servers have to process. To avoid the herd effect, it is sufficient to watch for the next znode down on the sequence of znodes. If a client receives a notification that the znode it is watching is gone, then it becomes the new leader in the case that there is no smaller znode. Note that this avoids the herd effect by not having all clients watching the same znode.
这还没有结束。监视leader失效是极为重要的,所以当前leader失效时一个新的客户端可以成为新的leader。一个繁琐的方法是让所有的应用程序都监听当前最小的z节点,并检查当最小的z节点移除时(leader失效时,最小z节点会移除,因为是临时节点)他们是否是新leader。但是这会导致一个羊群效应:当当前leader失效时,所有其他的进程都会受到一个通知,并对 /election 执行 getChildren 获取 /election 子节点的当前列表。如果客户端的数量很多,会导致ZooKeeper服务器要处理的操作数激增。为了避免羊群效应,只要按顺序号监控下一个z节点就足够了。如果一个客户端收到通知说他所监控的z节点消失了,当没有顺序号更小的z节点时他就会成为一个新的leader。因为不存在所有客户端监听同一个z节点,就避免了羊群效应。

Here’s the pseudo code:
Let ELECTION be a path of choice of the application. To volunteer to be a leader:

  1. Create znode z with path “ELECTION/n_” with both SEQUENCE and EPHEMERAL flags;
  2. Let C be the children of “ELECTION”, and i be the sequence number of z;
  3. Watch for changes on “ELECTION/n_j”, where j is the smallest sequence number such that j < i and n_j is a znode in C;
    Upon receiving a notification of znode deletion:
  4. Let C be the new set of children of ELECTION;
  5. If z is the smallest node in C, then execute leader procedure;
  6. Otherwise, watch for changes on “ELECTION/n_j”, where j is the smallest sequence number such that j < i and n_j is a znode in C;

Note that the znode having no preceding znode on the list of children does not imply that the creator of this znode is aware that it is the current leader. Applications may consider creating a separate to znode to acknowledge that the leader has executed the leader procedure.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值