Cockroach Design 翻译 ( 十三) Range租期

1  Range Leases (Range租期)

As outlinedin the Raft section, the replicas of a Range are organized as a Raft group andexecute commands from their shared commit log. Going through Raft is anexpensive operation though, and there are tasks which should only be carriedout by a single replica at a time (as opposed to all of them). In particular,it is desirable to serve authoritative reads from a single Replica (ideallyfrom more than one, but that is far more difficult).

Raft选举协议中所略述,range的副本以Raft小组来组织并执行来自它们共享提交日志的命令。因为仔细检查Raft协议是一个昂贵的操作,所以每次仅用单一副本来执行一些任务(而不是所有副本)。特别是,期望提供权威性读取服务,实际却从单一副本读取(理想情况是从多个副本读取,但困难很大)。

For thesereasons, Cockroach introduces the concept of Range Leases: This isa lease held for a slice of (database, i.e. hybrid logical) time and isestablished by committing a special log entry through Raft containing theinterval the lease is going to be active on, along with the Node:RaftIDcombination that uniquely describes the requesting replica. Reads and writesmust generally be addressed to the replica holding the lease; if none does, anyreplica may be addressed, causing it to try to obtain the lease synchronously.Requests received by a non-lease holder (for the HLC timestamp specified in therequest's header) fail with an error pointing at the replica's last known leaseholder. These requests are retried transparently with the updated lease by thegateway node and never reach the client.

出于这些原因,Cockroach引入了Range租约的概念:这是一个持续一段时间的租约,该租约通过提交一个特殊日志条目来建立,该条目符合Raft协议并包含了租约将被激活的时间间隔,它与Node:RaftID联合体一起,该联合体唯一描述了正在请求的副本。读操作和写操作必须被寻址到持有租约的副本;如果不这么做,可以寻址到任一个副本,这将引起它试图同步地获得租约。非租约持有者(在请求头中指定的HLC时间戳)收到的请求将失败并返回一个错误来指出副本的最近已知的租约持有者。

 

The replicaholding the lease is in charge or involved in handling Range-specificmaintenance tasks such as

正在持有租约的副本负责或者协助处理range约定的维护任务,如:

l  gossiping the sentinel and/or first range information

l  传播哨兵和第一个range信息

l  splitting, merging and rebalancing

l  拆分、合并和重平衡

and, veryimportantly, may satisfy reads locally, without incurring the overhead of goingthrough Raft.

和,非常重要的、满足本地读取、不发生仔细检查Raft协议的开销。

 

Since readsbypass Raft, a new lease holder will, among other things, ascertain that itstimestamp cache does not report timestamps smaller than the previous leaseholder's (so that it's compatible with reads which may have occurred on theformer lease holder). This is accomplished by letting leases enter a stasisperiod (which is just the expiration minus the maximum clock offset)before the actual expiration of the lease, so that all the next lease holderhas to do is set the low water mark of the timestamp cache to its new lease'sstart time.

因为读取绕过了Raft,一个新的租约持有者,除其他事情之外还要明确,它的时间戳缓存不能支持比前一租约持有者更小的时间戳(目的是为了兼容发生在前任租约持有者上的读取操作)。这通过在租约实际过期之前使其进入一个停滞期(过期时间-最大时间偏移)来完成,以使得所有下一个租约持有者必须做的是设置一个时间戳缓存低水位线作为它新租约的开始时间。

As a leaseenters its stasis period, no more reads or writes are served, which is undesirable.However, this would only happen in practice if a node became unavailable. Inalmost all practical situations, no unavailability results since leases areusually long-lived (and/or eagerly extended, which can avoidthe stasis period) or proactively transferred awayfrom the lease holder, which can also avoid the stasis period by promising notto serve any further reads until the next lease goes into effect.

当一个租约进入它的停滞期,将不再提供读操作或者写操作服务,这不是我们所期望的。然而,这实际上仅发生在一个节点失效时。在几乎所有实际场景中,没有失效的结果,因为租约通常是长寿命的(和/或急切地延期,这可以避免进入停滞期间)或者前瞻性地从租约持有者转移了,这也可以通过承诺直到下一次租约生效时不再提供任何读取操作避免进入停滞期间。

1.1 Colocation with Raft leadership与Raft leadership合并

The rangelease is completely separate from Raft leadership, and so without furtherefforts, Raft leadership and the Range lease might not be held by the sameReplica. Since it's expensive to not have these two roles colocated (the lease holderhas to forward each proposal to the leader, adding costly RPC round-trips),each lease renewal or transfer also attempts to colocate them. In practice,that means that the mismatch is rare and self-corrects quickly.

Range租约完全从Raft leadership中分离出来,所以不需要更多的投入,Raftleadershiprange租约可能不被相同的副本所持有。因为不同时拥有这两种角色(租约持有者必须将每个命令推送给leader,增加了昂贵的RPC往返开销),成本昂贵,所以每次重新续订或者转移也试图合并这两种角色。实际上,这意味着不匹配会很少出现并且会被快速自修正。

1.2  Command Execution Flow 命令执行流程

Thissubsection describes how a lease holder replica processes a read/write commandin more details. Each command specifies (1) a key (or a range of keys) that thecommand accesses and (2) the ID of a range which the key(s) belongs to. Whenreceiving a command, a node looks up a range by the specified Range ID andchecks if the range is still responsible for the supplied keys. If any of thekeys do not belong to the range, the node returns an error so that the clientwill retry and send a request to a correct range.

本子章节更详细的描述一个持有租约的副本如何处理一个读取或者写命令。每个命令给定了该命令访问的一个key(或者key的一个range) (1)和这些key所属的rangeID2)。当一个节点收到一个命令时,它据所给定的rangeID检索range并检查该range是否一直负责所提供的keys。如果有任一key不属于此range,那么该节点返回错误以使得客户端重试并将请求发送到正确的range

When all thekeys belong to the range, the node attempts to process the command. If thecommand is an inconsistent read-only command, it is processed immediately. Ifthe command is a consistent read or a write, the command is executed when bothof the following conditions hold:

当所有key属于此range,此节点偿试处理该命令。如果该命令是一个不要求一致性的只读命令,那么它将被立即处理。如果该命令是一个一致性读取或者更新操作,那么只有下面的条件全满足时才会被执行:

l  The range replica has a range lease.

l  该range副本拥有range租约

l  There are no other running commands whose keys overlap with the submittedcommand and cause read/write conflict.

l  没有其他正在运行命令的key与已递交命令的key有重叠并且引起读/写冲突。

When thefirst condition is not met, the replica attempts to acquire a lease or returnsan error so that the client will redirect the command to the current leaseholder. The second condition guarantees that consistent read/write commands fora given key are sequentially executed.

When theabove two conditions are met, the lease holder replica processes the command.Consistent reads are processed on the lease holder immediately. Write commandsare committed into the Raft log so that every replica will execute the samecommands. All commands produce deterministic results so that the range replicaskeep consistent states among them.

当第一个条件不满足时,该副本偿试获取租约,或者返回错误以使得客户端可以重新发送命令到正确的租约持有者。第二个条件保障了对给定Key的一致性读/写命令是顺序执行的。当上面的两个条件都满足时,持有租约的副本处理该命令。在租约持有者上的一致性读会立即处理。写命令则被提交到Raft日志以使得每个副本都执行相同的命令。所有命令产生决策结果,目的是range副本们在它们之间保持一致性状态。

When a writecommand completes, all the replica updates their response cache to ensureidempotency. When a read command completes, the lease holder replica updatesits timestamp cache to keep track of the latest read for a given key.

当一个写命令完成,所有副本更新它们的响应缓存来确保幂等性。当一个读命令完成,持有租约的副本更新它的时间戳缓存以追踪所给定key的最新读操作。

There is achance that a range lease gets expired while a command is executed. Beforeexecuting a command, each replica checks if a replica proposing the command hasa still lease. When the lease has been expired, the command will be rejected bythe replica.

当命令被执行时range过期会偶有发生。在执行命令之前,每个副本都检查打算执行命令的副本是否一直有持有租约。当租约已经过期时,命令会被该副本拒绝。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值