Cockroach Design 翻译 ( 十) 自修复、重平衡

最新推荐文章于 2023-04-12 16:56:28 发布

水利万物而不争

最新推荐文章于 2023-04-12 16:56:28 发布

阅读量929

点赞数

分类专栏：分布式数据库文章标签： Cockroach 分布式OLTP 数据库

分布式数据库专栏收录该内容

18 篇文章 2 订阅

订阅专栏

13 Self Repair自修复

If a storehas not been heard from (gossiped their descriptors) in sometime, the default setting being 5 minutes, the cluster will consider this storeto be dead. When this happens, all ranges that have replicas on that store aredetermined to be unavailable and removed. These ranges will then upreplicatethemselves to other available stores until their desired replication factor isagain met. If 50% or more of the replicas are unavailable at the same time,there is no quorum and the whole range will beconsidered unavailable until at least greater than 50% of the replicas areagain available.

如果一个store在一段时间内不能被监听到，默认是5分钟，集群将认为该store已经宕掉。此时，所有range在该store上的副本将被认定为失效并被移除。这些range将向上复制它们自己到其他有效store直到期望的复制因子。如果50%或者更多副本同时失效，此时达不到法定数量并且整个range被认定为失效，直到至少超过50%的副本再次有效。

2 Rebalancing重平衡

As more dataare added to the system, some stores may grow faster than others. To combatthis and to spread the overall load across the full cluster,replicas will be moved between stores maintaining the desired replicationfactor. The heuristics used to perform thisrebalancing include:

当更多的数据被加入到系统中，一些store将比另一些增长得更快一些。为了防止这种不平衡，并考虑将负载分散到整个集群，副本将在所期望复制因子的store间移动。此处使用启发式算法来完成该重新平衡，直观因素包括：

l the number of replicas per store

l 每个store副本的数量

l the total size of the data used per store

l 每个store已用数据总大小

l free space available per store

l 每个store有效空闲空间

In thefuture, some other factors that might be considered include:

将来需要考虑的一些其他因素，包括：

l cpu/network load per store

l 每个store 的CPU/网络负载

l ranges that are used together often in queries