13 Self Repair自修复
If a storehas not been heard from (gossiped their descriptors) in sometime, the default setting being 5 minutes, the cluster will consider this storeto be dead. When this happens, all ranges that have replicas on that store aredetermined to be unavailable and removed. These ranges will then upreplicatethemselves to other available stores until their desired replication factor isagain met. If 50% or more of the replicas are unavailable at the same time,there is no quorum and the whole range will beconsidered unavailable until at least greater than 50% of the replicas areagain available.
如果一个store在一段时间内不能被监听到,默认是5分钟,集群将认为该store已经宕掉。此时,所有range在该store上的副本将被认定为失效并被移除。这些range将向上复制它们自己到其他有效store直到期望的复制因子。如果50%或者更多副本同时失效,此时达不到法定数量并且整个range被认定为失效,直到至少超过50%的副本再次有效。
2 Rebalancing重平衡
As more dataare added to the system, some stores may grow faster than others. To combatthis and to spread the overall load across the full cluster,replicas will be moved between stores maintaining the desired replicationfactor. The heuristics used to perform thisrebalancing include:
当更多的数据被加入到系统中,一些store将比另一些增长得更快一些。为了防止这种不平衡,并考虑将负载分散到整个集群,副本将在所期望复制因子的store间移动。此处使用启发式算法来完成该重新平衡,直观因素包括:
l the number of replicas per store
l 每个store副本的数量
l the total size of the data used per store
l 每个store已用数据总大小
l free space available per store
l 每个store有效空闲空间
In thefuture, some other factors that might be considered include:
将来需要考虑的一些其他因素,包括:
l cpu/network load per store
l 每个store 的CPU/网络负载
l ranges that are used together often in queries
l range一起被查询的频度
l number of active ranges per store
l 每个store活动range的数量
l number of range leases held per store
l 每个store持有租期的range的数量