redis-plus-plus简介:redis-plus-plus
redis集群中节点通过gossip协议来交换信息,具体是通过ping消息来实现,主要逻辑是在clusterSendPing函数实现
ping消息每次携带的gossip节点数为1/10的节点数,最少携带三个三个节点。
wanted = floor(dictSize(server.cluster->nodes)/10);
if (wanted < 3) wanted = 3;
if (wanted > freshnodes) wanted = freshnodes;
为什么是1/10呢?redis作者对这几行代码注释如下:
/* How many gossip sections we want to add? 1/10 of the number of nodes
* and anyway at least 3. Why 1/10?
*
* If we have N masters, with N/10 entries, and we consider that in
* node_timeout we exchange with each other node at least 4 packets
* (we ping in the worst case in node_timeout/2 time, and we also
* receive two pings from the host), we have a total of 8 packets
* in the node_timeout*2 falure reports validity time. So we have
* that, for a single PFAIL node, we can expect to receive the following
* number of failure reports (in the specified window of time):
*
* PROB * GOSSIP_ENTRIES_PER_PACKET * TOTAL_PACKETS:
*
* PROB = probability of being featured in a single gossip entry,
* which is 1 / NUM_OF_NODES.
* ENTRIES = 10.
* TOTAL_PACKETS = 2 * 4 * NUM_OF_MASTERS.
*
* If we assume we have just masters (so num of nodes and num of masters
* is the same), with 1/10 we always get over the majority, and specifically
* 80% of the number of nodes, to account for many masters failing at the
* same time.
*
* Since we have non-voting slaves that lower the probability of an entry
* to feature our node, we set the number of entries per packet as
* 10% of the total nodes we have. */
ping消息之所以需要包含1/10节点的信息,是为了能够在下线检测时间(2倍的node_timeout时间)内,
能够收到大部分集群节点发来的信息。
假设集群共有N个节点,在超时时间内,一个节点最少回和其它的每个节点互换4个心跳包:
因节点最长经过node_timeout/2时间,就会其他节点发送一次PING包。节点收到PING包后,会回复PONG包。因此,在下线监测时间node_timeout2内,会收到其他任一集群节点发来的8个心跳包。
因此,当前节点总共可以收到8N个心跳包,每个心跳包中,包含下线节点信息的概率是1/10,
因此,收到下线报告的期望值就是8N(1/10),也就是N80%,因此,这意味着可以收到大部分节点发来的下线报告。