老从节点成为新master之后又被老主节点起来抢回master
接上篇的讨论(https://blog.csdn.net/damanchen/article/details/101075757)
一、查看之前的日志信息
152440:S 19 Sep 14:37:25.166 # Failover election won: I'm the new master.
152440:S 19 Sep 14:37:25.166 # configEpoch set to 61 after successful failover
152440:M 19 Sep 14:37:25.166 # Setting secondary replication ID to fedaa5b39ab6d03bcc869b536a63df55f69ca4f2, valid up to offset: 337640614162. New replication ID is 4a53bcb6407669b5b376510f18d3cd4ff37bfc8d
152440:M 19 Sep 14:37:25.166 # Connection with master lost. //啥意思?
152440:M 19 Sep 14:37:25.166 * Caching the disconnected master state.
152440:M 19 Sep 14:37:25.166 * Discarding previously cached master state.
152440:M 19 Sep 14:38:03.163 # Client id=12063607 addr=10.114.2.22:21516 fd=268 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=16384 oll=4090 omem=67108864 events=r cmd=hgetall scheduled to be closed ASAP for overcoming of output buffer limits.
152440:M 19 Sep 14:38:28.027 * Clear FAIL state for node 58c5ef7e9c0e53c6097f5f52c504f083a2b1e09b: master without slots is reachable again.
**152440:M 19 Sep 14:38:28.027 # Failover auth denied to 58c5ef7e9c0e53c6097f5f52c504f083a2b1e09b: its master is up
152440:M 19 Sep 14:38:28.027 # Configuration change detected. Reconfiguring myself as a replica of 58c5ef7e9c0e53c6097f5f52c504f083a2b1e09b //啥触发的Configuration change detected?**
152440:S 19 Sep 14:38:28.027 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
152440:S 19 Sep 14:38:28.936 * Connecting to MASTER 10.136.171.48:33482
152440:S 19 Sep 14:38:28.936 * MASTER <-> SLAVE sync started
152440:S 19 Sep 14:38:28.937 * Non blocking connect for SYNC fired the event.
152440:S 19 Sep 14:38:28.937 * Master replied to PING, replication can continue...
152440:S 19 Sep 14:38:28.937 * Trying a partial resynchronization (request 4a53bcb6407669b5b376510f18d3cd4ff37bfc8d:337642047232).
152440:S 19 Sep 14:38:29.086 * Full resync from master: 6584841c3345e4cfba369550d3fe7b95d38daa70:337640711689
152440:S 19 Sep 14:38:29.086 * Discarding previously cached master state.
152440:S 19 Sep 14:39:13.597 * MASTER <-> SLAVE sync: receiving 782800027 bytes from master
152440:S 19 Sep 14:39:15.863 * MASTER <-> SLAVE sync: Flushing old data
152440:S 19 Sep 14:39:56.469 * MASTER <-> SLAVE sync: Loading DB in memory
152440:S 19 Sep 14:40:56.306 * MASTER <-> SLAVE sync: Finished with success
这跟我们理解的主节点出故障后,主从切换的逻辑不太一样啊。
按理来说,在集群中其他主节点对出故障的老主节点进行故障恢复和转移的时候,会选出一个它的从库来作为新的主库继续工作。
之后就算老主节点故障恢复之后,老主节点也只能作为从库继续工作呀。
但是这里看日志发现老主节点起来之后好像就把新主节点的master抢了回去
。。。。。。
新主库被抢master的日志记录如下:
152440:M 19 Sep 14:38:28.027 * Clear FAIL state for node 58c5ef7e9c0e53c6097f5f52c504f083a2b1e09b: master without slots is reachable again.
152440:M 19 Sep 14:38:28.027 # Failover auth denied to 58c5ef7e9c0e53c6097f5f52c504f083a2b1e09b: its master is up
152440:M 19 Sep 14:38:28.027 # Configuration change detected. Reconfiguring myself as a replica of 58c5ef7e9c0e53c6097f5f52c504f083a2b1e09b //啥触发的Configuration change detected?
152440:S 19 Sep 14:38:28.027 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
难道触发了 ‘Configuration change detected.’就会被抢走master吗,分析一下源码看看真的是这样吗?
二、分析源码
2.1 查找日志的来源
根据 "Configuration change detected. Reconfiguring myself… "查看cluster.c代码。
在 clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoch, unsigned char *slots) 函数中,可以看到如下代码定义:
* 如果当前节点(或者当前节点的主节点)有至少一个槽被指派到了 sender,并且 sender 的 configEpoch 比当前节点的纪元要大,
* 那么可能发生了:
*1、当前节点是一个不再处理任何槽的主节点,这时应该将当前节点设置为新主节点的从节点。
*2、当前节点是一个从节点,并且当前节点的主节点已经不再处理任何槽,这时应该将当前节点设置为新主节点的从节点。
if (newmaster && curmaster->numslots == 0) {
redisLog(REDIS_WARNING,
"Configuration change detected. Reconfiguring myself "
"as a replica of %.40s", sender->name);
// 将 sender 设置为当前节点的主节点
clusterSetMaster(sender);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|
CLUSTER_TODO_UPDATE_STATE|
CLUSTER_TODO_FSYNC_CONFIG);
}
这个函数要做的就是在 slots 参数的新配置和本节点的当前配置进行对比,并更新本节点对槽的布局。
如果有需要的话,函数还会将本节点转换为 sender 的从节点!!!
说明在调用这个函数的时候,就有可能会把本节点转换为sender的从节点,
那么哪个函数会调用该函数呢?
2.2 查询函数调用
在 clusterProcessPacket(clusterLink *link) 函数中的调用如下:
// 如果 sender 是主节点,并且 sender 的槽布局出现了变动
// 那么检查当前节点对 sender 的槽布局设置,看是否需要进行更新
if (sender && nodeIsMaster(sender) && dirty_slots)
clusterUpdateSlotsConfigWith(sender,</