从库自动failover,有三种情况
1.由存活的主节点判断挂掉的主节点为客观下线(fail),即其他主节点抢先收集够了pfail数,然后广播fail信息到从库,从库clusterCron检查到挂掉主库的fail并发起投票时,其他主节点会正常投票
从库的日志是:
76803:S 09 Jun 12:21:09.240 * FAIL message received from bb7664a96fc83d3f31c2649ec37894a4944ed38b about 64cdc10096644b5bc3624f41ade916983806c47c
76803:S 09 Jun 12:21:09.257 # Start of election delayed for 579 milliseconds (rank #0, offset 30).
76803:S 09 Jun 12:21:09.859 # Starting a failover election for epoch 37.
76803:S 09 Jun 12:21:09.860 # Failover election won: I'm the new master.
76803:S 09 Jun 12:21:09.860 # configEpoch set to 37 after successful failover
76803:M 09 Jun 12:21:09.860 * Discarding previously cached master state.
2.从库抢先判断为客观下线,即从节点抢先收集够了pfail数,从库将挂掉的主库提升为fail状态,但从库不会广播fail状态,所以无法将fail通知其他存活主库
从库的日志是:
83548:S 08 Jun 14:28:02.202 * Marking node 82f42d2e50fa857fcda87127c3b958f868328eaa as failing (quorum reached).
83548:S 08 Jun 14:28:02.207 # Start of election delayed for 856 milliseconds (rank #0, offset 28148706874).
83548:S 08 Jun 14:28:03.109 # Starting a failover election for epoch 6.
所以又会分为两种切换情况:
2.1如果delayed for 856 milliseconds后,其他主库的下线报告链表还没凑够,即其他主库对于挂掉主库的判断还没有转为fail,那么其他主库会拒绝投票,从库failover失败;
2.2如果delayed for 8