哨兵的状态监测及故障切换代码梳理

redis哨兵代码流程如下图所示
在这里插入图片描述

1. sentinelTimer定时任务

sentinelCheckTiltCondition();
sentinelHandleDictOfRedisInstances(sentinel.masters);
sentinelRunPendingScripts();
sentinelCollectTerminatedScripts();
sentinelKillTimedoutScripts();

2. sentinelHandleDictOfRedisInstances

启动对所有监听的master节点的状态判断等任务

void sentinelHandleDictOfRedisInstances(dict *instances) {
   
    dictIterator *di;
    dictEntry *de;
    sentinelRedisInstance *switch_to_promoted = NULL;

    /* There are a number of things we need to perform against every master. */
    di = dictGetIterator(instances);
    while((de = dictNext(di)) != NULL) {
   
        sentinelRedisInstance *ri = dictGetVal(de);

        sentinelHandleRedisInstance(ri);
        if (ri->flags & SRI_MASTER) {
   
            sentinelHandleDictOfRedisInstances(ri->slaves);
            sentinelHandleDictOfRedisInstances(ri->sentinels);
            if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) {
   
                switch_to_promoted = ri;
            }
        }
    }
    if (switch_to_promoted)
        sentinelFailoverSwitchToPromotedSlave(switch_to_promoted);
    dictReleaseIterator(di);
}

重点
1、 启动对所有redis节点的sentinelHandleRedisInstance任务,详情见3
2、 如果redis节点是主节点 && 发现任务状态是failover-state== SENTINEL_FAILOVER_STATE_UPDATE_CONFIG,表示进行了failover并已经完成,然后进入sentinelFailoverSwitchToPromotedSlave发送switch-master事件,并reset哨兵内的数据记录,详情见16

3. sentinelHandleRedisInstance 开始监听工作

/* ======================== SENTINEL timer handler ==========================
 * This is the "main" our Sentinel, being sentinel completely non blocking
 * in design. The function is called every second.
 * -------------------------------------------------------------------------- */

/* Perform scheduled operations for the specified Redis instance. */
void sentinelHandleRedisInstance(sentinelRedisInstance *ri) {
   
    /* ========== MONITORING HALF ============ */
    /* Every kind of instance */
    sentinelReconnectInstance(ri);
    sentinelSendPeriodicCommands(ri);

    /* ============== ACTING HALF ============= */
    /* We don't proceed with the acting half if we are in TILT mode.
     * TILT happens when we find something odd with the time, like a
     * sudden change in the clock. */
    if (sentinel.tilt) {
   
        if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return;
        sentinel.tilt = 0;
        sentinelEvent(LL_WARNING,"-tilt",NULL,"#tilt mode exited");
    }

    /* Every kind of instance */
    sentinelCheckSubjectivelyDown(ri);

    /* Masters and slaves */
    if (ri->flags & (SRI_MASTER|SRI_SLAVE)) {
   
        /* Nothing so far. */
    }

    /* Only masters */
    if (ri->flags & SRI_MASTER) {
   
        sentinelCheckObjectivelyDown(ri);
        if (sentinelStartFailoverIfNeeded(ri))
            sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
        sentinelFailoverStateMachine(ri);
        sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
    }
}

重点
1、 每秒执行一次
2、 启动sentinelSendPeriodicCommands任务,启动对节点的定时任务,详情见14
3、 节点状态监听及可能的故障切换,详情见4

4. sentinelCheckSubjectivelyDown判断sdown

对心跳进行监听,如果sdown,进入odown判断,详情见5

5. 判断odown状态sentinelCheckObjectivelyDown

1、确认odown后,发布odown时间日志
标记master两个属性
master->flags |= SRI_O_DOWN;
master->o_down_since_time = mstime();
确认odown后启动是否failover,详情见6

2、否则-odown,取消标记

6. 判断是否需要failover,sentinelStartFailoverIfNeeded

重点对三个方面进行判断
1、 再次确认是否为odown状态
2、 确认是否已经启动failover_in_progresschulizhong
3、 确认是否和上次的failover时间超过2failover_time_out
mstime() - master->failover_start_time < master->failover_timeout
2

如果成立,就会加锁,在上次时间的基础上增加failover_timeout*2的时间

if (mstime() - master->failover_start_time <
        master->failover_timeout*2)
    {
   
        if (master->failover_delay_logged != master->failover_start_time) {
   
            time_t clock = (master->failover_start_time +
                            master->failover_timeout*2) / 1000;
            char ctimebuf[26];

            ctime_r(&clock,ctimebuf);
            ctimebuf[24] = '\0'; /* Remove newline. */
            master->failover_delay_logged = master->failover_start_time;
            serverLog(LL_WARNING,
                "Next failover delay: I will not start a failover before %s",
                ctimebuf);
        }
        return 0;
    }

如果需要startfailover,进入sentinelStartFailover编辑更新状态,详情见7

7. 标记启动failover sentinelStartFailover

标记failover_state、flags、failover_epoch、failover_start_time

master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START;
master->flags |= SRI_FAILOVER_IN_PROGRESS;
master->failover_epoch = ++sentinel.current_epoch;
sentinelEvent(LL_WARNING,"+new-epoch",master,"%llu",
        (unsigned long long) sentinel.current_epoch);
sentinelEvent(LL_WARNING,"+try-failover",master,"%@");
master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
master->failover_state_change_time = mstime();

重点
1、 标记failover_state= SENTINEL_FAILOVER_STATE_WAIT_START,表示failover的开始
2、 将master-> failover_start_time更新为当前时间
3、 启动新纪元epoch,开始failover

8. 标记完成后进入处理sentinelFailoverStateMachine

首先再次判断是否进入了SRI_FAILOVER_IN_PROGRESS状态
然后开始依次执行任务(当前任务状态“SENTINEL_FAILOVER_STATE_WAIT_START”)

switch(ri->failover_state) {
   
        case SENTINEL_FAILOVER_STATE_WAIT_START:
            sentinelFailoverWaitStart(ri);
            break;
        case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
            sentinelFailoverSelectSlave(ri);
            break;
        case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
            sentinelFailoverSendSlaveOfNoOne(ri);
            break;
        case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
            sentinelFailoverWaitPromotion(ri);
            break;
        case SENTINEL_FAILOVER_STATE_RECONF_SLAVES:
            sentinelFailoverReconfNextSlave(ri);
            break;
    }

重点:
1、 根据failover_state进入不同的处理阶段
2、 每个state处理完成不会进入下一个,而是等待从第一步开始的定时任务在进行下一个

9. sentinelFailoverWaitStart进行sentinel的leader选举

当前任务状态为SENTINEL_FAILOVER_STATE_WAIT_START

 char *leader;
    int isleader;

    /* Check if we are the leader for the failover epoch. */
    leader = sentinelGetLeader(ri, ri->failover_epoch);
    isleader = leader && strcasecmp(leader,sentinel.myid) == 0;
    sdsfree(leader);

    /* If I'm not the leader, and it is not a forced failover via
     * SENTINEL FAILOVER, then I can't continue with the failover. */
    if (!isleader && !(ri->flags & SRI_FORCE_FAILOVER)) {
   
        int election_timeout = SENTINEL_ELECTION_TIMEOUT;

        /* The election timeout is the MIN between SENTINEL_ELECTION_TIMEOUT
         * and the configured failover timeout. */
        if (election_timeout > ri->failover_timeout)
            election_timeout = ri->failover_timeout;
        /* Abort the failover if I'm not the leader after some time. */
        if (mstime() - ri->failover_start_time > election_timeout) {
   
            sentinelEvent(LL_WARNING,"-failover-abort-not-elected",ri,"%@");
            sentinelAbortFailover(ri);
        }
        return;
    }
    sentinelEvent(LL_WARNING,"+elected-leader",ri,"%@");
    if (sentinel.simfailure_flags & SENTINEL_SIMFAILURE_CRASH_AFTER_ELECTION)
        sentinelSimFailureCrash();
    ri->failover_state = SENTINEL_FAILOVER_STATE_SELECT_SLAVE;
    ri->failover_state_change_time = mstime();
    sentinelEvent(LL_WARNING,"+failover-state-select-slave",ri,"%@");

重点
1、 election_timeout:默认10s,在10s和failover_timeout时间取最小,选举超时放弃
2、 结束后任务状态设置为SENTINEL_FAILOVER_STATE_SELECT_SLAVE
3、 选举leader在方法sentinelGetLeader
4、 更新master->failover_start_time为当前时间

10. sentinelFailoverSelectSlave

当任务状态设置为SENTINEL_FAILOVER_STATE_SELECT_SLAVE,就会开始启动选举新主节点

void sentinelFailoverSelectSlave(sentinelRedisInstance *ri) {
   
    // 选出slave
    sentinelRedisInstance *slave = sentinelSelectSlave(ri);

    /* We don't handle the timeout in this state as the function aborts
     * the failover or go forward in the next state. */
    if (slave == NULL) {
   
        sentinelEvent(REDIS_WARNING,"-failover-abort-no-good-slave",ri,"%@");
        sentinelAbortFailover(ri);
    } else {
   
        // 修改状态为SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE
        sentinelEvent(REDIS_WARNING,"+selected-slave",slave,"%@");
        slave->flags |= SRI_PROMOTED;
        ri->promoted_slave = slave;
        ri->failover_state = SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE;
        ri->failover_state_change_time = mstime();
        sentinelEvent(REDIS_NOTICE,"+failover-state-send-slaveof-noone",
            slave, "%@");
    
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值