1、 故障转移
在(32)中分析了故障转移的SELECT_SLAVE状态下的代码,并提到了在sentinelFailoverSelectSlave方法中会将failover_state的状态修改为SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE。在这个状态下,sentinelFailoverStateMachine方法执行的方法如下:
调用的sentinelFailoverSendSlaveOfNoOne方法,内容如下:
void sentinelFailoverSendSlaveOfNoOne(sentinelRedisInstance *ri) {
int retval;
/* We can't send the command to the promoted slave if it is now
* disconnected. Retry again and again with this state until the timeout
* is reached, then abort the failover. */
if (ri->promoted_slave->link->disconnected) {
if (mstime() - ri->failover_state_change_time > ri->failover_timeout) {
sentinelEvent(LL_WARNING,"-failover-abort-slave-timeout",ri,"%@");
sentinelAbortFailover(ri);
}
return;
}
/* Send SLAVEOF NO ONE command to turn the slave into a master.
* We actually register a generic callback for this command as we don't
* really care about the reply. We check if it worked indirectly observing
* if INFO returns a different role (master instead of slave). */
retval = sentinelSendSlaveOf(ri->promoted_slave,NULL,0);
if (retval != C_OK) return;
sentinelEvent(LL_NOTICE, "+failover-state-wait-promotion",
ri->promoted_slave,"%@");
ri->failover_state = SENTINEL_FAILOVER_STATE_WAIT_PROMOTION;
ri->failover_state_change_time = mstime();
}
首先是第8行到第14行的if语句,这个语句是在检查与该服务器的连接是否正常。如果连接已经断开,那么就退出故障转移。
然后是第20行,这里调用了sentinelSendSlaveOf方法来向从服务器发送slaveof no one命令。然后是21行如果发送出错直接返回。最后是22行以后的代码,主要是一些赋值操作。其中重点是24行,这里会将failover_state的值修改为SENTINEL_FAILOVER_STATE_WAIT_PROMOTION。
其中发送命令的sentinelSendSlaveOf方法如下:
int sentinelSendSlaveOf(sentinelRedisInstance *ri, char *host, int port) {
char portstr[32];
int retval;
ll2string(portstr,sizeof(portstr),port);
/* If host is NULL we send SLAVEOF NO ONE that will turn the instance
* into a master. */
if (host == NULL) {
host = "NO";
memcpy(portstr,"ONE",4);
}
/* In order to send SLAVEOF in a safe way, we send a transaction performing
* the following tasks:
* 1) Reconfigure the instance according to the specified host/port params.
* 2) Rewrite the configuration.
* 3) Disconnect all clients (but this one sending the commnad) in order
* to trigger the ask-master-on-reconnection protocol for connected
* clients.
*
* Note that we don't check the replies returned by commands, since we
* will observe instead the effects in the next INFO output. */
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s",
sentinelInstanceMapCommand(ri,"MULTI"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s %s %s",
sentinelInstanceMapCommand(ri,"SLAVEOF"),
host, portstr);
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s REWRITE",
sentinelInstanceMapCommand(ri,"CONFIG"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
/* CLIENT KILL TYPE <type> is only supported starting from Redis 2.8.12,
* however sending it to an instance not understanding this command is not
* an issue because CLIENT is variadic command, so Redis will not
* recognized as a syntax error, and the transaction will not fail (but
* only the unsupported command will fail). */
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s KILL TYPE normal",
sentinelInstanceMapCommand(ri,"CLIENT"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s",
sentinelInstanceMapCommand(ri,"EXEC"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
return C_OK;
}
首先是第9行到第12行,这里会判断传入的host,如果host为NULL,则其会将slaveof命令的参数设置为NO ONE。在上一个方法sentinelFailoverSendSlaveOfNoOne中的第20行调用这个方法的地方,我们可以发现其传入的host正是NULL。
然后是第24行到第58行,这段代码相似度很高,作用是向从服务器发送5个命令,分别是MULTI、SLAVEOF、CLIENT、EXEC。这五个命令都是通过redisAsyncCommand方法来发送的,其注册的对返回值的处理方法都是sentinelDiscardReplyCallback。这个方法的内容如下:
/* Just discard the reply. We use this when we are not monitoring the return
* value of the command but its effects directly. */
void sentinelDiscardReplyCallback(redisAsyncContext *c, void *reply, void *privdata) {
instanceLink *link = c->data;
UNUSED(reply);
UNUSED(privdata);
if (link) link->pending_commands--;
}
这里可以看见这个方法实际是没有对返回值进行任何处理。从sentinelSendSlaveOf方法的注释中可以了解到这些命令的执行结果是从哨兵对其定期发送的info命令中确认。
然后是其发送的五个命令。首先是MULTI命令,这个命令标记一个事务块的开始。
事务块内的多条命令会按照先后顺序被放进一个队列当中,最后由 EXEC 命令原子性(atomic)地执行。然后是SLAVEOF命令,之前的文档分析了他的作用。接着是CONFIG命令,这里发送的是CONFIG REWRITE,作用是重写配置文件。然后是CLIENT命令,这里发送的是CLIENT KILL TYPE normal,其作用是断开客户端的连接。最后是EXEC命令,执行事务块中的命令。
至此,SEND_SLAVEOF_NOONE状态下的操作便解析完成了。接着继续看SENTINEL_FAILOVER_STATE_WAIT_PROMOTION状态,这个状态执行的方法如下:
其调用的sentinelFailoverWaitPromotion方法如下:
void sentinelFailoverWaitPromotion(sentinelRedisInstance *ri) {
/* Just handle the timeout. Switching to the next state is handled
* by the function parsing the INFO command of the promoted slave. */
if (mstime() - ri->failover_state_change_time > ri->failover_timeout) {
sentinelEvent(LL_WARNING,"-failover-abort-slave-timeout",ri,"%@");
sentinelAbortFailover(ri);
}
}
这段代码只有一个if语句,用于判断是否执行超时,若超时则退出故障转移。