【Redis-6.0.8】Redis主从复制的源码分析(上)

目录

0.阅读与引用

1.slave部分源码分析

1.1 redisServer结构体中与主从复制相关的成员变量

1.2 从slaveof命令看起

1.3 查看replicaofCommand的相关内容

1.3.1 replicaofCommand的实现

1.3.2 重要子步骤replicationSetMaster

1.3.3 看看server.repl_state的状态范围

1.4 阅读产生的疑问

1.5 看看replicationCron

1.5.1 serverCron中调用了replicationCron

1.5.2 replicationCron的实现

1.5.3 connectWithMaster-建立主从之间的连接

1.5.4 syncWithMaster实现

1.5.5 从机向主机发送PING命令 

1.5.6 从机接受并解析来自主机的对于PING命令的回复

1.5.7 发送权限验证相关信息的逻辑处理

1.5.8 接受来自主节点对于权限相关验证的消息

1.5.9 发送端口号给主节点

1.5.10 接受处理来自主节点对于端口号信息的回复

1.5.11 发送IP给主节点

1.5.12 接受处理来自主节点对于IP信息的回复

1.5.13  发送CAPA(发送能力)给主节点

1.5.14  接受处理来自主节点对于CAPA信息的回复

1.5.15  发送PSYNC命令给主节点

2.待探索的问题

2.1  如何知道master是否支持部分重同步 

2. 2 从代码中梳理复制实现的逻辑

2.3 从代码中梳理部分重同步的逻辑


0.阅读与引用

menwen-Redis 复制(replicate)源码详细解析

巨大泰迪的Redis的主从复制

chenssy的Redis主从复制

《Redis开发与运维》第6章 查看自己的电子书

《Redis5 设计与源码分析》第21章 主从复制 链接

《Redis 设计与实现》 第15章 复制

《Redis深度历险》原理8 主从同步 查看电子书

黄健宏的3.0版本注释

本文中搜索关键字【待确定】, 方便复习

写的很好-有很好的图

大佬【谈谈1974】的Master

大佬【谈谈1974】的slave

1.slave部分源码分析

1.1 redisServer结构体中与主从复制相关的成员变量

struct redisServer {
...
    /* 记录所有的从服务器,是一个链表,链表节点值类型为client */
    /* 记录所有的监控器,是一个链表,链表节点值类型为monitor */
    list *slaves, *monitors;    /* List of slaves and MONITORs */ 
 
    /* Replication (master) */
    /* 当前任期的master的运行Id*/
    char replid[CONFIG_RUN_ID_SIZE+1];  /* My current replication ID.  */
    /* 上个任期的master的运行Id */
    char replid2[CONFIG_RUN_ID_SIZE+1]; /* replid inherited from master */
    /* 当前任期的缓冲区最后一个字节的复制偏移量 */
    long long master_repl_offset;   /* My current replication offset */
    /* 上一个任期的缓冲区最后一个字节的复制偏移量 */
    long long second_replid_offset; /* Accept offsets up to this for replid2*/
    
    /* 待确定 */
    int slaveseldb;                 /* Last SELECTed DB in replication output */
    /* 
      表示发送心跳包的周期,主服务器以此周期向所有从服务器发送心跳包.
      主服务器和从服务器之间是通过TCP长连接交互数据的,就必然需要周期性地发送心跳包来检测连接有
      效性,该字段表示发送心跳包的周期,主服务器以此周期向所有从服务器发送心跳包. 可通过配置参数
      repl-ping-replica-period或者repl-ping-slave-period设置,默认为10.
    */
    int repl_ping_slave_period;     /* Master pings the slave every N seconds */
    /* 
      复制缓冲区,用于缓存主服务器已执行且待发送给从服务器的命令请求;
      缓冲区大小由字段repl_backlog_size指定,其可通过配置参数repl-backlog-size设置,
      默认为1MB
    */
    char *repl_backlog;             /* Replication backlog for partial syncs */
    /* 复制缓冲区的大小 */
    long long repl_backlog_size;    /* Backlog circular buffer size */
    /* 复制缓冲区中存储的命令请求数据长度 */
    long long repl_backlog_histlen; /* Backlog actual data length */
    
   /*复制缓冲区中存储的命令请求最后一个字节索引位置,即向复制缓冲区写入数据时会从该索引位置开始*/
    long long repl_backlog_idx;     /* Backlog circular buffer current offset,
                                       that is the next byte will'll write to.*/
    /* 复制缓冲区中第一个字节的复制偏移量 */
    long long repl_backlog_off;     /* Replication "master offset" of first
                                       byte in the replication backlog buffer.*/
    /* 待确定 */
    time_t repl_backlog_time_limit; /* Time without slaves after the backlog
                                       gets released. */
    /* 表示有多久没有从机了*/
    time_t repl_no_slaves_since;    /* We have no slaves since that time.
                                       Only valid if server.slaves len is 0. */
    /* 当有效从服务器的数目小于该值时,主服务器会拒绝执行写命令 */
    int repl_min_slaves_to_write;   /* Min number of slaves to write. */
    /* 决定从服务器是否处于失效状态的超时门限 */
    int repl_min_slaves_max_lag;    /* Max lag of <count> slaves to write. */
    /* 
        当前有效从服务器的数目.
        什么样的从服务器是有效的呢?我们说过主服务器和从服务器之间是通过TCP长连接交互数据的,并
        且会发送心跳包来检测连接有效性;主服务器会记录每个从服务器上次心跳检测成功的时间
        repl_ack_time,并且定时检测当前时间距离repl_ack_time是否超过一定超时门限,如果超过
        则认为从服务器处于失效状态。字段repl_min_slaves_max_lag存储的就是该超时门限,可通过
        配置参数min-slaves-max-lag或者min-replicas-max-lag设置,默认为10,单位秒。
     */
    int repl_good_slaves_count;     /* Number of slaves with lag <= max_lag. */
    /* 待研究 */
    int repl_diskless_sync;      /* Master send RDB to slaves sockets directly. */
    /* 待研究 */
    int repl_diskless_load;         /* Slave parse RDB directly from the socket.
                                     * see REPL_DISKLESS_LOAD_* enum */
    /* 待研究 */
    int repl_diskless_sync_delay;   /* Delay to start a diskless repl BGSAVE. */
    /* Replication (slave) */
    /* 必须要这个用户才能登录 */
    char *masteruser;         /* AUTH with this user and masterauth with master */
    /* masteruser用户对应的验证密码,当主服务器配置了“requirepass password”时,即表示从服
       务器必须通过密码认证才能同步主服务器数据。同样的需要在从服务器配置“masterauth<master- 
       password>”,用于设置请求同步主服务器时的认证密码.
     */
   
    char *masterauth;               /* AUTH with this password with master */
    /* 主服务器的IP */
    char *masterhost;               /* Hostname of master */
    /* 主服务器的端口 */
    int masterport;                 /* Port of master */
    /* 待研究 */
    int repl_timeout;               /* Timeout after N seconds of master idle */

    /* 当主从服务器成功建立连接之后,从服务器将成为主服务器的客户端,同样的主服务器也会成为从服务
       器的客户端,master即为主服务器,类型为client
    */
    client *master;     /* Client that is master for this slave */
    client *cached_master; /* Cached master to be reused for PSYNC. */
    int repl_syncio_timeout; /* Timeout for synchronous I/O calls */
    /* 主从复制流程的进展(从服务器状态)*/
    int repl_state;          /* Replication status if the instance is a slave */
    off_t repl_transfer_size; /* Size of RDB to read from master during sync. */
    off_t repl_transfer_read; /* Amount of RDB read from master during sync. */
    off_t repl_transfer_last_fsync_off; /* Offset when we fsync-ed last time. */
    connection *repl_transfer_s;     /* Slave -> Master SYNC connection */
    int repl_transfer_fd;    /* Slave -> Master SYNC temp file descriptor */
    char *repl_transfer_tmpfile; /* Slave-> master SYNC temp file name */
    time_t repl_transfer_lastio; /* Unix time of the latest read, for timeout */
    /* 当主从服务器断开连接时,该变量表示从服务器是否继续处理命令请求,可通过配置参数
       slave-serve-stale-data或者replica-serve-stale-data设置,默认为1,即可以继续处理
       命令请求。
    */
    int repl_serve_stale_data; /* Serve stale data when link is down? */
    /* 配置从机是否只是可读的(不处理除了主服务器发来以外的写命令).
       可通过配置参数slave-read-only或者replica-read-only设置,默认为1,即从服务器不处理写命
       令请求,除非该命令是主服务器发送过来的. 
    */
    int repl_slave_ro;          /* Slave is read only? */
    /* 从机是否没有键的过期处理策略*/
    int repl_slave_ignore_maxmemory;    /* If true slaves do not evict. */
    /* 从机与主机断开的时间 */
    time_t repl_down_since; /* Unix time at which link with master went down */
    int repl_disable_tcp_nodelay;   /* Disable TCP_NODELAY after SYNC? */
    int slave_priority;             /* Reported in INFO and used by Sentinel. */
    int slave_announce_port;        /* Give the master this listening port. */
    char *slave_announce_ip;        /* Give the master this ip address. */
    /* The following two fields is where we store master PSYNC replid/offset
     * while the PSYNC is in progress. At the end we'll copy the fields into
     * the server->master client structure. */
    char master_replid[CONFIG_RUN_ID_SIZE+1];  /* Master PSYNC runid. */
    long long master_initial_offset;           /* Master PSYNC offset. */
    int repl_slave_lazy_flush;          /* Lazy FLUSHALL before loading DB? */
    /* Replication script cache. */
    dict *repl_scriptcache_dict;        /* SHA1 all slaves are aware of. */
    list *repl_scriptcache_fifo;        /* First in, first out LRU eviction. */
    unsigned int repl_scriptcache_size; /* Max number of elements. */
    /* Synchronous replication. */
    list *clients_waiting_acks;         /* Clients waiting in WAIT command. */
    int get_ack_from_slaves;            /* If true we send REPLCONF GETACK. */
 
 
...
}
 

函数refreshGoodSlavesCount实现了从服务器有效性的检测;

1.2 从slaveof命令看起

/* 惊奇 slaveof命令调用的竟然不是slaveofCommand,而是replicaofCommand */
struct redisCommand redisCommandTable[] = {
...
   {"slaveof",replicaofCommand,3,
     "admin no-script ok-stale",
     0,NULL,0,0,0,0,0,0},
/* 注意噢 这两个命令调用同一个函数 */
    {"replicaof",replicaofCommand,3,
     "admin no-script ok-stale",
     0,NULL,0,0,0,0,0,0},
...
}

1.3 查看replicaofCommand的相关内容

1.3.1 replicaofCommand的实现

 replicaofCommand做的事情:

  1. 判断当前环境是否在集群模式下, 如果是的, 就不能执行命令,给出相关提示并且返回;
  2. 如果输入的命令是slaveof no one,那么解除主从关系,设置当前节点为主节点服务器;
  3. 判断是否已经是指定host,ip所代表的服务器的从机了, 如果已经是了,就不能再执行这个命令了,给出相关提示并且返回;
  4. 如果不是以上三个步骤中的情况, 调用replicationSetMaster设置执行slaveof命令的服务器为指定host,ip所代表的主服务器的从服务器.
/* slave host port命令实现 */
void replicaofCommand(client *c) {
    /* SLAVEOF is not allowed in cluster mode as replication is automatically
     * configured using the current address of the master node. */
    /* 如果服务器当前处于集群模式,不可以执行此操作 */
    if (server.cluster_enabled) {
        addReplyError(c,"REPLICAOF not allowed in cluster mode.");
        return;
    }

    /* The special host/port combination "NO" "ONE" turns the instance
     * into a master. Otherwise the new master address is set. */
    /* SLAVEOF NO ONE命令使得这个从节点关闭复制功能,并从从节点的身份转变回主节点,
       原来同步所得的数据集不会被丢弃*/
    if (!strcasecmp(c->argv[1]->ptr,"no") &&
        !strcasecmp(c->argv[2]->ptr,"one")) {
        /* 如果当前服务器的主节点的主机名不为NULL */
        if (server.masterhost) {
            /* 取消复制操作,设置服务器为主服务器 */
            replicationUnsetMaster();
            /* 获取client的每种信息,并以sds形式返回,并打印到日志中 */
            sds client = catClientInfoString(sdsempty(),c);
            serverLog(LL_NOTICE,"MASTER MODE enabled (user request from '%s')",
                client);
            /* 释放内存 */
            sdsfree(client);
        }
    } else {
        /* 设置port临时变量 */
        long port;
        /* 如果当前客户端已经是一个从机 */
        if (c->flags & CLIENT_SLAVE)
        {
            /* If a client is already a replica they cannot run this command,
             * because it involves flushing all replicas (including this
             * client) */
            /* 返回错误,给出错误提示:当前机器已经被部属为从机,不可以使用此命令 */
            addReplyError(c, "Command is not valid when client is a replica.");
            return;
        }
        /* 获取端口号 */
        if ((getLongFromObjectOrReply(c, c->argv[2], &port, NULL) != C_OK))
            return;

        /* Check if we are already attached to the specified slave */
        /*
          如果已存在从属于masterhost主节点且命令参数指定的主节点的host及port信息和
          server.masterhost,server.masterport也相等,给出“已经是指定主机指定端
          口的主服务器的从机了”,并直接返回
        */
        if (server.masterhost && !strcasecmp(server.masterhost,c->argv[1]->ptr)
            && server.masterport == port) {
            serverLog(LL_NOTICE,"REPLICAOF would result into synchronization "
                                "with the master we are already connected "
                                "with. No operation performed.");
            addReplySds(c,sdsnew("+OK Already connected to specified "
                                 "master\r\n"));
            return;
        }
        /* There was no previous master or the user specified a different one,
         * we can continue. */
        /* 第一次设置端口和ip指定为某主服务器的从机或者是重新设置端口和IP指定当前机器为另一台主
           服务器的的从服务器,这两种情况我们都可以继续 */
        /* 设置端口和IP */
        replicationSetMaster(c->argv[1]->ptr, port);
        /* 获取client的每种信息, 并以sds形式返回, 并打印到日志中, 然后释放内存 */
        sds client = catClientInfoString(sdsempty(),c);
        serverLog(LL_NOTICE,"REPLICAOF %s:%d enabled (user request from '%s')",
            server.masterhost, server.masterport, client);
        sdsfree(client);
    }
    /* 回复ok */
    addReply(c,shared.ok);
}



/* 以下为获取端口的时候调用的【getLongFromObjectOrReply】的实现步骤 */
int getLongFromObjectOrReply(client *c, robj *o, long *target, const char *msg) {
    long long value;

    if (getLongLongFromObjectOrReply(c, o, &value, msg) != C_OK) return C_ERR;
    if (value < LONG_MIN || value > LONG_MAX) {
        if (msg != NULL) {
            addReplyError(c,(char*)msg);
        } else {
            addReplyError(c,"value is out of range");
        }
        return C_ERR;
    }
    *target = value;
    return C_OK;
}

int getLongLongFromObject(robj *o, long long *target) {
    long long value;

    if (o == NULL) {
        value = 0;
    } else {
        serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);
        if (sdsEncodedObject(o)) {
            if (string2ll(o->ptr,sdslen(o->ptr),&value) == 0) return C_ERR;
        } else if (o->encoding == OBJ_ENCODING_INT) {
            value = (long)o->ptr;
        } else {
            serverPanic("Unknown string encoding");
        }
    }
    if (target) *target = value;
    return C_OK;
}

#define serverAssertWithInfo(_c,_o,_e) ((_e)?(void)0 : (_serverAssertWithInfo(_c,_o,#_e,__FILE__,__LINE__),_exit(1)))


void _serverAssertWithInfo(const client *c, const robj *o, const char *estr, const char *file, int line) {
    if (c) _serverAssertPrintClientInfo(c);
    if (o) _serverAssertPrintObject(o);
    _serverAssert(estr,file,line);
}

1.3.2 重要子步骤replicationSetMaster

1.3.2.1
/* Set replication to the specified master address and port. */
/* 设置当前服务为指定ip,port所代表的主机的从机 */
void replicationSetMaster(char *ip, int port) {
    /* == 的优先级高于 = */
    /* 判断server.masterhost是否为空,并且将是否为空的结果存入was_master中 */
    int was_master = server.masterhost == NULL;
    /* 清空释放server.masterhost之前存入的内容*/
    sdsfree(server.masterhost);
    /* 重新设置server.masterhost和server.masterport的值 */
    server.masterhost = sdsnew(ip);
    server.masterport = port;
    /* 如果server.master不为NULL */
    /* 这里可以这么理解:
        假设当前节点是B,且当前B的主节点是A,现在B想要将自己设置成C的从节点,那么B就要把自己之前存的关于A节点的信息给释放掉,
        因为当前B的主节点是A,作为网络中的两个节点,那么它必定与A保持一定的连接,所以可将A看作是B的客户端,存入server.master中.
    */
    /* 如果server.masterhost不为空 */
    if (server.master) {
        freeClient(server.master);/* 释放server.master这个客户端的信息 */
    }
    /* 
      断开所有阻塞着的客户端,现在可能出现主机变成别的主机的从机的情况,连接到本台机器上的连接
      可能已经不安全了,需要将它们与当前机器的连接断开
    */
    disconnectAllBlockedClients(); /* Clients blocked in master, now slave. */
 
    /* Update oom_score_adj */
    /* 设置更新内存溢出得分调整值*/
    setOOMScoreAdj(-1);
 
    /* Force our slaves to resync with us as well. They may hopefully be able
     * to partially resync with us, but we can notify the replid change. */
    /* 关闭所有从节点服务器的连接,强制从节点服务器进行重新同步操作 */
    disconnectSlaves();
    /* 取消主从复制的握手功能 */
    cancelReplicationHandshake();
    /* Before destroying our master state, create a cached master using
     * our own parameters, to later PSYNC with the new master. */
    /* 如果server.masterhost非空 */
    if (was_master) {
        /* 释放之前缓存的master的相关状态,看1.3.2.3中的具体实现 */
        replicationDiscardCachedMaster();
        /* 同步一下自己的master中的一些信息,也许在之后可以少同步一些内容,设置 
           server.cached_master = server.master,具体内容看1.3.2.4
         */
        replicationCacheMasterUsingMyself();
    }
 
    /* Fire the role change modules event. */
    /* 触发服务器的角色转变模块的事件 */
    moduleFireServerEvent(REDISMODULE_EVENT_REPLICATION_ROLE_CHANGED,
                          REDISMODULE_EVENT_REPLROLECHANGED_NOW_REPLICA,
                          NULL);
 
    /* Fire the master link modules event. */
    /* 如果server.repl_state的状态是REPL_STATE_CONNECTED,触发主机连接模块的事件 */
    if (server.repl_state == REPL_STATE_CONNECTED)
        moduleFireServerEvent(REDISMODULE_EVENT_MASTER_LINK_CHANGE,
                              REDISMODULE_SUBEVENT_MASTER_LINK_DOWN,
                              NULL);
 
    server.repl_state = REPL_STATE_CONNECT;
}

1.3.2.2
/*
这个函数阻止了一个正在进行的非阻塞的复制尝试(假设当前机器是A,可以理解成有一个B想要A
成为它的小弟-从机),如果复置所需要的握手已经完成了,那么返回1并且将server.repl_state
设置成REPL_STATE_CONNECT,如果复置所需要的握手没有完成,就返回0并且什么也不做.
*/
/* This function aborts a non blocking replication attempt if there is one
 * in progress, by canceling the non-blocking connect attempt or
 * the initial bulk transfer.
 *
 * If there was a replication handshake in progress 1 is returned and
 * the replication state (server.repl_state) set to REPL_STATE_CONNECT.
 *
 * Otherwise zero is returned and no operation is perforemd at all. */
int cancelReplicationHandshake(void) {
    if (server.repl_state == REPL_STATE_TRANSFER) {
        replicationAbortSyncTransfer();
        server.repl_state = REPL_STATE_CONNECT;
    } else if (server.repl_state == REPL_STATE_CONNECTING ||
               slaveIsInHandshakeState())
    {
        undoConnectWithMaster();
        server.repl_state = REPL_STATE_CONNECT;
    } else {
        return 0;
    }
    return 1;
}


1.3.2.3
/* 释放缓存的master,在它们再也不用用于重连然后执行同步调用的时候被调用 */
/* Free a cached master, called when there are no longer the conditions for
 * a partial resync on reconnection. */
void replicationDiscardCachedMaster(void) {
    if (server.cached_master == NULL) return;

    serverLog(LL_NOTICE,"Discarding previously cached master state.");
    server.cached_master->flags &= ~CLIENT_MASTER;
    freeClient(server.cached_master);
    server.cached_master = NULL;
}



1.3.2.4
/* This function is called when a master is turend into a slave, in order to
 * create from scratch a cached master for the new client, that will allow
 * to PSYNC with the slave that was promoted as the new master after a
 * failover.
 *
 * Assuming this instance was previously the master instance of the new master,
 * the new master will accept its replication ID, and potentiall also the
 * current offset if no data was lost during the failover. So we use our
 * current replication ID and offset in order to synthesize a cached master. */
void replicationCacheMasterUsingMyself(void) {
    serverLog(LL_NOTICE,
        "Before turning into a replica, using my own master parameters "
        "to synthesize a cached master: I may be able to synchronize with "
        "the new master with just a partial transfer.");

    /* This will be used to populate the field server.master->reploff
     * by replicationCreateMasterClient(). We'll later set the created
     * master as server.cached_master, so the replica will use such
     * offset for PSYNC. */
    server.master_initial_offset = server.master_repl_offset;

    /* The master client we create can be set to any DBID, because
     * the new master will start its replication stream with SELECT. */
    replicationCreateMasterClient(NULL,-1);

    /* Use our own ID / offset. */
    memcpy(server.master->replid, server.replid, sizeof(server.replid));

    /* Set as cached master. */
    unlinkClient(server.master);
    server.cached_master = server.master;
    server.master = NULL;
}

1.3.3 看看server.repl_state的状态范围

在1.1中我们可以看到repl_state成员变量是在当前机器是从机的情况下的,从机的服务器状态.
刚刚阅读的1.3.2的源码中出现了很多对于server.repl_state的判断,对server.repl_state的状态迁移对的理解可以帮助我们理解主从复制的流程,下面我们就来看看从机包含哪些状态:

/* Slave replication state. Used in server.repl_state for slaves to remember
 * what to do next. */
/* 从机的状态,用来提示从机接下来要做什么事情 */
#define REPL_STATE_NONE 0 /* No active replication */
#define REPL_STATE_CONNECT 1 /* Must connect to master */
#define REPL_STATE_CONNECTING 2 /* Connecting to master */
/* --- Handshake states, must be ordered --- */
#define REPL_STATE_RECEIVE_PONG 3 /* Wait for PING reply */
#define REPL_STATE_SEND_AUTH 4 /* Send AUTH to master */
#define REPL_STATE_RECEIVE_AUTH 5 /* Wait for AUTH reply */
#define REPL_STATE_SEND_PORT 6 /* Send REPLCONF listening-port */
#define REPL_STATE_RECEIVE_PORT 7 /* Wait for REPLCONF reply */
#define REPL_STATE_SEND_IP 8 /* Send REPLCONF ip-address */
#define REPL_STATE_RECEIVE_IP 9 /* Wait for REPLCONF reply */
#define REPL_STATE_SEND_CAPA 10 /* Send REPLCONF capa */
#define REPL_STATE_RECEIVE_CAPA 11 /* Wait for REPLCONF reply */
#define REPL_STATE_SEND_PSYNC 12 /* Send PSYNC */
#define REPL_STATE_RECEIVE_PSYNC 13 /* Wait for PSYNC reply */
/* --- End of handshake states --- */
#define REPL_STATE_TRANSFER 14 /* Receiving .rdb from master */
#define REPL_STATE_CONNECTED 15 /* Connected to master */


各状态含义如下。
❏ REPL_STATE_NONE:未开启主从复制功能,当前服务器是普通的Redis实例;
❏ REPL_STATE_CONNECT:待发起Socket连接主服务器;
❏ REPL_STATE_CONNECTING:Socket连接成功;
❏ REPL_STATE_RECEIVE_PONG:已经发送了PING请求包,并等待接收主服务器PONG回复;
❏ REPL_STATE_SEND_AUTH:待发起密码认证;
❏ REPL_STATE_RECEIVE_AUTH:已经发起了密码认证请求“AUTH<password>”,等待接收主服务器回复;
❏ REPL_STATE_SEND_PORT:待发送端口号;
❏ REPL_STATE_RECEIVE_PORT:已发送端口号“REPLCONF listening-port <port>”,等待接收主服务
器回复;
❏ REPL_STATE_SEND_IP:待发送IP地址;
❏ REPL_STATE_RECEIVE_IP:已发送IP地址“REPLCONF ip-address<ip>”,等待接收主服务器回复;
该IP地址与端口号用于主服务器主动建立Socket连接,并向从服务器同步数据;
❏ REPL_STATE_SEND_CAPA:主从复制功能进行过优化升级,不同版本Redis服务器支持的能力可能不同,因
此从服务器需要告诉主服务器自己支持的主从复制能力,通过命令“REPLCONF capa <capability>”实现;
❏ REPL_STATE_RECEIVE_CAPA:等待接收主服务器回复;❏ REPL_STATE_SEND_PSYNC:待发送PSYNC命
令;
❏ REPL_STATE_RECEIVE_PSYNC:等待接收主服务器PSYNC命令的回复结果;
❏ REPL_STATE_TRANSFER:正在接收RDB文件;
❏ REPL_STATE_CONNECTED:RDB文件接收并载入完毕,主从复制连接建立成功,此时从服务器只需要等待接
收主服务器同步数据即可。

1.4 阅读产生的疑问

通过1.3节中对replicaofCommand的阅读, 我们大概可以看到通过调用replicaofCommand的一些细节看出 
它们做的事情好像只有清除旧的主机信息,如果需要的话断开一些旧的连接,状态被置为REPL_STATE_CONNECT(待发起Socket连接主服务器);
 设置server.repl_state的状态,
那会存在两个问题:
(1)什么时候去建立与指定主机的连接?
(2)server.repl_state的状态转变是如何发生的?经历了哪些状态转变呢?

我们将在1.5节一起学习这两个问题的相关细节.

1.5 看看replicationCron

1.5.1 serverCron中调用了replicationCron

replicaofCommand函数实现并没有向主服务器发起连接请求,说明该操作应该是一个异步操作,那么很有可能
是在时间事件中执行,搜索时间事件处理函数serverCron会发现,以一秒为周期执行主从复制相关操作:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
...
  /* Replication cron function -- used to reconnect to master,
   * detect transfer failures, start background RDB transfers and so forth. */
 run_with_period(1000) replicationCron();
...

}

1.5.2 replicationCron的实现

* --------------------------- REPLICATION CRON  ---------------------------- */

/* Replication cron function, called 1 time per second. */
/* 复制的定时任务函数,每一秒钟调用一次 */
void replicationCron(void) {
    
    /* */
    static long long replication_cron_loops = 0;
    
    /* */
    /* Non blocking connection timeout? */
    if (server.masterhost &&
        (server.repl_state == REPL_STATE_CONNECTING ||
         slaveIsInHandshakeState()) &&
         (time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
    {
        serverLog(LL_WARNING,"Timeout connecting to the MASTER...");
        cancelReplicationHandshake();
    }
    /* */
    /* Bulk transfer I/O timeout? */
    if (server.masterhost && server.repl_state == REPL_STATE_TRANSFER &&
        (time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
    {
        serverLog(LL_WARNING,"Timeout receiving bulk data from MASTER... If the problem persists try to set the 'repl-timeout' parameter in redis.conf to a larger value.");
        cancelReplicationHandshake();
    }

    /* */
    /* Timed out master when we are an already connected slave? */
    if (server.masterhost && server.repl_state == REPL_STATE_CONNECTED &&
        (time(NULL)-server.master->lastinteraction) > server.repl_timeout)
    {
        serverLog(LL_WARNING,"MASTER timeout: no data nor PING received...");
        freeClient(server.master);
    }
     
    /* 检查我们是否应该去尝试去连接master,当server.repl_state是REPL_STATE_CONNECT
      (等待向主服务器发起Socket连接并且必须连接的状态)的时候,我们需要开始去连接
    */
    /* Check if we should connect to a MASTER */
    if (server.repl_state == REPL_STATE_CONNECT) {
        serverLog(LL_NOTICE,"Connecting to MASTER %s:%d",
            server.masterhost, server.masterport);
        /* 以非阻塞的方式连接主节点 */
        if (connectWithMaster() == C_OK) {
            serverLog(LL_NOTICE,"MASTER <-> REPLICA sync started");
        }
    }
    /* 当server.masterhost不为NULL且server.master不为NULL且master节点是支持
       部分重同步功能的时候,向master节点发送一个REPLCONF ACK命令给主节点去报告关于
       当前处理的offset
       这里要注意一个细节问题:就是如何知道master是否支持部分重同步的?这个我们下面
       会给出解释.
       from time to time --- 定时
       CLIENT_PRE_PSYNC  --- 不支持PSYNC功能的客户端(PSYNC中有部分重传功能)
       #define CLIENT_PRE_PSYNC (1<<16) /* Instance don't understand PSYNC.*/
    */
    /* Send ACK to master from time to time.
     * Note that we do not send periodic acks to masters that don't
     * support PSYNC and replication offsets. */
    if (server.masterhost && server.master &&
        !(server.master->flags & CLIENT_PRE_PSYNC))
        replicationSendAck();

    /* */
    /* If we have attached slaves, PING them from time to time.
     * So slaves can implement an explicit timeout to masters, and will
     * be able to detect a link disconnection even if the TCP connection
     * will not actually go down. */
    listIter li;
    listNode *ln;
    robj *ping_argv[1];

    /* */
    /* First, send PING according to ping_slave_period. */
    if ((replication_cron_loops % server.repl_ping_slave_period) == 0 &&
        listLength(server.slaves))
    {
        /* Note that we don't send the PING if the clients are paused during
         * a Redis Cluster manual failover: the PING we send will otherwise
         * alter the replication offsets of master and slave, and will no longer
         * match the one stored into 'mf_master_offset' state. */
        int manual_failover_in_progress =
            server.cluster_enabled &&
            server.cluster->mf_end &&
            clientsArePaused();

        if (!manual_failover_in_progress) {
            ping_argv[0] = createStringObject("PING",4);
            replicationFeedSlaves(server.slaves, server.slaveseldb,
                ping_argv, 1);
            decrRefCount(ping_argv[0]);
        }
    }
     
    /* */
    /* Second, send a newline to all the slaves in pre-synchronization
     * stage, that is, slaves waiting for the master to create the RDB file.
     *
     * Also send the a newline to all the chained slaves we have, if we lost
     * connection from our master, to keep the slaves aware that their
     * master is online. This is needed since sub-slaves only receive proxied
     * data from top-level masters, so there is no explicit pinging in order
     * to avoid altering the replication offsets. This special out of band
     * pings (newlines) can be sent, they will have no effect in the offset.
     *
     * The newline will be ignored by the slave but will refresh the
     * last interaction timer preventing a timeout. In this case we ignore the
     * ping period and refresh the connection once per second since certain
     * timeouts are set at a few seconds (example: PSYNC response). */
    listRewind(server.slaves,&li);
    while((ln = listNext(&li))) {
        client *slave = ln->value;

        int is_presync =
            (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_START ||
            (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_END &&
             server.rdb_child_type != RDB_CHILD_TYPE_SOCKET));

        if (is_presync) {
            connWrite(slave->conn, "\n", 1);
        }
    }
    /* */
    /* Disconnect timedout slaves. */
    if (listLength(server.slaves)) {
        listIter li;
        listNode *ln;

        listRewind(server.slaves,&li);
        while((ln = listNext(&li))) {
            client *slave = ln->value;

            if (slave->replstate != SLAVE_STATE_ONLINE) continue;
            if (slave->flags & CLIENT_PRE_PSYNC) continue;
            if ((server.unixtime - slave->repl_ack_time) > server.repl_timeout)
            {
                serverLog(LL_WARNING, "Disconnecting timedout replica: %s",
                    replicationGetSlaveName(slave));
                freeClient(slave);
            }
        }
    }
    /* */
    /* If this is a master without attached slaves and there is a replication
     * backlog active, in order to reclaim memory we can free it after some
     * (configured) time. Note that this cannot be done for slaves: slaves
     * without sub-slaves attached should still accumulate data into the
     * backlog, in order to reply to PSYNC queries if they are turned into
     * masters after a failover. */
    if (listLength(server.slaves) == 0 && server.repl_backlog_time_limit &&
        server.repl_backlog && server.masterhost == NULL)
    {
        time_t idle = server.unixtime - server.repl_no_slaves_since;

        if (idle > server.repl_backlog_time_limit) {
            /* When we free the backlog, we always use a new
             * replication ID and clear the ID2. This is needed
             * because when there is no backlog, the master_repl_offset
             * is not updated, but we would still retain our replication
             * ID, leading to the following problem:
             *
             * 1. We are a master instance.
             * 2. Our slave is promoted to master. It's repl-id-2 will
             *    be the same as our repl-id.
             * 3. We, yet as master, receive some updates, that will not
             *    increment the master_repl_offset.
             * 4. Later we are turned into a slave, connect to the new
             *    master that will accept our PSYNC request by second
             *    replication ID, but there will be data inconsistency
             *    because we received writes. */
            changeReplicationId();
            clearReplicationId2();
            freeReplicationBacklog();
            serverLog(LL_NOTICE,
                "Replication backlog freed after %d seconds "
                "without connected replicas.",
                (int) server.repl_backlog_time_limit);
        }
    }
     
    /* */
    /* If AOF is disabled and we no longer have attached slaves, we can
     * free our Replication Script Cache as there is no need to propagate
     * EVALSHA at all. */
    if (listLength(server.slaves) == 0 &&
        server.aof_state == AOF_OFF &&
        listLength(server.repl_scriptcache_fifo) != 0)
    {
        replicationScriptCacheFlush();
    }

    /* Start a BGSAVE good for replication if we have slaves in
     * WAIT_BGSAVE_START state.
     *
     * In case of diskless replication, we make sure to wait the specified
     * number of seconds (according to configuration) so that other slaves
     * have the time to arrive before we start streaming. */
    if (!hasActiveChildProcess()) {
        time_t idle, max_idle = 0;
        int slaves_waiting = 0;
        int mincapa = -1;
        listNode *ln;
        listIter li;

        listRewind(server.slaves,&li);
        while((ln = listNext(&li))) {
            client *slave = ln->value;
            if (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_START) {
                idle = server.unixtime - slave->lastinteraction;
                if (idle > max_idle) max_idle = idle;
                slaves_waiting++;
                mincapa = (mincapa == -1) ? slave->slave_capa :
                                            (mincapa & slave->slave_capa);
            }
        }

        if (slaves_waiting &&
            (!server.repl_diskless_sync ||
             max_idle > server.repl_diskless_sync_delay))
        {
            /* Start the BGSAVE. The called function may start a
             * BGSAVE with socket target or disk target depending on the
             * configuration and slaves capabilities. */
            startBgsaveForReplication(mincapa);
        }
    }
    /* */
    /* Remove the RDB file used for replication if Redis is not running
     * with any persistence. */
    removeRDBUsedToSyncReplicas();
    /* */
    /* Refresh the number of slaves with lag <= min-slaves-max-lag. */
    refreshGoodSlavesCount();
    replication_cron_loops++; /* Incremented with frequency 1 HZ. */
}

1.5.3 connectWithMaster-建立主从之间的连接


/* 以非阻塞的方式建立与master的连接 */
int connectWithMaster(void) {
    /* connection *repl_transfer_s; --是serverRedis中的成员变量
       int tls_replication;         --是serverRedis中的成员变量,TLS Configuration
       获取server.repl_transfer_s的值,如果配置了TLS,就调用connCreateTLS()返回一个
       加密的客户端连接,否则就调用connCreateSocket返回一个非加密的客户端连接.
    */
    /* 为一个客户端连接申请内存初始化 */
    server.repl_transfer_s = server.tls_replication ? connCreateTLS() : connCreateSocket();
    /* 创建socket链接,注册循环事件,设置连接处理函数为syncWithMaster */
    if (connConnect(server.repl_transfer_s, server.masterhost, server.masterport,
                NET_FIRST_BIND_ADDR, syncWithMaster) == C_ERR) {
        /* 如果创建socket,注册循环事件,设置连接处理函数等失败,则打印出提示内容,关闭连接,
           将服务中用来复制同步内容的链接置为NULL
        */
        serverLog(LL_WARNING,"Unable to connect to MASTER: %s",
                connGetLastError(server.repl_transfer_s));
        connClose(server.repl_transfer_s);
        server.repl_transfer_s = NULL;
        return C_ERR;
    }

    /* 最近一次读到RDB文件内容的时间,在之后的超时判断中会有用处
       server.unixtime在rdbLoadProgressCallback中更新.
     */
    server.repl_transfer_lastio = server.unixtime;
    /* 将server.repl_state置为“Socket连接成功”*/
    server.repl_state = REPL_STATE_CONNECTING;
    /* 返回C_OK */
    return C_OK;
}

/* 创建一个非加密的客户端连接 */
connection *connCreateSocket() {
    connection *conn = zcalloc(sizeof(connection));
    conn->type = &CT_Socket;
    conn->fd = -1;

    return conn;
}

/* 创建一个加密的客户端连接 */
connection *connCreateTLS(void) {
    tls_connection *conn = zcalloc(sizeof(tls_connection));
    conn->c.type = &CT_TLS;
    conn->c.fd = -1;
    conn->ssl = SSL_new(redis_tls_ctx);
    return (connection *) conn;
}

/* 调用连接函数 */
static inline int connConnect(
connection *conn, 
const char *addr, 
int port, 
const char *src_addr,
ConnectionCallbackFunc connect_handler) 
{
    return conn->type->connect(conn, addr, port, src_addr, connect_handler);
}


/* 在初始化的时候中的定义*/
typedef struct ConnectionType {
   ...
   int (*connect)(struct connection *conn, const char *addr, int port, const char *source_addr, ConnectionCallbackFunc connect_handler);
   ...
} ConnectionType;

ConnectionType CT_TLS = {
    ...
    .connect = connTLSConnect,
    ...
};

ConnectionType CT_Socket = {
    ...
    .connect = connSocketConnect,
    ...
};


/* 调用连接函数(在非加密的情况下)实际上执行的是connSocketConnect */
static inline int connConnect(
connection *conn, 
const char *addr, 
int port, 
const char *src_addr,
ConnectionCallbackFunc connect_handler) 
{
    return connSocketConnect(conn, addr, port, src_addr, connect_handler);
}

/* 看一下connSocketConnect的实现 */
static int connSocketConnect
(connection *conn, 
const char *addr, 
int port, 
const char *src_addr,
ConnectionCallbackFunc connect_handler) 
{
    /* 非阻塞方式去创建socket */
    int fd = anetTcpNonBlockBestEffortBindConnect(NULL,addr,port,src_addr);
    if (fd == -1) {
        conn->state = CONN_STATE_ERROR;
        conn->last_errno = errno;
        return C_ERR;
    }

    conn->fd = fd; /*将连接的fd置为socket函数返回的fd*/
    conn->state = CONN_STATE_CONNECTING;/* 将连接的状态置为CONN_STATE_CONNECTING */

    conn->conn_handler = connect_handler;/* 设置连接处理函数 */
    aeCreateFileEvent(server.el, conn->fd, AE_WRITABLE,
            conn->type->ae_handler, conn);/* 在事件循环上注册当前连接描述符的写事件 */

    return C_OK;
}


int anetTcpNonBlockBestEffortBindConnect(char *err, const char *addr, int port,
                                         const char *source_addr)
{
    return anetTcpGenericConnect(err,addr,port,source_addr,
            ANET_CONNECT_NONBLOCK|ANET_CONNECT_BE_BINDING);
}

static int anetTcpGenericConnect(char *err, const char *addr, int port,
                                 const char *source_addr, int flags)
{
  ...
  调用了socket创建和connect连接两个系统调用(从机作为客户端去连接主机,主机作为服务器端)
  ...
}

connectWithMaster()函数执行的操作可以总结为:

  1. 根据是否配置了TLS决定调用connCreateTLS()还是connCreateSocket()返回一个链接并将其赋值给server.repl_transfer_s;
  2. 调用connConnect, 执行创建socket链接,注册循环事件,设置连接处理函数为syncWithMaster(这里可能会失败返回);
  3. 记录最近一次读到RDB文件内容的时间到server.repl_transfer_lastio中(在之后的超时判断中会有用处);
  4. 将服务的复制状态server.repl_state置为REPL_STATE_CONNECTING (Socket连接成功);
  5. 成功返回.

这一步完成了主从网络连接的建立.

1.5.4 syncWithMaster实现

/* This handler fires when the non blocking connect was able to
 * establish a connection with the master. */
void syncWithMaster(connection *conn) {
    char tmpfile[256], *err = NULL;
    int dfd = -1, maxtries = 5;
    int psync_result;

    /* If this event fired after the user turned the instance into a master
     * with SLAVEOF NO ONE we must just return ASAP. */
    if (server.repl_state == REPL_STATE_NONE) {
        connClose(conn);
        return;
    }

    /* Check for errors in the socket: after a non blocking connect() we
     * may find that the socket is in error state. */
    if (connGetState(conn) != CONN_STATE_CONNECTED) {
        serverLog(LL_WARNING,"Error condition on socket for SYNC: %s",
                connGetLastError(conn));
        goto error;
    }

    /* Send a PING to check the master is able to reply without errors. */
    if (server.repl_state == REPL_STATE_CONNECTING) {
        serverLog(LL_NOTICE,"Non blocking connect for SYNC fired the event.");
        /* Delete the writable event so that the readable event remains
         * registered and we can wait for the PONG reply. */
        connSetReadHandler(conn, syncWithMaster);
        connSetWriteHandler(conn, NULL);
        server.repl_state = REPL_STATE_RECEIVE_PONG;
        /* Send the PING, don't check for errors at all, we have the timeout
         * that will take care about this. */
        err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"PING",NULL);
        if (err) goto write_error;
        return;
    }

    /* Receive the PONG command. */
    if (server.repl_state == REPL_STATE_RECEIVE_PONG) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);

        /* We accept only two replies as valid, a positive +PONG reply
         * (we just check for "+") or an authentication error.
         * Note that older versions of Redis replied with "operation not
         * permitted" instead of using a proper error code, so we test
         * both. */
        if (err[0] != '+' &&
            strncmp(err,"-NOAUTH",7) != 0 &&
            strncmp(err,"-NOPERM",7) != 0 &&
            strncmp(err,"-ERR operation not permitted",28) != 0)
        {
            serverLog(LL_WARNING,"Error reply to PING from master: '%s'",err);
            sdsfree(err);
            goto error;
        } else {
            serverLog(LL_NOTICE,
                "Master replied to PING, replication can continue...");
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_AUTH;
    }

    /* AUTH with the master if required. */
    if (server.repl_state == REPL_STATE_SEND_AUTH) {
        if (server.masteruser && server.masterauth) {
            err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"AUTH",
                                         server.masteruser,server.masterauth,NULL);
            if (err) goto write_error;
            server.repl_state = REPL_STATE_RECEIVE_AUTH;
            return;
        } else if (server.masterauth) {
            err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"AUTH",server.masterauth,NULL);
            if (err) goto write_error;
            server.repl_state = REPL_STATE_RECEIVE_AUTH;
            return;
        } else {
            server.repl_state = REPL_STATE_SEND_PORT;
        }
    }

    /* Receive AUTH reply. */
    if (server.repl_state == REPL_STATE_RECEIVE_AUTH) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);
        if (err[0] == '-') {
            serverLog(LL_WARNING,"Unable to AUTH to MASTER: %s",err);
            sdsfree(err);
            goto error;
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_PORT;
    }

    /* Set the slave port, so that Master's INFO command can list the
     * slave listening port correctly. */
    if (server.repl_state == REPL_STATE_SEND_PORT) {
        int port;
        if (server.slave_announce_port) port = server.slave_announce_port;
        else if (server.tls_replication && server.tls_port) port = server.tls_port;
        else port = server.port;
        sds portstr = sdsfromlonglong(port);
        err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"REPLCONF",
                "listening-port",portstr, NULL);
        sdsfree(portstr);
        if (err) goto write_error;
        sdsfree(err);
        server.repl_state = REPL_STATE_RECEIVE_PORT;
        return;
    }

    /* Receive REPLCONF listening-port reply. */
    if (server.repl_state == REPL_STATE_RECEIVE_PORT) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);
        /* Ignore the error if any, not all the Redis versions support
         * REPLCONF listening-port. */
        if (err[0] == '-') {
            serverLog(LL_NOTICE,"(Non critical) Master does not understand "
                                "REPLCONF listening-port: %s", err);
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_IP;
    }

    /* Skip REPLCONF ip-address if there is no slave-announce-ip option set. */
    if (server.repl_state == REPL_STATE_SEND_IP &&
        server.slave_announce_ip == NULL)
    {
            server.repl_state = REPL_STATE_SEND_CAPA;
    }

    /* Set the slave ip, so that Master's INFO command can list the
     * slave IP address port correctly in case of port forwarding or NAT. */
    if (server.repl_state == REPL_STATE_SEND_IP) {
        err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"REPLCONF",
                "ip-address",server.slave_announce_ip, NULL);
        if (err) goto write_error;
        sdsfree(err);
        server.repl_state = REPL_STATE_RECEIVE_IP;
        return;
    }

    /* Receive REPLCONF ip-address reply. */
    if (server.repl_state == REPL_STATE_RECEIVE_IP) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);
        /* Ignore the error if any, not all the Redis versions support
         * REPLCONF listening-port. */
        if (err[0] == '-') {
            serverLog(LL_NOTICE,"(Non critical) Master does not understand "
                                "REPLCONF ip-address: %s", err);
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_CAPA;
    }

    /* Inform the master of our (slave) capabilities.
     *
     * EOF: supports EOF-style RDB transfer for diskless replication.
     * PSYNC2: supports PSYNC v2, so understands +CONTINUE <new repl ID>.
     *
     * The master will ignore capabilities it does not understand. */
    if (server.repl_state == REPL_STATE_SEND_CAPA) {
        err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"REPLCONF",
                "capa","eof","capa","psync2",NULL);
        if (err) goto write_error;
        sdsfree(err);
        server.repl_state = REPL_STATE_RECEIVE_CAPA;
        return;
    }

    /* Receive CAPA reply. */
    if (server.repl_state == REPL_STATE_RECEIVE_CAPA) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);
        /* Ignore the error if any, not all the Redis versions support
         * REPLCONF capa. */
        if (err[0] == '-') {
            serverLog(LL_NOTICE,"(Non critical) Master does not understand "
                                  "REPLCONF capa: %s", err);
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_PSYNC;
    }

    /* Try a partial resynchonization. If we don't have a cached master
     * slaveTryPartialResynchronization() will at least try to use PSYNC
     * to start a full resynchronization so that we get the master run id
     * and the global offset, to try a partial resync at the next
     * reconnection attempt. */
    if (server.repl_state == REPL_STATE_SEND_PSYNC) {
        if (slaveTryPartialResynchronization(conn,0) == PSYNC_WRITE_ERROR) {
            err = sdsnew("Write error sending the PSYNC command.");
            goto write_error;
        }
        server.repl_state = REPL_STATE_RECEIVE_PSYNC;
        return;
    }

    /* If reached this point, we should be in REPL_STATE_RECEIVE_PSYNC. */
    if (server.repl_state != REPL_STATE_RECEIVE_PSYNC) {
        serverLog(LL_WARNING,"syncWithMaster(): state machine error, "
                             "state should be RECEIVE_PSYNC but is %d",
                             server.repl_state);
        goto error;
    }

    psync_result = slaveTryPartialResynchronization(conn,1);
    if (psync_result == PSYNC_WAIT_REPLY) return; /* Try again later... */

    /* If the master is in an transient error, we should try to PSYNC
     * from scratch later, so go to the error path. This happens when
     * the server is loading the dataset or is not connected with its
     * master and so forth. */
    if (psync_result == PSYNC_TRY_LATER) goto error;

    /* Note: if PSYNC does not return WAIT_REPLY, it will take care of
     * uninstalling the read handler from the file descriptor. */

    if (psync_result == PSYNC_CONTINUE) {
        serverLog(LL_NOTICE, "MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.");
        if (server.supervised_mode == SUPERVISED_SYSTEMD) {
            redisCommunicateSystemd("STATUS=MASTER <-> REPLICA sync: Partial Resynchronization accepted. Ready to accept connections.\n");
            redisCommunicateSystemd("READY=1\n");
        }
        return;
    }

    /* PSYNC failed or is not supported: we want our slaves to resync with us
     * as well, if we have any sub-slaves. The master may transfer us an
     * entirely different data set and we have no way to incrementally feed
     * our slaves after that. */
    disconnectSlaves(); /* Force our slaves to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained slaves to PSYNC. */

    /* Fall back to SYNC if needed. Otherwise psync_result == PSYNC_FULLRESYNC
     * and the server.master_replid and master_initial_offset are
     * already populated. */
    if (psync_result == PSYNC_NOT_SUPPORTED) {
        serverLog(LL_NOTICE,"Retrying with SYNC...");
        if (connSyncWrite(conn,"SYNC\r\n",6,server.repl_syncio_timeout*1000) == -1) {
            serverLog(LL_WARNING,"I/O error writing to MASTER: %s",
                strerror(errno));
            goto error;
        }
    }

    /* Prepare a suitable temp file for bulk transfer */
    if (!useDisklessLoad()) {
        while(maxtries--) {
            snprintf(tmpfile,256,
                "temp-%d.%ld.rdb",(int)server.unixtime,(long int)getpid());
            dfd = open(tmpfile,O_CREAT|O_WRONLY|O_EXCL,0644);
            if (dfd != -1) break;
            sleep(1);
        }
        if (dfd == -1) {
            serverLog(LL_WARNING,"Opening the temp file needed for MASTER <-> REPLICA synchronization: %s",strerror(errno));
            goto error;
        }
        server.repl_transfer_tmpfile = zstrdup(tmpfile);
        server.repl_transfer_fd = dfd;
    }

    /* Setup the non blocking download of the bulk file. */
    if (connSetReadHandler(conn, readSyncBulkPayload)
            == C_ERR)
    {
        char conninfo[CONN_INFO_LEN];
        serverLog(LL_WARNING,
            "Can't create readable event for SYNC: %s (%s)",
            strerror(errno), connGetInfo(conn, conninfo, sizeof(conninfo)));
        goto error;
    }

    server.repl_state = REPL_STATE_TRANSFER;
    server.repl_transfer_size = -1;
    server.repl_transfer_read = 0;
    server.repl_transfer_last_fsync_off = 0;
    server.repl_transfer_lastio = server.unixtime;
    return;

error:
    if (dfd != -1) close(dfd);
    connClose(conn);
    server.repl_transfer_s = NULL;
    if (server.repl_transfer_fd != -1)
        close(server.repl_transfer_fd);
    if (server.repl_transfer_tmpfile)
        zfree(server.repl_transfer_tmpfile);
    server.repl_transfer_tmpfile = NULL;
    server.repl_transfer_fd = -1;
    server.repl_state = REPL_STATE_CONNECT;
    return;

write_error: /* Handle sendSynchronousCommand(SYNC_CMD_WRITE) errors. */
    serverLog(LL_WARNING,"Sending command to master in replication handshake: %s", err);
    sdsfree(err);
    goto error;
}

1.5.5 从机向主机发送PING命令 

1.找到syncWithMaster中从机发送PING命令的分支
void syncWithMaster(connection *conn) {
    char *err = NULL;    
    ...
    /* Send a PING to check the master is able to reply without errors. */

/*如果从机的复制状态为REPL_STATE_CONNECTING,发送一个PING去检查主节点是否能正确回复一个PONG*/
    if (server.repl_state == REPL_STATE_CONNECTING) {
        serverLog(LL_NOTICE,"Non blocking connect for SYNC fired the event.");
        /* Delete the writable event so that the readable event remains
         * registered and we can wait for the PONG reply. */
        /* 设置读事件的处理函数为syncWithMaster */
        connSetReadHandler(conn, syncWithMaster);
        /* 设置读事件的处理函数为NULL,目的是暂时对触发的写事件不做处理 */
        connSetWriteHandler(conn, NULL);
        /* 将server.repl_state置为“已经发送了PING请求包,并等待接受主服务器PONG回复”*/
        server.repl_state = REPL_STATE_RECEIVE_PONG;
        /* Send the PING, don't check for errors at all, we have the timeout
         * that will take care about this. */
        err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"PING",NULL);
        if (err) goto write_error;
        return;
    }
    ...
}

2.对于SYNC_CMD_WRITE的理解
#define SYNC_CMD_READ (1<<0)
#define SYNC_CMD_WRITE (1<<1)
#define SYNC_CMD_FULL (SYNC_CMD_READ|SYNC_CMD_WRITE)


3.sendSynchronousCommand中写命令的实现调用的是connSyncWrite,
  最终调用的是syncWrite
char *sendSynchronousCommand(int flags, connection *conn, ...) {

    /* Create the command to send to the master, we use redis binary
     * protocol to make sure correct arguments are sent. This function
     * is not safe for all binary data. */
    if (flags & SYNC_CMD_WRITE) {
        char *arg;
        va_list ap;
        sds cmd = sdsempty();
        sds cmdargs = sdsempty();
        size_t argslen = 0;
        va_start(ap,conn);

        while(1) {
            arg = va_arg(ap, char*);
            if (arg == NULL) break;

            cmdargs = sdscatprintf(cmdargs,"$%zu\r\n%s\r\n",strlen(arg),arg);
            argslen++;
        }

        va_end(ap);

        cmd = sdscatprintf(cmd,"*%zu\r\n",argslen);
        cmd = sdscatsds(cmd,cmdargs);
        sdsfree(cmdargs);

        /* Transfer command to the server. */
        if (connSyncWrite(conn,cmd,sdslen(cmd),server.repl_syncio_timeout*1000)
            == -1)
        {
            sdsfree(cmd);
            return sdscatprintf(sdsempty(),"-Writing to master: %s",
                    connGetLastError(conn));
        }
        sdsfree(cmd);
    }

    /* Read the reply from the server. */
    if (flags & SYNC_CMD_READ) {
        char buf[256];

        if (connSyncReadLine(conn,buf,sizeof(buf),server.repl_syncio_timeout*1000)
            == -1)
        {
            return sdscatprintf(sdsempty(),"-Reading from master: %s",
                    strerror(errno));
        }
        server.repl_transfer_lastio = server.unixtime;
        return sdsnew(buf);
    }
    return NULL;
}

static inline ssize_t connSyncWrite(connection *conn, char *ptr, ssize_t size, long long timeout) {
    return conn->type->sync_write(conn, ptr, size, timeout);
}

ConnectionType CT_Socket = {
...
.sync_write = connSocketSyncWrite,
...
}

static ssize_t connSocketSyncWrite(connection *conn, char *ptr, ssize_t size, long long timeout) {
    return syncWrite(conn->fd, ptr, size, timeout);
}

4. syncWrite的实现
/* Write the specified payload to 'fd'. If writing the whole payload will be
 * done within 'timeout' milliseconds the operation succeeds and 'size' is
 * returned. Otherwise the operation fails, -1 is returned, and an unspecified
 * partial write could be performed against the file descriptor. */
ssize_t syncWrite(int fd, char *ptr, ssize_t size, long long timeout) {
    ssize_t nwritten, ret = size;
    long long start = mstime();
    long long remaining = timeout;

    while(1) {
        long long wait = (remaining > SYNCIO__RESOLUTION) ?
                          remaining : SYNCIO__RESOLUTION;
        long long elapsed;

        /* Optimistically try to write before checking if the file descriptor
         * is actually writable. At worst we get EAGAIN. */
        nwritten = write(fd,ptr,size);
        if (nwritten == -1) {
            if (errno != EAGAIN) return -1;
        } else {
            ptr += nwritten;
            size -= nwritten;
        }
        if (size == 0) return ret;

        /* Wait */
        aeWait(fd,AE_WRITABLE,wait);
        elapsed = mstime() - start;
        if (elapsed >= timeout) {
            errno = ETIMEDOUT;
            return -1;
        }
        remaining = timeout - elapsed;
    }
}

读1.5.3中的connSocketConnect实现,可以发现建立网络连接后,向循环事件注册fd的AE_WRITABLE事件,
因此会触发一个AE_WRITABLE事件,调用syncWithMaster()函数,处理写事件.
(待研究:写事件的注册是在connect之后发生的,仅仅通过fd, epoll就知道fd之前发生的事件吗?)
根据当前的REPL_STATE_CONNECTING状态,从节点向主节点发送PING命令, PING命令的目的有:

  1. 检测主从节点之间的网络是否可用;
  2. 检查主从节点当前是否接受处理命令;

发送PING命令主要的代码逻辑是:

  1. 将与主机创建连接返回的fd写事件设置为不做处理, 因为接下来要读主节点服务器发送过来的PONG回复, 此时可只处理读事件;
  2. 设置从节点的复制状态为REPL_STATE_RECEIVE_PONG, 等待一个主节点回复一个PONG命令;
  3. 以写的方式调用sendSynchronousCommand()函数发送一个PING命令给主节点.

从机的复制状态变化情况为:

  • REPL_STATE_CONNECTING--->REPL_STATE_RECEIVE_PONG

 1.5.6 从机接受并解析来自主机的对于PING命令的回复

    
1.找到syncWithMaster中从机接受主机对于PING命令的回复的分支
void syncWithMaster(connection *conn) {
    char tmpfile[256], *err = NULL;
    ... 
   /* Receive the PONG command. */
   /* 复制状态是REPL_STATE_RECEIVE_PONG */
    if (server.repl_state == REPL_STATE_RECEIVE_PONG) {
        /* 发送读请求 */
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);

        /* We accept only two replies as valid, a positive +PONG reply
         * (we just check for "+") or an authentication error.
         * Note that older versions of Redis replied with "operation not
         * permitted" instead of using a proper error code, so we test
         * both. */
        /* 现在的版本中我们只接受两种有效的回复:
           (1)如果一切正常的话, 就是“+PONG”;
            (2)如果有问题的话就是验证错误.
           怎么感觉这里写的有点问题?四个条件都要满足吗?err是什么样子的?
           除非是比如"-abc"这样子,才可能使得四个判断条件都是1,走到错误响应的分支,
           如果是"-NOAUTH"这样子的,是不能走到错误响应的分支的,
           所以可以测试下看看err在错误情况下装的什么内容.     
           */
        if (err[0] != '+' &&
            strncmp(err,"-NOAUTH",7) != 0 &&
            strncmp(err,"-NOPERM",7) != 0 &&
            strncmp(err,"-ERR operation not permitted",28) != 0)
        {
            serverLog(LL_WARNING,"Error reply to PING from master: '%s'",err);
            sdsfree(err);
            goto error;
        } else {
            serverLog(LL_NOTICE,
                "Master replied to PING, replication can continue...");
        } 
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_AUTH;
    }
    ...
}


2.如1.5.5中的类似分析, sendSynchronousCommand(SYNC_CMD_READ,conn,NULL)最终的实现是 
  syncRead,其实现如下:
  
/* Read the specified amount of bytes from 'fd'. If all the bytes are read
 * within 'timeout' milliseconds the operation succeed and 'size' is returned.
 * Otherwise the operation fails, -1 is returned, and an unspecified amount of
 * data could be read from the file descriptor. */
ssize_t syncRead(int fd, char *ptr, ssize_t size, long long timeout) {
    ssize_t nread, totread = 0;
    long long start = mstime();
    long long remaining = timeout;

    if (size == 0) return 0;
    while(1) {
        long long wait = (remaining > SYNCIO__RESOLUTION) ?
                          remaining : SYNCIO__RESOLUTION;
        long long elapsed;

        /* Optimistically try to read before checking if the file descriptor
         * is actually readable. At worst we get EAGAIN. */
        nread = read(fd,ptr,size);
        if (nread == 0) return -1; /* short read. */
        if (nread == -1) {
            if (errno != EAGAIN) return -1;
        } else {
            ptr += nread;
            size -= nread;
            totread += nread;
        }
        if (size == 0) return totread;

        /* Wait */
        aeWait(fd,AE_READABLE,wait);
        elapsed = mstime() - start;
        if (elapsed >= timeout) {
            errno = ETIMEDOUT;
            return -1;
        }
        remaining = timeout - elapsed;
    }
}

在这个步骤当中, 当发现复制状态为REPL_STATE_RECEIVE_PONG的时候, 以读的方式调用sendSynchronousCommand(), 如果一切都没有问题, 将正确接受并读到的来自master的"+PONG\r\n", 此时会将从节点的复制状态设置为server.repl_state = REPL_STATE_SEND_AUTH, 之后进行下一步的操作. 此处的从节点的复制状态变更情况为:

  • REPL_STATE_RECEIVE_PONG--->REPL_STATE_SEND_AUTH

1.5.7 发送权限验证相关信息的逻辑处理

/* 
  配置文件中给出的masterauth和masteruser的解释,当主服务器需要密码验证的情况下,从机请求
  从主机那里同步数据就需要使用masterauth <master-password>,在6以上版本最好把特定的用户
  也加上.
# If the master is password protected (using the "requirepass" configuration
# directive below) it is possible to tell the replica to authenticate before
# starting the replication synchronization process, otherwise the master will
# refuse the replica request.
#
# masterauth <master-password>
#
# However this is not enough if you are using Redis ACLs (for Redis version
# 6 or greater), and the default user is not capable of running the PSYNC
# command and/or other commands needed for replication. In this case it's
# better to configure a special user to use with replication, and specify the
# masteruser configuration as such:
#
# masteruser <username>
#
*/  
void syncWithMaster(connection *conn) {
  char tmpfile[256], *err = NULL; 
  ...    
   /* AUTH with the master if required. */
    if (server.repl_state == REPL_STATE_SEND_AUTH) {
        /* 
           如果配置的用于同步的主机用户名和密码都不为NULL,将配置文件中配置的主机用户名
           和主机密码发送给主机用于主从同步的验证码,将从节点的复制状态更新为
           REPL_STATE_RECEIVE_AUTH      
        */
        if (server.masteruser && server.masterauth) {
            err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"AUTH",
                                         server.masteruser,server.masterauth,NULL);
            if (err) goto write_error;
            server.repl_state = REPL_STATE_RECEIVE_AUTH;
            return;
        } 
        /* 如果配置的用于同步的主机密码不为NULL,将配置文件中配置的主机密码发送给主机用于主
           从同步的验证码,将从节点的复制状态更新为REPL_STATE_RECEIVE_AUTH
        */
        else if (server.masterauth) 
        {
            err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"AUTH",server.masterauth,NULL);
            if (err) goto write_error;
            server.repl_state = REPL_STATE_RECEIVE_AUTH;
            return;
        } 
        else 
        /* 如果没有配置需要验证,那么将从节点的复制状态更新为REPL_STATE_SEND_PORT       
        */
        {
            server.repl_state = REPL_STATE_SEND_PORT;
        }
    }
  ...
}

在这个步骤当中, 当发现复制状态为REPL_STATE_RECEIVE_AUTH的时候, 以写的方式调用sendSynchronousCommand(),根据配置做好相应的处理, 在本步骤进行完毕后,  此处的从节点的复制状态变更为:

  • REPL_STATE_SEND_AUTH--->REPL_STATE_RECEIVE_AUTH

或者:

  • REPL_STATE_SEND_AUTH--->REPL_STATE_SEND_PORT

1.5.8 接受来自主节点对于权限相关验证的消息

void syncWithMaster(connection *conn) {
  char tmpfile[256], *err = NULL; 
  ...      
    /* Receive AUTH reply. */
    if (server.repl_state == REPL_STATE_RECEIVE_AUTH) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);
        if (err[0] == '-') {
            serverLog(LL_WARNING,"Unable to AUTH to MASTER: %s",err);
            sdsfree(err);
            goto error;
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_PORT;
    }
  ...
}

主节点会读取到AUTH命令, 调用authCommand()函数来处理, 主节点服务器会比较从节点发送过来的server.masterauth和主节点服务器保存的server.requirepass是否一致,如果一致,会回复一个"+OK\r\n".在本步骤进行完毕后,  此处的从节点的复制状态变更情况为:

  • REPL_STATE_RECEIVE_AUTH--->REPL_STATE_SEND_PORT

1.5.9 发送端口号给主节点

/* 配置文件中对端口号和IP的描述

# A Redis master is able to list the address and port of the attached
# replicas in different ways. For example the "INFO replication" section
# offers this information, which is used, among other tools, by
# Redis Sentinel in order to discover replica instances.
# Another place where this info is available is in the output of the
# "ROLE" command of a master.
#
# The listed IP and address normally reported by a replica is obtained
# in the following way:
#
#   IP: The address is auto detected by checking the peer address
#   of the socket used by the replica to connect with the master.
#
#   Port: The port is communicated by the replica during the replication
#   handshake, and is normally the port that the replica is using to
#   listen for connections.
#
# However when port forwarding or Network Address Translation (NAT) is
# used, the replica may be actually reachable via different IP and port
# pairs. The following two options can be used by a replica in order to
# report to its master a specific set of IP and port, so that both INFO
# and ROLE will report those values.
#
# There is no need to use both the options if you need to override just
# the port or the IP address.
#
# replica-announce-ip 5.5.5.5
# replica-announce-port 1234
*/

void syncWithMaster(connection *conn) {
  char tmpfile[256], *err = NULL; 
  ...      
     /* Set the slave port, so that Master's INFO command can list the
     * slave listening port correctly. */
     /* 向主节点报告自己的端口信息,这样主节点之后才能用INFO命令正确地打印出自己的从节点
        信息
     */
    if (server.repl_state == REPL_STATE_SEND_PORT) {
        int port;
        if (server.slave_announce_port) port = server.slave_announce_port;
        else if (server.tls_replication && server.tls_port) port = server.tls_port;
        else port = server.port;
        sds portstr = sdsfromlonglong(port);
        err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"REPLCONF",
                "listening-port",portstr, NULL);
        sdsfree(portstr);
        if (err) goto write_error;
        sdsfree(err);
        server.repl_state = REPL_STATE_RECEIVE_PORT;
        return;
    }
  ...
}

发送端口号信息给主节, 以REPLCONF listening-port命令的方式, 将复制状态设置为REPL_STATE_RECEIVE_PORT, 等待接受主节点的回复. 主节点从fd中读到REPLCONF listening-port <port>命令, 调用replconfCommand()命令来处理, 而replconfCommand()函数的定义就在replication.c文件中, REPLCONF命令可以设置多种不同的选项, 解析到端口号后,将端口号保存从节点对应client状态的c->slave_listening_port = port中, 最终回复一个"+OK\r\n" . 当主节点将回复写到fd时, 又会触发从节点的可读事件, 从节点紧接着调用syncWithMaster()函数来处理回复的信息.在本步骤进行完毕后,  此处的从节点的复制状态变更情况为:

  • REPL_STATE_SEND_PORT--->REPL_STATE_RECEIVE_PORT 

 1.5.10 接受处理来自主节点对于端口号信息的回复

void syncWithMaster(connection *conn) {
  char tmpfile[256], *err = NULL; 
  ...      
     /* Receive REPLCONF listening-port reply. */
    if (server.repl_state == REPL_STATE_RECEIVE_PORT) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);
        /* Ignore the error if any, not all the Redis versions support
         * REPLCONF listening-port. */
        if (err[0] == '-') {
            serverLog(LL_NOTICE,"(Non critical) Master does not understand "
                                "REPLCONF listening-port: %s", err);
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_IP;
    }
  ...
}

经过这几轮看现在再看就很简单了, 注意一下即使这里主节点返回的消息并不是正面的, 我们也忽略它, 因为并不是所有的redis都支持 ‘’ REPLCONF listening-port‘’,  在本步骤进行完毕后,  此处的从节点的复制状态变更情况为:

  • REPL_STATE_RECEIVE_PORT--->REPL_STATE_SEND_IP

 1.5.11 发送IP给主节点

void syncWithMaster(connection *conn) {
  char tmpfile[256], *err = NULL; 
  ...      
    /* Skip REPLCONF ip-address if there is no slave-announce-ip option set. */
    /* 如果slave-announce-ip没有设置就跳过 */
    if (server.repl_state == REPL_STATE_SEND_IP &&
        server.slave_announce_ip == NULL)
    {
            server.repl_state = REPL_STATE_SEND_CAPA;
    }
 
    /* Set the slave ip, so that Master's INFO command can list the
     * slave IP address port correctly in case of port forwarding or NAT. */
    /*
       如果设置了从机的ip, 主节点就可以通过INFO命令正确地列出真实的ip,否在如果经过
       端口转换或者地址转换的话, 主节点可能就无法将真实的ip地址列出
    */
    if (server.repl_state == REPL_STATE_SEND_IP) {
        err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"REPLCONF",
                "ip-address",server.slave_announce_ip, NULL);
        if (err) goto write_error;
        sdsfree(err);
        server.repl_state = REPL_STATE_RECEIVE_IP; 
        return;
    }
   ...
}

向主节点调用REPLCONF ip-address <server.slave_announce_ip >命令的方式, 将从节点的IP写到fd中并发给主节点, 并且设置从节点的复制状态为REPL_STATE_RECEIVE_IP, 等待接受主节点的回复, 然后就直接返回, 等待fd可读事件触发. 主节点仍然会调用replication.c文件中实现的replconfCommand()函数来处理REPLCONF命令, 解析出REPLCONF ip-address ip命令,保存从节点的ip到主节点的对应从节点的client的c->slave_ip中, 将"+OK\r\n"状态, 写到fd中, 发给从节点. 此时, 从节点监听到fd触发了可读事件, 会调用syncWithMaster()函数来处理, 验证主节点是否正确接收到从节点的IP.   在本步骤进行完毕后,  此处的从节点的复制状态变更情况为:

  • REPL_STATE_SEND_IP--->REPL_STATE_RECEIVE_IP
  • REPL_STATE_SEND_IP--->REPL_STATE_SEND_CAPA

1.5.12 接受处理来自主节点对于IP信息的回复

void syncWithMaster(connection *conn) {
  char tmpfile[256], *err = NULL; 
  ...      
     /* Receive REPLCONF ip-address reply. */
    if (server.repl_state == REPL_STATE_RECEIVE_IP) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);
        /* Ignore the error if any, not all the Redis versions support
         * REPLCONF listening-port. */
        /* 这里的注释是不是写错了?*/
        if (err[0] == '-') {
            serverLog(LL_NOTICE,"(Non critical) Master does not understand "
                                "REPLCONF ip-address: %s", err);
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_CAPA;
    }
   ...
}

接受处理主节点返回的关于IP信息的消息, 注意一下即使这里主节点返回的消息并不是正面的, 我们也忽略它, 因为并不是所有的redis都支持 ‘’ REPLCONF ip-address‘’,  在本步骤进行完毕后,  此处的从节点的复制状态变更情况为:

  • REPL_STATE_SEND_IP--->REPL_STATE_SEND_CAPA

  1.5.13  发送CAPA(发送能力)给主节点


void syncWithMaster(connection *conn) {
  char tmpfile[256], *err = NULL; 
  ...      
     /* Inform the master of our (slave) capabilities.
     *
     * EOF: supports EOF-style RDB transfer for diskless replication.
     * PSYNC2: supports PSYNC v2, so understands +CONTINUE <new repl ID>.
     *
     * The master will ignore capabilities it does not understand. */
     /* 通知主节点自己拥有的发送能力(我觉得说同步手段更好):
       (1)EOF:支持不经过磁盘的EOF类型的RDB操作(大概就是二进制流并通过内存直接导入);
       (2)PSYNC2:支持PSYNC v2, 它可以理解+CONTINUE <new repl ID>命令.
        主节点将会忽略它不懂的同步方式.
    */
    
    if (server.repl_state == REPL_STATE_SEND_CAPA) {
        err = sendSynchronousCommand(SYNC_CMD_WRITE,conn,"REPLCONF",
                "capa","eof","capa","psync2",NULL);
        if (err) goto write_error;
        sdsfree(err);
        server.repl_state = REPL_STATE_RECEIVE_CAPA;
        return;
    }
   ...
}

从节点将REPLCONF capa eof capa psync2 命令发送给主节点 , 写到fd中, 主节点仍然会调用replication.c文件中实现的replconfCommand()函数来处理REPLCONF命令, 解析出REPLCONF capa eof capa psync2命令, 将信息存入到client的c->slave_capa中, 然后将"+OK\r\n"写到fd中, 此时, 从节点监听到fd触发了可读事件,会调用syncWithMaster()函数来处理, 验证主节点是否正确接收到从节点的capa.  在本步骤进行完毕后,  此处的从节点的复制状态变更情况为:

  • REPL_STATE_SEND_CAPA--->REPL_STATE_RECEIVE_CAPA

  1.5.14  接受处理来自主节点对于CAPA信息的回复

   void syncWithMaster(connection *conn) {
    char tmpfile[256], *err = NULL; 
    ...  
    /* Receive CAPA reply. */
    if (server.repl_state == REPL_STATE_RECEIVE_CAPA) {
        err = sendSynchronousCommand(SYNC_CMD_READ,conn,NULL);
        /* Ignore the error if any, not all the Redis versions support
         * REPLCONF capa. */
        if (err[0] == '-') {
            serverLog(LL_NOTICE,"(Non critical) Master does not understand "
                                  "REPLCONF capa: %s", err);
        }
        sdsfree(err);
        server.repl_state = REPL_STATE_SEND_PSYNC;
    }
   ...
}

接受处理主节点返回的关于CAPA信息的消息, 注意一下即使这里主节点返回的消息并不是正面的, 我们也忽略它, 因为并不是所有的redis都支持 ‘’ REPLCONF capa‘’,  在本步骤进行完毕后,  此处的从节点的复制状态变更情况为:

  • REPL_STATE_RECEIVE_CAPA--->REPL_STATE_SEND_PSYNC

  1.5.15  发送PSYNC命令给主节点

由于篇幅的限制,将在下文中继续分析下面的内容.

2.待探索的问题

2.1  如何知道master是否支持部分重同步 

2. 2 从代码中梳理复制实现的逻辑

1)连接Socket;
2)发送PING请求包确认连接是否正确;
3)发起密码认证(如果需要);
4)信息同步;
5)发送PSYNC命令;
6)接收RDB文件并载入;
7)连接建立完成,等待主服务器同步命令请求。

2.3 从代码中梳理部分重同步的逻辑

 

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Docker Compose是一个用于定义和运行多容器Docker应用程序的工具。Redis是一个开的内存数据结构存储系统,支持多种数据结构,如字符串、哈希、列表等。Redis主从复制是一种数据复制机制,用于将一个Redis服务器的数据复制到其他Redis服务器上。 下面是一个使用Docker Compose配置Redis主从复制的示例: 1. 首先,创建一个名为docker-compose.yml的文件,并在其中定义两个Redis服务,一个作为主服务器,另一个作为从服务器。示例配置如下: ```yaml version: '3' services: redis-master: image: redis ports: - "6379:6379" volumes: - ./redis-master-data:/data command: redis-server --appendonly yes redis-slave: image: redis volumes: - ./redis-slave-data:/data command: redis-server --slaveof redis-master 6379 ``` 2. 在上述配置中,我们定义了两个服务:redis-master和redis-slave。redis-master服务使用Redis官方镜像,并将主服务器的6379端口映射到主机的6379端口。同时,我们将主服务器的数据目录挂载到本地的redis-master-data目录。 3. redis-slave服务也使用Redis官方镜像,并将从服务器的数据目录挂载到本地的redis-slave-data目录。在command字段中,我们使用--slaveof参数指定redis-slave作为redis-master的从服务器,并指定主服务器的地址和端口。 4. 在终端中,进入包含docker-compose.yml文件的目录,并运行以下命令启动Redis主从复制: ```bash docker-compose up -d ``` 5. 等待一段时间,直到两个Redis服务器都成功启动。您可以使用以下命令检查容器的状态: ```bash docker-compose ps ``` 6. 现在,您可以通过连接到主服务器的6379端口来访问Redis主服务器,并将数据复制到从服务器。您可以使用以下命令连接到Redis服务器: ```bash redis-cli -h localhost -p 6379 ``` 7. 在连接到主服务器后,可以执行一些Redis命令来设置和检索数据。这些数据将自动复制到从服务器。 这就是使用Docker Compose配置Redis主从复制的基本步骤。您可以根据需要进行调整和扩展。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值