Redis集群分析(37)

1、 hash槽

在(36)中介绍了cluster模式和其配置方法,在配置的时候有一个专门的脚本用来设置服务器的hash槽。如果想要单独为某个节点设置hash槽,可以使用cluste命令。该命令的使用详情如下:

在这里插入图片描述

如上图,在redis的客户端中使用help @cluster命令可以查看cluster的使用详情。这里我们以CLUSTER ADDSLOTS命令和CLUSTER DELSLOTS命令为例,介绍cluster模式下的hash槽。

如上图所示CLUSTER ADDSLOTS命令是向服务器添加槽位,CLUSTER DELSLOTS命令是从服务器删除槽位。

如同之前解析的命令执行方式一样,在server.c文件中可以找到该命令:

在这里插入图片描述

上图中与cluster命令对应的clusterCommand方法,实现在cluster.c文件中,其余上述提到的两个子命令相关的代码如下:

void clusterCommand(client *c) {
    
    ...
    } else if ((!strcasecmp(c->argv[1]->ptr,"addslots") ||
               !strcasecmp(c->argv[1]->ptr,"delslots")) && c->argc >= 3)
    {
        /* CLUSTER ADDSLOTS <slot> [slot] ... */
        /* CLUSTER DELSLOTS <slot> [slot] ... */
        int j, slot;
        unsigned char *slots = zmalloc(CLUSTER_SLOTS);
        int del = !strcasecmp(c->argv[1]->ptr,"delslots");

        memset(slots,0,CLUSTER_SLOTS);
        /* Check that all the arguments are parseable and that all the
         * slots are not already busy. */
        for (j = 2; j < c->argc; j++) {
            if ((slot = getSlotOrReply(c,c->argv[j])) == -1) {
                zfree(slots);
                return;
            }
            if (del && server.cluster->slots[slot] == NULL) {
                addReplyErrorFormat(c,"Slot %d is already unassigned", slot);
                zfree(slots);
                return;
            } else if (!del && server.cluster->slots[slot]) {
                addReplyErrorFormat(c,"Slot %d is already busy", slot);
                zfree(slots);
                return;
            }
            if (slots[slot]++ == 1) {
                addReplyErrorFormat(c,"Slot %d specified multiple times",
                    (int)slot);
                zfree(slots);
                return;
            }
        }
        for (j = 0; j < CLUSTER_SLOTS; j++) {
            if (slots[j]) {
                int retval;

                /* If this slot was set as importing we can clear this
                 * state as now we are the real owner of the slot. */
                if (server.cluster->importing_slots_from[j])
                    server.cluster->importing_slots_from[j] = NULL;

                retval = del ? clusterDelSlot(j) :
                               clusterAddSlot(myself,j);
                serverAssertWithInfo(c,NULL,retval == C_OK);
            }
        }
        zfree(slots);
        clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|CLUSTER_TODO_SAVE_CONFIG);
        addReply(c,shared.ok);
    ...

    }

首先是第11行,这里会判断接收到的命令是否是delslots,然后是第16行到第36行,这里是一个for循环,循环遍历的是命令传入的参数,即需要处理的槽位。

这个循环内部首先是第17行到第20行的if语句,这个语句中执行了一个getSlotOrReply方法,这个方法会将传入的参数转换为int类型的数字,如果转换失败,则执行if语句内的内容退出。

然后是第21行到29行的if和if else语句,这两个判断是为了应对两种情况,其一是该槽位没有被认领却要删除,其二是该槽位已经被认领却要添加。出现这两种情况都是先打印日志然后退出。

最后是第30到35行的if语句。这里的if语句是用于判断传入的槽位是否有重复的。

然后是第37行到50行的for循环。这里的循环其实很简单。就是遍历所有需要设置的槽位然后根据其要执行的是删除还是添加,执行clusterDelSlot方法或clusterAddSlot方法。

这里我们以添加槽位为例,细看clusterAddSlot方法。其内容如下:

/* Add the specified slot to the list of slots that node 'n' will
 * serve. Return C_OK if the operation ended with success.
 * If the slot is already assigned to another instance this is considered
 * an error and C_ERR is returned. */
int clusterAddSlot(clusterNode *n, int slot) {
    if (server.cluster->slots[slot]) return C_ERR;
    clusterNodeSetSlotBit(n,slot);
    server.cluster->slots[slot] = n;
    return C_OK;
}

这段代码很简单,重要的就两行:第7行和第8行。首先看第7行,这里调用了一个clusterNodeSetSlotBit方法,这个方法传入了两个参数:一个n,一个slot。这两个参数都是从上一个方法传入的,其中slot代表的是需要设置的槽位,而对于n来说,上一个方法传入的myself,它的结构为clusterNode。这个结构的定义在cluster.h文件中,代表着一个集群中的一个节点。其内容如下:

typedef struct clusterNode {
    mstime_t ctime; /* Node object creation time. */
    char name[CLUSTER_NAMELEN]; /* Node name, hex string, sha1-size */
    int flags;      /* CLUSTER_NODE_... */
    uint64_t configEpoch; /* Last configEpoch observed for this node */
    unsigned char slots[CLUSTER_SLOTS/8]; /* slots handled by this node */
    int numslots;   /* Number of slots handled by this node */
    int numslaves;  /* Number of slave nodes, if this is a master */
    struct clusterNode **slaves; /* pointers to slave nodes */
    struct clusterNode *slaveof; /* pointer to the master node. Note that it
                                    may be NULL even if the node is a slave
                                    if we don't have the master node in our
                                    tables. */
    mstime_t ping_sent;      /* Unix time we sent latest ping */
    mstime_t pong_received;  /* Unix time we received the pong */
    mstime_t fail_time;      /* Unix time when FAIL flag was set */
    mstime_t voted_time;     /* Last time we voted for a slave of this master */
    mstime_t repl_offset_time;  /* Unix time we received offset for this node */
    mstime_t orphaned_time;     /* Starting time of orphaned master condition */
    long long repl_offset;      /* Last known repl offset for this node. */
    char ip[NET_IP_STR_LEN];  /* Latest known IP address of this node */
    int port;                   /* Latest known clients port of this node */
    int cport;                  /* Latest known cluster port of this node. */
    clusterLink *link;          /* TCP/IP link with this node */
    list *fail_reports;         /* List of nodes signaling this as failing */
} clusterNode;

然后继续看第7行调用的clusterNodeSetSlotBit方法,其内容如下:

/* Set the slot bit and return the old value. */
int clusterNodeSetSlotBit(clusterNode *n, int slot) {
    int old = bitmapTestBit(n->slots,slot);
    bitmapSetBit(n->slots,slot);
    if (!old) {
        n->numslots++;
        /* When a master gets its first slot, even if it has no slaves,
         * it gets flagged with MIGRATE_TO, that is, the master is a valid
         * target for replicas migration, if and only if at least one of
         * the other masters has slaves right now.
         *
         * Normally masters are valid targerts of replica migration if:
         * 1. The used to have slaves (but no longer have).
         * 2. They are slaves failing over a master that used to have slaves.
         *
         * However new masters with slots assigned are considered valid
         * migration tagets if the rest of the cluster is not a slave-less.
         *
         * See https://github.com/antirez/redis/issues/3043 for more info. */
        if (n->numslots == 1 && clusterMastersHaveSlaves())
            n->flags |= CLUSTER_NODE_MIGRATE_TO;
    }
    return old;
}

这里先看第3行这里调用了一个bitmapTestBit方法,这个方法传入了两个参数,一个是传入的slot,另一个是传入的n的slots。这个slots在上面clusterNode的定义中可以查到(第6行)。它是一个字符串数组,长度为CLUSTER_SLOTS/8。CLUSTER_SLOTS的值为16384。这个参数的作用是用来存储当前节点认领的hash槽的。

这里使用字符串来存储hash槽,是因为redis对于字符串有另一种操作方式。一个字符是8位的二进制,redis提供了一种名叫bitmap的方法可以直接操作字符的二进制数据(例如将某一位的数值设为1或0)。而对于存储hash槽来说,只需要将该槽位上对应的数值设置为1便可。

然后细看上面的clusterNodeSetSlotBit方法,实际和bitmap相关的就第3行和第4行的两个方法。实际执行设置hash槽操作的是第4行的bitmapSetBit方法。该方法的内容如下:

/* Set the bit at position 'pos' in a bitmap. */
void bitmapSetBit(unsigned char *bitmap, int pos) {
    off_t byte = pos/8;
    int bit = pos&7;
    bitmap[byte] |= 1<<bit;
}

这个方法很简单,先找到该槽位应该在那个字符中,然后找到该槽位在字符的哪一位,最后将该位置设置为1。

自此,clusterNodeSetSlotBit方法便解释完成了。在添加槽位的clusterAddSlot方法除了调用这个方法的第7行外,第8行的赋值操作也很重要。第8行对参数server.cluster->slots[slot]进行赋值,将其值赋值为n。这个n之前解释过是代表了服务器节点。这里需要详细的解释参数server.cluster->slots[slot]。
首先是server.cluster这个参数会在clusterInit方法中被赋值,其赋值代码片段如下:
在这里插入图片描述

这里可看见其赋值的结构为clusterState,这个结构定义在cluster.h中,其内容如下:

typedef struct clusterState {
    clusterNode *myself;  /* This node */
    uint64_t currentEpoch;
    int state;            /* CLUSTER_OK, CLUSTER_FAIL, ... */
    int size;             /* Num of master nodes with at least one slot */
    dict *nodes;          /* Hash table of name -> clusterNode structures */
    dict *nodes_black_list; /* Nodes we don't re-add for a few seconds. */
    clusterNode *migrating_slots_to[CLUSTER_SLOTS];
    clusterNode *importing_slots_from[CLUSTER_SLOTS];
    clusterNode *slots[CLUSTER_SLOTS];
    uint64_t slots_keys_count[CLUSTER_SLOTS];
    rax *slots_to_keys;
    /* The following fields are used to take the slave state on elections. */
    mstime_t failover_auth_time; /* Time of previous or next election. */
    int failover_auth_count;    /* Number of votes received so far. */
    int failover_auth_sent;     /* True if we already asked for votes. */
    int failover_auth_rank;     /* This slave rank for current auth request. */
    uint64_t failover_auth_epoch; /* Epoch of the current election. */
    int cant_failover_reason;   /* Why a slave is currently not able to
                                   failover. See the CANT_FAILOVER_* macros. */
    /* Manual failover state in common. */
    mstime_t mf_end;            /* Manual failover time limit (ms unixtime).
                                   It is zero if there is no MF in progress. */
    /* Manual failover state of master. */
    clusterNode *mf_slave;      /* Slave performing the manual failover. */
    /* Manual failover state of slave. */
    long long mf_master_offset; /* Master offset the slave needs to start MF
                                   or zero if stil not received. */
    int mf_can_start;           /* If non-zero signal that the manual failover
                                   can start requesting masters vote. */
    /* The followign fields are used by masters to take state on elections. */
    uint64_t lastVoteEpoch;     /* Epoch of the last vote granted. */
    int todo_before_sleep; /* Things to do in clusterBeforeSleep(). */
    /* Messages received and sent by type. */
    long long stats_bus_messages_sent[CLUSTERMSG_TYPE_COUNT];
    long long stats_bus_messages_received[CLUSTERMSG_TYPE_COUNT];
    long long stats_pfail_nodes;    /* Number of nodes in PFAIL status,
                                       excluding nodes without address. */
} clusterState;

在第10行可以看见,之前第8行代码赋值的server.cluster的slots参数的定义。它是一个clusterNode的数组。

自此,redis添加槽位的操作便解析完了。这里主要做了两个操作:首先是在代表其自身节点的clusterNode结构中将要添加的槽位设置到存储槽位的字符中。然后是在clusterState结构中将其节点设置到代表hash槽数组的指定位置。

从上面的代码中可以看出redis将hash槽的信息存储在了两个地方:第一个是名为server.cluster的clusterState结构中的slots;第二个是clusterNode结构中的slots。两个地方存储的hash槽信息有些许不同,clusterState中存储的是整个集群的hash槽信息,它以hash槽和其对应的节点一一对应的方式存储在数组中。clusterNode中存储的是当前节点的槽位信息。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值