Redis集群里的节点支持动态删除,但是一般情况下不会这么做,只有在节点软硬件升级的时候才会主动让节点下线。删除节点的方式就是redis-cli客户端连接到服务器,然后执行cluster forget node-id就可以了,如果是删除一个从节点的话,集群仍然是可用状态,如果是删除一个主节点的话,集群的槽位不足,就会变成不可用状态。
下边看下我在自己的虚拟机运行的例子
- 127.0.0.1:7000> cluster info
- cluster_state:ok
- cluster_slots_assigned:16384
- cluster_slots_ok:16384
- cluster_slots_pfail:0
- cluster_slots_fail:0
- cluster_known_nodes:6
- cluster_size:3
- cluster_current_epoch:8
- cluster_my_epoch:7
- cluster_stats_messages_sent:2058
- cluster_stats_messages_received:1596
-
- 127.0.0.1:7000> cluster nodes
- 930daea84150b5fabd32a95592781b27ceab1b71 192.168.39.153:7001 master - 0 1479044139420 2 connected 5461-10922
- 8a6707d5b9269b6260315b47f300c1ab599733b7 192.168.39.153:7005 slave bdb62bb6ffce71588961f513c74b0d5a1a7145ea 0 1479044141441 6 connected
- bdb62bb6ffce71588961f513c74b0d5a1a7145ea 192.168.39.153:7002 master - 0 1479044139925 3 connected 10923-16383
- 81c884ebfc919ad293f02d797aff1033025ac27e 192.168.39.153:7004 slave 930daea84150b5fabd32a95592781b27ceab1b71 0 1479044140937 2 connected
- 099cfc6fbb785449a8bf5369a53d21a9e127fa42 192.168.39.153:7000 myself,slave a8081e97862d9cf76c72d364f9a173187376f215 0 0 1 connected
- a8081e97862d9cf76c72d364f9a173187376f215 192.168.39.153:7003 master - 0 1479044140430 7 connected 0-5460
从上边的运行结果可以看出,集群有六个节点,分别是192.168.39.153:7000、192.168.39.153:7001、192.168.39.153:7002、192.168.39.153:7003、192.168.39.153:7004、192.168.39.153:7005。对应的node-id是099cfc6fbb785449a8bf5369a53d21a9e127fa42、930daea84150b5fabd32a95592781b27ceab1b71、bdb62bb6ffce71588961f513c74b0d5a1a7145ea、a8081e97862d9cf76c72d364f9a173187376f215、81c884ebfc919ad293f02d797aff1033025ac27e、8a6707d5b9269b6260315b47f300c1ab599733b7。
然后我们删除从节点192.168.39.153:7004
- 127.0.0.1:7000> cluster forget 81c884ebfc919ad293f02d797aff1033025ac27e
- OK
- 127.0.0.1:7000> cluster info
- cluster_state:ok
- cluster_slots_assigned:16384
- cluster_slots_ok:16384
- cluster_slots_pfail:0
- cluster_slots_fail:0
- cluster_known_nodes:5
- cluster_size:3
- cluster_current_epoch:8
- cluster_my_epoch:7
- cluster_stats_messages_sent:2403
- cluster_stats_messages_received:1941
可以看到,删除了节点后,cluster_known_nodes显示的值就是5,如果我们输入cluster nodes会发现原先的192.168.39.153:7004节点就找不到了,因为他已经从每一个节点的记录中删除了。同事我们也看到cluster_state:ok,说明集群状态仍然是可用的。
那我们尝试着删除主节点192.168.39.153:7001看看。
- 127.0.0.1:7000> cluster forget 930daea84150b5fabd32a95592781b27ceab1b71
- OK
- 127.0.0.1:7000> cluster info
- cluster_state:fail
- cluster_slots_assigned:10922
- cluster_slots_ok:10922
- cluster_slots_pfail:0
- cluster_slots_fail:0
- cluster_known_nodes:5
- cluster_size:2
- cluster_current_epoch:8
- cluster_my_epoch:7
- cluster_stats_messages_sent:2627
- cluster_stats_messages_received:2165
删除了192.168.39.153:7001后集群状态就是cluster_state:fail,说明集群此时是不可用的。
我们看看redis源代码,看看forget删除节点是怎么实现的,在redis/cluster.c文件里,客户端传入的forget参数会进入clusterCommand函数
- —————————————————————————————————
- } else if (!strcasecmp(c->argv[1]->ptr,"forget") && c->argc == 3) {
-
- clusterNode *n = clusterLookupNode(c->argv[2]->ptr);
-
-
- if (!n) {
- addReplyErrorFormat(c,"Unknown node %s", (char*)c->argv[2]->ptr);
- return;
-
- } else if (n == myself) {
- addReplyError(c,"I tried hard but I can't forget myself...");
- return;
- } else if (nodeIsSlave(myself) && myself->slaveof == n) {
- addReplyError(c,"Can't forget my master!");
- return;
- }
-
-
- clusterBlacklistAddNode(n);
-
- clusterDelNode(n);
-
- clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|
- CLUSTER_TODO_SAVE_CONFIG);
- addReply(c,shared.ok);
-
- }
- —————————————————————————————————
我们继续看clusterBlacklistAddNode函数是如何把node加入到黑名单的
-
- void clusterBlacklistAddNode(clusterNode *node) {
- dictEntry *de;
- sds id = sdsnewlen(node->name,REDIS_CLUSTER_NAMELEN);
-
-
- clusterBlacklistCleanup();
-
-
- if (dictAdd(server.cluster->nodes_black_list,id,NULL) == DICT_OK) {
- id = sdsdup(id);
- }
-
- de = dictFind(server.cluster->nodes_black_list,id);
- dictSetUnsignedIntegerVal(de,time(NULL)+REDIS_CLUSTER_BLACKLIST_TTL);
- sdsfree(id);
- }
下边是删除节点的关键函数,这个函数首先将所有由这个节点负责的槽位都标记成未分配,然后移除这个节点发送的下线报告,最后释放本节点对这个节点的保存,如果此节点是从节点的话,把此节点的父节点的从节点指针中删除这个节点。
- void clusterDelNode(clusterNode *delnode) {
- int j;
- dictIterator *di;
- dictEntry *de;
-
-
- for (j = 0; j < REDIS_CLUSTER_SLOTS; j++) {
-
- if (server.cluster->importing_slots_from[j] == delnode)
- server.cluster->importing_slots_from[j] = NULL;
-
- if (server.cluster->migrating_slots_to[j] == delnode)
- server.cluster->migrating_slots_to[j] = NULL;
-
- if (server.cluster->slots[j] == delnode)
- clusterDelSlot(j);
- }
-
-
- di = dictGetSafeIterator(server.cluster->nodes);
- while((de = dictNext(di)) != NULL) {
- clusterNode *node = dictGetVal(de);
-
- if (node == delnode) continue;
- clusterNodeDelFailureReport(node,delnode);
- }
- dictReleaseIterator(di);
-
-
- if (nodeIsSlave(delnode) && delnode->slaveof)
- clusterNodeRemoveSlave(delnode->slaveof,delnode);
-
-
- freeClusterNode(delnode);
- }
这样,在本地服务器看来,这个节点就被删除了。集群中的节点会周期性的交换信息,一小段时间以后,整个集群就都知道这个节点的被删除。