Redis高可用实战之Cluster

Redis高可用系列

概念

上一篇《Redis高可用实战之Sentinel》介绍了如何通过 Sentinel 实现 Redis 的高可用,在此基础上使用Redis Cluster将数据分割到多个节点上,进一步提高 Redis 的并发能力和性能。整个集群的部分节点失败或者不可达的情况下能够继续处理命令。

原理

Redis Cluster 使用CRC16算法计算键的哈希值,再对16384进行取模得到该键的hash slot。Redis集群共有16384个 Hash Slot,这些 Hash Slot 可以分布在集群中的多个节点上。

增加或删除集群节点时,将一部分 Hash Slot 移动到新的节点上,或从该节点移除,这样便于对集群进行扩展。

在 Redis Cluster 中每个 Hash Slot 可拥有多个副本,当每个 master 都存在副本时,其中一个 master 故障后,它的副本将被提升为新的 master,这样整个集群仍然可用。

环境说明

节点IP端口
主节点A192.16.1.2017000
副本A192.16.1.2017001
主节点B192.16.1.2027002
副本B192.16.1.2027003
主节点C192.16.1.2037004
副本C192.16.1.2037005

修改配置

复制配置文件,并添加端口进行区分。

cp redis.conf redis-7000.conf

修改配置以启用集群模式,并指定集群配置文件路径:

port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 5000

其他端口配置文件类似。

创建集群

启动实例

首先启动各节点 Redis 实例:

redis-server redis-7000.conf 

实例将以集群模式启动,在日志中可以看到实例在集群中的ID:

20309:M 10 Aug 2022 18:45:13.340 * No cluster configuration found, I'm 6b6c5653321d7bfaad1266b395f7647eac6aa4e8

创建集群

通过命令创建集群:

redis-cli --cluster create \
        192.168.1.201:7000 192.168.1.201:7001 \
        192.168.1.202:7002 192.168.1.202:7003 \
        192.168.1.203:7004 192.168.1.203:7005 \
        --cluster-replicas 1

创建时会对 Hash Slots 进行分配,当看到该提示时表示集群创建成功。

[OK] All 16384 slots covered.

创建完成后,查看集群信息:

[root@node1 ~]# redis-cli --cluster info 192.168.1.201:7000
192.168.1.203:7005 (b43dbc87...) -> 0 keys | 16384 slots | 5 slaves.
[OK] 0 keys in 1 masters.
0.00 keys per slot on average.

如果提示以下错误,表示该实例的0号库中存在数据。

[ERR] Node 192.168.1.201:7000 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

此时需要清空后重新创建。

redis-cli -h 192.168.1.201 -p 7000 -n 0 flushdb

如果提示以下错误,表示 slots 分布不正确。

[ERR] Not all 16384 slots are covered by nodes.

需要对 slots 的分布进行修复操作:

redis-cli --cluster fix 192.168.1.201:7000

集群重新分片

如果集群中 slots 的分布不是我们预期的状态,比如以下情况:

[root@node1 ~]# redis-cli --cluster check 192.168.1.201:7000
192.168.1.203:7005 (b43dbc87...) -> 0 keys | 16384 slots | 5 slaves.
[OK] 0 keys in 1 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.201:7000)
S: 6b6c5653321d7bfaad1266b395f7647eac6aa4e8 192.168.1.201:7000
   slots: (0 slots) slave
   replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
S: d8133074b6df13eb7dca0c9c56fef14f3f349564 192.168.1.202:7003
   slots: (0 slots) slave
   replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
S: 477703acb78e9fd5b2e71e39e9de96e39b67ced5 192.168.1.201:7001
   slots: (0 slots) slave
   replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
S: c4c1ce621d4739888080075348bc3ae1c5ef8841 192.168.1.202:7002
   slots: (0 slots) slave
   replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
M: b43dbc87504861f09efcfbaa4df6394bd5eca1e5 192.168.1.203:7005
   slots:[0-16383] (16384 slots) master
   5 additional replica(s)
S: 5635d84f4ad0ca122d8ef657b6716258b11977bb 192.168.1.203:7004
   slots: (0 slots) slave
   replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

当前集群有6个节点,但只有1个master,而我们期望的是3个 master,每个 master有1个副本,此时需要调整 slots 的分布。

如果希望将192.168.1.201:7000提升为master,首先将它从集群中剔除掉:

[root@node1 ~]# redis-cli --cluster del-node 192.168.1.201:7000 6b6c5653321d7bfaad1266b395f7647eac6aa4e8           
>>> Removing node 6b6c5653321d7bfaad1266b395f7647eac6aa4e8 from cluster 192.168.1.201:7000
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.

再重新加入集群:

[root@node1 ~]# redis-cli --cluster add-node 192.168.1.201:7000 192.168.1.202:7002

将5000个 slots 重新分配到192.168.1.201:7000节点上:

redis-cli --cluster reshard 192.168.1.201:7000 \
          --cluster-from b43dbc87504861f09efcfbaa4df6394bd5eca1e5 \
          --cluster-to 6b6c5653321d7bfaad1266b395f7647eac6aa4e8 \
          --cluster-slots 5000 \
          --cluster-yes

重新分配后,确保每个 master 都有 slots 。

[root@node1 ~]# redis-cli --cluster check 192.168.1.201:7000
192.168.1.201:7000 (6b6c5653...) -> 0 keys | 5000 slots | 1 slaves.
192.168.1.202:7002 (c4c1ce62...) -> 0 keys | 5000 slots | 1 slaves.
192.168.1.203:7005 (b43dbc87...) -> 0 keys | 6384 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.201:7000)
M: 6b6c5653321d7bfaad1266b395f7647eac6aa4e8 192.168.1.201:7000
   slots:[0-4999] (5000 slots) master
   1 additional replica(s)
S: d8133074b6df13eb7dca0c9c56fef14f3f349564 192.168.1.202:7003
   slots: (0 slots) slave
   replicates b43dbc87504861f09efcfbaa4df6394bd5eca1e5
S: 477703acb78e9fd5b2e71e39e9de96e39b67ced5 192.168.1.201:7001
   slots: (0 slots) slave
   replicates 6b6c5653321d7bfaad1266b395f7647eac6aa4e8
M: c4c1ce621d4739888080075348bc3ae1c5ef8841 192.168.1.202:7002
   slots:[5000-9999] (5000 slots) master
   1 additional replica(s)
M: b43dbc87504861f09efcfbaa4df6394bd5eca1e5 192.168.1.203:7005
   slots:[10000-16383] (6384 slots) master
   1 additional replica(s)
S: 5635d84f4ad0ca122d8ef657b6716258b11977bb 192.168.1.203:7004
   slots: (0 slots) slave
   replicates c4c1ce621d4739888080075348bc3ae1c5ef8841
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

测试验证

写入数据

[root@node1 ~]# redis-cli -c -h 192.168.1.201 -p 7000
192.168.1.201:7000> set hello world
OK

可能将数据保存在其他节点的 slot 上,比如:

192.168.1.201:7000> set test redis
-> Redirected to slot [6918] located at 192.168.1.202:7002
OK

获取数据

192.168.1.202:7002> get hello
-> Redirected to slot [866] located at 192.168.1.201:7000
"world"

验证可用性
关闭集群中的一个 master 实例,集群开始故障转移,它的副本将被自动提升为新的 master ,整个集群仍然可用。

通过副本的日志信息可以查看该过程:

20493:S 10 Aug 2022 19:24:29.390 * MASTER <-> REPLICA sync started
20493:S 10 Aug 2022 19:24:29.390 # Error condition on socket for SYNC: Connection refused
20493:S 10 Aug 2022 19:24:29.618 * FAIL message received from 5635d84f4ad0ca122d8ef657b6716258b11977bb about 6b6c5653321d7bfaad1266b395f7647eac6aa4e8
20493:S 10 Aug 2022 19:24:29.618 # Cluster state changed: fail
20493:S 10 Aug 2022 19:24:29.696 # Start of election delayed for 981 milliseconds (rank #0, offset 754967).
20493:S 10 Aug 2022 19:24:30.414 * Connecting to MASTER 192.168.1.201:7000
20493:S 10 Aug 2022 19:24:30.414 * MASTER <-> REPLICA sync started
20493:S 10 Aug 2022 19:24:30.414 # Error condition on socket for SYNC: Connection refused
20493:S 10 Aug 2022 19:24:30.720 # Starting a failover election for epoch 9.
20493:S 10 Aug 2022 19:24:30.730 # Failover election won: I'm the new master.
20493:S 10 Aug 2022 19:24:30.730 # configEpoch set to 9 after successful failover
20493:M 10 Aug 2022 19:24:30.730 * Discarding previously cached master state.
20493:M 10 Aug 2022 19:24:30.730 # Setting secondary replication ID to b3c89f90f221747b4fe4e60073bc4c7b432fb5dd, valid up to offset: 754968. New replication ID is 58143b1e717a087933879fcd0c0d47d76903a761
20493:M 10 Aug 2022 19:24:30.733 # Cluster state changed: ok

故障节点恢复后,它将作为 master 的副本加入集群:

192.168.1.201:7001> CLUSTER nodes
c4c1ce621d4739888080075348bc3ae1c5ef8841 192.168.1.202:7002@17002 master - 0 1660130971000 8 connected 5000-9999
6b6c5653321d7bfaad1266b395f7647eac6aa4e8 192.168.1.201:7000@17000 slave 477703acb78e9fd5b2e71e39e9de96e39b67ced5 0 1660130970585 9 connected
d8133074b6df13eb7dca0c9c56fef14f3f349564 192.168.1.202:7003@17003 slave b43dbc87504861f09efcfbaa4df6394bd5eca1e5 0 1660130971506 6 connected
477703acb78e9fd5b2e71e39e9de96e39b67ced5 192.168.1.201:7001@17001 myself,master - 0 1660130970000 9 connected 0-4999
b43dbc87504861f09efcfbaa4df6394bd5eca1e5 192.168.1.203:7005@17005 master - 0 1660130971000 6 connected 10000-16383
5635d84f4ad0ca122d8ef657b6716258b11977bb 192.168.1.203:7004@17004 slave c4c1ce621d4739888080075348bc3ae1c5ef8841 0 1660130971711 8 connected

参考资料

官方文档

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值