简单搭建redis cluster集群
背景说明
redis cluster 集群,要求至少 3 个 master,去组成一个高可用、健壮的分布式的集群,每个 master 都建议至少给一个 slave。
正式环境下,建议在 6 台机器上去搭建,至少也得保证 3 台机器,每个 master 都跟自己的 slave 不在同一台机器上。
此次,因为只有三个虚拟环境,所以采用在 3 台机器去搭建 6 个 redis 实例的 redis cluster 的方案(每台两个)。
redis cluster的重要配置
cluster-enabled:是否启用 redis cluster 集群。
cluster-config-file:指定一个文件,保存 cluster 模式下的 redis 实例将集群状态,包括集群中其他机器的信息,比如节点的上线和下限,故障转移等。
cluster-node-timeout:节点存活超时时长,超过一定时长,会认为节点宕机(master 宕机就会触发主备切换,slave宕机就不再提供服务)。
编写配置文件
- 创建 3 个文件夹,分别用以保存集群状态信息、redis 日志和 redis 数据
mkdir -p /etc/redis-cluster
mkdir -p /var/log/redis
mkdir -p /var/redis/7001
- 修改配置文件
port 7001
cluster-enabled yes
cluster-config-file /etc/redis-cluster/node-7001.conf
cluster-node-timeout 15000
daemonize yes
pidfile /var/run/redis_7001.pid
dir /var/redis/7001
logfile /var/log/redis/7001.log
bind 10.0.0.4
appendonly yes
先后配置好 6 个 redis 实例,端口分别设置为:7001,7002,7003,7004,7005,7006,将配置好的文件放在 /etc/redis 目录下,分别为: 7001.conf,7002.conf,7003.conf,7004.conf,7005.conf,7006.conf。
-
修改启动脚本
在 /etc/init.d 目录下,配置 6 个启动脚本,分别为:redis_7001,redis_7002,redis_7003,redis_7004,redis_7005,redis_7006,每个启动脚本内,都修改对应的端口号即可。 -
删除 replicaof 配置,启动 6 个 redis 实例
创建集群
在 redis 5.0 之前,创建集群是使用 redis-trib.rb 命令,需要安装 ruby,很麻烦。在 redis 5.0 及其之后,可以使用 redis-cli 创建集群,具体命令如下:
redis-cli --cluster create 10.0.0.4:7001 10.0.0.4:7002 10.0.0.5:7003
10.0.0.5:7004 10.0.0.6:7005 10.0.0.6:7006 --cluster-replicas 1
创建集群时如下所示:
[root@mq1 init.d]# redis-cli --cluster create 10.0.0.4:7001 10.0.0.4:7002 10.0.0.5:7003 10.0.0.5:7004 10.0.0.6:7005 10.0.0.6:7006 --cluster-replicas 1
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.0.0.5:7004 to 10.0.0.4:7001
Adding replica 10.0.0.6:7006 to 10.0.0.5:7003
Adding replica 10.0.0.4:7002 to 10.0.0.6:7005
M: 0cc51aa5e4e6827ddeb4a9ad1ad006d5ca844aab 10.0.0.4:7001
slots:[0-5460] (5461 slots) master
S: b0724296eb302b52a50001248576fcb7ddacdcec 10.0.0.4:7002
replicates 0812d9d0024e653b1e1914d4cf4594fe9218570d
M: 26834402434481c56e854e097ab693d84e1c87ca 10.0.0.5:7003
slots:[5461-10922] (5462 slots) master
S: f6b0b5c44000126d451924a947233b851cd80fae 10.0.0.5:7004
replicates 0cc51aa5e4e6827ddeb4a9ad1ad006d5ca844aab
M: 0812d9d0024e653b1e1914d4cf4594fe9218570d 10.0.0.6:7005
slots:[10923-16383] (5461 slots) master
S: 1a7c9de899e0768727c5f95ec1bc2b83883bfa35 10.0.0.6:7006
replicates 26834402434481c56e854e097ab693d84e1c87ca
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
...
>>> Performing Cluster Check (using node 10.0.0.4:7001)
M: 0cc51aa5e4e6827ddeb4a9ad1ad006d5ca844aab 10.0.0.4:7001
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: 0812d9d0024e653b1e1914d4cf4594fe9218570d 10.0.0.6:7005
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 1a7c9de899e0768727c5f95ec1bc2b83883bfa35 10.0.0.6:7006
slots: (0 slots) slave
replicates 26834402434481c56e854e097ab693d84e1c87ca
M: 26834402434481c56e854e097ab693d84e1c87ca 10.0.0.5:7003
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: b0724296eb302b52a50001248576fcb7ddacdcec 10.0.0.4:7002
slots: (0 slots) slave
replicates 0812d9d0024e653b1e1914d4cf4594fe9218570d
S: f6b0b5c44000126d451924a947233b851cd80fae 10.0.0.5:7004
slots: (0 slots) slave
replicates 0cc51aa5e4e6827ddeb4a9ad1ad006d5ca844aab
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
从上可以看出,3 个 master 分配的 Slots 分别为:0 - 5460,5461 - 10922,10923 - 16383;
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
10.0.0.5:7004 是 10.0.0.4:7001 的从节点,10.0.0.6:7006 是 10.0.0.5:7003 的从节点,10.0.0.4:7002 to 10.0.0.6:7005 的从节点。
Adding replica 10.0.0.5:7004 to 10.0.0.4:7001
Adding replica 10.0.0.6:7006 to 10.0.0.5:7003
Adding replica 10.0.0.4:7002 to 10.0.0.6:7005
踩坑记录
- Unrecoverable error: corrupted cluster config file
移除该出错节点,删除 cluster-config-file 文件,重启该节点,将该节点加入集群。 - [ERR] Node 10.0.0.4:7001 is not empty. Either the nodealready knows other nodes (check with CLUSTER NODES) or contains some key in database 0.
删除该节点上的 aof、rdb 及其备份文件,删除 cluster-config-file 文件,redis-cli –h <IP地址> –p <端口>登录后使用 flushdb 清空数据库。