Redis cluster

最新推荐文章于 2022-01-10 10:52:08 发布

goo flush. org

最新推荐文章于 2022-01-10 10:52:08 发布

阅读量379

点赞数

分类专栏： redis 文章标签： cluster redis集群高可用解决方案 nosql redis4

本文链接：https://blog.csdn.net/weixin_42352759/article/details/104324055

版权

redis 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Redis Cluster

早期Redis 分布式集群部署方案：

客户端分区：由客户端程序决定key写分配和写入的redis node，但是需要客户端自己处理写入分配、高可用管理和故障转移等
代理方案：基于三方软件实现redis proxy，客户端先连接之代理层，由代理层实现key的写入分配，对客户端来说是有比较简单，但是对于集群管节点增减相对比较麻烦，而且代理本身也是单点和性能瓶颈。

在哨兵sentinel机制中，可以解决redis高可用的问题，即当master故障后可以自动将slave提升为master从而可以保证redis服务的正常使用，但是无法解决redis单机写入的瓶颈问题，即单机的redis写入性能受限于单机的内存大小、并发数量、网卡速率等因素，因此redis官方在redis 3.0版本之后推出了无中心架构的redis cluster机制，在无中心的redis集群当中，其每个节点保存当前节点数据和整个集群状态,每个节点都和其他所有节点连接，特点如
下：

1：所有Redis节点使用(PING机制)互联
2：集群中某个节点的失效，是整个集群中超过半数的节点监测都失效才算真正的失效
3：客户端不需要proxy即可直接连接redis，应用程序需要写全部的redis服务器IP。
4：redis cluster把所有的redis node映射到 0-16383个槽位(slot)上，读写需要到指定的redis node上进行操作，因此有多少个reids node相当于redis 并发扩展了多少倍。
5：Redis cluster预先分配16384个(slot)槽位，当需要在redis集群中写入一个key -value的时候，会使用
CRC16(key) mod 16384之后的值，决定将key写入值哪一个槽位从而决定写入哪一个Redis节点上，从而有效解决单机瓶颈。

Redis cluster基本架构

假如三个主节点分别是：A, B, C 三个节点，采用哈希槽 (hash slot)的方式来分配16384个slot 的话，它们三个节点分别承担的slot 区间是

节点A覆盖 0－5460
节点B覆盖 5461－10922
节点C覆盖 10923－16383

在这里插入图片描述

Redis cluster主从架构

Redis cluster的架构虽然解决了并发的问题，但是又引入了一个新的问题，每个Redis master的高可用如何解决？

在这里插入图片描述

部署集群

环境：生产环境建议直接6台服务器

–		–	–
master	172.222.2.107：6379/6380	172.222.2.117：6379/6380	172.222.2.127：6379/6380
slave	172.222.2.10：6379/6380	172.222.2.11：6379/6380	172.222.2.12：6379/6380
	预留主机	172.222.2.13：6379/6380	172.222.2.14：6379/6380

创建redis cluster集群的前提

1.每个redis node节点采用相同的硬件配置、相同的密码、相同的redis版本。

2.每个节点必须开启的参数cluster-enabled yes #必须开启集群状态，开启后redis 进程会有cluster显示 cluster-config-file nodes-6380.conf #此文件有redis cluster集群自动创建和维护，不需要任何手动操作

3.所有redis服务器必须没有任何数据

4.先启动为单机redis且没有任何key value

各服务器配置cluster文件

bind 0.0.0.0
requirepass 123456
logfile "/apps/redis/log/redis.log"
dir "/apps/redis/data"
cluster-enabled yes #开启cluster集群
cluster-node-timeout 15000  #node同步超时 
cluster-config-file nodes-6379.conf #clusterfile

验证当前Redis服务状态：

#ps -ef |grep redis
redis      3820      1  0 11:07 ?        00:00:00 /apps/redis/bin/redis-server 0.0.0.0:6379 [cluster]

#ss -tnlp |grep redis
LISTEN     0      511          *:6379   #客户端通信端口                  *:*                   users:(("redis-server",pid=3820,fd=6))
LISTEN     0      511          *:16379   #服务端端口                 *:*                   users:(("redis-server",pid=3820,fd=8))

创建集群

Redis 3和 4版本：
需要使用到集群管理工具redis-trib.rb，这个工具是redis官方推出的管理redis集群的工具，集成在redis的源码src目录下，是基于redis提供的集群命令封装成简单、便捷、实用的操作工具，redis-trib.rb是redis作者用ruby开发完成的，centos 系统yum安装的ruby存在版本较低问题，如下：

# yum install ruby rubygems -y
# find / -name redis-trib.rb
/usr/local/src/redis-4.0.14/src/redis-trib.rb
[root@s1 ~]# cp /usr/local/src/redis-4.0.14/src/redis-trib.rb /usr/bin/
[root@s1 src]# gem install redis
Fetching: redis-4.1.2.gem (100%)
ERROR: Error installing redis:
redis requires Ruby version >= 2.3.0.

解决ruby版本较低问题：

#解决ruby版本较低问题：
# yum remove ruby rubygems -y
# wget https://cache.ruby-lang.org/pub/ruby/2.5/ruby-2.5.5.tar.gz
# tar xf ruby-2.5.5.tar.gz -C /apps/redis
# cd ruby-2.5.5
# ./configure
# make -j 2
# make install
/apps/redis/ruby-2.5.5/bin]#./gem install redis
Fetching: redis-4.1.3.gem (100%)
Successfully installed redis-4.1.3
Parsing documentation for redis-4.1.3
Installing ri documentation for redis-4.1.3
Done installing documentation for redis after 1 seconds
1 gem installed

#如果无法在线安装，可以下载reids模块安装包离线安装
#https://rubygems.org/gems/redis #先下载redis模块安装包
#gem install -l redis-3.3.0.gem #安装redis模块

如果gem install redis这一步出错了

ERROR:  Loading command: install (LoadError)
    cannot load such file -- zlib
ERROR:  While executing gem ... (NoMethodError)
    undefined method `invoke_with_build_args' for nil:NilClass

问题解决方案-安装库

yum -y install zlib-devel openssl-devel
---------------------------------------
cd /ruby-2.5.5/ext/zlib
-------------------------------------
ruby extconf.rb

问题解决方案-修改Makefile文件

vim Makefile 290

省略……
zlib.o: $(RUBY_EXTCONF_H)
zlib.o: $(arch_hdrdir)/ruby/config.h
zlib.o: $(hdrdir)/ruby/backward.h
zlib.o: $(hdrdir)/ruby/defines.h
zlib.o: $(hdrdir)/ruby/encoding.h
zlib.o: $(hdrdir)/ruby/intern.h
zlib.o: $(hdrdir)/ruby/io.h
zlib.o: $(hdrdir)/ruby/missing.h
zlib.o: $(hdrdir)/ruby/onigmo.h
zlib.o: $(hdrdir)/ruby/oniguruma.h
zlib.o: $(hdrdir)/ruby/ruby.h
zlib.o: $(hdrdir)/ruby/st.h
zlib.o: $(hdrdir)/ruby/subst.h
zlib.o: $(hdrdir)/ruby/thread.h
zlib.o: $(top_srcdir)/include/ruby.h   #修改此行zlib.o: ../../include/ruby.h  
zlib.o: zlib.c
省略……

重新编译

#cd ../../
#make clean
#make && make install


# ./bin/gem install redis #重新安装redis
Fetching: redis-4.1.3.gem (100%)
Successfully installed redis-4.1.3
Parsing documentation for redis-4.1.3
Installing ri documentation for redis-4.1.3
Done installing documentation for redis after 0 seconds
1 gem installed

验证redis-trib.rb命令是否可执行:

这个命令是在redis的源码包中/root/redis-4.0.14/src/redis-trib.rb

ln -s /root/redis-4.0.14/src/redis-trib.rb /usr/bin/ #创建软连接

/apps/redis]#redis-trib.rb
Usage: redis-trib <command> <options> <arguments ...>
create host1:port1 ... hostN:portN #创建集群
--replicas <arg> #指定master的副本数量
check host:port #检查集群信息
info host:port #查看集群主机信息
fix host:port #修复集群
--timeout <arg>
reshard host:port #在线热迁移集群指定主机的slots数据
--from <arg>
--to <arg>
--slots <arg>
--yes
--timeout <arg>
--pipeline <arg>
rebalance host:port #平衡集群中各主机的slot数量
--weight <arg>
--auto-weights
--use-empty-masters
--timeout <arg>
--simulate
--pipeline <arg>
--threshold <arg>
add-node new_host:new_port existing_host:existing_port #添加主机到集群
--slave
--master-id <arg>
del-node host:port node_id #删除主机
set-timeout host:port milliseconds #设置节点的超时时间
call host:port command arg arg .. arg #在集群上的所有节点上执行命令
import host:port #导入外部redis服务器的数据到当前集群
--from <arg>
--copy
--replace
help (show this help)

修改密码redis 登录密码

# vim /usr/local/lib/ruby/gems/2.5.0/gems/redis-4.1.3/lib/redis/client.rb

)]

保证每个集群node节点正常运行创建redis cluster集群

#redis-trib.rb create --replicas 1 172.222.2.107:6379 172.222.2.117:6379 172.222.2.127:6379 172.222.2.10:6379 172.222.2.11:6379 172.222.2.12:6379

#Redis 3/4版本：
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
172.222.2.107:6379
172.222.2.117:6379
172.222.2.127:6379
Adding replica 172.222.2.11:6379 to 172.222.2.107:6379
Adding replica 172.222.2.12:6379 to 172.222.2.117:6379
Adding replica 172.222.2.10:6379 to 172.222.2.127:6379
M: f3c7690f8855d568ddb4800bf3ff9e3add81320f 172.222.2.107:6379
   slots:0-5460 (5461 slots) master
M: 2db718e78013cae9d9742751d57a746c76981695 172.222.2.117:6379
   slots:5461-10922 (5462 slots) master
M: 37b3d5c303aba40acb819d452f155dd633b01377 172.222.2.127:6379
   slots:10923-16383 (5461 slots) master
S: 007f65ee64fe06543fa434ecd42b88a9d14e9247 172.222.2.10:6379
   replicates 37b3d5c303aba40acb819d452f155dd633b01377
S: 8fb2a00ad5cc8384f672cc29e5b99906cdd4afab 172.222.2.11:6379
   replicates f3c7690f8855d568ddb4800bf3ff9e3add81320f
S: 9ed92a3e95dd28039bad64a6a6a437b27028dbb2 172.222.2.12:6379
   replicates 2db718e78013cae9d9742751d57a746c76981695
Can I set the above configuration? (type 'yes' to accept): yes #确定集群分配
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join..
>>> Performing Cluster Check (using node 172.222.2.107:6379)
M: f3c7690f8855d568ddb4800bf3ff9e3add81320f 172.222.2.107:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 007f65ee64fe06543fa434ecd42b88a9d14e9247 172.222.2.10:6379
   slots: (0 slots) slave
   replicates 37b3d5c303aba40acb819d452f155dd633b01377
S: 8fb2a00ad5cc8384f672cc29e5b99906cdd4afab 172.222.2.138:6379
   slots: (0 slots) slave
   replicates f3c7690f8855d568ddb4800bf3ff9e3add81320f
M: 2db718e78013cae9d9742751d57a746c76981695 172.222.2.117:6379
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
M: 37b3d5c303aba40acb819d452f155dd633b01377 172.222.2.127:6379
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 9ed92a3e95dd28039bad64a6a6a437b27028dbb2 172.222.2.12:6379
   slots: (0 slots) slave
   replicates 2db718e78013cae9d9742751d57a746c76981695
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

如果有之前的操作导致 Redis 集群创建报错，则执行清空数据和集群命令

# 127.0.0.1:6379> FLUSHALL
OK
# 127.0.0.1:6379> cluster reset
OK

# systemctl stop redis
# rm -rf /apps/redis/data/*
# systemctl start redis
# 如果还是不行，重新停止每台redis主机，并删除节点数据

Redis 5版本：

#redis-cli -a 123456 --cluster create 1 172.222.2.107:6379 172.222.2.117:6379 172.222.2.127:6379 172.222.2.10:6379 172.222.2.11:6379 172.222.2.12:6379 --cluster-replicas 1

验证Redis集群状态

由于未设置masterauth认证密码，所以主从未建立起来，但是集群已经运行，所以需要在每个slave控制台使用
config set设置masterauth密码，或者写在每个redis配置文件中，最好是在控制点设置密码之后再写入配置文件当中。

172.222.2.11:6379> info replication  #任意slave节点查看
# Replication
role:slave
master_host:172.222.2.107
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:1581575827
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:7a80851b2c66738f3abba509b2598bcd0e3f07f4
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

验证master状态

172.222.2.117:6379> info replication
# Replication
role:master
connected_slaves:0
master_replid:5ed4cbd89785d4635e35f8bcf632285677cf8246
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

验证集群状态

172.222.2.117:6379> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:2
cluster_stats_messages_ping_sent:5495
cluster_stats_messages_pong_sent:4806
cluster_stats_messages_meet_sent:2
cluster_stats_messages_sent:10303
cluster_stats_messages_ping_received:4803
cluster_stats_messages_pong_received:5497
cluster_stats_messages_meet_received:3
cluster_stats_messages_received:10303

查看集群node对应关系

172.222.2.107:6379> cluster  nodes
b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379@16379 master - 0 1581585431269 3 connected 10923-16383
34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379@16379 master - 0 1581585432000 2 connected 5461-10922
01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379@16379 slave b1ed14b586e20cde9e08da6bef39e0107c705ffa 0 1581585432275 4 connected
ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379@16379 myself,master - 0 1581585430000 1 connected 0-5460
c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379@16379 slave ad8963b59fa501819fd70b45776a0106ef6b2ec1 0 1581585431268 5 connected
41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379@16379 slave 34152ec02a70206be59f41133d23f2e6076dc056 0 1581585431570 6 connected

保证每台主从关系同步正确,错误示例日志

# tail -f /apps/redis/logs/redis.log 
7625:S 15 Feb 00:02:11.807 # Error condition on socket for SYNC: Connection refused
7625:S 15 Feb 00:02:12.809 * Connecting to MASTER 172.222.2.12:6379
7625:S 15 Feb 00:02:12.809 * MASTER <-> SLAVE sync started
7625:S 15 Feb 00:02:12.810 # Error condition on socket for SYNC: Connection refused
7625:S 15 Feb 00:02:13.811 * Connecting to MASTER 172.222.2.12:6379
7625:S 15 Feb 00:02:13.811 * MASTER <-> SLAVE sync started
7625:S 15 Feb 00:02:13.812 # Error condition on socket for SYNC: Connection refused

解决方案-窗口级别临时添加密码认证

# /apps/redis/bin/redis-cli 
172.222.2.12:6379> auth 123456
OK
172.222.2.12:6379> CONFIG set masterauth 123456
OK

再次查看slave日志验证

# tail -f /apps/redis/logs/redis.log   #保证数据可以正常同步
22536:S 15 Feb 00:01:54.368 * Connecting to MASTER 172.222.2.117:6379
22536:S 15 Feb 00:01:54.368 * MASTER <-> SLAVE sync started
22536:S 15 Feb 00:01:54.369 * Non blocking connect for SYNC fired the event.
22536:S 15 Feb 00:01:54.369 * Master replied to PING, replication can continue...
22536:S 15 Feb 00:01:54.369 * Partial resynchronization not possible (no cached master)
22536:S 15 Feb 00:01:54.369 * Full resync from master: dc8f1d6513d4247a2edb0f6cb62c0a2037a1536b:938
22536:S 15 Feb 00:01:54.460 * MASTER <-> SLAVE sync: receiving 177 bytes from master
22536:S 15 Feb 00:01:54.460 * MASTER <-> SLAVE sync: Flushing old data
22536:S 15 Feb 00:01:54.460 * MASTER <-> SLAVE sync: Loading DB in memory
22536:S 15 Feb 00:01:54.460 * MASTER <-> SLAVE sync: Finished with success

验证集群写入key

172.222.2.127:6379> set key2 value2 #经过算法计算，当前key的槽位需要写入指定的node
(error) MOVED 4998 172.222.2.107:6379  #槽位不在当前node所以无法写入

172.222.2.117:6379> set key2 value2
(error) MOVED 4998 172.222.2.107:6379

172.222.2.107:6379> set key2 value2  #指定的node就可以写入
OK


172.222.2.107:6379> KEYS *
1) "key2"
172.222.2.117:6379> KEYS *
(empty list or set)
172.222.2.127:6379> KEYS *
(empty list or set)

集群状态验证与监控

#redis-trib.rb check 172.222.2.107:6379
>>> Performing Cluster Check (using node 172.222.2.107:6379)
M: ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379
   slots: (0 slots) slave
   replicates b1ed14b586e20cde9e08da6bef39e0107c705ffa
S: c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379
   slots: (0 slots) slave
   replicates ad8963b59fa501819fd70b45776a0106ef6b2ec1
S: 41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379
   slots: (0 slots) slave
   replicates 34152ec02a70206be59f41133d23f2e6076dc056
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
#redis-trib.rb info 172.222.2.107:6379
172.222.2.107:6379 (ad8963b5...) -> 0 keys | 5461 slots | 1 slaves.
172.222.2.127:6379 (b1ed14b5...) -> 0 keys | 5461 slots | 1 slaves.
172.222.2.117:6379 (34152ec0...) -> 0 keys | 5462 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.

Redis 5：

redis-cli -a 123456 --cluster check 172.222.2.101:6379

在这里插入图片描述

redis cluster集群节点维护

集群运行时间长久之后，难免由于硬件故障、网络规划、业务增长等原因对已有集群进行相应的调整，比如增加Redis node节点、减少节点、节点迁移、更换服务器等。

增加节点和删除节点会涉及到已有的槽位重新分配及数据迁移

集群维护之动态添加节点

增加master 172.222.2.13 slave 172.222.2.14，需要与之前的Redis node版本相同、配置一致，然后分别启动两台Redis node且不能影响业务使用和数据丢失

add-node new_host:new_port existing_host:existing_port
要添加的新redis节点IP和端口 添加到的集群中的master IP:端口，新的node节点加到集群之后默认是master节
点，但是没有slots数据，需要重新分配。

添加master节点

Redis 4
#redis-trib.rb add-node 172.222.2.13:6379 172.222.2.107:6379

Redis 5添加方式

#redis-cli -a 123456 --cluster add-node 172.222.2.13:6379 172.222.2.107:6379

#redis-trib.rb add-node 172.222.2.13:6379 172.222.2.107:6379
>>> Adding node 172.222.2.13:6379 to cluster 172.222.2.107:6379
>>> Performing Cluster Check (using node 172.222.2.107:6379)
M: ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379
   slots: (0 slots) slave
   replicates b1ed14b586e20cde9e08da6bef39e0107c705ffa
S: c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379
   slots: (0 slots) slave
   replicates ad8963b59fa501819fd70b45776a0106ef6b2ec1
S: 41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379
   slots: (0 slots) slave
   replicates 34152ec02a70206be59f41133d23f2e6076dc056
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 172.222.2.13:6379 to make it join the cluster.
[OK] New node added correctly.

重新分配槽位

添加主机之后需要对添加至集群种的新主机重新分片否则其没有分片也就无法写入数据。

验证当前状态

#redis-trib.rb check 172.222.2.107:6379  #当前状态 ,新添加的master没有对应的槽位
#redis-trib.rb reshard 172.222.2.107:6379 #重新分片

redis5
#redis-cli -a 123456 --cluster check 172.222.2.107:6379
# redis-cli -a 123456 --cluster reshard  172.222.2.107:6379


#使用命令对新加的主机重新分配槽位:
How many slots do you want to move (from 1 to 16384)? 4096 #分配多少个槽位

What is the receiving node ID? 111b4b3d85558858e00966f1cd51af5d04523ddf #接收slot的服务器ID，手动输入172.222.2.13:6379的node ID

Source node #1: all #将哪些源主机的槽位分配给172.222.2.13:6379，all是自动在所有的redis node选择划分，如果是从redis cluster删除主机可以使用此方式将主机上的槽位全部移动到别的redis主机


Do you want to proceed with the proposed reshard plan (yes/no)? yes #确认分配

确定分配状态

Moving slot 11493 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11494 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11495 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11496 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11497 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11498 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11499 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11500 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11501 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11502 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11503 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11504 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11505 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11506 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11507 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11508 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11509 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11510 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11511 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11512 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11513 from 172.222.2.127:6379 to 172.222.2.13:6379: 
Moving slot 11514 from 172.222.2.127:6379 to 172.222.2.13:6379:

验证重新分配槽位之后的集群状态

在这里插入图片描述

为新的master添加slave节点

需要再向当前的Redis集群中添加一个Redis单机服务器，用于解决当前172.222.2.13单机的潜在宕机问题

Redis 3/4：
# redis-trib.rb add-node 172.222.2.14:6379 172.222.2.107:6379

Redis 5:
# redis-cli -a 123456 --cluster add-node 172.222.2.14:6379 172.222.2.107:6379

在这里插入图片描述

更改新节点更改状态为slave

需要手动将其指定为某个master的slave，否则其默认角色为master

# /apps/redis/bin/redis-cli  -h 172.222.2.14 -a 123456 #登录到新添加节点
172.222.2.14:6379> auth 123456
OK
172.222.2.14:6379> cluster nodes #查看当前集群节点，找到目标master 的ID
c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379@16379 slave ad8963b59fa501819fd70b45776a0106ef6b2ec1 0 1581589009656 1 connected
111b4b3d85558858e00966f1cd51af5d04523ddf 172.222.2.13:6379@16379 master - 0 1581589009154 7 connected 0-1364 5461-6826 10923-12287  #masterID 
ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379@16379 master - 0 1581589009556 1 connected 1365-5460
d160f06bc3ca9eea0c5e8ef7872b9a61b9983912 172.222.2.14:6379@16379 myself,master - 0 1581589009000 0 connected
b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379@16379 master - 0 1581589010160 3 connected 12288-16383
41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379@16379 slave 34152ec02a70206be59f41133d23f2e6076dc056 0 1581589009000 2 connected
01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379@16379 slave b1ed14b586e20cde9e08da6bef39e0107c705ffa 0 1581589009053 3 connected
34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379@16379 master - 0 1581589009556 2 connected 6827-10922
172.222.2.14:6379> CLUSTER REPLICATE 111b4b3d85558858e00966f1cd51af5d04523ddf  #将其设置
slave
OK
命令格式为cluster replicate MASTERID

验证当前集群状态，验证节点是否已经更改为指定master 的slave

确认每个master都有一个slave

在这里插入图片描述

集群维护之动态删除节点

添加节点的时候是先添加node节点到集群，然后分配槽位，删除节点的操作与添加节点的操作正好相反，是先将
被删除的Redis node上的槽位迁移到集群中的其他Redis node节点上，然后再将其删除，如果一个Redis node节
点上的槽位没有被完全迁移，删除该node的时候会提示有数据且无法删除。

迁移master 的槽位到其他master

被迁移Redis master源服务器必须保证没有数据，否则迁移报错并会被强制中断。

redis4
# redis-trib.rb reshard 172.222.2.13:6379
# redis-trib.rb fix 172.222.2.14:6379 #如果迁移失败使用此命令修复集群

redis5
#redis-cli -a 123456 --cluster reshard 172.222.2.13:6379

迁移172.222.2.13的槽位到172.222.2.127上

#redis-trib.rb reshard 172.222.2.13:6379
>>> Performing Cluster Check (using node 172.222.2.13:6379)
M: 111b4b3d85558858e00966f1cd51af5d04523ddf 172.222.2.13:6379
   slots:0-1364,5461-6826,10923-12287 (4096 slots) master
   1 additional replica(s)
S: 01f29f0c1eb9d203c3affad66885a5e9b53f5cf6 172.222.2.10:6379
   slots: (0 slots) slave
   replicates b1ed14b586e20cde9e08da6bef39e0107c705ffa
S: d160f06bc3ca9eea0c5e8ef7872b9a61b9983912 172.222.2.14:6379
   slots: (0 slots) slave
   replicates 111b4b3d85558858e00966f1cd51af5d04523ddf
M: ad8963b59fa501819fd70b45776a0106ef6b2ec1 172.222.2.107:6379
   slots:1365-5460 (4096 slots) master
   1 additional replica(s)
M: b1ed14b586e20cde9e08da6bef39e0107c705ffa 172.222.2.127:6379
   slots:12288-16383 (4096 slots) master
   1 additional replica(s)
M: 34152ec02a70206be59f41133d23f2e6076dc056 172.222.2.117:6379
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
S: 41ddaa8d7903bd8c9e18640d991486baac31e74b 172.222.2.12:6379
   slots: (0 slots) slave
   replicates 34152ec02a70206be59f41133d23f2e6076dc056
S: c43cdbc5afacb9ff21c1b48f03e1f9cbad9cfdb0 172.222.2.11:6379
   slots: (0 slots) slave
   replicates ad8963b59fa501819fd70b45776a0106ef6b2ec1
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096 #迁移master上的多少个槽位
What is the receiving node ID? b1ed14b586e20cde9e08da6bef39e0107c705ffa #接收槽位的服务器172.222.2.127的ID
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:111b4b3d85558858e00966f1cd51af5d04523ddf #从哪个服务器迁移4096个槽位，13的主机ID
Source node #2:done #写done，表示没有其他master了
  Moving slot 12286 from 111b4b3d85558858e00966f1cd51af5d04523ddf
    Moving slot 12287 from 111b4b3d85558858e00966f1cd51af5d04523ddf
Do you want to proceed with the proposed reshard plan (yes/no)? yes #是否继续

在这里插入图片描述
迁移完成

验证槽位迁移完成

在这里插入图片描述

从集群删除服务器

虽然槽位已经迁移完成，但是服务器IP信息还在集群当中，因此还需要将IP信息从集群删除

删除master

Redis 3/4：
#redis-trib.rb del-node 172.222.2.13:6379 111b4b3d85558858e00966f1cd51af5d04523ddf
>>> Removing node 111b4b3d85558858e00966f1cd51af5d04523ddf from cluster 172.222.2.13:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

Redis 5:
# redis-cli -a 123456 --cluster del-node 172.222.2.13:6379 ID

验证master-node是否被删除

#redis-trib.rb info 172.222.2.107:6379
172.222.2.107:6379 (ad8963b5...) -> 1 keys | 4096 slots | 1 slaves.
172.222.2.127:6379 (b1ed14b5...) -> 0 keys | 8192 slots | 2 slaves.
172.222.2.117:6379 (34152ec0...) -> 1 keys | 4096 slots | 1 slaves.
[OK] 2 keys in 3 masters.
0.00 keys per slot on average.

#master被删除之后，其之前的slave自动成为了Redis集群中其他master的slave，此节点如果不需要也可以一并删除。
#确认整个Redis cluster集群中，每个master至少有一个slave，可以有多个，但是至少要有一个提供数据备份和服务高可用

验证集群Master与Slave对应关系：
Redis Slave节点一定不能个master在一个服务器，必须为跨主机交叉备份模式，避免主机故障后主备全部挂掉，如果出现Redis Slave与Redis master在同一台Redis node的情况，则需要安装以上步骤重新进行slave分配，直到不相互交叉备份为止。

在这里插入图片描述

集群维护之模拟Master宕机

目前的架构为三主三从，互为跨主机master slave模式，测试master 宕机之后是否会自动切换至slave

测试数据写入

测试数据写入master，并在相应slave验证数据

//Master 172.222.2.117
[root@centos7 ~]# redis-cli -h 172.222.2.117 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.117:6379> SET key1 value1
OK
172.222.2.117:6379> KEYS *
1) "key1"

slave验证数据

//Slave 172.222.2.12
[root@centos7 ~]# redis-cli -h 172.222.2.12 -p 6379 -a linux39
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.12:6379> KEYS *
1) "key1"
172.222.2.12:6379> GET key1
(error) MOVED 9189 172.222.2.117:6379

停止master并验证故障转移

Redis Master服务停止之后，其对应的slave会被选举为master继续处理数据的读写操作。

//Master 172.222.2.117
systemctl stop redis.service

验证slave 日志

tail -f /apps/redis/logs/redis_6379.log #需要相应的数秒故障转移时间

7625:S 15 Feb 00:17:14.624 * MASTER <-> SLAVE sync started
7625:S 15 Feb 00:17:14.624 # Error condition on socket for SYNC: Connection refused
7625:S 15 Feb 00:17:15.325 # Start of election delayed for 530 milliseconds (rank #0, offset 1066).
7625:S 15 Feb 00:17:15.426 # Currently unable to failover: Waiting the delay before I can start a new failover.
7625:S 15 Feb 00:17:15.626 * Connecting to MASTER 172.222.0.117:6379
7625:S 15 Feb 00:17:15.626 * MASTER <-> SLAVE sync started
7625:S 15 Feb 00:17:15.626 # Error condition on socket for SYNC: Connection refused
7625:S 15 Feb 00:17:15.927 # Starting a failover election for epoch 11.
7625:S 15 Feb 00:17:15.935 # Currently unable to failover: Waiting for votes, but majority still not reached.
7625:S 15 Feb 00:17:15.940 # Failover election won: I'm the new master. #成为新的master
7625:S 15 Feb 00:17:15.940 # configEpoch set to 11 after successful failover
7625:M 15 Feb 00:17:15.940 # Setting secondary replication ID to 7f64150decace43211d248dd093cadc7e9a1ec49, valid up to offset: 1067. New replication ID is c063d71c97da69434dc56472af7a7d389d5dd00e
7625:M 15 Feb 00:17:15.941 * Discarding previously cached master state.
7625:M 15 Feb 00:17:15.941 # Cluster state changed: ok  #切换成功

注意

redis-cluser至少是3master和３slave，且一旦有某个master宕机此时，在集群切换期间(slave提为master)，集群会被关闭，不能读写数据

30659:S 14 Feb 19:18:22.242 * FAIL message received from  6a07dfc8eeecd8b1b378b4beec3f17536bba77d3 about 774217998cb20e0e2092a2db4279faef49bdacbc
30659:S 14 Feb 19:18:22.242 # Cluster state changed: fail #集群已关闭

验证数据读写

确认slave 172.222.2.12:6379切换为master之后可以继续为业务提供读写业务且数据没有丢失。

注：服务恢复之后重新验证各Master的Slave同步是否异常,重启源master节点后，会自动加入到cluster中且状态为slave。

//Master 172.222.2.12
[root@centos7 ~]# redis-cli -h 172.222.2.12 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.117:6379> KEYS *
1) "key1"
172.222.2.117:6379> GET kye1 #默认slave是不能GET数据的，所以此时slave提升成功
“valuel”

集群维护之导入现有Redis数据

导入数据需要redis cluster不能与被导入的数据有重复的key名称，否则导入不成功或中断

基础环境准备

导入数据之前需要关闭各redis 服务器的密码，包括集群中的各node和源Redis server，避免认证带来的环境不一致从而无法导入，可以加参数–cluster-replace 强制替换Redis cluster已有的key。

关闭各Redis密码认证

# redis-cli -h master1 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.117:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h master2 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.127:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h master3 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.107:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h slave1 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.110:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h slave2 -p 6379 -a 123456
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.11:6379> CONFIG SET requirepass ""
OK
----------------------------------------------
# redis-cli -h slave3 -p 6379 -a 123456bash
Warning: Using a password with '-a' option on the command line interface may not be safe.
172.222.2.12:6379> CONFIG SET requirepass ""
OK
----------------------------------------------

执行数据导入

将源Redis server的数据直接导入之redis cluster。

redis 4
#redis-trib.rb import --from cluster外部主机IP:6379 --replace 集群热任意节点IP:6379

错误1：
# redis-trib.rb import --from 172.222.2.99:6379 --replace 172.222.2.117:6379
>>> Importing data from 172.222.2.99:6379 to cluster  #连接不到本地节点
[ERR] Sorry, can't connect to node 172.222.2.117:6379

#修改
# vim /usr/local/lib/ruby/gems/2.5.0/gems/redis-4.1.3/lib/redis/client.rb
:password => nil, #设置为nil

错误2:
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Connecting to the source Redis instance
*** Importing 4 keys from DB 0
Migrating mem3 to 172.17.0.3:6379: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
Migrating mem1 to 172.17.0.6:6379: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
Migrating mem2 to 172.17.0.3:6379: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
Migrating mem to 172.17.0.6:6379: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)

#这是因为ruby的gem不同造成的。以后5.0版本会抛弃redis-trib.rb。直接使用redis-cli客户端实现集群管理。在此之前，大家可以先安装本文的解决方案进行处理。期待抛弃了redis-trib.rb的Redis-5.0。

详见https://blog.csdn.net/m0_37128231/article/details/80755478

redis 5
#redis-cli --cluster import 集群服务器IP:PORT --cluster-from 外部Redis node-IP:PORT --cluster-copy --cluster-replace
--------------------------------------

redis扩展集群方案

除了Redis 官方自带的Redis cluster集群之外，还有一些开源的集群解决方案可供参考使用。

codis

Codis 是一个分布式 Redis 解决方案, 对于上层的应用来说, 连接到 Codis Proxy 和连接原生的 Redis Server 没有显著区别 (令不支持的命列表), 上层应用可以像使用单机的 Redis 一样使用, Codis 底层会处理请求的转发, 不停机的数据迁移等工作, 所有后边的一切事情, 对于前面的客户端来说是透明的, 可以简单的认为后边连接的是一个内存无限大的 Redis 服务

codis-proxy相当于redis，即连接codis-proxy和连接redis是没有任何区别的，codis-proxy无状态，不负责记录是
否在哪保存，数据在zookeeper记录，即codis proxy向zookeeper查询key的记录位置，proxy 将请求转发到一个
组进行处理，一个组里面有一个master和一个或者多个slave组成，默认有1024个槽位，redis cluster 默认有
16384个槽位，其把不同的槽位的内容放在不同的group。

Github 地址：https://github.com/CodisLabs/codis/

在这里插入图片描述