Redis（三）哨兵、集群

最新推荐文章于 2024-07-02 10:10:43 发布

湫鹤椿水

最新推荐文章于 2024-07-02 10:10:43 发布

阅读量159

点赞数

文章标签： redis 数据库缓存

本文链接：https://blog.csdn.net/m0_71774042/article/details/133131362

版权

追、Redis复制的工作流程、缺点

（一）工作流程

master率先启动，然后从机启动，从机根据配置文件中指定的master 向主机发送sync请求，master收到请求之后会建立连接。
建立连接之后，首先在后台备份自己的rdb文件，同时将新产生的写指令集统计后发送到从机，从机接收到的rdb和aof分别是全量复制和增量复制。
为了主机数据不断向从机同步，主机需要不断确认自己的从机是否在线，这种确定方式就是每隔一定时间段向从机发送心跳包（repl-ping-replica-period），等待从机回应，如果回应就是连接有效，从机没有掉线。
同时，新产生的数据会以指令集这种增量复制的方式不断同步到从机
当从机与主机断开后从新连接主机，主机会根据自己backlog文件中的offset（记录同步的偏移量）来将offset之后的数据同步到从机中，从机中也会保留offset，这种同步是参考二者的。

（二）缺点

主机在将数据向从机同步的过程中受网络条件影响较大，当从机数量较多时，会产生大量的网络延迟，信号衰弱。
当主机挂掉之后，从机不会自动成为master接替挂掉master的工作，这样主机如果挂掉，整个缓存数据库就只能进行查操作不能进行写操作，对主机要求较高，主机若挂掉，服务就残缺。

一、Redis哨兵（sentinel 、主机罢工，从机自动上位）

（一）Redis的sentinel是什么？

它是一种特殊身份，其不能进行读写操作，拥有自己的配置文件，职责就是监控整个redis主从机之间联系的健康，一旦发现主机宕机（根据sentinel集群中投票产生的客观下线），就会将对应从机（从机也是投票产生）变为主机，保持程序的正常运行。

（二）哨兵的主要职能

监控主从redis是否运行正常
将故障转移后的结果发送给客户端
如果主机异常会让从机上位
客户端可以通过连接哨兵来获得当前Redis服务的主节点地址。

（三）为什么哨兵要使用投票机制（quorum）

哨兵最终投票的结果就是quorum，一个哨兵判断master是否下线是不准的，因为哨兵和reids服务器部署在不同的服务器上，这就会产生网络拥堵，网速抖动等客观意外情况，从而导致sentinel误判为master下线，执行从机上位命令，使用多台sentinel构成集群，当这个集群中认为master下线的sentinel数量（quorum）达到设置的值，则认为该主机下线，这样就大大减少了误判的可能性（即提高了公平性和可用性）

（四）sentinel的相关指令

sentinel down-after-milliseconds <master-name> <milliseconds>	指定多少秒之后，主节点没有应答，此哨兵就判断为该哨兵下线（主管下线）
sentinel parallel-syncs <master-name> <nums>	表示允许进行同步的slave个数，当master挂掉之后，新的从机上位之后其他从机会向新主机传数据，该指令即确定传数据的从机个数
sentinel failover-timeout <master-name> <milliseconds>	该指令设置故障转移的超过时间，进行故障转移时，如果超过设置的毫秒，表示故障转移失败
sentinel notification-script <master-name> <script-path>	配置当某一事件发生时要执行的脚本文件
sentinel client-reconfig-script <master-name> <script-path>	客户端重新配置主节点脚本参数

(五）配置sentinel、启动sentinel

A、把sentinel.conf文件复制到myredis文件夹中改名为mysentinel.conf，原件留作备用

cp sentinel.conf myredis/mysentinel.conf

B、将配置文件如下内容作修改

port 26379   #根据实际情况占用的端口号设置
protectec-mode no
daemonize yes
pidfile /var/run/redis-sentinel26379.pid
logfile "/opt/redis-7.0.11/myredis/mysentinellog26379.log"
sentinel monitor mymaster 47.94.143.83 6379 2
#为sentinel指定要监控的主机地址和端口，最后数字时投票数（quorum）
sentinel auth-pass mymaster chen13515....   
#这里的密码后面省略

C、改变主从机的配置

6379主机配置

masterauth chen135...
#配置连接主机时，要使用的密码

其本身就是master，不是不用配连接主机的密码吗？这是因为当sentinel在其下线时，会任命新的主机，当它再次上线，sentinel就直接给它安排了slave的位置，所以需要该项，该项的密码是未来可能主机的密码，故从机密码应该保持一致，防止从机上位后，前主机上位找不到master（master down）

D、6380和6390两个端口都修改为上面的密码(chen135...)

E、最后用同样的方式再配置另两个sentinel，构成sentinel集群

启动sentinel的两种指令（二者等价）

#方式一
redis-server sentinel配置文件路径 --sentinel
#方式二
redis-sentinel sentinel配置文件路径

最后算上三个redis服务和三个sentinel，用ps -ef ｜ grep redis 指令查看如下

F、sentinel启动后的一些细节

sentinel启动后，会分别在各自的配置文件中添加新的内容，包括自己的id号等信息

latency-tracking-info-percentiles 50 99 99.9
user default on nopass ~* &* +@all
sentinel myid 1caba81c6d050632b8670f1916045c0678726b81
#自己在senttinel集群中的id号
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel current-epoch 0

sentinel known-replica mymaster 47.94.143.83 6390

sentinel known-replica mymaster 47.94.143.83 6380
#上面两条是关于监控主机对应从机的端口号和路径地址
sentinel known-sentinel mymaster 172.25.51.63 26390 12687b9cc2cf427ac181b67b8dd84bb86270db67

sentinel known-sentinel mymaster 172.25.51.63 26380 271de4fd809d041133ccd2c0e2182421fca1e275
#上面是sentinel之间的一些信息。

同时如果有监控的master下线，进行从机上位时也会修改对应的redis服务的配置文件。

（六）模拟主机下线，尝试从机上位

A、主机下线

[root@quihechunshui ~]# redis-cli -a chen***** -p 6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> shutdown
(0.51s)  #可以看到配置完sentinel后，执行shutdown命令关闭服务，所用时间变长了
not connected> quit

B、查看是否有从机成为主机

#对6380的查询，发现时master，上位成功！
127.0.0.1:6380> info replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:4087c620bca2c222d39ea79ba416a2c4b84e6ccd
master_replid2:1812c5b091725863fb876e0b6245dea75a5e9d82
master_repl_offset:1351268
second_repl_offset:1320388
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:286476
repl_backlog_histlen:1064793
127.0.0.1:6380>

C、原主机启动，查看身份

127.0.0.1:6379> info replication #重新登陆后，开始的指令会失败，用来整理服务端新结构
Error: Server closed the connection
not connected> info replication #再次执行发现变为从机了，挂在了原来的主机下边
# Replication
role:slave
master_host:47.94.143.83
master_port:6380
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:1431661
slave_repl_offset:1431661
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:1
slave0:ip=47.94.143.83,port=6390,state=online,offset=1431383,lag=0
master_failover_state:no-failover
master_replid:4087c620bca2c222d39ea79ba416a2c4b84e6ccd
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1431661
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1429262
repl_backlog_histlen:2400

D、新主机和老主机配置文件的变化

新主机删去了replicaof等配置，因为自己就成为主机了
老主机增加配置如下：

replicaof 47.94.143.83 6380 #指定的主机位置
latency-tracking-info-percentiles 50 99 99.9
user default on #b5d13ce8e67a49f69839d85f82e428a037999268a26e9783d171f46f1266f6ef ~* &* +@all

E、从机上位后，用客户端执行指令会出现的两种错误及解决方法

pipe broken (管道破裂）此种情况一般发生在从机读取命令时，主机突然下线，双方通信连接管道破裂，原因是对端没有返回“三次握手”的确认消息或心跳包未发送：解决方法，再次执行指令即可
Server closed the connection ，这是因为，新从机上位后或老主机上线后，主从格局发生改变，故原来服务连接不适用，故关闭：解决方法，再次执行命令。

F、sentinel日志文件分析主观下线过程和监控运行（以26379为例）

Sentinel ID is 1caba81c6d050632b8670f1916045c0678726b81
1884181:X 22 Sep 2023 11:22:08.122 # +monitor master mymaster 47.94.143.83 6379 quorum 2
#确定的监测主机和投票数信息
1884181:X 22 Sep 2023 11:22:38.181 # +sdown slave 47.94.143.83:6390 47.94.143.83 6390 @ mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:45.661 # +sdown master mymaster 47.94.143.83 6379
#本台sentinel发现监测主机下线
1884181:X 22 Sep 2023 12:25:45.720 # +odown master mymaster 47.94.143.83 6379 #quorum 3/2
#所有主机投票数达到了2票
1884181:X 22 Sep 2023 12:25:45.720 # +new-epoch 1
#开始选择sentinel集群中的领导者leader
1884181:X 22 Sep 2023 12:25:45.720 # +try-failover master mymaster 47.94.143.83 6379
#尝试故障容错
1884181:X 22 Sep 2023 12:25:45.724 * Sentinel new configuration saved on disk
1884181:X 22 Sep 2023 12:25:45.724 # +vote-for-leader 1caba81c6d050632b8670f1916045c0678726b81 1
#本台sentinel选择id尾号为6b81的sentinel担任leader
1884181:X 22 Sep 2023 12:25:45.733 # 271de4fd809d041133ccd2c0e2182421fca1e275 voted for 1caba81c6d050632b8670f1916045c0678726b81 1
#尾号e275的sentinel选择尾号为6b81的sentinel担任leader
1884181:X 22 Sep 2023 12:25:45.733 # 12687b9cc2cf427ac181b67b8dd84bb86270db67 voted for 1caba81c6d050632b8670f1916045c0678726b81 1
#尾号db67的sentinel选择尾号6b81的sentinel担任leader
1884181:X 22 Sep 2023 12:25:45.801 # +elected-leader master mymaster 47.94.143.83 6379
#对应id的sentinel26379就接任了下面从机上位的工作
1884181:X 22 Sep 2023 12:25:45.801 # +failover-state-select-slave master mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:45.902 # +selected-slave slave 47.94.143.83:6380 47.94.143.83 6380 @ mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:45.902 * +failover-state-send-slaveof-noone slave 47.94.143.83:6380 47.94.143.83 6380 @ mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:45.964 * +failover-state-wait-promotion slave 47.94.143.83:6380 47.94.143.83 6380 @ mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:46.032 * Sentinel new configuration saved on disk
#上面是对所检测的主机及其从机信息的确定，和三个sentinel构成集群的协调过程。
1884181:X 22 Sep 2023 12:25:46.032 # +promoted-slave slave 47.94.143.83:6380 47.94.143.83 6380 @ mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:46.032 # +failover-state-reconf-slaves master mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:46.089 * +slave-reconf-sent slave 47.94.143.83:6390 47.94.143.83 6390 @ mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:46.089 # +failover-end master mymaster 47.94.143.83 6379
1884181:X 22 Sep 2023 12:25:46.089 # +switch-master mymaster 47.94.143.83 6379 47.94.143.83 6380
#转换主机
1884181:X 22 Sep 2023 12:25:46.089 * +slave slave 47.94.143.83:6390 47.94.143.83 6390 @ mymaster 47.94.143.83 6380
#将另一个从机挂在新主机上
1884181:X 22 Sep 2023 12:25:46.089 * +slave slave 47.94.143.83:6379 47.94.143.83 6379 @ mymaster 47.94.143.83 6380
#老主机上位，给个从机身份。
1884181:X 22 Sep 2023 12:25:46.094 * Sentinel new configuration saved on disk
1884181:X 22 Sep 2023 12:26:16.146 # +sdown slave 47.94.143.83:6390 47.94.143.83 6390 @ mymaster 47.94.143.83 6380
1884181:X 22 Sep 2023 12:26:16.146 # +sdown slave 47.94.143.83:6379 47.94.143.83 6379 @ mymaster 47.94.143.83 6380
1884181:X 22 Sep 2023 12:34:24.204 # -sdown slave 47.94.143.83:6379 47.94.143.83 6379 @ mymaster 47.94.143.83 6380

二、Redis哨兵的运行流程、哨兵的缺点

（一）过程

A、当主机下线后，某台sentinel向监测的master发送心跳包，收不到合法回复就会认为该主机下线（主管下线）

在配置文件总可以通过down-after-milliseconds来修改判断主机下线的时间段。

B、所有sentinel的判断投票结果为主机客观下线后，sentinel集群中开始选取leader来进行下面从机上位操作。

C、各个sentinel用raft算法投出leader-sentinel，leadersentinel接管下一步工作。

raft算法大致是先到先得原则，具体细节还需深入了解，这里咱搁置不提。

D、leader开始调用被选出来的从机，调用其slaveof no one 命令使其成为独立的主机。

根据从机的优先级---->从机的主从复制偏移量-----> 进程对应acii码来确定谁来成为新主机。
先比较优先级（配置项中有replica-priority来配置优先级数值越小优先级越高），优先级高者为新master，若一样，则比较偏移量（用来记录同步主机数据的值，值越大代表从机跟主机的数据一致性越高），偏移量大者称为新master，若还一样，就比进程号对应的acii码。

E、leader调用其他没有被选为主机的从机，令其执行slaveof命令挂到新选出的主机上。

F、旧的主机回来后，也令其挂在新主机下。

（二）使用哨兵建议

哨兵集群中sentinel的数量建议使用单数个，方便选举
实际生产上，sentinel使用的服务器性能要保持一致。

（三）哨兵的缺陷

无法保证数据一致性，因为主机下线后，从机上位需要时间，同时旧从机挂新主机也需要时间。由于网络抖动等网络原因更容易导致时间过程，在从机上位的过程中数据还是会丢失，为了解决这类问题，集群就被使用。

三、Redis集群

（一）Redis集群认识

redis集群是提供了一个多个节点之间共享数据的程序集
redis支持多个master，一个master和其下面的slave构成一个分片，redis集群会将数据根据槽位分散到不同的分片下储存。
redis集群提供了16384个槽位，会根据节点数平均分配到每个节点的一样槽位数，所以理论上可以配置16384个主节点，但是官方推荐最多1000个节点。

（二）hash_solt槽位和分片的概念

分片：指redis集群会将数据分配到不同位置储存，一个主节点和下属从节点称为一个分片，多个分片共同构成一个集群。
槽位原理

自带故障转移机制，不需要哨兵的存在，主节点宕机后，从节点上位，自实现。
redis集群会根据存在主节点的数量决定将槽位平均分配多少给每个主节点，比如有三个主节点，也是三个分片则每个节点依次对应（0-5460、5461-10922、10923-16383）
当有用户要储存数据时，会跟据key的值带入HASH_SOLT=CRC16(key) mod 16384 得到的就是数据要存放槽位，然后根据槽位去到对应的分片保存数据
用户获取数据同样可以根据key值寻找到对应的槽位。

槽位大概图解

槽位的优势：

分配均匀，每个节点压力分配均匀
方便扩缩容，当要增加主节点时，只需要将之前的节点占用的槽位都分别让出一部分槽位给新来的主节点，这样就实现了扩容，并不需要停掉集群保证了服务的持续性，也不需要修改槽位计算公式（分母槽位数永远不会变）

（三）解决数据储存分配的方案

A、哈希取余分区

适用小规模生产，采用简单的hash（key）/主节点数算出数据要储存的位置。
缺点是不方便数据扩缩容，扩容要改变分母，同时要停机，不能保证服务的连续性。

B、一致性hash算法

因为hash算法的结果都会在[0,2^32-1]这个范围内，这是一个线性的范围，一致性hash算法通过一定逻辑将0和2^32首位相接，就将线性结构变为圆形结构，每个key经过算法都会在圆上有一个落点。图解如下：
公司（使用方）根据redis的唯一id值或者其他唯一信息，设计算法将计算结果也锁定到0-2^32范围内，这样每个redis主节点也会在圆上有一个落点。
每个key得出的落点顺时针方向走，碰到的第一个redis主节点落点就是要储存的redis主节点。
图示（假设有四个redis主节点，四个key落点）

一致性hash算法的优点

容错性：当有节点宕机后，不会影响key的储存，本应该存入的节点宕机后，只需要在圆环上顺时针寻找宕掉后的健康redis主节点即可。
扩容简单，当要增加新的redis节点时，只需要正常计算在圆上的落点即可，对应key是顺时针寻找，一定有key会存入新的redis节点中。

一致性hash算法的缺点

容易造成数据存储倾斜，造成有redis节点忙死，有redis节点闲死。下面是图示：

C、一致性hash槽（又解决了上一种方式的数据倾斜问题）

redis集群将key值对应的槽位通过公式HASH_SOLT=CRC16(key) mod 16384将key限制到0-16384这个范围内。
16384是2^14 ,比之前2^32-1粒度更大，同时平均分配槽位，防止数据倾斜问题。

（四）为什么是16384个槽位？

A、实际生产，不建议集群中主节点数量超过1000台，所以中间每个分片总节点数一点不会超过16384，完全够用。

B、在发送心跳包时，会携带一个槽位信息，2^14次方是2kb内存，而2^16次方是8kb，这样每次心跳包就更大，受网络限制就更明显。

C、Redis主节点的配置信息中它负责的hash槽是通过一张bitmap形式来保存的，在传输的过程中会对bitmap进行压缩，如果bitmap的填充率（slots/节点数）过高的话，bitmap压缩率就会很低，如果节点少，solt多，压缩率只会更低。

(五）集群搭建

A、配置文件内容(一共六份配置文件，端口分别是6381、6382、6383、6384、6385、6386）

 #bind 127.0.0.1 -::1
 dbfilename "cluster6381.rdb"
 port 6381
#根据端口号灵活调整
 protected-mode no
 daemonize yes
#一些后台运行，远程连接等常规操作
 dir "/opt/redis-7.0.11/mycluster"
 pidfile "/opt/redis-7.0.11/mycluster/cluster6381.pid"
 logfile "/opt/redis-7.0.11/mycluster/cluster6380.log"
#日志文件和进程号文件的配置
 masterauth "chen13515216766"
 requirepass "chen13515216766"
#配置要连接的主机密码和自己的密码。
 appendonly yes
#开启aof备份
 appendfilename "cluster6381aof.aof"
 appenddirname "appendonlydir"
#设置备份文件名字，和备份文件所在文件夹名字
 cluster-enabled yes
 cluster-config-file nodes-6381.conf
 cluster-node-timeout 15000
#一些关于集群的配置

B、用启动语句启动各个集群

redis-server /opt/redis-7.0.11/mycluster/cluster6381.conf

查看进程结果如下：（全部成功启动）

root     2055738       1  0 13:44 ?        00:00:01 redis-server *:6381 [cluster]
root     2056162       1  0 13:50 ?        00:00:00 redis-server *:6382 [cluster]
root     2057193       1  0 14:00 ?        00:00:00 redis-server *:6383 [cluster]
root     2057245       1  0 14:00 ?        00:00:00 redis-server *:6384 [cluster]
root     2057297       1  0 14:00 ?        00:00:00 redis-server *:6385 [cluster]
root     2057349       1  0 14:00 ?        00:00:00 redis-server *:6386 [cluster]
root     2057411 2053248  0 14:01 pts/0    00:00:00 grep --color=auto cluster

C、在客户端规定主从数量关系，建立连接。

redis-cli -a chen**** --cluster create --cluster-replicas 1 47.94.143.83:6381 47.94.143.83:6382 47.94.143.83:6383 47.94.143.83:6384 47.94.143.83:6385 47.94.143.83:6386

回车后，显示如下内容：

Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 47.94.143.83:6385 to 47.94.143.83:6381
Adding replica 47.94.143.83:6386 to 47.94.143.83:6382
Adding replica 47.94.143.83:6384 to 47.94.143.83:6383
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
#这里有个警告是说有一些redis是在同一台机器上的，废话，我没钱，就都用的一台。
M: cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381
   slots:[0-5460] (5461 slots) master
M: e4ca9a39418f35043013be72d8570ae508a88888 47.94.143.83:6382
   slots:[5461-10922] (5462 slots) master
M: d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383
   slots:[10923-16383] (5461 slots) master
S: 58f17d6fbb22caf27b750f009c9a81d3c57f7cda 47.94.143.83:6384
   replicates cd079a91c83822bcad273346550e5d960a02d692
S: 675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385
   replicates e4ca9a39418f35043013be72d8570ae508a88888
S: cb6f48fa7300f38345358415f70d35a90f280e32 47.94.143.83:6386
   replicates d3ed83ca838d4786e749a80ec6c940040a0854f3
Can I set the above configuration? (type 'yes' to accept): yes
#这里询问您意见，输入yes，不能简化输入y，这样为否定，无法识别为同意。

然后如下输出：

>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
..
>>> Performing Cluster Check (using node 47.94.143.83:6381)
M: cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: e4ca9a39418f35043013be72d8570ae508a88888 47.94.143.83:6382
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385
   slots: (0 slots) slave
   replicates e4ca9a39418f35043013be72d8570ae508a88888
M: d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 58f17d6fbb22caf27b750f009c9a81d3c57f7cda 47.94.143.83:6384
   slots: (0 slots) slave
   replicates cd079a91c83822bcad273346550e5d960a02d692
S: cb6f48fa7300f38345358415f70d35a90f280e32 47.94.143.83:6386
   slots: (0 slots) slave
   replicates d3ed83ca838d4786e749a80ec6c940040a0854f3
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
#说明成功了！

D、登陆集群，验证数据写入和查询的槽位计算。（加上-c登陆自动路由）

redis-cli -a chen**** -p 6381
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6381> set one "two"
(error) MOVED 9084 47.94.143.83:6382
#直接报错，因为计算得出槽位为9084，对应主机6382，咱在6381，所以无法写入
#可以在登陆集群中一台机器时，加上-c选项，代表自动路由，我们推出登陆，再加上-c登陆尝试。

[root@quihechunshui mycluster]# redis-cli -a chen*** -p 6381 -c
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6381> set one "heihei"
-> Redirected to slot [9084] located at 47.94.143.83:6382
OK
47.94.143.83:6382>
#自动路由了,同时注意路由后，自己客户端也变成了6382.

E、使用集群的相关命令

命令：

cluster info	查看集群相关信息
cluster keyslot key	查看指定键值对应槽位
cluster nodes	查看集群节点信息

使用：

47.94.143.83:6382> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:2
cluster_stats_messages_ping_sent:1270
cluster_stats_messages_pong_sent:1141
cluster_stats_messages_meet_sent:1
cluster_stats_messages_sent:2412
cluster_stats_messages_ping_received:1141
cluster_stats_messages_pong_received:1271
cluster_stats_messages_received:2412
total_cluster_links_buffer_limit_exceeded:0
47.94.143.83:6382> cluster keyslot key
(integer) 12539
47.94.143.83:6382> cluster keyslot me
(integer) 373
47.94.143.83:6382> CLUSTER NODES
cb6f48fa7300f38345358415f70d35a90f280e32 47.94.143.83:6386@16386 slave d3ed83ca838d4786e749a80ec6c940040a0854f3 0 1695623903652 3 connected
e4ca9a39418f35043013be72d8570ae508a88888 172.25.51.63:6382@16382 myself,master - 0 1695623906000 2 connected 5461-10922
cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381@16381 master - 0 1695623907000 1 connected 0-5460
58f17d6fbb22caf27b750f009c9a81d3c57f7cda 47.94.143.83:6384@16384 slave cd079a91c83822bcad273346550e5d960a02d692 0 1695623907668 1 connected
675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385@16385 slave e4ca9a39418f35043013be72d8570ae508a88888 0 1695623906664 2 connected
d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383@16383 master - 0 1695623904655 3 connected 10923-16383

D、模拟集群中其中一台主机掉线，查看从机是否上位，再恢复查看主从关系。

首先我们找到集群中端口为6382和端口为6385这两台有从属关系的主机，令主机掉线。

root@quihechunshui mycluster]# redis-cli -a chen**** -p 6382 -c
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6382> shutdown
not connected> quit

登陆其他主机，查看集群状态

127.0.0.1:6381> cluster nodes
e4ca9a39418f35043013be72d8570ae508a88888 47.94.143.83:6382@16382 master,fail - 1695624905483 1695624901000 2 disconnected
#可以看到6382下线了，宕机中
675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385@16385 master - 0 1695624992000 7 connected 5461-10922
#原来的从机6385上位成为主机
d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383@16383 master - 0 1695624993132 3 connected 10923-16383
58f17d6fbb22caf27b750f009c9a81d3c57f7cda 47.94.143.83:6384@16384 slave cd079a91c83822bcad273346550e5d960a02d692 0 1695624992125 1 connected
cb6f48fa7300f38345358415f70d35a90f280e32 47.94.143.83:6386@16386 slave d3ed83ca838d4786e749a80ec6c940040a0854f3 0 1695624992000 3 connected
cd079a91c83822bcad273346550e5d960a02d692 172.25.51.63:6381@16381 myself,master - 0 1695624989000 1 connected 0-5460

6382启动，查看状态

[root@quihechunshui mycluster]# redis-server cluster6382.conf
[root@quihechunshui mycluster]# redis-cli -a ****** -p 6382 -c
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6382> info replication
# Replication
role:slave
master_host:47.94.143.83
master_port:6385      #可见后来复活，就挂在了原来的下属6385身上了。
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_read_repl_offset:3767
slave_repl_offset:3767
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:c4888d020425622eb5f4121c0e1d3aca7914ab48
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:3767
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:3726
repl_backlog_histlen:42

如何恢复原来主从关系，只需要在6382端口执行cluster failover命令

127.0.0.1:6382> CLUSTER FAILOVER
OK
127.0.0.1:6382> cluster nodes
675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385@16385 slave e4ca9a39418f35043013be72d8570ae508a88888 0 1695625620567 8 connected
cb6f48fa7300f38345358415f70d35a90f280e32 47.94.143.83:6386@16386 slave d3ed83ca838d4786e749a80ec6c940040a0854f3 0 1695625620000 3 connected
cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381@16381 master - 0 1695625618000 1 connected 0-5460
58f17d6fbb22caf27b750f009c9a81d3c57f7cda 47.94.143.83:6384@16384 slave cd079a91c83822bcad273346550e5d960a02d692 0 1695625619000 1 connected
d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383@16383 master - 0 1695625621570 3 connected 10923-16383
e4ca9a39418f35043013be72d8570ae508a88888 172.25.51.63:6382@16382 myself,master - 0 1695625619000 8 connected 5461-10922

(六）集群扩容

A、首先新增两个端口对应的redis服务（6387 6388）

配置文件据上，无二致，然后启动服务。

[root@quihechunshui mycluster]# redis-server cluster6387.conf
[root@quihechunshui mycluster]# redis-server cluster6388.conf

B、先新加入集群主节点，执行如下指令，将6387加入原集群

下面命令前面是要加入的节点，后面还需要已经在集群中的节点，来确认要加入的集群。


[root@quihechunshui mycluster]# redis-cli -a chen1***** --cluster add-node 47.94.143.83:6387 47.94.143.83:6381
#....省略部分输出内容
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Getting functions from cluster
>>> Send FUNCTION LIST to 47.94.143.83:6387 to verify there is no functions in it
>>> Send FUNCTION RESTORE to 47.94.143.83:6387
>>> Send CLUSTER MEET to node 47.94.143.83:6387 to make it join the cluster.
[OK] New node added correctly.
#有上条就证明成功！

C、全新指令查看集群节点槽位等信息（redis-cli -a 密码 --cluster check 集群中有的主节点：port）

[root@quihechunshui mycluster]# redis-cli -a che**** --cluster check 47.94.143.83:6381
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
47.94.143.83:6381 (cd079a91...) -> 1 keys | 5461 slots | 1 slaves.
47.94.143.83:6382 (e4ca9a39...) -> 1 keys | 5462 slots | 1 slaves.
47.94.143.83:6383 (d3ed83ca...) -> 0 keys | 5461 slots | 1 slaves.
47.94.143.83:6387 (55d4917c...) -> 0 keys | 0 slots | 0 slaves.
#可见新加入的6387没有槽位信息
[OK] 2 keys in 4 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 47.94.143.83:6381)
M: cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: e4ca9a39418f35043013be72d8570ae508a88888 47.94.143.83:6382
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385
   slots: (0 slots) slave
   replicates e4ca9a39418f35043013be72d8570ae508a88888
M: d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
M: 55d4917c29e82afe57676d672ab59f1b52912b27 47.94.143.83:6387
   slots: (0 slots) master
S: 58f17d6fbb22caf27b750f009c9a81d3c57f7cda 47.94.143.83:6384
   slots: (0 slots) slave
   replicates cd079a91c83822bcad273346550e5d960a02d692
S: cb6f48fa7300f38345358415f70d35a90f280e32 47.94.143.83:6386
   slots: (0 slots) slave
   replicates d3ed83ca838d4786e749a80ec6c940040a0854f3
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

可见新加入节点没有槽位。

D、给新加入节点（6387节点）分配槽位

执行redis-cli -a 密码 --cluster reshard 集群所在其中一台主机ip：端口号

 redis-cli -a chen*** --cluster reshard 47.94.143.83:6381

执行后，显示如下信息，会有多次询问：第一次询问要将集群中原来槽位移出多少交给其他节点，第二次询问要将移出槽位交给的节点id，第三次选择从那些id号对应节点移出（输入all，为全部全部槽位槽位转接）第四次是移出槽位后，是否同意执行yes即可。

redis-cli -a chen***** --cluster rehard 47.94.143.83:6381
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Unknown --cluster subcommand
[root@quihechunshui mycluster]# redis-cli -a chen13515216766 --cluster reshard 47.94.143.83:6381
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing Cluster Check (using node 47.94.143.83:6381)
M: cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: e4ca9a39418f35043013be72d8570ae508a88888 47.94.143.83:6382
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385
   slots: (0 slots) slave
   replicates e4ca9a39418f35043013be72d8570ae508a88888
M: d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
M: 55d4917c29e82afe57676d672ab59f1b52912b27 47.94.143.83:6387
   slots: (0 slots) master
S: 58f17d6fbb22caf27b750f009c9a81d3c57f7cda 47.94.143.83:6384
   slots: (0 slots) slave
   replicates cd079a91c83822bcad273346550e5d960a02d692
S: cb6f48fa7300f38345358415f70d35a90f280e32 47.94.143.83:6386
   slots: (0 slots) slave
   replicates d3ed83ca838d4786e749a80ec6c940040a0854f3
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4059   #第一次询问
What is the receiving node ID? 55d4917c29e82afe57676d672ab59f1b52912b27  #第二次询问
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: all  #第三次询问
Ready to move 4059 slots.
  Source nodes:
    M: cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381
       slots:[0-5460] (5461 slots) master
       1 additional replica(s)
    M: e4ca9a39418f35043013be72d8570ae508a88888 47.94.143.83:6382
       slots:[5461-10922] (5462 slots) master
       1 additional replica(s)
    M: d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383
       slots:[10923-16383] (5461 slots) master
       1 additional replica(s)
  Destination node:
    M: 55d4917c29e82afe57676d672ab59f1b52912b27 47.94.143.83:6387
       slots: (0 slots) master
  Resharding plan:
    Moving slot 5461 from e4ca9a39418f35043013be72d8570ae508a88888
    Moving slot 5462 from e4ca9a39418f35043013be72d8570ae508a88888
    Moving slot 5463 from e4ca9a39418f35043013be72d8570ae508a88888
    Moving slot 5464 from e4ca9a39418f35043013be72d8570ae508a88888
    Moving slot 5465 from e4ca9a39418f35043013be72d8570ae508a88888
    Moving slot 5466 from e4ca9a39418f35043013be72d8570ae508a88888
    Moving slot 5467 from e4ca9a39418f35043013be72d8570ae508a88888
    Moving slot 5468 from e4ca9a39418f35043013be7
    #后面省略其他槽位调出
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 5461 from 47.94.143.83:6382 to 47.94.143.83:6387: 
Moving slot 5462 from 47.94.143.83:6382 to 47.94.143.83:6387: 
Moving slot 5463 from 47.94.143.83:6382 to 47.94.143.83:6387: 
Moving slot 5464 from 47.94.143.83:6382 to 47.94.143.83:6387: 
Moving slot 5465 from 47.94.143.83:6382 to 47.94.143.83:6387: 
Moving slot 5466 from 47.94.143.83:6382 to 47.94.143.83:6387: 
Moving slot 5467 from 47.94.143.83:6382 to 47.94.143.83:6387:
#后面又省略大量调入输出，要等一段时间

再次查看槽位情况如下：

47.94.143.83:6381 (cd079a91...) -> 1 keys | 4109 slots | 1 slaves.
47.94.143.83:6382 (e4ca9a39...) -> 1 keys | 4108 slots | 1 slaves.
47.94.143.83:6383 (d3ed83ca...) -> 0 keys | 4109 slots | 1 slaves.
47.94.143.83:6387 (55d4917c...) -> 0 keys | 4058 slots | 0 slaves.

那么新节点这些槽位是怎样从原来集群中节点中移出来的呢？

cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381
   slots:[1352-5460] (4109 slots) master
   1 additional replica(s)
M: e4ca9a39418f35043013be72d8570ae508a88888 47.94.143.83:6382
   slots:[6815-10922] (4108 slots) master
   1 additional replica(s)
S: 675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385
   slots: (0 slots) slave
   replicates e4ca9a39418f35043013be72d8570ae508a88888
M: d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383
   slots:[12275-16383] (4109 slots) master

首先每个原有节点都匀出一点，新节点接收，如下。

55d4917c29e82afe57676d672ab59f1b52912b27 47.94.143.83:6387
   slots:[0-1351],[5461-6814],[10923-12274] (4058 slots) master

可见其槽位不是连续的，因为是匀出来的槽位拼接的。

E、加入6388，作为从节点

redis-cli -a chen***** --cluster add-node 47.94.143.83:6388 47.94.143.83:6387 --cluster-slave --cluster-master-id 55d4917c29e82afe57676d672ab59f1b52912b27
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Adding node 47.94.143.83:6388 to cluster 47.94.143.83:6387
>>> Performing Cluster Check (using node 47.94.143.83:6387)
M: 55d4917c29e82afe57676d672ab59f1b52912b27 47.94.143.83:6387
   slots:[0-1351],[5461-6814],[10923-12274] (4058 slots) master
S: 675e84e05723ba63ab3fa699c4ff9dec6252d754 47.94.143.83:6385
   slots: (0 slots) slave
   replicates e4ca9a39418f35043013be72d8570ae508a88888
M: e4ca9a39418f35043013be72d8570ae508a88888 47.94.143.83:6382
   slots:[6815-10922] (4108 slots) master
   1 additional replica(s)
M: cd079a91c83822bcad273346550e5d960a02d692 47.94.143.83:6381
   slots:[1352-5460] (4109 slots) master
   1 additional replica(s)
M: d3ed83ca838d4786e749a80ec6c940040a0854f3 47.94.143.83:6383
   slots:[12275-16383] (4109 slots) master
   1 additional replica(s)
S: cb6f48fa7300f38345358415f70d35a90f280e32 47.94.143.83:6386
   slots: (0 slots) slave
   replicates d3ed83ca838d4786e749a80ec6c940040a0854f3
S: 58f17d6fbb22caf27b750f009c9a81d3c57f7cda 47.94.143.83:6384
   slots: (0 slots) slave
   replicates cd079a91c83822bcad273346550e5d960a02d692
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 47.94.143.83:6388 to make it join the cluster.
Waiting for the cluster to join

>>> Configure node as replica of 47.94.143.83:6387.
[OK] New node added correctly.

（七）集群缩容

A、删除从节点6388(redis-cli -a chen13515216766 --cluster del-node 6388ip:端口 6388的ID）

redis-cli -a chen13515216766 --cluster del-node 47.94.143.83:6388  907336253b0597f7d8c35e8eb82f8a85d426a504
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Removing node 907336253b0597f7d8c35e8eb82f8a85d426a504 from cluster 47.94.143.83:6388
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.

B、归还6387占用节点(仅展示关键四问）

[root@quihechunshui mycluster]# redis-cli -a chen***** --cluster reshard 47.94.143.83:6381
How many slots do you want to move (from 1 to 16384)? 4059 #指定要移动的槽位数量
What is the receiving node ID? cd079a91c83822bcad273346550e5d960a02d692 #指定谁来接收移出的槽位，这里用6381来接收6387全部槽位
Source node #1: 55d4917c29e82afe57676d672ab59f1b52912b27    #从谁那里移出槽位
Source node #2: done  #指定完毕开始执行

C、删除6387节点

[root@quihechunshui mycluster]# redis-cli -a chen13515216766 --cluster del-node 47.94.143.83:6387 55d4917c29e82afe57676d672ab59f1b52912b27
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Removing node 55d4917c29e82afe57676d672ab59f1b52912b27 from cluster 47.94.143.83:6387
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.

（八）集群的其他细节

A、一个配置决定当redis集群不完整时，是否继续提供服务。

不完整是指，当一个集群有一个或以上的主从redis全部挂掉的情况
配置文件中有cluster-require-full-coverage yes 默认为yes，即使redis不完整依然提供服务，为no，则一旦不完整就停止提供服务。

B、当使用mget等批处理命令，报不在一台机器上储存的解决办法

因为根据键值计算槽位确定存入位置，所以不在一台redis上的可能性很大。
我们可以在set时，键值后加上{}来映射到同一槽位。
例子：

#没有映射因为没有映射，导致redis不同，报错
127.0.0.1:6381> mset one "me" two "you" three "hh"
(error) CROSSSLOT Keys in request don't hash to the same slot
#使用映射键值，都映射为a，则计算时，这几个key就都会被映射为a，从而肯定在一个槽位存储。
127.0.0.1:6381> mset one{a} "me" two{a} "you" three{a} "meh"
-> Redirected to slot [15495] located at 47.94.143.83:6383
OK
#获取时带上映射即可，获得。
47.94.143.83:6383> mget one{a} two{a} three{a}
1) "me"
2) "you"
3) "meh"

湫鹤椿水

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Redis（三）哨兵、集群

启动sentinel的两种指令（二者等价）#方式一redis-server sentinel配置文件路径 --sentinel#方式二redis-sentinel sentinel配置文件路径最后算上三个redis服务和三个sentinel，用ps -ef ｜ grep redis 指令查看如下port 6381#根据端口号灵活调整#一些后台运行，远程连接等常规操作#日志文件和进程号文件的配置#配置要连接的主机密码和自己的密码。#开启aof备份。
复制链接

扫一扫