Redis Sentinel哨兵模式原理及配置

最新推荐文章于 2024-06-19 18:15:00 发布

cuilingqiao0657

最新推荐文章于 2024-06-19 18:15:00 发布

阅读量245

点赞数

文章标签：数据库

之前文章介绍了redis replication主从高可用架构，
现在延伸下，本文讨论redis replication架构下实现自动故障切换(automatic failover)的技术--Sentinel

主要参考官方文档：
https://redis.io/topics/sentinel

http://redisdoc.com/topic/sentinel.html

注：配置sentinel之前需先建立master-slave replication
可依照文章建立replication： http://blog.itpub.net/25583515/viewspace-2644438/

一.安装配置
在1个master 1个slave 的环境中加一个sentinel，
进入redis源码安装目录copy sentinel.conf文件
# cd /u01/packages/redis-3.0.6
# cp sentinel.conf /usr/local/redis/etc/
# vi /usr/local/redis/etc/sentinel.conf
daemonize yes
sentinel monitor mymaster 127.0.0.1 6379 1
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000

参数解释：
sentinel monitor mymaster 127.0.0.1 6379 1
指示 Sentinel 去监视一个被命名为 mymaster 的master，可指定为任何名字
Master IP为127.0.0.1 ，端口号为 6379 ，
这个master判断为失效至少需要 1 个 Sentinel 同意（只要同意 Sentinel 的数量不达标，自动故障迁移就不会执行）
注意，无论你设置多少个 Sentinel 同意才能判断一个服务器失效，一个 Sentinel 都需要获得系统中多数（majority） Sentinel 的支持，才能发起一次自动故障迁移

sentinel down-after-milliseconds mymaster 30000
down-after-milliseconds 指定了 Sentinel 认为master已经断线所需的毫秒数

sentinel parallel-syncs mymaster 1
parallel-syncs 指定了在执行故障转移时，最多可以有多少个slave同时对新的master进行同步，这个数字越小，完成故障转移所需的时间就越长

sentinel failover-timeout mymaster 180000
failover-timeout 指定故障切换允许的毫秒数，超过这个时间，就认为故障切换失败，默认为3分钟

启动sentinel
# redis-sentinel /usr/local/redis/etc/sentinel.conf

查看状态
# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=127.0.0.1:6379,slaves=1,sentinels=1

至此，配置sentinel工作完成

Sentinel 相关命令
• INFO sentinel的基本状态信息
• PING ：返回 PONG
• SENTINEL masters ：列出所有被监视的master，以及这些master的当前状态
• SENTINEL slaves <master name> ：列出给定master的所有slave服务器，以及这些slave的当前状态
• SENTINEL get-master-addr-by-name <master name> ：返回给定名字的master的 IP 地址和端口号。如果这个master正在执行故障转移操作，或者针对这个master的故障转移操作已经完成，那么这个命令返回新的master的 IP 地址和端口号。
• SENTINEL reset <pattern> ：重置所有名字和给定模式 pattern 相匹配的master。 pattern 参数是一个 Glob 风格的模式。重置操作清除master目前的所有状态，包括正在执行中的故障转移，并移除目前已经发现和关联的，master的所有slave和 Sentinel
• SENTINEL failover <master name> ：当master失效时，在不询问其他 Sentinel 意见的情况下，强制开始一次自动故障迁移（不过发起故障转移的 Sentinel 会向其他 Sentinel 发送一个新的配置，其他 Sentinel 会根据这个配置进行相应的更新）。

二. 原理：
故障转移时主要是解决两个问题，一是选Leader Sentinel，二是选新的master

1.选Leader Sentinel规则

Sentinel 自动故障迁移使用 Raft 算法来选举领头（Leader）Sentinel ，从而确保在一个给定的纪元时期（epoch）里，只有一个Leader产生。
表示在同一个时期，不会有两个 Sentinel 同时被选中为Leader，并且各个 Sentinel 在同一个时期中只会对一个Leader进行投票。

注：Raft算法主要思想是同一期Term(Epoch)投票中少数服从多数原则达成一致，选出Leader
具体算法这里不过多解释，详细可参考文章： https://www.jianshu.com/p/8e4bbe7e276c

2.选新master规则
1> 在失效主服务器属下的从服务器当中，那些被标记为主观下线、已断线、或者最后一次回复 PING 命令的时间大于五秒钟的从服务器都会被淘汰。
2> 在失效主服务器属下的从服务器当中，那些与失效主服务器连接断开的时长超过 down-after 选项指定的时长十倍的从服务器都会被淘汰。
3> 在经历了以上两轮淘汰之后剩下来的从服务器中，我们选出复制偏移量（replication offset）最大的那个slave作为新的master服务器；如果复制偏移量不可用，或者slave服务器的复制偏移量相同，那么带有最小运行 ID 的那个从服务器成为新的master。

一次故障转移步骤：
1>发现master已进入客观下线状态。
2>对当前纪元时期(epoch)进行自增，并尝试在这个纪元时期中当选。
3>如果当选失败，那么在设定的故障迁移超时时间的两倍之后，重新尝试当选。如果当选成功，那么执行以下步骤
4>选出一个slave，并将它升级为master。
5>向被选中的slave发送 SLAVEOF NO ONE 命令，让它转变为master。
通过发布与订阅功能，将更新后的配置传播给所有其他 Sentinel ，其他 Sentinel 对它们自己的配置进行更新。
6>向已下线master的其它slave发送 SLAVEOF host port 命令，让它们去复制新的master。
7>当所有slave都已经开始复制新的master时，领头 Sentinel 终止这次故障迁移操作。

以上，介绍了Redis Sentinel哨兵模式相关配置及原理，是否很好理解呢。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/25583515/viewspace-2645084/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/25583515/viewspace-2645084/

cuilingqiao0657

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Redis Sentinel哨兵模式原理及配置

之前文章介绍了redis replication主从高可用架构，现在延伸下，本文讨论redis replication架构下实现自动故障切换(automatic failover)的技术--Sentinel...
复制链接

扫一扫