redis Sentinel机制、原理及代码分析

最新推荐文章于 2022-07-29 14:37:44 发布

zheng6652

最新推荐文章于 2022-07-29 14:37:44 发布

阅读量659

点赞数

分类专栏： redis 文章标签： redis sentinel 代码分析

本文链接：https://blog.csdn.net/zheng6652/article/details/82262443

版权

redis 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

* 需求背景

有多个或一个sentinel实例组成的sentinel系统可以监视任意多个主服务器，以及这些主服务器属下的所有从服务器，并且在被监视的主服务器进入下线状态时，自动将下线主服务器属下的某个从服务器升级为新的主服务器，然后由新的主服务器代替已下线的主服务器继续处理命令请求。

* sentinel监控分布式系统的功能和需求

@服务器状态-主服务器上线下线监控；怎么检测？主观下线，客观下线？

@ sentinel选举：采用raft选举算法，选出最佳sentinel；如何选出最优sentinel？

@自动故障迁移(automatic failover);从slave服务器中选取一个作为新主服务器,并把slave复制新的主服务器；怎么选出新的主服务器?

@提醒：当监控redis服务器异常，通过api向管理员或者其他应用程序发送通知；

* 实现方案

@发现 master slave 与 sentinel的方式

1、人工配置 sentinel master ip ip_port quorum(投票数量) --->获取master

2、sentinel 每10秒向master发一次info命令获取 master的slaves信息 --->获取slave；

3、sentinel每2秒向masters及slaves -->{publish __sentinel__:hello "s_ip, s_port,s_runnid, s_epoch,m_name,m_ip, m_port, m_epoch"},信息的内容不publish slave服务器信息，当sentinel subscribe __sentinel__:hello找得到这个信息，会获取这个发布publish的sentinel相关信息，创建或跟新对应maser的sentinelinstance结构的sentinels字段 --获取监控的master服务器有多有个sentinel在同时监控这个master；

@ sentinel 与master slave sentinel之间的通讯方式；

1、命令连接，sentinel之间只有命令连接，没有订阅连接；

2、订阅连接：__sentinel__:hello频道

@sentinel判断master主观下线的方法；

sentinel每秒一次向所有与它创建了命令连接的sentinelinstance实例发送ping命令，并通过实例返回的ping命令回复判断实例是否在线；如果在 down-after-millseconds内收到+pong -loading -masterdown等有效回复，认为在线，如果在 down-after-millseconds内收到除这三种回复外的无效回复，或没有收到回复，认为下线；会修改sentinelinstance中的flag属性为SRI_S_DOWN标志。down-after-millseconds时间对所有sentinelinstance有同样的作用；

@sentinel判断master客观下线的方法；

当主观下线后，向这个master的其他所有sentinel发送 sentinel is-master-down-by-addr ip(mater） port(master) current_epoch runid(*:代表用于检测master主观下线；sentinel run id：用于选举领头sentinel)命令,接收到的sentinel返回 sentinel is-master-down-by down-state leader-runid(其中leader-runnid都为*，表示仅仅用于检测master下线状态；否则用于选举领头sentinel) leader_epoch(仅在leader-runnid不为*有效，当leader-runnid为*,则leader_epoch总为0)，当sentinel从其他sentinel得到有足够的数量（quorum值）的判断这个master已经主观下线后，就将这个master判断为客观下线。最终sentinelinstancle的flags的标志为SRI_MASTER|SRI_S_DOWN|SRI_O_DOWN;

@选举领头sentinel做客观下线的master故障转移操作的sentinel的方式；

客观下线后，进行选举sentinel做故障转移:

#当一个sentinel被半数以上的sentinel设置为局部领头sentinel，那么这个sentinel就是领头sentinel;在给定的一个纪元里面，只能有一个领头sentinel;

#向目标sentinel发送 sentinel is-master-down-by-addr 其中runid为源sentinel runnid，要求目标sentinel把源sentinel设置为局部领头sentinel；当源sentinel回到回复，会判断回复中的runid 和epoch参数，判断自己是否被目标sentinel设置为局部领头sentinel;

#sentinel设置局部领头sentinel规则是先到先得，一旦目标sentinel设置了局部领头sentinel，就不会更改，回复sentinel is-master-down-by-addr中带的runnid 是自己的局部领头sentinel和配置纪元；

#在一定时限内，没有一个sentinel被选举为领头sentinel，那么各个sentinel将在一段时间之后再次进行选举；

@领头sentinel故障转移；

1、从已经下线的master的slaves选出一个从服务器，作为新的master,选出参考值：从服务器的参数有：上线状态、最近５秒内没有回复领头sentine info命令；与下线的master断开连接超过down-after-millseconds * 10;之后优先级排序最高的，相同优先级就看从服务器的复制偏移量最大；如果优先级和复制偏移量相同，则选择runid最小的slave.

2、让下线的master的所有从服务器复制新的master;

3、将下线的master作为新的master的从服务器，当这个就的主服务器重新上线时，它会成为新的master的从slave;

* 代码解析