本文讲解,基于Redis版本:5.0.3
2021-12-24更新:本教程 Redis-6.2.1 同样适用
本文是在Redis集群的基础之上,通过Redis哨兵机制来完成Redis集群的高可用方案。如需了解Redis Cluster集群的安装,请移步:Redis Cluster集群安装(手工搭建 && redis-cli工具搭建)
1.什么是哨兵
顾名思义,哨兵的作用就是监控Redis系统的运行状况,在Redis中,哨兵叫做sentinel。它的功能包括两个:
①监控master节点和slave节点,是否正常运行
②当master主节点发生故障时,自动将master对应的slave节点升级为master节点,实现主从切换
2.Redis Cluster使用哨兵架构图
Redis-Sentinel(哨兵模式)是Redis官方推荐的高可用性(HA)解决方案。Redis哨兵是一个独立的进程,Redis集群引入哨兵后,架构图如下:
从如上架构图我们不难发现, 当哨兵挂掉之后,我们依然无法实现Redis集群的高可用,所以此处就有引出了哨兵的单点问题,哨兵单点故障仍然无法满足Redis集群的高可用。那么哨兵如何解决单点故障呢?
解决哨兵的单点故障问题,我们可以使用多个哨兵进行监控任务以保证系统足够稳定。此时哨兵不仅会监控master和slave,同时还会互相监控,这种方式称为哨兵集群。哨兵集群需要解决①Redis集群的故障发现 ②master节点决策的协商机制问题
当我们引入哨兵集群之后,多个哨兵之间也会进行相互监控,Redis集群架构图如下:
多个哨兵节点之间,会因为共同监听同一个master节点,从而产生关联。一个新加入的哨兵节点,需要和监视相同master节点的其他哨兵,通过pub/sub(发布/订阅)机制,来完成相互感知,从而使集群中原有的哨兵发现这个新加入集群的哨兵。最后新加入哨兵集群的哨兵,会和集群中的其他的哨兵建立起长连接,来共同维护Redis集群的高可用。
3.Redis集群中master节点故障发现
Redis集群中,通过引入哨兵机制来完成Redis集群的高可用。那么master节点的故障是如何被发现的呢?Sentinel哨兵节点会定时向master节点发送心跳来判断master节点是否存活。一旦master节点在规定时间内没有正确响应,Sentinel哨兵会把master节点设置为"主观不可用状态",然后它会把"主观不可用状态"发送给其他所有的集群中的其他Sentinel哨兵节点去确认,当确认的Sentinel节点数 > quorum(quorum在配置文件中可配置)时,便会认为该master是"客观不可用",接下来便会进入新的master选举过程。
但是,在哨兵集群中,如果多个节点同时发现master节点达到"客观不可用状态",那么由哪个哨兵来决定哪个节点作为master呢?
这个时候就需要从哨兵集群中,选择一个Sentinel来作为leader来做出相应的决策。这里会用到一个一致性算法Raft算法,它和ZooKeeper中用到的Paxos算法类似,都是分布式一致性算法。Raft算法和Paxos算法一样,也是基于投票算法,只要保证过半数节点通过选举,即可选定该Sentinel为新的leader,来做出哪个节点应该作为master节点的决策。
4.哨兵机制的配置
基于Redis-5.0.3集群安装,完成3主3从,部署在6台机器上的Redis集群哨兵的配置。此处哨兵你可以随意配置几个都可以。1个哨兵的话,无法满足Redis Cluster的高可用。所以最少得配置2台哨兵。本文配置3台哨兵,实现3主3从3哨兵的Redis集群高可用。
我们在解压缩redis.tar.gz包后,会在目录下发现一个sentinel.conf文件,改文件就是哨兵的配置文件,如下图所示
3台哨兵分别配置在192.168.204.201、192.168.204.202、192.168.204.203三台服务器上(哨兵并不一定配置在3主3从的服务器上,也可以重新找一台服务器来配置)。注意:Redis-Sentinel(哨兵模式),作为Redis中的一个分支,必须依赖于Redis服务,所以如果你要讲哨兵部署在Redis集群之外的机器上,也必须想安装Redis才能正常使用哨兵。
我们先来看看sentinel.conf配置文件的内容(如不想看,可直接跳过,看下面重要部分)
# Example sentinel.conf
# *** IMPORTANT ***
#
# By default Sentinel will not be reachable from interfaces different than
# localhost, either use the 'bind' directive to bind to a list of network
# interfaces, or disable protected mode with "protected-mode no" by
# adding it to this configuration file.
#
# Before doing that MAKE SURE the instance is protected from the outside
# world via firewalling or other means.
#
# For example you may use one of the following:
#
# bind 127.0.0.1 192.168.1.1
#
# protected-mode no
# port <sentinel-port>
# The port that this sentinel instance will run on
port 26379
# By default Redis Sentinel does not run as a daemon. Use 'yes' if you need it.
# Note that Redis will write a pid file in /var/run/redis-sentinel.pid when
# daemonized.
daemonize no
# When running daemonized, Redis Sentinel writes a pid file in
# /var/run/redis-sentinel.pid by default. You can specify a custom pid file
# location here.
pidfile /var/run/redis-sentinel.pid
# Specify the log file name. Also the empty string can be used to force
# Sentinel to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /dev/null
logfile ""
# sentinel announce-ip <ip>
# sentinel announce-port <port>
#
# The above two configuration directives are useful in environments where,
# because of NAT, Sentinel is reachable from outside via a non-local address.
#
# When announce-ip is provided, the Sentinel will claim the specified IP address
# in HELLO messages used to gossip its presence, instead of auto-detecting the
# local address as it usually does.
#
# Similarly when announce-port is provided and is valid and non-zero, Sentinel
# will announce the specified TCP port.
#
# The two options don't need to be used together, if only announce-ip is
# provided, the Sentinel will announce the specified IP and the server port
# as specified by the "port" option. If only announce-port is provided, the
# Sentinel will announce the auto-detected local IP and the specified port.
#
# Example:
#
# sentinel announce-ip 1.2.3.4
# dir <working-directory>
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
dir /tmp
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least <quorum> sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Replicas are auto-discovered, so you don't need to specify replicas in
# any way. Sentinel itself will rewrite this configuration file adding
# the replicas using additional configuration options.
# Also note that the configuration file is rewritten when a
# replica is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
sentinel monitor mymaster 127.0.0.1 6379 2
# sentinel auth-pass <master-name> <password>
#
# Set the password to use to authenticate with the master and replicas.
# Useful if there is a password set in the Redis instances to monitor.
#
# Note that the master password is also used for replicas, so it is not
# possible to set a different password in masters and replicas instances
# if you want to be able to monitor these instances with Sentinel.
#
# However you can have Redis instances without the authentication enabled
# mixed with Redis instances requiring the authentication (as long as the
# password set is the same for all the instances requiring the password) as
# the AUTH command will have no effect in Redis instances with authentication
# switched off.
#
# Example:
#
# sentinel auth-pass mymaster MySUPER--secret-0123passw0rd
# sentinel down-after-milliseconds <master-name> <milliseconds>
#
# Number of milliseconds the master (or any attached replica or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.
sentinel down-after-milliseconds mymaster 30000
# sentinel parallel-syncs <master-name> <numreplicas>
#
# How many replicas we can reconfigure to point to the new replica simultaneously
# during the failover. Use a low number if you use the replicas to serve query
# to avoid that all the replicas will be unreachable at about the same
# time while performing the synchronization with the master.
sentinel parallel-syncs mymaster 1
# sentinel failover-timeout <master-name> <milliseconds>
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
# already tried against the same master by a given Sentinel, is two
# times the failover timeout.
#
# - The time needed for a replica replicating to a wrong master according
# to a Sentinel current configuration, to be forced to replicate
# with the right master, is exactly the failover timeout (counting since
# the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
# did not produced any configuration change (SLAVEOF NO ONE yet not
# acknowledged by the promoted replica).
#
# - The maximum time a failover in progress waits for all the replicas to be
# reconfigured as replicas of the new master. However even after this time
# the replicas will be reconfigured by the Sentinels anyway, but not with
# the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.
sentinel failover-timeout mymaster 180000
# SCRIPTS EXECUTION
#
# sentinel notification-script and sentinel reconfig-script are used in order
# to configure scripts that are called to notify the system administrator
# or to reconfigure clients after a failover. The scripts are executed
# with the following rules for error handling:
#
# If script exits with "1" the execution is retried later (up to a maximum
# number of times currently set to 10).
#
# If script exits with "2" (or an higher value) the script execution is
# not retried.
#
# If script terminates because it receives a signal the behavior is the same
# as exit code 1.
#
# A script has a maximum running time of 60 seconds. After this limit is
# reached the script is terminated with a SIGKILL and the execution retried.
# NOTIFICATION SCRIPT
#
# sentinel notification-script <master-name> <script-path>
#
# Call the specified notification script for any sentinel event that is
# generated in the WARNING level (for instance -sdown, -odown, and so forth).
# This script should notify the system administrator via email, SMS, or any
# other messaging system, that there is something wrong with the monitored
# Redis systems.
#
# The script is called with just two arguments: the first is the event type
# and the second the event description.
#
# The script must exist and be executable in order for sentinel to start if
# this option is provided.
#
# Example:
#
# sentinel notification-script mymaster /var/redis/notify.sh
# CLIENTS RECONFIGURATION SCRIPT
#
# sentinel client-reconfig-script <master-name> <script-path>
#
# When the master changed because of a failover a script can be called in
# order to perform application-specific tasks to notify the clients that the
# configuration has changed and the master is at a different address.
#
# The following arguments are passed to the script:
#
# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
#
# <state> is currently always "failover"
# <role> is either "leader" or "observer"
#
# The arguments from-ip, from-port, to-ip, to-port are used to communicate
# the old address of the master and the new address of the elected replica
# (now a master).
#
# This script should be resistant to multiple invocations.
#
# Example:
#
# sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
# SECURITY
#
# By default SENTINEL SET will not be able to change the notification-script
# and client-reconfig-script at runtime. This avoids a trivial security issue
# where clients can set the script to anything and trigger a failover in order
# to get the program executed.
sentinel deny-scripts-reconfig yes
# REDIS COMMANDS RENAMING
#
# Sometimes the Redis server has certain commands, that are needed for Sentinel
# to work correctly, renamed to unguessable strings. This is often the case
# of CONFIG and SLAVEOF in the context of providers that provide Redis as
# a service, and don't want the customers to reconfigure the instances outside
# of the administration console.
#
# In such case it is possible to tell Sentinel to use different command names
# instead of the normal ones. For example if the master "mymaster", and the
# associated replicas, have "CONFIG" all renamed to "GUESSME", I could use:
#
# SENTINEL rename-command mymaster CONFIG GUESSME
#
# After such configuration is set, every time Sentinel would use CONFIG it will
# use GUESSME instead. Note that there is no actual need to respect the command
# case, so writing "config guessme" is the same in the example above.
#
# SENTINEL SET can also be used in order to perform this configuration at runtime.
#
# In order to set a command back to its original name (undo the renaming), it
# is possible to just rename a command to itsef:
#
# SENTINEL rename-command mymaster CONFIG CONFIG
接下来我们来正式配置Redis哨兵集群。因为我们要配置3个哨兵,所以每个哨兵则需要监听集群中所有的节点,我们在192.168.204.201节点来配置一个哨兵,主要配置文件如下:(直接使用如下哨兵配置即可)
#Sentinel使用端口
port 26379
#打开非保护模式
protected-mode no
#守护线程启动(即后台启动)
daemonize yes
#守护进程会使用到的一个文件
pidfile "/var/run/redis-sentinel.pid"
#指定日志文件名,默认为"",空字符串也可用于强制Sentinel登录标准输出,指定后我们可以通过tail -f xxx.log查看日志
logfile "/usr/local/lib/redis-5.0.3/redis-sentinel.log"
#每个长时间运行的进程都应该有一个明确定义的工作目录。对于Redis Sentinel来说,启动时dir到/tmp是最简单的事情为进程不干扰管理任务,如卸载文件系统。(默认就是"/tmp",copy过来即可)
dir "/tmp"
#重要的来了
#sentinel monitor <master-name> <ip> <redis-port> <quorum>
#告诉sentinel去监听地址为ip:port的一个master,这里的master-name可以自定义,quorum是一个数字,指明当
#有多少个sentinel认为一个master失效时,master才算真正失效.需要注意的是master-ip 要写真实
#的ip地址而不要用回环地址(127.0.0.1)。
sentinel monitor master001 192.168.204.201 6379 2
sentinel monitor master002 192.168.204.202 6379 2
sentinel monitor master003 192.168.204.203 6379 2
#sentinel down-after-milliseconds <master-name> <milliseconds>
#这个配置项指定需要多少时间无响应,一个master才会被这个sentinel主观地认为是不可用的.单位是毫秒,默认为30秒
sentinel down-after-milliseconds master001 10000
sentinel down-after-milliseconds master002 10000
sentinel down-after-milliseconds master003 10000
#sentinel parallel-syncs <master-name> <numslaves>
#这个配置项指定了在发生failover主备切换时最多可以有多少个slave同时对新的master进行同步,这个数字越小,完成failover所需的时间就越长,但是如果这个数字越大,就意味着越 多的slave因为replication而不可用.可以通过将这个值设为1(默认就是1)来保证每次只有一个slave处于不能处理命令请求的状态
sentinel parallel-syncs master001 1
sentinel parallel-syncs master002 1
sentinel parallel-syncs master003 1
#sentinel failover-timeout <master-name> <milliseconds>
# failover过期时间,当failover开始后,在此时间内仍然没有触发任何failover操作,当前sentinel 将会认为此次failover失败,默认为3分钟,单位为毫秒
sentinel failover-timeout master001 180000
sentinel failover-timeout master002 180000
sentinel failover-timeout master003 180000
#是否拒绝从新配置通知脚本,默认拒绝(yes).
sentinel deny-scripts-reconfig yes
配置完成后,将该配置文件分别复制到192.168.204.202和192.168.204.203节点各一份即可。然后通过命令,将3台服务器的哨兵都启动,命令如下:
src/redis-sentinel ./sentinel.conf
至此哨兵集群搭建完毕。
注意开放端口问题:
如果你需要将三个Sentinel哨兵,部署在三台不同的服务器上,切记要在该三台服务器上分别开放Sentinel访问的端口。如果不开放端口,Sentinel哨兵也还是无法监控到的。开放端口,请移步参考:Linux开放指定端口
5.Redis集群HA测试
以 201 master节点和 204 slave节点为例
如果配置了后台启动,你可以通过tail-f xxx.log来查看哨兵日志。三个哨兵中打印的日志都是一样的内容。所以我们看一个201服务器的哨兵集群日志即可。
我们现在手动关闭201这个master节点,哨兵会帮我们自动将204节点从slave角色变更为master角色,如下图:
为什么哨兵会有一段时间无响应,那是它在测试连接的心跳是否超时,一次来判断master节点是否已经挂掉,这个我们可以在sentinel.conf文件中配置。
我们会发现204节点已经变更为master节点,当我们将原master节点201服务器重新启动后,我们会发现原master 201节点已经变成现在新master 204节点的slave。
我还在努力写博客,来充实自己中...
如有本文有帮助到你,那就帮我点个赞,鼓励一下我啦^_^
END