Redis（5）- Redis哨兵机制

最新推荐文章于 2023-04-17 13:27:04 发布

cleancp

最新推荐文章于 2023-04-17 13:27:04 发布

阅读量206

点赞数

分类专栏： Redis

本文链接：https://blog.csdn.net/cleancp/article/details/113795307

版权

Redis 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Redis（5）- Redis哨兵机制

前言
- 概念
- 原因
一、哨兵机制
二、RedisSentinel
- 1、如何安装与部署
- 2、哨兵机制测试

前言

主要包括Redis持久化机制，Redis主从搭建配置，Redis同步说明，多种拓扑方式说明参考：https://blog.csdn.net/qq_41453285/article/details/103354554

概念

实例对PING命令的回复可以分为以下两种情况：
- 有效回复：实例返回+PONG、-LOADING、-MASTERDOWN三种回复的其中一种
- 无效回复：实例返回除+PONG、-LOADING、-MASTERDOWN三种回复之外的其他回复，或者在指定时限内没有返回任何回复
主观下线（Subjectivity Down，简称sdown）：当有一个哨兵Sentinel检测到Reids服务无效回复，则进行主观下线。
哨兵选举机制：哨兵会有自己的选举leader机制选举哨兵leader，(可能是最先对redis服务进行主观下线的的哨兵作为leader可能性大？)
客观下线（Objectively Down，简称 odown）：主观下线后，哨兵通知其它监听的哨兵进行主观下线判断，当主观下线数达到quorm时，进行客观下线。
故障转移：达成客观下线条件，判定故障，开始对故障redis主节点进行转移，哨兵leader对从节点slaveof no one变为主节点，然后将其他从节点 slaveof newmaster ，之后监听原主节点是否正常，如果正常，将主节点指向slaveof newmaster，通知应用主节点地址。
INFO：info replication （展示当前redis节点的信息，是否主节点，从节点信息等）

#6379被kill6381成为新主节点的info信息
[root@localhost redis]# ./redis-cli -h 192.168.42.119 -p 6381 -a 12345678 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.42.119,port=6380,state=online,offset=615344,lag=1
master_replid:81c2d503e26da087e4739af6b1d26597b7c7df4c
master_replid2:3f0eb9d8a3e432be3a5a925f91496426b84582eb
master_repl_offset:615344
second_repl_offset:240214
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:615344

原因

sentinel每10秒info所有节点一次
1、获取主从节点信息，sentinel的从节点信息就是从该监听机制获取
2、新的节点加入能及时感应
3、节点无效及时更新拓扑信息
sentinel每2秒在主节点的频道pubish/scrible
发布与订阅主节点的特定名称频道，每个sentinel会将自身的信息与监听的主节点信息发布出去，同时也会收到其它sentinel发布的信息
1、获取新加入的sentinel的信息，并建立连接
2、获取其它sentinel对主节点的判断
sentinel每秒PING sentinel 和所有节点
1、监测节点是否故障
2、检测sentinel是否故障

sentinel.conf配置

#配置监听的redis主节点 mymaster是节点别名  quorum是客观下线的判定数，2：两个哨兵都认为是主观下线 就是客观下线
# sentinel monitor mymaster masterip masterport quorum
sentinel monitor mymaster 127.0.0.1 6379 2

# sentinel auth-pass <master-name> <password> redis主节点的访问密码
sentinel auth-pass mymaster 12345678

# Default is 30 seconds.   mymaster 30秒之后无响应视为无效回复
sentinel down-after-milliseconds mymaster 30000

#进行客观下线时，Sentinel领导者节点会故障转移，选出新的主节点，原来的从节点会向新的主节点发起复制操作，限制每次向新的主节点发起复制操作的从节点个数为1
sentinel parallel-syncs mymaster 1
#故障转移的超时时间 3分钟
sentinel failover-timeout mymaster 180000

redis.conf配置

#故障转移时从节点的优先级高优先作为主节点
# By default the priority is 100.
slave-priority 100

一、哨兵机制

1、什么是哨兵机制

理解：哨兵机制是基于Redis主从机制对各节点服务的运行状态做一个监控，如果主节点存在问题不可使用，会通过一套机制（从节点的响应速度、响应时间等）选定最优主节点，维系主节点的稳定运行，使服务高可用。
哨兵对哨兵、redis主/从节点是每隔1秒PING的
原理：当主节点出现故障时，由Redis Sentinel自动完成故障发现和转移，并通知应用方，实现高可用性。
在这里插入图片描述

2、哨兵机制的三个定时监控任务作用

主要是ping判断节点是否正常工作，publish/subscribe看消息发布订阅是否正常，info获取主从节点信息
在这里插入图片描述

3、哨兵主观下线

主观下线，
在这里插入图片描述

4、哨兵客观下线

客观下线：通过配置的quorum(表示几个哨兵通过才算通过)判定客观下线
在这里插入图片描述

5、领导者哨兵选举流程

is-masterdown-by-addr：panduan主节点是否正常的判断标志
一般来说由谁发现由谁成为哨兵领导负责故障处理，哨兵之间也有自己的机制判定谁是哨兵领导
在这里插入图片描述

6、哨兵机制－故障转移流程A

在这里插入图片描述

7、哨兵机制－故障转移流程B

在这里插入图片描述

8、哨兵机制－故障转移流程C

在这里插入图片描述

9、哨兵机制－故障转移后的拓扑结构图D

在这里插入图片描述

10、哨兵机制－故障转移详细流程

在这里插入图片描述

二、RedisSentinel

1、如何安装与部署

在这里插入图片描述

#启动redis主节点
#设置masterauth 当6379挂了，重启主节点变成6381，sentinel将6379设置为从节点，此时6379与6381主从同步需要masterauth，否则数据无法同步（使用哨兵也挺不方便的）
./redis-server redis.conf &> ./logs/6379/redis.log &
#启动redis从节点
#设置为6379的从节点
./redis-server redis6380.conf &> ./logs/6380/redis.log &
./redis-server redis6381.conf &> ./logs/6381/redis.log &

#在sentinel.conf配置上进行修改
#Protected-mode 是为了禁止公网访问redis cache，加强redis安全的。它启用的条件，有两个：没有bind IP、没有设置访问密码，如果启用了，则只能够通过lookback ip（127.0.0.1）访问 Redis，如果从外网访问，则会返回相应的错误信息：
protected-mode no 
#设置监听的主节点，设置quorum (两个哨兵都进行主观下线才进行客观下线)
sentinel monitor mymaster 192.168.42.119 6381 2
# 设置监听的redis主节点访问密码
sentinel auth-pass mymaster 12345678
#启动哨兵
./redis-sentinel sentinel.conf &> ./logs/26379/sentinel.log &
./redis-sentinel sentinel26380.conf &> ./logs/26380/sentinel.log &
./redis-sentinel sentinel26381.conf &> ./logs/26381/sentinel.log &
#或者
./redis-server sentinel26381.conf --sentinel &> ./logs/26381/sentinel.log &

2、哨兵机制测试

kill 6379主节点
哨兵进行选举leader 26381

主节点6379日志记录

1593:M 21 Feb 08:49:34.150 # Server initialized
1593:M 21 Feb 08:49:34.150 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1593:M 21 Feb 08:49:34.150 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1593:M 21 Feb 08:49:34.150 * DB loaded from append only file: 0.000 seconds
1593:M 21 Feb 08:49:34.150 * Ready to accept connections
1593:M 21 Feb 08:50:20.767 * Slave 192.168.42.119:6380 asks for synchronization
1593:M 21 Feb 08:50:20.767 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '9d949f851c961a8e875859e3e92dcec5e712aebb', my replication IDs are 'e5dbfa1d3ff1088ee599da661277523920b0eeba' and '0000000000000000000000000000000000000000')
1593:M 21 Feb 08:50:20.767 * Starting BGSAVE for SYNC with target: disk
1593:M 21 Feb 08:50:20.768 * Background saving started by pid 1603
1603:C 21 Feb 08:50:20.771 * DB saved on disk
1603:C 21 Feb 08:50:20.772 * RDB: 6 MB of memory used by copy-on-write
1593:M 21 Feb 08:50:20.790 * Background saving terminated with success
1593:M 21 Feb 08:50:20.790 * Synchronization with slave 192.168.42.119:6380 succeeded
1593:M 21 Feb 08:50:28.114 * Slave 192.168.42.119:6381 asks for synchronization
1593:M 21 Feb 08:50:28.114 * Partial resynchronization request from 192.168.42.119:6381 accepted. Sending 14 bytes of backlog starting from offset 1.

哨兵26379日志记录

1608:X 21 Feb 08:51:17.819 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1608:X 21 Feb 08:51:17.819 # Redis version=4.0.6, bits=64, commit=00000000, modified=0, pid=1608, just started
1608:X 21 Feb 08:51:17.819 # Configuration loaded
1608:X 21 Feb 08:51:17.820 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1608:X 21 Feb 08:51:17.821 * Running mode=sentinel, port=26379.
1608:X 21 Feb 08:51:17.821 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1608:X 21 Feb 08:51:17.821 # Sentinel ID is d077d6a0f1c0677745a91a9db31a2469f12bb7e4
1608:X 21 Feb 08:51:17.821 # +monitor master mymaster 192.168.42.119 6379 quorum 2
1608:X 21 Feb 09:11:06.857 # +sdown master mymaster 192.168.42.119 6379
1608:X 21 Feb 09:11:06.908 # +new-epoch 1
1608:X 21 Feb 09:11:06.909 # +vote-for-leader c2f14aa1cc3553d52ddd7e4651629a6aa47ca937 1
1608:X 21 Feb 09:11:06.924 # +odown master mymaster 192.168.42.119 6379 #quorum 3/2
1608:X 21 Feb 09:11:06.925 # Next failover delay: I will not start a failover before Sun Feb 21 09:17:07 2021
1608:X 21 Feb 09:11:07.386 # +config-update-from sentinel c2f14aa1cc3553d52ddd7e4651629a6aa47ca937 192.168.42.119 26380 @ mymaster 192.168.42.119 6379
1608:X 21 Feb 09:11:07.386 # +switch-master mymaster 192.168.42.119 6379 192.168.42.119 6381
1608:X 21 Feb 09:11:07.386 * +slave slave 192.168.42.119:6380 192.168.42.119 6380 @ mymaster 192.168.42.119 6381
1608:X 21 Feb 09:11:07.386 * +slave slave 192.168.42.119:6379 192.168.42.119 6379 @ mymaster 192.168.42.119 6381
1608:X 21 Feb 09:11:37.468 # +sdown slave 192.168.42.119:6379 192.168.42.119 6379 @ mymaster 192.168.42.119 6381
#增加6381从节点6382
1612:X 21 Feb 10:25:36.561 * +slave slave 192.168.42.119:6382 192.168.42.119 6382 @ mymaster 192.168.42.119 6381

哨兵26380日志记录

  已经选出26380为新的故障转移leader，并进行后续的故障转移操作

1612:X 21 Feb 08:51:27.826 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1612:X 21 Feb 08:51:27.826 # Redis version=4.0.6, bits=64, commit=00000000, modified=0, pid=1612, just started
1612:X 21 Feb 08:51:27.826 # Configuration loaded
1612:X 21 Feb 08:51:27.827 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1612:X 21 Feb 08:51:27.828 * Running mode=sentinel, port=26380.
1612:X 21 Feb 08:51:27.828 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1612:X 21 Feb 08:51:27.828 # Sentinel ID is c2f14aa1cc3553d52ddd7e4651629a6aa47ca937
1612:X 21 Feb 08:51:27.828 # +monitor master mymaster 192.168.42.119 6379 quorum 2
# 进行主观下线
1612:X 21 Feb 09:11:06.840 # +sdown master mymaster 192.168.42.119 6379
# 达到quorum 进行客观下线
1612:X 21 Feb 09:11:06.903 # +odown master mymaster 192.168.42.119 6379 #quorum 2/2
1612:X 21 Feb 09:11:06.903 # +new-epoch 1
# 准备进行故障转移
1612:X 21 Feb 09:11:06.903 # +try-failover master mymaster 192.168.42.119 6379
# 选举故障转移leader
1612:X 21 Feb 09:11:06.904 # +vote-for-leader c2f14aa1cc3553d52ddd7e4651629a6aa47ca937 1
1612:X 21 Feb 09:11:06.909 # c4b6779f832848544e74b7420e3671b70690b0ad voted for c2f14aa1cc3553d52ddd7e4651629a6aa47ca937 1
1612:X 21 Feb 09:11:06.910 # d077d6a0f1c0677745a91a9db31a2469f12bb7e4 voted for c2f14aa1cc3553d52ddd7e4651629a6aa47ca937 1
1612:X 21 Feb 09:11:06.980 # +elected-leader master mymaster 192.168.42.119 6379
1612:X 21 Feb 09:11:06.980 # +failover-state-select-slave master mymaster 192.168.42.119 6379
# 从节点中挑选新的主节点
1612:X 21 Feb 09:11:07.082 # +selected-slave slave 192.168.42.119:6381 192.168.42.119 6381 @ mymaster 192.168.42.119 6379
# 选出的从节点6381置为主节点
1612:X 21 Feb 09:11:07.082 * +failover-state-send-slaveof-noone slave 192.168.42.119:6381 192.168.42.119 6381 @ mymaster 192.168.42.119 6379
1612:X 21 Feb 09:11:07.135 * +failover-state-wait-promotion slave 192.168.42.119:6381 192.168.42.119 6381 @ mymaster 192.168.42.119 6379
1612:X 21 Feb 09:11:07.292 # +promoted-slave slave 192.168.42.119:6381 192.168.42.119 6381 @ mymaster 192.168.42.119 6379
1612:X 21 Feb 09:11:07.293 # +failover-state-reconf-slaves master mymaster 192.168.42.119 6379
1612:X 21 Feb 09:11:07.383 * +slave-reconf-sent slave 192.168.42.119:6380 192.168.42.119 6380 @ mymaster 192.168.42.119 6379
#客观下线原主节点6379
1612:X 21 Feb 09:11:07.972 # -odown master mymaster 192.168.42.119 6379
1612:X 21 Feb 09:11:08.285 * +slave-reconf-inprog slave 192.168.42.119:6380 192.168.42.119 6380 @ mymaster 192.168.42.119 6379
1612:X 21 Feb 09:11:08.285 * +slave-reconf-done slave 192.168.42.119:6380 192.168.42.119 6380 @ mymaster 192.168.42.119 6379
#故障转移结束
1612:X 21 Feb 09:11:08.369 # +failover-end master mymaster 192.168.42.119 6379
#切换哨兵的主节点配置
1612:X 21 Feb 09:11:08.369 # +switch-master mymaster 192.168.42.119 6379 192.168.42.119 6381
1612:X 21 Feb 09:11:08.369 * +slave slave 192.168.42.119:6380 192.168.42.119 6380 @ mymaster 192.168.42.119 6381
1612:X 21 Feb 09:11:08.369 * +slave slave 192.168.42.119:6379 192.168.42.119 6379 @ mymaster 192.168.42.119 6381
#主观下线6379
1612:X 21 Feb 09:11:38.373 # +sdown slave 192.168.42.119:6379 192.168.42.119 6379 @ mymaster 192.168.42.119 6381
#6381增加从节点6382
1612:X 21 Feb 10:25:36.561 * +slave slave 192.168.42.119:6382 192.168.42.119 6382 @ mymaster 192.168.42.119 6381

哨兵26381日志记录

  26381和26379做了一样的事情，进行master down的判断

1616:X 21 Feb 08:51:37.461 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1616:X 21 Feb 08:51:37.461 # Redis version=4.0.6, bits=64, commit=00000000, modified=0, pid=1616, just started
1616:X 21 Feb 08:51:37.461 # Configuration loaded
1616:X 21 Feb 08:51:37.462 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1616:X 21 Feb 08:51:37.463 * Running mode=sentinel, port=26381.
1616:X 21 Feb 08:51:37.463 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1616:X 21 Feb 08:51:37.463 # Sentinel ID is c4b6779f832848544e74b7420e3671b70690b0ad
1616:X 21 Feb 08:51:37.463 # +monitor master mymaster 192.168.42.119 6379 quorum 2
1616:X 21 Feb 09:11:06.809 # +sdown master mymaster 192.168.42.119 6379
1616:X 21 Feb 09:11:06.907 # +new-epoch 1
1616:X 21 Feb 09:11:06.909 # +vote-for-leader c2f14aa1cc3553d52ddd7e4651629a6aa47ca937 1
1616:X 21 Feb 09:11:07.385 # +config-update-from sentinel c2f14aa1cc3553d52ddd7e4651629a6aa47ca937 192.168.42.119 26380 @ mymaster 192.168.42.119 6379
1616:X 21 Feb 09:11:07.385 # +switch-master mymaster 192.168.42.119 6379 192.168.42.119 6381
1616:X 21 Feb 09:11:07.385 * +slave slave 192.168.42.119:6380 192.168.42.119 6380 @ mymaster 192.168.42.119 6381
1616:X 21 Feb 09:11:07.385 * +slave slave 192.168.42.119:6379 192.168.42.119 6379 @ mymaster 192.168.42.119 6381
1616:X 21 Feb 09:11:37.417 # +sdown slave 192.168.42.119:6379 192.168.42.119 6379 @ mymaster 192.168.42.119 6381
#增加6381从节点6382
1612:X 21 Feb 10:25:36.561 * +slave slave 192.168.42.119:6382 192.168.42.119 6382 @ mymaster 192.168.42.119 6381

6381变更为主节点日志记录

1604:S 21 Feb 09:11:06.007 * MASTER <-> SLAVE sync started
1604:S 21 Feb 09:11:06.007 # Error condition on socket for SYNC: Connection refused
1604:S 21 Feb 09:11:07.023 * Connecting to MASTER 192.168.42.119:6379
1604:S 21 Feb 09:11:07.023 * MASTER <-> SLAVE sync started
1604:S 21 Feb 09:11:07.023 # Error condition on socket for SYNC: Connection refused
1604:M 21 Feb 09:11:07.135 # Setting secondary replication ID to 3f0eb9d8a3e432be3a5a925f91496426b84582eb, valid up to offset: 240214. New replication ID is 81c2d503e26da087e4739af6b1d26597b7c7df4c
1604:M 21 Feb 09:11:07.135 * Discarding previously cached master state.
1604:M 21 Feb 09:11:07.135 * MASTER MODE enabled (user request from 'id=5 addr=192.168.42.119:46220 fd=11 name=sentinel-c2f14aa1-cmd age=1180 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
1604:M 21 Feb 09:11:07.137 # CONFIG REWRITE executed with success.
1604:M 21 Feb 09:11:07.782 * Slave 192.168.42.119:6380 asks for synchronization
1604:M 21 Feb 09:11:07.782 * Partial resynchronization request from 192.168.42.119:6380 accepted. Sending 166 bytes of backlog starting from offset 240214.

6380同步新的主节点日志记录

1599:S 21 Feb 09:11:05.755 * MASTER <-> SLAVE sync started
1599:S 21 Feb 09:11:05.755 # Error condition on socket for SYNC: Connection refused
1599:S 21 Feb 09:11:06.769 * Connecting to MASTER 192.168.42.119:6379
1599:S 21 Feb 09:11:06.769 * MASTER <-> SLAVE sync started
1599:S 21 Feb 09:11:06.769 # Error condition on socket for SYNC: Connection refused
1599:S 21 Feb 09:11:07.383 * SLAVE OF 192.168.42.119:6381 enabled (user request from 'id=6 addr=192.168.42.119:57452 fd=11 name=sentinel-c2f14aa1-cmd age=1180 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=143 qbuf-free=32625 obl=36 oll=0 omem=0 events=r cmd=exec')
1599:S 21 Feb 09:11:07.384 # CONFIG REWRITE executed with success.
1599:S 21 Feb 09:11:07.781 * Connecting to MASTER 192.168.42.119:6381
1599:S 21 Feb 09:11:07.781 * MASTER <-> SLAVE sync started
1599:S 21 Feb 09:11:07.781 * Non blocking connect for SYNC fired the event.
1599:S 21 Feb 09:11:07.782 * Master replied to PING, replication can continue...
1599:S 21 Feb 09:11:07.782 * Trying a partial resynchronization (request 3f0eb9d8a3e432be3a5a925f91496426b84582eb:240214).
1599:S 21 Feb 09:11:07.783 * Successful partial resynchronization with master.
1599:S 21 Feb 09:11:07.783 # Master replication ID changed to 81c2d503e26da087e4739af6b1d26597b7c7df4c
1599:S 21 Feb 09:11:07.783 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

cleancp

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Redis（5）- Redis哨兵机制

Redis（5）- Redis集群哨兵机制前言一、Redis持久化1、RDB方式2、AOF方式二.Redis主从1 Redis主从配置2 主从复制主从复制原理三、Redis主从拓扑前言主要包括Redis持久化机制，Redis主从搭建配置，Redis同步说明，多种拓扑方式说明一、Redis持久化1、RDB方式1.手动触发save：阻塞当前Redis，一直到Redis持久化完成，如果内存实例太大，造成长时间阻塞，线上不建议使用bgsave：redis进程执行fork操作创建子线程，由子线程
复制链接

扫一扫

专栏目录