一、概念
1、使用哨兵模式的目的
为了解决Redis的主从复制不支持高可用的性能,Redis实现了Sentinel哨兵机制解决方案;
2、什么是哨兵模式
由一个或多个Sentinel去监听任意多个主服务以及主服务器下的所有从服务器,并在被监视的主服务器进入下线状态时,自动将下线的主服务器属下的某个从服务器升级为新的主服务器,然后由新的主服务器代替已经下线的从服务器,并且Sentinel可以互相监视。
3、哨兵模式如何工作的
(1)每个Sentinel(哨兵)进程以每秒钟一次的频率向整个集群中的Master主服务器,Slave从服务器以及其他Sentinel(哨兵)进程发送一个 PING 命令。
(2)如果一个实例(instance)距离最后一次有效回复 PING 命令的时间超过 down-after-milliseconds 选项所指定的值, 则这个实例会被 Sentinel(哨兵)进程标记为主观下线(SDOWN)
(3)如果一个Master主服务器被标记为主观下线(SDOWN),则正在监视这个Master主服务器的所有 Sentinel(哨兵)进程要以每秒一次的频率确认Master主服务器的确进入了主观下线状态
(4)当有足够数量的 Sentinel(哨兵)进程(大于等于配置文件指定的值)在指定的时间范围内确认Master主服务器进入了主观下线状态(SDOWN), 则Master主服务器会被标记为客观下线(ODOWN)
(5)在一般情况下, 每个 Sentinel(哨兵)进程会以每 10 秒一次的频率向集群中的所有Master主服务器、Slave从服务器发送 INFO 命令。
(6)当Master主服务器被 Sentinel(哨兵)进程标记为客观下线(ODOWN)时,Sentinel(哨兵)进程向下线的 Master主服务器的所有 Slave从服务器发送 INFO 命令的频率会从 10 秒一次改为每秒一次。
(7)若没有足够数量的 Sentinel(哨兵)进程同意 Master主服务器下线, Master主服务器的客观下线状态就会被移除。若 Master主服务器重新向 Sentinel(哨兵)进程发送 PING 命令返回有效回复,Master主服务器的主观下线状态就会被移除。
4、哨兵模式的优点
(1)哨兵模式是基于主从复制模式的,所有主从复制的优点,哨兵模式都具有。
(2)主从可以自动切换,系统更健壮,可用性更高。
5、哨兵模式的缺点
Redis较难支持在线扩容,在集群容量达到上限时在线扩容会变得很复杂。
二、单哨兵redis集群搭建
1、节点介绍
master(192.168.xxx.21): 默认主服务
slaves1(192.168.xxx.22): 默认从服务
slaves2(192.168.xxx.23): 默认从服务
2、master的redis.conf主要配置
daemonize yes
port 6379
bind 192.168.xxx.21
requirepass "123456"
3、master的sentinel.conf主要配置
port 26379
sentinel monitor mymaster 192.168.xxx.23 6379 1
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
说明:
sentinel monitor master-group-name hostname port quorum:
quorum指的是至少多少个哨兵要一致同意,master进程挂掉了,或者slave进程挂掉了,或者要启动一个故障转移操作 ;
down-after-milliseconds:
超过多少毫秒跟一个redis实例断了连接,哨兵就可能认为这个redis实例挂了 。
parallel-syncs:
新的master被切换之后,同时有多少个slave被切换到去连接新master,重新做同步,数字越低,花费的时间越多。假设redis是1个master,4个slave,然后master宕机了,4个slave中有1个切换成了master,剩下3个slave就要挂到新的master上面去,这个时候:
如果parallel-syncs是1,那么3个slave,一个一个地挂接到新的master上面去,1个挂接完,而且从新的master sync完数据之后,再挂接下一个;
如果parallel-syncs是3,那么一次性就会把所有slave挂接到新的master上去 。
failover-timeout:
执行故障转移的timeout超时时长。
sentinel auth-pass:
是设置主节点的密码。
4、slaves1的redis.conf的主要配置
daemonize yes
port 6379
bind 192.168.xxx.22
requirepass 123456
#连接主机和端口号
slaveof master 6379
#设置连接的主机密码
masterauth 123456
5、slaves2的redis.conf的主要配置
daemonize yes
port 6379
bind 192.168.xxx.23
requirepass 123456
#连接主机和端口号
slaveof master 6379
#设置连接的主机密码
masterauth 123456
6、在master启动redis和哨兵
[root@master bin]# ./redis-server ./redis.conf
[root@master bin]# ./redis-cli -h 192.168.xxx.21 -a 123456
[root@master bin]# ./redis-sentinel /opt/softWare/redis3.0/redis-3.0.0/sentinel.conf
[root@master bin]# ./redis-sentinel /opt/softWare/redis3.0/redis-3.0.0/sentinel.conf
7325:X 27 Mar 15:24:00.102 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.0.0 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 7325
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
7325:X 27 Mar 15:24:00.103 # Sentinel runid is 64976baecdc6b9f5d39ee3de67ea124ba8441551
7325:X 27 Mar 15:24:00.103 # +monitor master mymaster 192.168.xxx.21 6379 quorum 1
7、分别启动slaves1、slaves2的redis从服务
[root@slaves1 bin]# ./redis-server ./redis.conf
[root@slaves1 bin]# ./redis-cli -h 192.168.xxx.22 -a 123456
[root@slaves2 bin]# ./redis-server ./redis.conf
[root@slaves2 bin]# ./redis-cli -h 192.168.xxx.23 -a 123456
8、角色验证
192.168.xxx.21:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.xxx.22,port=6379,state=online,offset=9381,lag=1
slave1:ip=192.168.xxx.23,port=6379,state=online,offset=9381,lag=1
master_repl_offset:9524
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:9523
192.168.xxx.22:6379> info replication
# Replication
role:slave
master_host:192.168.xxx.21
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:10253
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
192.168.xxx.23:6379> info replication
# Replication
role:slave
master_host:192.168.xxx.21
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:10696
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
9、模拟slaves1的redis服务挂掉
[root@slaves1 ~]# netstat -lnp | grep 6379
tcp 0 0 192.168.xxx.22:6379 0.0.0.0:* LISTEN 6658/./redis-server
[root@slaves1 ~]# kill -9 6658
[root@slaves1 ~]# netstat -lnp | grep 6379
哨兵输出:
7325:X 27 Mar 15:32:09.916 # +sdown slave 192.168.xxx.22:6379 192.168.xxx.22 6379 @ mymaster 192.168.xxx.21 6379
此时:
192.168.xxx.21:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.xxx.23,port=6379,state=online,offset=32280,lag=0
master_repl_offset:32280
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:32279
192.168.xxx.23:6379> info replication
# Replication
role:slave
master_host:192.168.xxx.21
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:33881
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
10、将slaves1的redis服务回复正常
[root@slaves1 bin]# ./redis-server ./redis.conf
[root@slaves1 bin]# ./redis-cli -h 192.168.xxx.22 -a 123456
哨兵输出:
7325:X 27 Mar 15:34:16.091 * +reboot slave 192.168.xxx.22:6379 192.168.xxx.22 6379 @ mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:34:16.162 # -sdown slave 192.168.xxx.22:6379 192.168.xxx.22 6379 @ mymaster 192.168.xxx.21 6379
此时:
192.168.xxx.21:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.xxx.23,port=6379,state=online,offset=40622,lag=0
slave1:ip=192.168.xxx.22,port=6379,state=online,offset=40622,lag=0
master_repl_offset:40765
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:40764
11、模拟master的redis主服务挂掉
[root@master ~]# netstat -lnp | grep 6379
tcp 0 0 0.0.0.0:26379 0.0.0.0:* LISTEN 7325/./redis-sentin
tcp 0 0 192.168.xxx.21:6379 0.0.0.0:* LISTEN 7294/./redis-server
tcp6 0 0 :::26379 :::* LISTEN 7325/./redis-sentin
[root@master ~]# kill -9 7294
[root@master ~]# netstat -lnp | grep 6379
tcp 0 0 0.0.0.0:26379 0.0.0.0:* LISTEN 7325/./redis-sentin
tcp6 0 0 :::26379 :::* LISTEN 7325/./redis-sentin
7325:X 27 Mar 15:36:45.697 # +sdown master mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:45.697 # +odown master mymaster 192.168.xxx.21 6379 #quorum 1/1
7325:X 27 Mar 15:36:45.697 # +new-epoch 1
7325:X 27 Mar 15:36:45.697 # +try-failover master mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:45.698 # +vote-for-leader 64976baecdc6b9f5d39ee3de67ea124ba8441551 1
7325:X 27 Mar 15:36:45.698 # +elected-leader master mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:45.698 # +failover-state-select-slave master mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:45.789 # +selected-slave slave 192.168.xxx.22:6379 192.168.xxx.22 6379 @ mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:45.789 * +failover-state-send-slaveof-noone slave 192.168.xxx.22:6379 192.168.xxx.22 6379 @ mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:45.866 * +failover-state-wait-promotion slave 192.168.xxx.22:6379 192.168.xxx.22 6379 @ mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:46.735 # +promoted-slave slave 192.168.xxx.22:6379 192.168.xxx.22 6379 @ mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:46.735 # +failover-state-reconf-slaves master mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:46.794 * +slave-reconf-sent slave 192.168.xxx.23:6379 192.168.xxx.23 6379 @ mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:47.785 * +slave-reconf-inprog slave 192.168.xxx.23:6379 192.168.xxx.23 6379 @ mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:47.785 * +slave-reconf-done slave 192.168.xxx.23:6379 192.168.xxx.23 6379 @ mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:47.884 # +failover-end master mymaster 192.168.xxx.21 6379
7325:X 27 Mar 15:36:47.884 # +switch-master mymaster 192.168.xxx.21 6379 192.168.xxx.22 6379
7325:X 27 Mar 15:36:47.885 * +slave slave 192.168.xxx.23:6379 192.168.xxx.23 6379 @ mymaster 192.168.xxx.22 6379
7325:X 27 Mar 15:36:47.888 * +slave slave 192.168.xxx.21:6379 192.168.xxx.21 6379 @ mymaster 192.168.xxx.22 6379
7325:X 27 Mar 15:37:17.925 # +sdown slave 192.168.xxx.21:6379 192.168.xxx.21 6379 @ mymaster 192.168.xxx.22 6379
由输出日志可知:主服务挂掉,已经将从服务的xxx.22选为主服务;
此时:
192.168.xxx.22:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.xxx.23,port=6379,state=online,offset=2068,lag=1
master_repl_offset:2068
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:2067
192.168.xxx.23:6379> info replication
# Replication
role:slave
master_host:192.168.xxx.22
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:8929
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
12、再将master上的redis服务重新起来
[root@master bin]# ./redis-server ./redis.conf
[root@master bin]# ./redis-cli -h 192.168.xxx.21 -a 123456
哨兵输出:
7325:X 27 Mar 15:40:01.285 * +convert-to-slave slave 192.168.xxx.21:6379 192.168.xxx.21 6379 @ mymaster 192.168.xxx.22 6379
此时master上的redis服务已经变为了从服务:
192.168.xxx.21:6379> info replication
# Replication
role:slave
master_host:192.168.xxx.22
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:1585294864
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
192.168.xxx.22:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.xxx.23,port=6379,state=online,offset=21765,lag=0
master_repl_offset:21908
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:21907
192.168.xxx.23:6379> info replication
# Replication
role:slave
master_host:192.168.xxx.22
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:23523
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0