使用redis5.0.5
服务器 centOS7
容器docker部署服务器
部署部分省略,使用Xshell连接服务器开始部署
我们使用一拖二模式,一个主服务,两个从服务,三个哨兵,当主服务宕机,哨兵会投票从从服务中选一个当做主服务,即使原来的主服务重启,也只能成为从服务,
主服务redis-6379.conf
#监听的IP
bind 127.0.0.1
#监听的端口
port 6379
#不使用守护进程
daemonize no
#设置数据库数量
databases 16
#设置10 秒内如果至少有 2 个 key 的值变化,则保存
save 10 2
#在 dump .rdb 数据库的时候使用 LZF 压缩字符串
rdbcompression yes
#校验rdb文件
rdbchecksum yes
#设置 dump 的文件位置
dbfilename "dump-6379.rdb"
#工作目录
dir "/root/redis-5.0.5/data"
#Redis异步将数据集转储到磁盘上
appendonly yes
#附加文件的名称
appendfilename "appendonly-6379.aof"
从服务
redis-6380.conf
port 6381
daemonize no
dir "/root/redis-5.0.5/data"
#配置主服务的ip和端口号
slaveof 127.0.0.1 6379
redis-6381.conf
port 6380
daemonize no
dir "/root/redis-5.0.5/data"
#配置主服务的ip和端口号
slaveof 127.0.0.1 6379
三个哨兵
sentinel-26379.conf
port 26379
dir "/root/redis-5.0.5/data"
#检测的主服务名称是mymaster,IP和端口127.0.0.1:6379,当两个哨兵认为主服务挂了,就判定主服务挂了
sentinel monitor mymaster 127.0.0.1 6379 2
#判断连接超过30秒无响应就判断挂了
sentinel down-after-milliseconds mymaster 30000
#设置从服务同步数据事只允许一个服务一个服务的同步
sentinel parallel-syncs mymaster 1
#同步时间超过3分钟判定同步失败
sentinel failover-timeout mymaster 180000
sentinel-26380.conf
port 26380
dir "/root/redis-5.0.5/data"
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel-26381.conf
port 26381
dir "/root/redis-5.0.5/data"
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
启动顺序,我们按照先启动主服务器,后启动一个哨兵,再启动从服务,最后启动剩下的哨兵
这里开启九个窗口
我们按上面的顺序启动
1、主服务
[root@c5a48c86a09c src]# ./redis-server ../conf/redis-6379.conf
2、哨兵1
[root@c5a48c86a09c src]# ./redis-sentinel ../conf/sentinel-26379.conf
378:X 24 Dec 2019 15:34:57.343 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
378:X 24 Dec 2019 15:34:57.343 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=378, just started
378:X 24 Dec 2019 15:34:57.343 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 5.0.5 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 378
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
378:X 24 Dec 2019 15:34:57.347 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
378:X 24 Dec 2019 15:34:57.351 # Sentinel ID is 37cd4c72d5ebf2b09a9b3d6e2da15d63e6e0fb2b
378:X 24 Dec 2019 15:34:57.351 # +monitor master mymaster 127.0.0.1 6379 quorum 2
我们可以从最后一行日志看到它检测到了我们的主服务了
3、哨兵2
[root@c5a48c86a09c src]# ./redis-sentinel ../conf/sentinel-26380.conf
390:X 24 Dec 2019 15:37:54.234 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
390:X 24 Dec 2019 15:37:54.234 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=390, just started
390:X 24 Dec 2019 15:37:54.234 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 5.0.5 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26380
| `-._ `._ / _.-' | PID: 390
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
390:X 24 Dec 2019 15:37:54.235 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
390:X 24 Dec 2019 15:37:54.286 # Sentinel ID is 37384e489ad3e40481c708453ade40784ae3ddec
390:X 24 Dec 2019 15:37:54.286 # +monitor master mymaster 127.0.0.1 6379 quorum 2
390:X 24 Dec 2019 15:37:54.540 * +sentinel sentinel 37cd4c72d5ebf2b09a9b3d6e2da15d63e6e0fb2b 127.0.0.1 26379 @ mymaster 127.0.0.1 6379
我们可以看一下日志,首先哨兵建立了自己,并获得了自己的Sentinel ID,接着检测到了主服务,然后发现了哨兵1,我们此时再去看哨兵1的日志,发现它也检测到了哨兵2,这种行为是相互的
4、从服务1
[root@c5a48c86a09c src]# ./redis-server ../conf/redis-6380.conf
我们可以看到如下三条日志
主服务
364:M 24 Dec 2019 15:46:46.158 * Synchronization with replica 127.0.0.1:6380 succeeded
我们看到主服务知道了有从服务连接上了自己
哨兵1和哨兵2 都有日志
378:X 24 Dec 2019 15:46:50.121 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
知道了添加了一个从服务,并且和主服务连接上了
5、从服务2
[root@c5a48c86a09c src]# ./redis-server ../conf/redis-6381.conf
同理看日志我们依旧可以知道他们互相检测到了彼此
6、哨兵3
[root@c5a48c86a09c src]# ./redis-sentinel ../conf/sentinel-26381.conf
429:X 24 Dec 2019 15:55:15.091 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
429:X 24 Dec 2019 15:55:15.091 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=429, just started
429:X 24 Dec 2019 15:55:15.091 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 5.0.5 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26381
| `-._ `._ / _.-' | PID: 429
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
429:X 24 Dec 2019 15:55:15.092 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
429:X 24 Dec 2019 15:55:15.142 # Sentinel ID is a53e4a6605a0488f3dc46692086df069cae21a84
429:X 24 Dec 2019 15:55:15.142 # +monitor master mymaster 127.0.0.1 6379 quorum 2
429:X 24 Dec 2019 15:55:15.143 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 15:55:15.145 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 15:55:15.531 * +sentinel sentinel 37384e489ad3e40481c708453ade40784ae3ddec 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 15:55:15.822 * +sentinel sentinel 37cd4c72d5ebf2b09a9b3d6e2da15d63e6e0fb2b 127.0.0.1 26379 @ mymaster 127.0.0.1 6379
同理三个哨兵也都知道了彼此,同时它还检测到了主从结果
这时我们关掉主服务
[root@c5a48c86a09c ~]# ps -ef | grep -i redis
root 364 126 0 15:32 pts/2 00:00:05 ./redis-server 127.0.0.1:6379
root 378 180 0 15:34 pts/4 00:00:07 ./redis-sentinel *:26379 [sentinel]
root 390 144 0 15:37 pts/1 00:00:06 ./redis-sentinel *:26380 [sentinel]
root 402 162 0 15:46 pts/3 00:00:02 ./redis-server *:6380
root 417 198 0 15:52 pts/5 00:00:01 ./redis-server *:6381
root 429 216 0 15:55 pts/6 00:00:01 ./redis-sentinel *:26381 [sentinel]
root 434 234 0 15:59 pts/7 00:00:00 grep --color=auto -i redis
[root@c5a48c86a09c ~]# kill -9 364
[root@c5a48c86a09c ~]# ps -ef | grep -i redis
root 378 180 0 15:34 pts/4 00:00:07 ./redis-sentinel *:26379 [sentinel]
root 390 144 0 15:37 pts/1 00:00:07 ./redis-sentinel *:26380 [sentinel]
root 402 162 0 15:46 pts/3 00:00:02 ./redis-server *:6380
root 417 198 0 15:52 pts/5 00:00:01 ./redis-server *:6381
root 429 216 0 15:55 pts/6 00:00:02 ./redis-sentinel *:26381 [sentinel]
root 436 234 0 16:00 pts/7 00:00:00 grep --color=auto -i redis
这样我们就可以看到我们关了主服务,此时主服务报进程关闭
两个从服务一直报警告然后恢复正常,
我们看哨兵1的新日志
378:X 24 Dec 2019 16:00:50.026 # +sdown master mymaster 127.0.0.1 6379
378:X 24 Dec 2019 16:00:50.200 # +new-epoch 1
378:X 24 Dec 2019 16:00:50.204 # +vote-for-leader a53e4a6605a0488f3dc46692086df069cae21a84 1
378:X 24 Dec 2019 16:00:51.132 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
378:X 24 Dec 2019 16:00:51.132 # Next failover delay: I will not start a failover before Tue Dec 24 16:06:51 2019
378:X 24 Dec 2019 16:00:51.381 # +config-update-from sentinel a53e4a6605a0488f3dc46692086df069cae21a84 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
378:X 24 Dec 2019 16:00:51.381 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
378:X 24 Dec 2019 16:00:51.381 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
378:X 24 Dec 2019 16:00:51.382 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
378:X 24 Dec 2019 16:01:21.467 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
我们可以看到它主观发现原来6379的主服务挂了,然后哨兵2的日志
390:X 24 Dec 2019 16:00:50.072 # +sdown master mymaster 127.0.0.1 6379
390:X 24 Dec 2019 16:00:50.144 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
390:X 24 Dec 2019 16:00:50.144 # +new-epoch 1
390:X 24 Dec 2019 16:00:50.144 # +try-failover master mymaster 127.0.0.1 6379
390:X 24 Dec 2019 16:00:50.195 # +vote-for-leader 37384e489ad3e40481c708453ade40784ae3ddec 1
390:X 24 Dec 2019 16:00:50.197 # a53e4a6605a0488f3dc46692086df069cae21a84 voted for a53e4a6605a0488f3dc46692086df069cae21a84 1
390:X 24 Dec 2019 16:00:50.204 # 37cd4c72d5ebf2b09a9b3d6e2da15d63e6e0fb2b voted for a53e4a6605a0488f3dc46692086df069cae21a84 1
390:X 24 Dec 2019 16:00:51.383 # +config-update-from sentinel a53e4a6605a0488f3dc46692086df069cae21a84 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
390:X 24 Dec 2019 16:00:51.383 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
390:X 24 Dec 2019 16:00:51.383 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
390:X 24 Dec 2019 16:00:51.383 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
390:X 24 Dec 2019 16:01:21.417 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
它也主观发现主服务挂了
我们再去看哨兵3的日志
429:X 24 Dec 2019 16:00:50.080 # +sdown master mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:50.156 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
429:X 24 Dec 2019 16:00:50.156 # +new-epoch 1
429:X 24 Dec 2019 16:00:50.156 # +try-failover master mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:50.195 # +vote-for-leader a53e4a6605a0488f3dc46692086df069cae21a84 1
429:X 24 Dec 2019 16:00:50.197 # 37384e489ad3e40481c708453ade40784ae3ddec voted for 37384e489ad3e40481c708453ade40784ae3ddec 1
429:X 24 Dec 2019 16:00:50.204 # 37cd4c72d5ebf2b09a9b3d6e2da15d63e6e0fb2b voted for a53e4a6605a0488f3dc46692086df069cae21a84 1
429:X 24 Dec 2019 16:00:50.251 # +elected-leader master mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:50.251 # +failover-state-select-slave master mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:50.323 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:50.323 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:50.378 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:51.349 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:51.349 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:51.379 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:52.331 # -odown master mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:52.432 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:52.432 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:52.498 # +failover-end master mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:52.498 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
429:X 24 Dec 2019 16:00:52.499 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
429:X 24 Dec 2019 16:00:52.499 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
429:X 24 Dec 2019 16:01:22.503 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
它也发现主服务挂了
此时三个哨兵开始交流了,得知都认为主观挂了,于是得出结论,主服务客观挂了,于是他们又开始交流进行投票,并告诉其他哨兵自己的投票结果,最后哨兵3最先得知投票结果,于是他成了领导者,然后进入故障转移阶段,现在着重观察哨兵3 的日志,首先明确转移故障的服务器是6379,然后选择6381作为代替的主服务器命令6381执行slaveof no one
,使其成为主服务器,然后等待和确认6381升级到主服务,然后我们看从服务2开启了MASTER模式,修改自己的配置文件,升级到主服务器
17:M 24 Dec 2019 16:00:50.379 # Setting secondary replication ID to bf8d230ef4d260d36ba5173bab1ccd93e7857bde, valid up to offset: 127109. New replication ID is cffee72c703a90a1d28d82556b7d9b763c29c206
417:M 24 Dec 2019 16:00:50.379 * Discarding previously cached master state.
417:M 24 Dec 2019 16:00:50.379 * MASTER MODE enabled (user request from 'id=8 addr=127.0.0.1:52768 fd=13 name=sentinel-a53e4a66-cmd age=335 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=140 qbuf-free=32628 obl=36 oll=0 omem=0 events=r cmd=exec')
417:M 24 Dec 2019 16:00:50.380 # CONFIG REWRITE executed with success.
417:M 24 Dec 2019 16:00:51.757 * Replica 127.0.0.1:6380 asks for synchronization
417:M 24 Dec 2019 16:00:51.757 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 422 bytes of backlog starting from offset 127109.
回头再看哨兵3
日志:
429:X 24 Dec 2019 16:00:51.349 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:51.379 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
进入重新配置从节点阶段,并要求6380复制新的主服务
此时我们看从服务1,我们看到此时开始重新配置自己的文件,并异步增量同步新的主服务数据
402:S 24 Dec 2019 16:00:51.380 * REPLICAOF 127.0.0.1:6381 enabled (user request from 'id=8 addr=127.0.0.1:36114 fd=13 name=sentinel-a53e4a66-cmd age=336 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=281 qbuf-free=32487 obl=36 oll=0 omem=0 events=r cmd=exec')
402:S 24 Dec 2019 16:00:51.380 # CONFIG REWRITE executed with success.
402:S 24 Dec 2019 16:00:51.756 * Connecting to MASTER 127.0.0.1:6381
402:S 24 Dec 2019 16:00:51.756 * MASTER <-> REPLICA sync started
402:S 24 Dec 2019 16:00:51.756 * Non blocking connect for SYNC fired the event.
402:S 24 Dec 2019 16:00:51.757 * Master replied to PING, replication can continue...
402:S 24 Dec 2019 16:00:51.757 * Trying a partial resynchronization (request bf8d230ef4d260d36ba5173bab1ccd93e7857bde:127109).
402:S 24 Dec 2019 16:00:51.758 * Successful partial resynchronization with master.
402:S 24 Dec 2019 16:00:51.758 # Master replication ID changed to cffee72c703a90a1d28d82556b7d9b763c29c206
402:S 24 Dec 2019 16:00:51.758 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
此时我们可以看到哨兵1和哨兵2 也在修改自己的配置文件
378:X 24 Dec 2019 16:00:51.381 # +config-update-from sentinel a53e4a6605a0488f3dc46692086df069cae21a84 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
再看哨兵3,数据同步尚未完成,然后数据同步完成,最后结束故障转移操作
429:X 24 Dec 2019 16:00:52.432 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:52.432 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
429:X 24 Dec 2019 16:00:52.498 # +failover-end master mymaster 127.0.0.1 6379
然后通知切换主服务
429:X 24 Dec 2019 16:00:52.498 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
然后检测关联将从服务器和新的主服务联系起来,原来的主服务,变成从服务和现在的主服务关联
429:X 24 Dec 2019 16:00:52.499 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
429:X 24 Dec 2019 16:00:52.499 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
最后再确认原来的主服务主观下线
429:X 24 Dec 2019 16:01:22.503 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
此时完成了主从服务器的切换,这就是一个简单的主从结构的哨兵模式的工作过程。