文章目录
redis集群
上一个步骤的主从架构无法实现master和slave角色的自动切换,即当master出现redis服务异常、主机断电、磁盘损坏等问题导致master无法使用,而redis高可用无法实现自故障转移(将slave提升为master),需要手动改环境配置才能切换到slave redis服务器,另外也无法横向扩展Redis服务的并行写入性能,当单台Redis服务器性能无法满足业务写入需求的时候就必须需要一种方式解决以上的两个核心问题,即:1.master和slave角色的无缝切换,让业务无感知从而不影响业务使用 2.可以横向动态扩展Redis服务器,从而实现多台服务器并行写入以实现更高并发的目的。
Redis 集群实现方式:客户端分片 代理分片 Redis Cluster
Sentinel(哨兵)
- Sentinel 进程是用于监控redis集群中Master主服务器工作的状态,在Master主服务器发生故障的时候,可以实现Master和Slave服务器的切换,保证系统的高可用,其已经被集成在redis2.6+的版本中,Redis的哨兵模式到了2.8版本之后就稳定了下来。一般在生产环境也建议使用Redis的2.8版本的以后版本。哨兵(Sentinel) 是一个分布式系统,可以在一个架构中运行多个哨兵(sentinel) 进程,这些进程使用流言协议(gossip protocols)来接收关于Master主服务器是否下线的信息,并使用投票协议(Agreement Protocols)来决定是否执行自动故障迁移,以及选择哪个Slave作为新的Master。每个哨兵(Sentinel)进程会向其它哨兵(Sentinel)、Master、Slave定时发送消息,以确认对方是否”活”着,如果发现对方在指定配置时间(可配置的)内未得到回应,则暂时认为对方已掉线,也就是所谓的”主观认为宕机” ,主观是每个成员都具有的独自的而且可能相同也可能不同的意识,英文名称:Subjective Down,简称SDOWN。有主观宕机,肯定就有客观宕机。当“哨兵群”中的多数Sentinel进程在对Master主服务器做出SDOWN 的判断,并且通过 SENTINEL is-master-down-by-addr 命令互相交流之后,得出的Master Server下线判断,这种方式就是“客观宕机”,客观是不依赖于某种意识而已经实际存在的一切事物,英文名称是:ObjectivelyDown, 简称 ODOWN。通过一定的vote算法,从剩下的slave从服务器节点中,选一台提升为Master服务器节点,然后自动修改相关配置,并开启故障转移(failover)。
Sentinel 机制可以解决master和slave角色的切换问题。
配置master-slave
需要手动先指定某一台Redis服务器为master,然后将其他slave服务器使用命令配置为master服务器的slave,哨兵的前提是已经手动实现了一个redis master-slave的运行环境
实现一个主两从基于哨兵的高可用redis架构
服务器1配置slave
127.0.0.1:6379> SLAVEOF 172.222.2.117 6379
OK
127.0.0.1:6379> CONFIG SET masterauth hjq
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:172.222.2.117
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:10598
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:e836b86e581cce2e8fd22d74340b462a85979904
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:10598 #保存上一次的master_replid值,当发生过故障转移后此值会记录当前master的id
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:10557
repl_backlog_histlen:42
服务器1配置slave
127.0.0.1:6379> SLAVEOF 172.222.2.117 6379
OK
127.0.0.1:6379> CONFIG SET masterauth hjq
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:172.222.2.117
master_port:6379
master_link_status:up
master_last_io_seconds_ago:5
master_sync_in_progress:0
slave_repl_offset:10906
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:e836b86e581cce2e8fd22d74340b462a85979904
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:10906
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:29
repl_backlog_histlen:10878
当前master状态
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.222.2.127,port=6379,state=online,offset=10976,lag=0
slave1:ip=172.222.2.107,port=6379,state=online,offset=10976,lag=0
master_replid:e836b86e581cce2e8fd22d74340b462a85979904
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:10990
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:10990
应用程序客户端连接到redis
Redis 官方客户端:https://redis.io/clients
java客户端连接redis是通过Jedis来实现的,java代码用的时候只要创建Jedis对象就可以建多个Jedis连接池来连接redis,应用程序再直接调用连接池即可连接Redis。
而Redis为了保障高可用,服务一般都是Sentinel部署方式,当Redis服务中的主服务挂掉之后,会仲裁出另外一台Slaves服务充当Master。这个时候,我们的应用即使使用了Jedis 连接池,Master服务挂了,我们的应用将还是无法连接新的Master服务,为了解决这个问题, Jedis也提供了相应的Sentinel实现,能够在Redis Sentinel主从切换时候,通知我们的应用,把我们的应用连接到新的Master服务。
Redis Sentinel的使用也是十分简单的,只是在JedisPool中添加了Sentinel和MasterName参数,JRedis Sentinel底层基于Redis订阅实现Redis主从服务的切换通知,当Reids发生主从切换时,Sentinel会发送通知主动通知Jedis进行连接的切换,JedisSentinelPool在每次从连接池中获取链接对象的时候,都要对连接对象进行检测,如果此链接和Sentinel的Master服务连接参数不一致,则会关闭此连接,重新获取新的Jedis连接对象。
编辑配置文件sentinel.conf
Server1 配置
哨兵可以不和redis服务器部署在一起
#cp /usr/local/src/redis-4.0.14/sentinel.conf /apps/redis/etc/
#ll /apps/redis/etc/
total 68
-rw-r--r-- 1 redis redis 58797 Feb 11 15:54 redis.conf
-rw-r--r-- 1 root root 7921 Feb 11 17:39 sentinel.conf
#mkdir /apps/redis/setinel
#tree
.
├── bin
│ ├── redis-benchmark
│ ├── redis-check-aof
│ ├── redis-check-rdb
│ ├── redis-cli
│ ├── redis-sentinel -> redis-server
│ └── redis-server
├── data
│ ├── \ 20200208_22:47:21.rdb
│ ├── \ 20200208.rdb
│ ├── appendonly.aof
│ └── dump.rdb
├── etc
│ ├── redis.conf
│ └── sentinel.conf
├── logs
│ └── redis.log
├── run
└── setinel
├── redis-sentinel.pid
└── sentinel_26379.log
6 directories, 15 files
# grep "^[a-Z]" /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/setinel"
sentinel monitor mymaster 172.222.2.117 6379 2
#法定人数限制(quorum),即有几个slave认为master down了就进行故障转移
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 30000
#(SDOWN)主观下线的时间
sentinel parallel-syncs mymaster 1
#发生故障转移时候同时向新master同步数据的slave数量,数字越小总同步时间越长
sentinel failover-timeout mymaster 180000
#所有slaves指向新的master所需的超时时间
sentinel deny-scripts-reconfig yes #禁止修改脚本
Server2 配置
#cat /apps/redis/etc/sentinel.conf
bind 172.222.2.127
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/setinel"
#sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 172.222.2.117 6379 2
sentinel auth-pass mymaster 123456
Server3 配置
#cat /apps/redis/etc/sentinel.conf
bind 172.222.2.107
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/setinel"
#sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 172.222.2.117 6379 2
sentinel auth-pass mymaster 123456
启动哨兵
三台哨兵都要启动
#redis-sentinel /apps/redis/etc/sentinel.conf
#redis-sentinel /apps/redis/etc/sentinel.conf
#redis-sentinel /apps/redis/etc/sentinel.conf
查看哨兵日志
#tail /apps/redis/setinel/sentinel_26379.log
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
36766:X 11 Feb 18:01:36.697 # Sentinel ID is 2609150da3a3523e4b2ee46f1ae810c8e62b7397
36766:X 11 Feb 18:01:36.697 # +monitor master mymaster 172.222.2.117 6379 quorum 2
36766:X 11 Feb 18:01:36.699 * +slave slave 172.222.2.127:6379 172.222.2.127 6379 @ mymaster 172.222.2.117 6379
36766:X 11 Feb 18:01:36.700 * +slave slave 172.222.2.107:6379 172.222.2.107 6379 @ mymaster 172.222.2.117 6379
36766:X 11 Feb 18:01:37.550 * +sentinel sentinel 05cd41b5f40984f980a35fd42fec4dc067c6def0 172.222.2.117 26379 @ mymaster 172.222.2.117 6379
36766:X 11 Feb 18:01:38.130 * +sentinel sentinel 92433a41d71151323b8ace688d0ba452f289732d 172.222.2.127 26379 @ mymaster 172.222.2.117 6379
查看redis master状态
#redis-cli
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.222.2.127,port=6379,state=online,offset=63958,lag=0
slave1:ip=172.222.2.107,port=6379,state=online,offset=63817,lag=0
master_replid:e836b86e581cce2e8fd22d74340b462a85979904
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:64099
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:64099
127.0.0.1:6379> info sentinel
127.0.0.1:6379> info sentinel
127.0.0.1:6379> exit
查看redis sentinel状态
在sentinel状态中尤其是最后一行,涉及到masterIP是多少,有几个slave,有几个sentinels,必须是符合全部服务器数量的。
#redis-cli -p 26379
127.0.0.1:26379> info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.222.2.117:6379,slaves=2,sentinels=3
停止Redis Master测试故障转移
#systemctl stop redis
查看新的redis集群的master信息
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.222.2.107,port=6379,state=online,offset=925901,lag=1
master_replid:78c2b0b77c2e91cc270678639e16be2adb856f11
master_replid2:e836b86e581cce2e8fd22d74340b462a85979904
master_repl_offset:925915
second_repl_offset:911836
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:29
repl_backlog_histlen:925887
查看哨兵消息
#redis-cli -h 172.222.2.127 -p 26379
172.222.2.127:26379> info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.222.2.127:6379,slaves=2,sentinels=3
查看故障转移时的sentine信息
#tail -f /apps/redis/setinel/sentinel_26379.log
23123:X 11 Feb 19:12:03.157 # +new-epoch 1
23123:X 11 Feb 19:12:03.157 # +try-failover master mymaster 172.222.2.117 6379
23123:X 11 Feb 19:12:03.158 # +vote-for-leader 92433a41d71151323b8ace688d0ba452f289732d 1
23123:X 11 Feb 19:12:03.159 # 05cd41b5f40984f980a35fd42fec4dc067c6def0 voted for 05cd41b5f40984f980a35fd42fec4dc067c6def0 1
23123:X 11 Feb 19:12:03.161 # 2609150da3a3523e4b2ee46f1ae810c8e62b7397 voted for 05cd41b5f40984f980a35fd42fec4dc067c6def0 1
23123:X 11 Feb 19:12:03.862 # +config-update-from sentinel 05cd41b5f40984f980a35fd42fec4dc067c6def0 172.222.2.117 26379 @ mymaster 172.222.2.117 6379
23123:X 11 Feb 19:12:03.862 # +switch-master mymaster 172.222.2.117 6379 172.222.2.127 6379
23123:X 11 Feb 19:12:03.863 * +slave slave 172.222.2.107:6379 172.222.2.107 6379 @ mymaster 172.222.2.127 6379
23123:X 11 Feb 19:12:03.863 * +slave slave 172.222.2.117:6379 172.222.2.117 6379 @ mymaster 172.222.2.127 6379
23123:X 11 Feb 19:12:33.903 # +sdown slave 172.222.2.117:6379 172.222.2.117 6379 @ mymaster 172.222.2.127 6379
故障转移后的redis配置文件
#grep "^[a-Z]" /apps/redis/etc/sentinel.conf
bind 172.222.2.127
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/setinel"
sentinel myid 92433a41d71151323b8ace688d0ba452f289732d
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 172.222.2.127 6379 2 #故障转移后redis.conf中的replicaof行的master IP会被修改
sentinel auth-pass mymaster 123456 #sentinel.conf中的sentinel monitor IP会被修改
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel known-slave mymaster 172.222.2.117 6379
sentinel known-slave mymaster 172.222.2.107 6379
sentinel known-sentinel mymaster 172.222.2.107 26379 2609150da3a3523e4b2ee46f1ae810c8e62b7397
sentinel known-sentinel mymaster 172.222.2.117 26379 05cd41b5f40984f980a35fd42fec4dc067c6def0
sentinel current-epoch 1
查看redis-slave状态
172.222.2.117:6379> info replication
# Replication
role:slave
master_host:172.222.2.127 #故障转移后新的masterIP
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:937547
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:78c2b0b77c2e91cc270678639e16be2adb856f11 #故障以后的当前master_replid
master_replid2:e836b86e581cce2e8fd22d74340b462a85979904 #故障以后上一次的master_replid
master_repl_offset:937547
second_repl_offset:911836
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:10557
repl_backlog_histlen:926991
重启原master主机测试,是否能自动迁移会master状态
#systemctl stop redis
查看sentine日志:不会随着原maser主机重新上线而回归
#tail -f /apps/redis/setinel/sentinel_26379.log
23123:X 11 Feb 19:12:03.157 # +new-epoch 1
23123:X 11 Feb 19:12:03.157 # +try-failover master mymaster 172.222.2.117 6379
23123:X 11 Feb 19:12:03.158 # +vote-for-leader 92433a41d71151323b8ace688d0ba452f289732d 1
23123:X 11 Feb 19:12:03.159 # 05cd41b5f40984f980a35fd42fec4dc067c6def0 voted for 05cd41b5f40984f980a35fd42fec4dc067c6def0 1
23123:X 11 Feb 19:12:03.161 # 2609150da3a3523e4b2ee46f1ae810c8e62b7397 voted for 05cd41b5f40984f980a35fd42fec4dc067c6def0 1
23123:X 11 Feb 19:12:03.862 # +config-update-from sentinel 05cd41b5f40984f980a35fd42fec4dc067c6def0 172.222.2.117 26379 @ mymaster 172.222.2.117 6379
23123:X 11 Feb 19:12:03.862 # +switch-master mymaster 172.222.2.117 6379 172.222.2.127 6379
23123:X 11 Feb 19:12:03.863 * +slave slave 172.222.2.107:6379 172.222.2.107 6379 @ mymaster 172.222.2.127 6379
23123:X 11 Feb 19:12:03.863 * +slave slave 172.222.2.117:6379 172.222.2.117 6379 @ mymaster 172.222.2.127 6379
23123:X 11 Feb 19:12:33.903 # +sdown slave 172.222.2.117:6379 172.222.2.117 6379 @ mymaster 172.222.2.127 6379
23123:X 11 Feb 19:12:38.903 * +convert-to-slave slave 172.222.2.117:6379 172.222.2.117 6379 @ mymaster 172.222.2.127 6379