玩转Redis高可用 - 哨兵（Sentinel）模式

最新推荐文章于 2024-04-07 16:56:54 发布

尘世间一名迷途小码农

最新推荐文章于 2024-04-07 16:56:54 发布

阅读量1.5k

点赞数 1

分类专栏：中间件 Java Redis 文章标签： redis jedis java

本文链接：https://blog.csdn.net/justyman/article/details/106582868

版权

Java 同时被 3 个专栏收录

10 篇文章 0 订阅

订阅专栏

Redis

4 篇文章 1 订阅

订阅专栏

中间件

3 篇文章 0 订阅

订阅专栏

一、前言

本人所负责的团队里也广泛使用Redis，但是集群模式也有很多种，有三主三从，有一主两从三哨兵等。具体部署方式根据业务系统的重要程度，但记住哨兵尽量是奇数个，因为要避免发生脑裂。

三主三从这个确实可以最完整的保证数据的完整性，但是所需要的服务器资源也是最多的。在一般情况，统筹兼顾数据完整性和方案经济性，一般最优解是采用一主两从三哨兵的模式。本着总结是为了更好的运用，这里就大致整理一下之前项目中使用的Redis哨兵模式（Sentinel）的搭建方式，并演示一下数据如何同步，故障如何自动转移。

二、何为哨兵模式

首先，哨兵模式是一种特殊的模式，它是Redis高可用的一种实现方案。首先哨兵是一个独立的进程，可以实现对Redis实例的监控、通知、自动故障转移。具体如下：

监控(Monitoring) ：哨兵(sentinel) 会不断地检查你的Master和Slave是否运作正常。

提醒(Notification) ：当被监控的某个Redis出现问题时, 哨兵(sentinel) 可以通过 API 向管理员或者其他应用程序发送通知。

自动故障迁移(Automatic failover) ：当一个Master不能正常工作时，哨兵(sentinel) 会开始一次自动故障迁移操作,它会将失效Master的其中一个Slave升级为新的Master, 并让失效Master的其他Slave改为复制新的Master; 当客户端试图连接失效的Master时,集群也会向客户端返回新Master的地址,使得集群可以使用Master代替失效Master。

这里先发个拓扑图可能会更加容易理解。

实际上，每个哨兵节点每秒通过ping去进行心跳监测（包括所有redis实例和sentinel同伴），并根据回复判断节点是否在线。

如果某个sentinel线程发现主库没有在给定时间（ down-after-milliseconds）内响应这个PING，则这个sentinel线程认为主库是不可用的，这种情况叫 “主观失效”（即SDOWN）；这种情况一般不会引起马上的故障自动转移，但是当多个sentinel线程确实发现主库是不可用并超过sentinel.conf里面的配置项sentinel monitor mymaster {#ip} {#port} {#number}中的#number时候（这里实际上采用了流言协议），一般其余sentinel线程会通过RAFT算法推举领导的sentinel线程负责主库的客观下线并同时负责故障自动转移，这种情况叫 “客观失效”（即ODOWN）。

*** 流言协议【Gossip Protocol】，具体的流言协议请参考搜狐的一篇文章 https://www.sohu.com/a/302002994_467784

三、Redis环境搭建

1、安装gcc依赖

gcc是什么？gcc实际上就是linux下面的多语言编译器（以前一开始只支持C），所以它的全称也叫GNU Compiler Collections。例如你下载个Redis等都是需要进行编译安装。废话不多说，先开始安装吧。

#先检查你的centos之前有没有安装过gcc。如有有以下echo即证明已安装。

[root@centos7a local]# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

#如果没有，就通过yum安装吧，具体命令如下：
yum install gcc

2、下载redis安装包

1、这里以5.0.8为例，首先下载redis包

[root@centos7a local]# wget http://download.redis.io/releases/redis-5.0.8.tar.gz
--2020-06-05 20:48:59--  http://download.redis.io/releases/redis-5.0.8.tar.gz
Resolving download.redis.io (download.redis.io)... 109.74.203.151
Connecting to download.redis.io (download.redis.io)|109.74.203.151|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1985757 (1.9M) [application/x-gzip]
Saving to: ‘redis-5.0.8.tar.gz’
100%[===================================================================================================================================================>] 1,985,757   30.2KB/s   in 88s
2020-06-05 20:50:27 (22.1 KB/s) - ‘redis-5.0.8.tar.gz’ saved [1985757/1985757]

2、解压

[root@centos7a local]# tar -zxvf redis-5.0.8.tar.gz

#注意

如果gcc版本过低，在安装redis 6.0.0以上会报这类错误 => make[1]: *** [server.o] Error 1。一般的解决方案：1）升级gcc版本；2）安装低版本的redis；

博主懒得升级gcc，因此选择后者，直接安装5.0.8版本，一路通关无障碍。

3、编译安装

#1、进入安装目录

[root@centos7a local]# cd redis-5.0.8/

[root@centos7a redis-5.0.8]# pwd

/usr/local/redis-5.0.8

#2、开始编译安装

[root@centos7a redis-5.0.8]# make MALLOC=libc

cd src && make all

make[1]: Entering directory `/usr/local/redis-5.0.8/src'

CC Makefile.dep

.....

.....

INSTALL redis-check-aof

Hint: It's a good idea to run 'make test' ;)

make[1]: Leaving directory `/usr/local/redis-5.0.8/src'

4、修改配置（/usr/local/redis-5.0.8/redis.conf）

#1、redis默认不是以线程守护的方式运行的，如需要调整至线程守护的方式，请把

daemonize no => daemonize yes

#2、把默认端口6379改为其他

port 6379 => port 9500

#3、bind会限制能够访问redis的地址，默认（127.0.0.1）是指本地才能访问。如果要放开redis，如要搭建redis集群的话，要把bind注释掉；同时要把protected-mode从yes改为no

#bind 127.0.0.1

protected-mode yes => protected-mode no

#4、把PID文件名改成跟修改后的PORT一致

pidfile /var/run/redis_9500.pid

#5、设定日志文件路径

logfile "/var/run/redis/redis.log"

5、通过systemd进行服务启停管理

#1、在/lib/systemd/system/目录下新建文件redis.service，并加入以下内容并保存

[Unit]

Description=Redis

After=network.target

[Service]

ExecStart=/usr/local/redis-5.0.8/src/redis-server /usr/local/redis-5.0.8/redis.conf --daemonize no

ExecStop=/usr/local/redis-5.0.8/src/redis-cli -h 127.0.0.1 -p 9500 shutdown

[Install]

WantedBy=multi-user.target

#2、刷新配置

systemctl daemon-reload

#3、直接就可以使用systemctl进行start/stop/restart

systemctl start redis.service //启动

systemctl restart redis.service //重启

systemctl stop redis.service //关停

systemctl enable redis.service //设置成开机启动

只要在日志文件中看到如下日志，那就代表redis已经启动了。

6、测试redis数据库

# 1、从下面的命令可以看到Redis已经启动并提供服务

[root@centos7a system]# netstat -tunlp|grep redis

tcp 0 0 0.0.0.0:9500 0.0.0.0:* LISTEN 10331/redis-server

tcp6 0 0 :::9500 :::* LISTEN 10331/redis-server

# 2、对Redis进行一系列操作

[root@centos7a src]# ./redis-cli -h 127.0.0.1 -p 9500

127.0.0.1:9500> SET K1 LIWENJIE

OK

127.0.0.1:9500> KEYS *

1) "K1"

2) "K3"

3) "K2"

127.0.0.1:9500> DEL K1

(integer) 1

127.0.0.1:9500> EXPIRE K2 1000

(integer) 1

127.0.0.1:9500> TTL K2

(integer) 997

127.0.0.1:9500> PERSIST K2

(integer) 1

127.0.0.1:9500> TTL K2

(integer) -1

7、从以上6个步骤来看，一台Redis的搭建是不是特别容易就搞定呢？接下来的就是重复上面的步骤搭建另外两个Redis。

四、哨兵模式设置

1、这里我们采用的一主两从三哨兵的模式进行介绍，下面列出相应的服务器信息。

实例	IP	端口	备注
Redis（主）	192.168.31.168	9500
Redis（从）	192.168.31.181	9501
Redis（从）	192.168.31.129	9502
Sentinel（1）	192.168.31.168	26379	默认端口
Sentinel（2）	192.168.31.181	26379	默认端口
Sentinel（3）	192.168.31.129	26379	默认端口

2、首先，修改每个redis服务器对应的redis.conf文件（如下）。

# 1、首先，把IP绑定改为绑定对应服务器内网IP. （三个redis实例的都要修改）

# 2、因为这里是以192.168.31.168这台作为主库，因此只需要修改181和129这两台从库的以下配置项（只需要修改两个从redis实例）

# 这里请注意，5.0版本之前是使用slaveof，5.0版本之后的配置使用replicaof，但是因为向下兼容的原则，就算你在5.0的版本中使用slaveof也不会有问题，但一般建议有新的就用新的吧，否则某天如果突然不支持旧的slaveof就GG了。

replicaof 192.168.31.168 9500

3、修改每个sentinel.conf（如下）

[root@centos7a redis-5.0.8]# grep -Ev "^$|#" sentinel.conf #先grep出来展示有什么属性项

port 26379

daemonize no # 是否后台运行

pidfile /var/run/redis-sentinel.pid # 运行时的pid文件

logfile "" # sentinel日志文件

dir /tmp   # 工作目录

sentinel monitor mymaster 127.0.0.1 6379 2   # 这里定义主库的IP和端口，还有最后的2表示要达到2台sentinel认同才认为主库已经挂掉（即客观失效），后面科普

sentinel down-after-milliseconds mymaster 30000 # 主库在30000毫秒（即30秒）内没有反应就认为主库挂掉（即主观失效）

sentinel parallel-syncs mymaster 1 # 若新主库当选后，允许最大可以同时从新主库同步数据的从库数

sentinel failover-timeout mymaster 180000   # 若在指定时间（即180000毫秒，即180秒）内没有实现故障转移，则会自动再发起一次

sentinel deny-scripts-reconfig yes

# 1、修改日志文件

logfile "/var/log/sentinel.log"

# 2、修改主库的IP和端口

sentinel monitor mymaster 192.168.31.168 9500 2

# 3、设置成后台运行

daemonize yes

# 4、其他如无特别就不用修改

4、把redis-sentinel设置成systemd启动

#1、在/lib/systemd/system/目录下新建文件redis-sentinel.service，并加入以下内容并保存

[Unit]

Description=Redis

After=network.target

[Service]

ExecStart=/usr/local/redis-5.0.8/src/redis-sentinel /usr/local/redis-5.0.8/sentinel.conf –sentinel

ExecStop=/usr/local/redis-5.0.8/src/redis-cli -p 26379 shutdown

[Install]

WantedBy=multi-user.target

#2、刷新配置

systemctl daemon-reload

5、使用以下命令启动Redis和Sentinel，具体顺序为：Redis（主）-> Redis（从）->Redis（从）->Sentinel（1）->Sentinel（2）->Sentinel（3）

Redis： systemctl start redis.service

Sentinel： ./redis-sentinel /usr/local/redis-5.0.8/sentinel.conf

启动完后，通过以下命令可以确认主从关系已经确立。

[root@centos7a src]# ./redis-cli -h 192.168.31.168 -p 9500

192.168.31.168:9500> info replication

# Replication

role:master

connected_slaves:2

slave0:ip=192.168.31.181,port=9501,state=online,offset=2235966,lag=1

slave1:ip=192.168.31.129,port=9502,state=online,offset=2235966,lag=1

master_replid:1df20ecc0df92307ef811e9bd81cc09e131725c7

master_replid2:0000000000000000000000000000000000000000

master_repl_offset:2236538

........

[root@mimy-centos7b src]# ./redis-cli -h 192.168.31.181 -p 9501

192.168.31.181:9501> info replication

# Replication

role:slave

master_host:192.168.31.168

master_port:9500

master_link_status:up

master_last_io_seconds_ago:0

master_sync_in_progress:0

slave_repl_offset:2317523

........

[root@localhost src]# ./redis-cli -h 192.168.31.129 -p 9502

192.168.31.129:9502> info replication

# Replication

role:slave

master_host:192.168.31.168

master_port:9500

master_link_status:up

master_last_io_seconds_ago:1

master_sync_in_progress:0

slave_repl_offset:2336225

........

五、数据同步、高可用方案验证

1、数据同步

# 1、一开始三台机都是同步的，没有数据的；

192.168.31.168:9500> KEYS *

(empty list or set)

192.168.31.181:9501> keys *

(empty list or set)

192.168.31.129:9502> keys *

(empty list or set)

# 2、在主库插入（KEY1,CHAOS MONKEY）

192.168.31.168:9500> SET KEY1 'CHAOS MONKEY'

OK

# 3、查询另外两个从库是否同步了

192.168.31.181:9501> get KEY1

"CHAOS MONKEY"

192.168.31.129:9502> GET KEY1

"CHAOS MONKEY"

从129这台机的日志（/var/run/redis/redis.log）看，数据正在从168同步到129的从库（截图一）。另外，相同的同步的行为也发生在从168到181的从库（截图二）。

另外，这里要注意两个从库同步的时间是不一样的，因为我们之前设置了最大同时从主库同步数据的从库数（具体数据同步的行为请看下图日志）。

sentinel parallel-syncs mymaster 1 # 若新主库当选后，允许最大可以同时从新主库同步数据的从库数

2、高可用验证

首先，我们通过命令“ ./redis-cli -h 192.168.31.168 -p 26379 info sentinel”随便从一台机看看sentinel的情况，目前主库是192.168.31.168。

接着我们尝试模拟主库崩溃，先把168的主库停掉，然后通过sentinel.log日志发现主库已经从168换成181。

首先，181发现168下线，这是主观失效；因为我们在sentinel.conf中的设置了 sentinel monitor mymaster 127.0.0.1 6379 2 ，这里最后的2代表需要2个哨兵认为主库下线才算是真正下线（即客观失效）然后再进行选举；
接着，129或者168的哨兵也发现168下线，已经达到2台，因此就把168主库设成下线（客观失效），开始选举；
接着，开始选举，最终181被选举成主库。

登录到181的redis上，通过 INFO REPLICATION可以看到181也确实变成主库，129变成181的从库。

这个时候我们继续尝试一下新增KEY，从以下结果可以看出129是能够正常从181（新主库）同步数据。

192.168.31.181:9501> keys *

1) "KEY1"

2) "KEY2"

3) "KEY3"

192.168.31.129:9502> keys *

1) "KEY3"

2) "KEY2"

3) "KEY1"

192.168.31.181:9501> set KEY4 BUTTERFLY

OK

192.168.31.181:9501> keys *

1) "KEY1"

2) "KEY4"

3) "KEY2"

4) "KEY3"

192.168.31.129:9502> keys *

1) "KEY3"

2) "KEY2"

3) "KEY4"

4) "KEY1"