kubernetes 部署redis cluster
前言
Redis cluster的集群关系的维系,并非是依赖于ip的,而是依赖于cluster内部的唯一id, ip只在首次建立集群关系时连接彼此使用,不作为成员连接凭据,取而代之的是id。画外音:只要持有id,容器重启ip怎么变化都不会影响到维系redis cluster的成员关系。
那么id怎么保存的呢?redis cluster在建立起来后,每个主备节点都会保存一份cluster节点的元数据文件,因此,为了保证在kubernetes内pod重启后,集群相关的角色配置等不丢失,此文件必须持久化,因此,适合使用statefulSet来部署。
部署
直接贴部署的yaml文件:
apiVersion: v1
data:
redis.conf: |2
appendonly no
save 900 1
save 300 10
save 60 300
maxmemory 4GB
maxmemory-policy allkeys-lru
cluster-enabled yes
cluster-config-file /var/lib/redis/nodes.conf # cluster的node元数据文件
#cluster-node-timeout 5000 # 主从心跳检查的时间间隔
cluster-node-timeout 500 # 这里临时配置成500ms,为了下方的主备切换的测试
dir /var/lib/redis
port 6379
kind: ConfigMap
metadata:
name: app-rds-cluster
namespace: default
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-rds-cluster
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
storageClassName: cephfs
---
# 用来提供redis cluster对外连接服务的
apiVersion: v1
kind: Service
metadata:
name: app-rds-cluster
labels:
app: redis
spec:
ports:
- name: redis-port
port: 6379
selector:
app: app-rds-cluster
appCluster: redis-cluster
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: app-rds-cluster
spec:
serviceName: "app-rds-cluster"
replicas: 6
template:
metadata:
labels:
app: app-rds-cluster
appCluster: redis-cluster
spec:
terminationGracePeriodSeconds: 5
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- app-rds-cluster
topologyKey: kubernetes.io/hostname
containers:
- name: redis
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: redis:v4.0.14
command:
- "redis-server" #redis启动命令
args:
- "/etc/redis/redis.conf" # 配置文件
# command: redis-server /etc/redis/redis.conf
resources: #资源
requests: #请求的资源
cpu: "100m" #m代表千分之,相当于0.1 个cpu资源
memory: "100Mi" #内存100m大小
limits:
cpu: "1" # 1代表1核
memory: "4096Mi" #内存100m大小
ports:
- name: redis
containerPort: 6379
protocol: "TCP"
- name: cluster
containerPort: 16379
protocol: "TCP"
volumeMounts:
- name: "redis-conf" # 挂载configmap生成的文件
mountPath: "/etc/redis"
- name: pvc
mountPath: "/var/lib/redis"
subPathExpr: $(POD_NAME)/data/
nodeSelector:
RDSDB: ''
volumes:
- name: "redis-conf"
configMap:
name: app-rds-cluster
- name: pvc
persistentVolumeClaim:
claimName: app-rds-cluster
部署说明:
- redis配置文件使用configmap挂载
- 共配置6个节点,3主3从,这也是官方文档中说的推荐最小的集群规模了。
- redis cluster node元数据文件很重要,下面会有详细说明。这个文件必须要做持久化,可使用pvc/hostPath等方式。
- 使用redis-trib工具建立cluster,建立好cluster后,后续cluster关系维护不再需要它。
- 网上有不少其他文章中都部署了2个service,包含一个headless service,建cluster时用作服务发现;和一个正常的service,给客户端连接使用。但这个headless service其实根本没有必要,建cluster的时候手动取一次pod ip就好,何必多建一个无用的service
假设你的持久化没有问题,那么,你现在应该有6个运行正常的pod了:
[root@008019 redis-cluster]# kubectl get pods -o wide --all-namespaces | grep app-rds-cluster
default app-rds-cluster-0 1/1 Running 0 5m1s 172.36.4.43 008031 <none> <none>
default app-rds-cluster-1 1/1 Running 0 4m58s 172.36.1.30 008020 <none> <none>
default app-rds-cluster-2 1/1 Running 0 4m56s 172.36.6.21 020203 <none> <none>
default app-rds-cluster-3 1/1 Running 0 4m53s 172.36.5.25 008032 <none> <none>
default app-rds-cluster-4 1/1 Running 0 4m51s 172.36.0.198 008019 <none> <none>
default app-rds-cluster-5 1/1 Running 0 4m48s 172.36.3.134 020204 <none> <none>
# 获取到这些pod 的ip,下面会用到
[root@008019 redis-cluster]# echo `kubectl get pods -o wide --all-namespaces | grep app-rds-cluster | awk '{print $7":6379"}'`
172.36.4.43:6379 172.36.1.30:6379 172.36.6.21:6379 172.36.5.25:6379 172.36.0.198:6379 172.36.3.134:6379
建立集群
因此建立集群的操作只需要执行一次,后续集群关系redis节点之间会基于节点元数据文件自动维护,所以,专门部署一个环境,用来初始化配置集群,部署文件如下:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: redis-cluster-manager
name: redis-cluster-manager
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: redis-cluster-manager
template:
metadata:
labels:
app: redis-cluster-manager
spec:
containers:
- command:
- tail
args:
- -f
- /dev/null
env:
- name: DB_NAME
value: redis-cluster-manager
image: centos:centos7
imagePullPolicy: Always
name: redis-cluster-manager
resources:
limits:
cpu: 500m
memory: 300Mi
requests:
cpu: 100m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
[root@008019 redis-cluster]# kubectl get pods -o wide --all-namespaces | grep redis-cluster-manager
default redis-cluster-manager-5468b99f7f-lxpw7 1/1 Running 0 87m 172.36.4.42 008031 <none> <none>
[root@008019 redis-cluster]# kubectl exec -it redis-cluster-manager-5468b99f7f-lxpw7 bash
[root@redis-cluster-manager-5468b99f7f-lxpw7 /]#
安装集群配置工具:
cat >> /etc/yum.repos.d/epel.repo<<'EOF'
[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=https://mirrors.tuna.tsinghua.edu.cn/epel/7/$basearch
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch
failovermethod=priority
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
EOF
yum -y install redis-trib.noarch
初始化集群:
# 拿到上面获取到的pod ip,执行命令,交互步骤输入yes,不出意外,集群初始化成功
# 解释一下,这里的6个节点,--replicas 1,前3个会是master,后3个会是前3个的slave
redis-trib create --replicas 1 \
172.36.4.43:6379 172.36.1.30:6379 172.36.6.21:6379 172.36.5.25:6379 172.36.0.198:6379 172.36.3.134:6379
...
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
连接
选择其中一个
[root@008019 redis-cluster]# kubectl exec -it redis-cluster-manager-5468b99f7f-lxpw7 bash
[root@redis-cluster-manager-5468b99f7f-lxpw7 /]#
[root@redis-cluster-manager-5468b99f7f-lxpw7 /]# exit
exit
[root@008019 redis-cluster]# kubectl exec -it app-rds-cluster-0 bash
# redis-cli连接cluster记得加-c参数
root@app-rds-cluster-0:/var/lib/redis# redis-cli -c
# 设一个key
127.0.0.1:6379> get a
-> Redirected to slot [15495] located at 172.36.6.21:6379
(nil)
172.36.6.21:6379> set a 1
OK
172.36.6.21:6379> get a
"1"
# 查看角色
172.36.6.21:6379> role
1) "master"
2) (integer) 1954
3) 1) 1) "172.36.3.134"
2) "6379"
3) "1954"
172.36.6.21:6379> quit
# 查看集群节点元数据文件,第一列是id号,后面则是ip与角色信息, myself指的是自身
root@app-rds-cluster-0:/var/lib/redis# cat nodes.conf
bdcda6ce963add5bc9706e912b6fcf4b355e2add 172.36.3.134:6379@16379 slave d24da6b03aa3b614a1429847b3a47836f89dbc07 0 1578042761171 6 connected
f85ce96851823832a9bda4233cc6e3066e97c050 172.36.5.25:6379@16379 slave 561f70b2c94c6e46f7b9588705f2cb48861b1e89 0 1578042760571 4 connected
561f70b2c94c6e46f7b9588705f2cb48861b1e89 172.36.4.43:6379@16379 myself,master - 0 1578042760000 1 connected 0-5460
e5c64aa60dcc73746e196b6c1630019ec0c10ad5 172.36.0.198:6379@16379 slave 7e2a3fa95ef9402bfba7b2e08a27812272640179 0 1578042760000 5 connected
7e2a3fa95ef9402bfba7b2e08a27812272640179 172.36.1.30:6379@16379 master - 0 1578042760000 2 connected 5461-10922
d24da6b03aa3b614a1429847b3a47836f89dbc07 172.36.6.21:6379@16379 master - 0 1578042759569 3 connected 10923-16383
vars currentEpoch 6 lastVoteEpoch 0
root@app-rds-cluster-0:/var/lib/redis#
root@app-rds-cluster-0:/var/lib/redis#
使用service 固定ip连接:
[root@008019 ~]# kubectl get service --all-namespaces | grep app-rds-cluster
default app-rds-cluster ClusterIP 10.123.80.163 <none> 6379/TCP 48m
[root@008019 ~]# kubectl exec -it redis-cluster-manager-5468b99f7f-lxpw7 bash
[root@redis-cluster-manager-5468b99f7f-lxpw7 /]# redis-cli -h 10.123.80.163 -c
10.123.80.163:6379> get a
-> Redirected to slot [15495] located at 172.36.6.22:6379
"1"
主备切换
作为一个高可用集群,主备切换是基本能力,看一看切换过程发生了什么:
# 重启一个master
[root@008019 ~]# kubectl delete pod app-rds-cluster-0
pod "app-rds-cluster-0" deleted
[root@008019 ~]#
[root@008019 ~]# kubectl get pods -o wide --all-namespaces | grep app-rds-cl
default app-rds-cluster-0 1/1 Running 0 20s 172.36.4.44 008031 <none> <none>
default app-rds-cluster-1 1/1 Running 0 31m 172.36.1.30 008020 <none> <none>
default app-rds-cluster-2 1/1 Running 0 31m 172.36.6.21 020203 <none> <none>
default app-rds-cluster-3 1/1 Running 0 31m 172.36.5.25 008032 <none> <none>
default app-rds-cluster-4 1/1 Running 0 31m 172.36.0.198 008019 <none> <none>
default app-rds-cluster-5 1/1 Running 0 31m 172.36.3.134 020204 <none> <none>
# 对比上面可以发现,redis-0的ip从172.36.4.43变成了172.36.4.44,那么连接看看数据和角色
# 可以看到,此时master已经变成了slave, 但nodes.conf内的ip没有随着容器ip变化而修改,只是修改了从属关系的记录。当然,数据也没有丢失。
root@app-rds-cluster-0:/var/lib/redis# cat nodes.conf
7e2a3fa95ef9402bfba7b2e08a27812272640179 172.36.1.31:6379@16379 master - 1578044978612 1578044978607 2 connected 5461-10922
e5c64aa60dcc73746e196b6c1630019ec0c10ad5 172.36.0.199:6379@16379 slave 7e2a3fa95ef9402bfba7b2e08a27812272640179 0 1578044978614 5 connected
561f70b2c94c6e46f7b9588705f2cb48861b1e89 172.36.4.43:6379@16379 myself,slave f85ce96851823832a9bda4233cc6e3066e97c050 0 1578044978608 9 connected
bdcda6ce963add5bc9706e912b6fcf4b355e2add 172.36.3.135:6379@16379 slave d24da6b03aa3b614a1429847b3a47836f89dbc07 1578044978612 1578044978608 10 connected
f85ce96851823832a9bda4233cc6e3066e97c050 172.36.5.26:6379@16379 master - 0 1578044978614 11 connected 0-5460
d24da6b03aa3b614a1429847b3a47836f89dbc07 172.36.6.22:6379@16379 master - 1578044978612 1578044978608 10 connected 10923-16383
vars currentEpoch 11 lastVoteEpoch 10
root@app-rds-cluster-0:/var/lib/redis# redis-cli -c
127.0.0.1:6379> get a
-> Redirected to slot [15495] located at 172.36.6.22:6379
"1"
有兴趣可以同时删除多个节点试试,6节点,3个为主,按照raft的算法,外加主挂了之后还有备接替上,因此,除非2主+2备或更多节点同时挂掉,不然不会影响redis的服务。