Operator——Redis之重启虚拟机后无法重建集群

1. 背景介绍

由于业务需要Redis,我司选择使用Redis OperatorKubernetes中部署一套Redis集群。

运维同学反应,当重启虚拟机之后,Redis集群无法重建集群。我把问题排查过程记录了下来,于是有了大家看到的这篇文章。

重启虚拟机之后,Redis-Operator无法重建集群,错误日志如下:

从日志信息中可以看到,Redis集群的Leader, Follower已经成功启动,但是在通过redis-cli --cluster add-node命令时出错。

# kubectl logs -f --tail 100 -n redis-system redis-operator-75f946fd68-stzjx
I1213 03:17:50.751820       1 request.go:665] Waited for 1.042141823s due to client-side throttling, not priority and fairness, request: GET:https://10.233.0.1:443/apis/events.k8s.io/v1?timeout=32s
{"level":"info","ts":1670901471.1052506,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1670901471.1055748,"logger":"setup","msg":"starting manager"}
I1213 03:17:51.105988       1 leaderelection.go:248] attempting to acquire leader lease redis-system/6cab913b.redis.opstreelabs.in...
{"level":"info","ts":1670901471.1060054,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
{"level":"info","ts":1670901471.1060386,"msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
I1213 03:18:08.969275       1 leaderelection.go:258] successfully acquired lease redis-system/6cab913b.redis.opstreelabs.in
{"level":"info","ts":1670901488.9695313,"logger":"controller.redis","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis","source":"kind source: *v1beta1.Redis"}
{"level":"info","ts":1670901488.9696147,"logger":"controller.redis","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis"}
{"level":"info","ts":1670901488.9695785,"logger":"controller.rediscluster","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster","source":"kind source: *v1beta1.RedisCluster"}
{"level":"info","ts":1670901488.9696522,"logger":"controller.rediscluster","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster"}
{"level":"info","ts":1670901489.0710678,"logger":"controller.rediscluster","msg":"Starting workers","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster","worker count":1}
{"level":"info","ts":1670901489.0711362,"logger":"controller.redis","msg":"Starting workers","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis","worker count":1}
{"level":"info","ts":1670901489.071197,"logger":"controllers.RedisCluster","msg":"Reconciling opstree redis Cluster controller","Request.Namespace":"redis-system","Request.Name":"redis"}
{"level":"info","ts":1670901489.0770147,"logger":"controller_redis","msg":"Redis statefulset get action was successful","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-leader"}
{"level":"info","ts":1670901489.0919821,"logger":"controller_redis","msg":"Reconciliation Complete, no Changes required.","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-leader"}
{"level":"info","ts":1670901489.0957606,"logger":"controller_redis","msg":"Redis service get action is successful","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-leader-headless"}
{"level":"info","ts":1670901489.0983348,"logger":"controller_redis","msg":"Redis service is already in-sync","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-leader-headless"}
{"level":"info","ts":1670901489.10161,"logger":"controller_redis","msg":"Redis service get action is successful","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-leader"}
{"level":"info","ts":1670901489.1034107,"logger":"controller_redis","msg":"Redis service is already in-sync","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-leader"}
{"level":"info","ts":1670901489.2027256,"logger":"controller_redis","msg":"Redis PodDisruptionBudget get action failed","Request.PodDisruptionBudget.Namespace":"redis-system","Request.PodDisruptionBudget.Name":"redis-leader"}
{"level":"info","ts":1670901489.202785,"logger":"controller_redis","msg":"Reconciliation Successful, no PodDisruptionBudget Found.","Request.PodDisruptionBudget.Namespace":"redis-system","Request.PodDisruptionBudget.Name":"redis-leader"}
{"level":"info","ts":1670901489.2072728,"logger":"controller_redis","msg":"Redis statefulset get action was successful","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-follower"}
{"level":"info","ts":1670901489.217569,"logger":"controller_redis","msg":"Reconciliation Complete, no Changes required.","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-follower"}
{"level":"info","ts":1670901489.2213707,"logger":"controller_redis","msg":"Redis service get action is successful","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-follower-headless"}
{"level":"info","ts":1670901489.2232463,"logger":"controller_redis","msg":"Redis service is already in-sync","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-follower-headless"}
{"level":"info","ts":1670901489.2272012,"logger":"controller_redis","msg":"Redis service get action is successful","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-follower"}
{"level":"info","ts":1670901489.2288983,"logger":"controller_redis","msg":"Redis service is already in-sync","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-follower"}
{"level":"info","ts":1670901489.2301779,"logger":"controller_redis","msg":"Redis PodDisruptionBudget get action failed","Request.PodDisruptionBudget.Namespace":"redis-system","Request.PodDisruptionBudget.Name":"redis-follower"}
{"level":"info","ts":1670901489.2302952,"logger":"controller_redis","msg":"Reconciliation Successful, no PodDisruptionBudget Found.","Request.PodDisruptionBudget.Namespace":"redis-system","Request.PodDisruptionBudget.Name":"redis-follower"}
{"level":"info","ts":1670901489.2347434,"logger":"controller_redis","msg":"Redis statefulset get action was successful","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-leader"}
{"level":"info","ts":1670901489.239003,"logger":"controller_redis","msg":"Redis statefulset get action was successful","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-follower"}
{"level":"info","ts":1670901489.2390306,"logger":"controllers.RedisCluster","msg":"Creating redis cluster by executing cluster creation commands","Request.Namespace":"redis-system","Request.Name":"redis","Leaders.Ready":"3","Followers.Ready":"3"}
{"level":"info","ts":1670901489.246679,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-0","ip":"10.233.75.78"}
{"level":"info","ts":1670901489.247344,"logger":"controller_redis","msg":"Redis cluster nodes are listed","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Output":"a23173bd200452117228a4d83388d8cf4c0bd977 10.233.74.87:6379@16379 master - 0 1670901487928 3 connected 10923-16383\n2fac4565c44c0dc15d700bbea68965bcc708cfea 10.233.97.134:6379@16379 master - 0 1670901488931 2 connected 5461-10922\n39fdd5b0cd3493fe86e6af5bb0753c310e655fa6 10.233.75.78:6379@16379 myself,master - 0 1670901484000 1 connected 0-5460\n"}
{"level":"info","ts":1670901489.247449,"logger":"controller_redis","msg":"Total number of redis nodes are","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Nodes":"3"}
{"level":"info","ts":1670901489.254622,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-0","ip":"10.233.75.78"}
{"level":"info","ts":1670901489.2553933,"logger":"controller_redis","msg":"Redis cluster nodes are listed","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Output":"a23173bd200452117228a4d83388d8cf4c0bd977 10.233.74.87:6379@16379 master - 0 1670901487928 3 connected 10923-16383\n2fac4565c44c0dc15d700bbea68965bcc708cfea 10.233.97.134:6379@16379 master - 0 1670901488931 2 connected 5461-10922\n39fdd5b0cd3493fe86e6af5bb0753c310e655fa6 10.233.75.78:6379@16379 myself,master - 0 1670901484000 1 connected 0-5460\n"}
{"level":"info","ts":1670901489.2554975,"logger":"controller_redis","msg":"Number of redis nodes are","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Nodes":"3","Type":"leader"}
{"level":"info","ts":1670901489.2555683,"logger":"controllers.RedisCluster","msg":"All leader are part of the cluster, adding follower/replicas","Request.Namespace":"redis-system","Request.Name":"redis","Leaders.Count":3,"Instance.Size":3,"Follower.Replicas":3}
{"level":"info","ts":1670901489.2618587,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-0","ip":"10.233.75.78"}
{"level":"info","ts":1670901489.2623925,"logger":"controller_redis","msg":"Redis cluster nodes are listed","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Output":"a23173bd200452117228a4d83388d8cf4c0bd977 10.233.74.87:6379@16379 master - 0 1670901487928 3 connected 10923-16383\n2fac4565c44c0dc15d700bbea68965bcc708cfea 10.233.97.134:6379@16379 master - 0 1670901488931 2 connected 5461-10922\n39fdd5b0cd3493fe86e6af5bb0753c310e655fa6 10.233.75.78:6379@16379 myself,master - 0 1670901484000 1 connected 0-5460\n"}
{"level":"info","ts":1670901489.2665036,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-follower-0","ip":"10.233.75.71"}
{"level":"info","ts":1670901489.2665575,"logger":"controller_redis","msg":"Checking if Node is in cluster","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Node":"10.233.75.71"}
{"level":"info","ts":1670901489.2665722,"logger":"controller_redis","msg":"Adding node to cluster.","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Node.IP":"10.233.75.71","Follower.Pod":{"PodName":"redis-follower-0","Namespace":"redis-system"}}
{"level":"info","ts":1670901489.2955825,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-follower-0","ip":"10.233.75.71"}
{"level":"info","ts":1670901489.2998388,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-0","ip":"10.233.75.78"}
{"level":"info","ts":1670901489.3070674,"logger":"controller_redis","msg":"Pod Counted successfully","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Count":0,"Container Name":"redis-leader"}
{"level":"error","ts":1670901489.4937437,"logger":"controller_redis","msg":"Could not execute command","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Command":["redis-cli","--cluster","add-node","10.233.75.71:6379","10.233.75.78:6379","--cluster-slave","-a","xxxxxxxx"],"Output":">>> Adding node 10.233.75.71:6379 to cluster 10.233.75.78:6379\n>>> Performing Cluster Check (using node 10.233.75.78:6379)\nM: 39fdd5b0cd3493fe86e6af5bb0753c310e655fa6 10.233.75.78:6379\n   slots:[0-5460] (5461 slots) master\nM: a23173bd200452117228a4d83388d8cf4c0bd977 10.233.74.87:6379\n   slots:[10923-16383] (5461 slots) master\nM: 2fac4565c44c0dc15d700bbea68965bcc708cfea 10.233.97.134:6379\n   slots:[5461-10922] (5462 slots) master\n[OK] All nodes agree about slots configuration.\n>>> Check for open slots...\n>>> Check slots coverage...\n[OK] All 16384 slots covered.\nAutomatically selected master 10.233.75.78:6379\n[ERR] Node 10.233.75.71:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.\n","Error":"Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.\n","error":"command terminated with exit code 1","stacktrace":"redis-operator/k8sutils.ExecuteRedisReplicationCommand\n\t/workspace/k8sutils/redis.go:180\nredis-operator/controllers.(*RedisClusterReconciler).Reconcile\n\t/workspace/controllers/rediscluster_controller.go:127\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227"}

集群信息如下:

# kubectl get pods -n redis-system
NAME                              READY   STATUS    RESTARTS   AGE     IP              NODE    NOMINATED NODE   READINESS GATES
redis-follower-0                  2/2     Running   0          15m     10.233.75.71    node6   <none>           <none>
redis-follower-1                  2/2     Running   0          7m9s    10.233.97.163   node5   <none>           <none>
redis-follower-2                  2/2     Running   0          5m28s   10.233.74.124   node4   <none>           <none>
redis-leader-0                    2/2     Running   0          13m     10.233.75.78    node6   <none>           <none>
redis-leader-1                    2/2     Running   0          6m33s   10.233.97.134   node5   <none>           <none>
redis-leader-2                    2/2     Running   0          5m17s   10.233.74.87    node4   <none>           <none>
redis-operator-75f946fd68-bsn6k   1/1     Running   3          23m     10.233.75.94    node6   <none>           <none>

#

# kubectl get svc -n redis-system
NAME                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
redis-follower            ClusterIP   10.233.24.70   <none>        6379/TCP,9121/TCP   83m
redis-follower-headless   ClusterIP   None           <none>        6379/TCP            83m
redis-leader              ClusterIP   10.233.26.65   <none>        6379/TCP,9121/TCP   83m
redis-leader-headless     ClusterIP   None           <none>        6379/TCP            83m

2. 解决方案

改造Operator,实现以下功能即可

  • 1、删除aof/rdb文件
  • 2、删除nodes.conf文件
  • 3、执行flushdb命令
  • 4、执行cluster reset命令

3. Reference

Either the node already knows other nodes (check with CLUSTER NODES) error message on joining a cluster #3154

redis集群添加节点报错Either the node already knows other nodes (check with CLUSTER NODES) or contains some k

Redis集群报错Node Is Not Empty,Either The Node Already Knows Other Nodes

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值