k8s 1.15.7 node有问题一直加不到集群里

测试环境遇到一个问题node节点显示状态异常,我手工delete node。之后在重启服务想把node加入集群。结果一直进不去。

  1. 先使用简单粗暴的方法重启flanneld kube-proxy kubelet docker。看status的状态,都是正常的。但是没有加入集群。
  2. 重启机器reboot。然后查到服务没有什么报错,还是加入不了集群。
  3. 因为我使用使用二进制安装的,我直接把原来的node节点的配置文件全部删掉,重新写了一遍配置参数。重启服务,没有正常加入集群。

没有办法只有仔细查看日志了。

1. systemctl status 查看状态是正常的,但是还是有问题,就只能查看messages了。

简单分析,加入不了集群应该是kubelet异常:
看日志有

Aug  6 16:33:22 node125 kubelet: E0806 16:33:22.591193   20451 eviction_manager.go:247] eviction manager: failed to get summary stats: failed to get node info: node "node125" not found
Aug  6 16:33:22 node125 kubelet: I0806 16:33:22.595423   20451 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach

这个报错,查看配置文件和hosts解析都是没有问题的,初步排查kubelet,但看日志kubelet一直在重启,没有其他报错。应该是docker有问题。

Aug  6 16:28:32 node125 kubelet: E0806 16:28:32.073254   19188 node_container_manager_linux.go:50] Failed to create ["kubepods"] cgroup
Aug  6 16:28:32 node125 kubelet: F0806 16:28:32.073276   19188 kubelet.go:1372] Failed to start ContainerManager Cannot set property TasksAccounting, or unknown property.

2. 这个是docker的问题,导致k8s的kubelet 无法正常启动。查看下我的配置

docker的配置:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 28
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Init Binary: /usr/libexec/docker/docker-init-current
containerd version:  (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: e9c345b3f906d5dc5e8100b05ce37073a811c74a (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: 5b117de7f824f3d3825737cf09581645abbe35d4 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
Kernel Version: 4.4.54-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 4
Total Memory: 7.797 GiB
Name: node125
ID: V442:WMKW:IUXK:P57X:ZFRG:NKVS:6ROQ:U7FA:LAMQ:GAPS:CKJ5:B6PI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 192.168.100.119
 registry:5000
 127.0.0.0/8
Live Restore Enabled: false
Registries: docker.io (secure)

使用的是:Cgroup Driver: systemd
查看kubelet配置:

[root@node125 kubernetes]# cat kubelet.yaml 
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: 192.168.60.125
port: 10250
readOnlyPort: 10255 
clusterDNS:
   - 10.254.10.20
clusterDomain: cluster.local
cgroupDriver: systemd
kubeletCgroups: /systemd/system.slice
failSwapOn: false 
authentication:
    webhook:
      enabled: false
      cacheTTL: "2m0s"
    anonymous:
      enabled: true

使用的都是:cgroupDriver: systemd 。一致的没有问题。
那应该这个错是kubelet报的docker的,然后看下还docker还真有这个问题,版本问题,这个类型的有bug。

解决方法:

yum update systemd
systemctl restart docker 

然后在重启下kubelet服务,查看日志:

Aug  6 16:33:22 node125 kubelet: E0806 16:33:22.591193   20451 eviction_manager.go:247] eviction manager: failed to get summary stats: failed to get node info: node "node125" not found
Aug  6 16:33:22 node125 kubelet: I0806 16:33:22.595423   20451 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
Aug  6 16:33:22 node125 kubelet: E0806 16:33:22.595487   20451 kubelet.go:2252] node "node125" not found
Aug  6 16:33:22 node125 kubelet: I0806 16:33:22.597140   20451 kubelet_node_status.go:72] Attempting to register node node125
Aug  6 16:33:22 node125 kubelet: I0806 16:33:22.599621   20451 plugin_manager.go:116] Starting Kubelet Plugin Manager
Aug  6 16:33:22 node125 kubelet: I0806 16:33:22.602856   20451 kubelet_node_status.go:75] Successfully registered node node125
Aug  6 16:33:22 node125 kubelet: I0806 16:33:22.695543   20451 reconciler.go:150] Reconciler: start to sync state

在查看下kubectl get node就有了

查看下kubele的介绍

[root@node125 kubernetes]# kubelet --help| grep cg
      --cgroup-driver string                                                                                      Driver that the kubelet uses to manipulate cgroups on the host.  Possible values: 'cgroupfs', 'systemd' (default "cgroupfs") (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

这个默认和docker的默认不一致,建议修改docker的cgroup-driver的类型为cgroupfs,这个我在其他环境使用没有发现出过类似的问题。

  • 6
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值