测试环境遇到一个问题node节点显示状态异常,我手工delete node。之后在重启服务想把node加入集群。结果一直进不去。
- 先使用简单粗暴的方法重启flanneld kube-proxy kubelet docker。看status的状态,都是正常的。但是没有加入集群。
- 重启机器reboot。然后查到服务没有什么报错,还是加入不了集群。
- 因为我使用使用二进制安装的,我直接把原来的node节点的配置文件全部删掉,重新写了一遍配置参数。重启服务,没有正常加入集群。
没有办法只有仔细查看日志了。
1. systemctl status 查看状态是正常的,但是还是有问题,就只能查看messages了。
简单分析,加入不了集群应该是kubelet异常:
看日志有
Aug 6 16:33:22 node125 kubelet: E0806 16:33:22.591193 20451 eviction_manager.go:247] eviction manager: failed to get summary stats: failed to get node info: node "node125" not found
Aug 6 16:33:22 node125 kubelet: I0806 16:33:22.595423 20451 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
这个报错,查看配置文件和hosts解析都是没有问题的,初步排查kubelet,但看日志kubelet一直在重启,没有其他报错。应该是docker有问题。
Aug 6 16:28:32 node125 kubelet: E0806 16:28:32.073254 19188 node_container_manager_linux.go:50] Failed to create ["kubepods"] cgroup
Aug 6 16:28:32 node125 kubelet: F0806 16:28:32.073276 19188 kubelet.go:1372] Failed to start ContainerManager Cannot set property TasksAccounting, or unknown property.
2. 这个是docker的问题,导致k8s的kubelet 无法正常启动。查看下我的配置
docker的配置:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 28
Server Version: 1.13.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Init Binary: /usr/libexec/docker/docker-init-current
containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: e9c345b3f906d5dc5e8100b05ce37073a811c74a (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: 5b117de7f824f3d3825737cf09581645abbe35d4 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
seccomp
WARNING: You're not using the default seccomp profile
Profile: /etc/docker/seccomp.json
Kernel Version: 4.4.54-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 4
Total Memory: 7.797 GiB
Name: node125
ID: V442:WMKW:IUXK:P57X:ZFRG:NKVS:6ROQ:U7FA:LAMQ:GAPS:CKJ5:B6PI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
192.168.100.119
registry:5000
127.0.0.0/8
Live Restore Enabled: false
Registries: docker.io (secure)
使用的是:Cgroup Driver: systemd
查看kubelet配置:
[root@node125 kubernetes]# cat kubelet.yaml
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: 192.168.60.125
port: 10250
readOnlyPort: 10255
clusterDNS:
- 10.254.10.20
clusterDomain: cluster.local
cgroupDriver: systemd
kubeletCgroups: /systemd/system.slice
failSwapOn: false
authentication:
webhook:
enabled: false
cacheTTL: "2m0s"
anonymous:
enabled: true
使用的都是:cgroupDriver: systemd 。一致的没有问题。
那应该这个错是kubelet报的docker的,然后看下还docker还真有这个问题,版本问题,这个类型的有bug。
解决方法:
yum update systemd
systemctl restart docker
然后在重启下kubelet服务,查看日志:
Aug 6 16:33:22 node125 kubelet: E0806 16:33:22.591193 20451 eviction_manager.go:247] eviction manager: failed to get summary stats: failed to get node info: node "node125" not found
Aug 6 16:33:22 node125 kubelet: I0806 16:33:22.595423 20451 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
Aug 6 16:33:22 node125 kubelet: E0806 16:33:22.595487 20451 kubelet.go:2252] node "node125" not found
Aug 6 16:33:22 node125 kubelet: I0806 16:33:22.597140 20451 kubelet_node_status.go:72] Attempting to register node node125
Aug 6 16:33:22 node125 kubelet: I0806 16:33:22.599621 20451 plugin_manager.go:116] Starting Kubelet Plugin Manager
Aug 6 16:33:22 node125 kubelet: I0806 16:33:22.602856 20451 kubelet_node_status.go:75] Successfully registered node node125
Aug 6 16:33:22 node125 kubelet: I0806 16:33:22.695543 20451 reconciler.go:150] Reconciler: start to sync state
在查看下kubectl get node就有了
查看下kubele的介绍
[root@node125 kubernetes]# kubelet --help| grep cg
--cgroup-driver string Driver that the kubelet uses to manipulate cgroups on the host. Possible values: 'cgroupfs', 'systemd' (default "cgroupfs") (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)
这个默认和docker的默认不一致,建议修改docker的cgroup-driver的类型为cgroupfs,这个我在其他环境使用没有发现出过类似的问题。