Docker启动失败
背景:在学习k8s集群时,不知道什么原因导致了node节点的状态变为了notReady,于是重启了虚拟机,重启之后就发现无法启动kubelet了。
伏笔:在之前的K8s文章中学习docker搭建镜像仓库时,编辑了/etc/docker/daemon.json文件
问题排查-journalctl -xefu kubelet
划重点:journalctl -xefu kubelet
命令可以查看kubelet的运行日志。
使用journalctl -xefu kubelet
命令看kubelet的日志,发现原因是docker启动不起来failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[root@k8s-master01 ~]# journalctl -xefu kubelet
-- Logs begin at 四 2021-04-01 21:31:05 CST. --
4月 01 21:31:16 k8s-master01 systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
4月 01 21:31:29 k8s-master01 kubelet[771]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
4月 01 21:31:29 k8s-master01 kubelet[771]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
4月 01 21:31:30 k8s-master01 kubelet[771]: I0401 21:31:30.188245 771 server.go:425] Version: v1.15.1
4月 01 21:31:30 k8s-master01 kubelet[771]: I0401 21:31:30.188669 771 plugins.go:103] No cloud provider specified.
4月 01 21:31:30 k8s-master01 kubelet[771]: I0401 21:31:30.188736 771 server.go:791] Client rotation is on, will bootstrap in background
4月 01 21:31:30 k8s-master01 kubelet[771]: I0401 21:31:30.252570 771 certificate_store.go:129] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.599474 771 server.go:661] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.612966 771 container_manager_linux.go:261] container manager verified user specified cgroup-root exists: []
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.613077 771 container_manager_linux.go:266] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms}
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.613268 771 container_manager_linux.go:286] Creating device plugin manager: true
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.613460 771 state_mem.go:36] [cpumanager] initializing new in-memory state store
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.625560 771 state_mem.go:84] [cpumanager] updated default cpuset: ""
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.625592 771 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.680045 771 kubelet.go:281] Adding pod path: /etc/kubernetes/manifests
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.680287 771 kubelet.go:306] Watching apiserver
4月 01 21:31:38 k8s-master01 kubelet[771]: E0401 21:31:38.700168 771 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.15.154:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-master01&limit=500&resourceVersion=0: dial tcp 192.168.15.154:6443: connect: connection refused
4月 01 21:31:38 k8s-master01 kubelet[771]: E0401 21:31:38.700251 771 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://192.168.15.154:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.15.154:6443: connect: connection refused
4月 01 21:31:38 k8s-master01 kubelet[771]: E0401 21:31:38.701128 771 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://192.168.15.154:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-master01&limit=500&resourceVersion=0: dial tcp 192.168.15.154:6443: connect: connection refused
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.796071 771 client.go:75] Connecting to docker on unix:///var/run/docker.sock
4月 01 21:31:38 k8s-master01 kubelet[771]: I0401 21:31:38.796107 771 client.go:104] Start docker client with request timeout=2m0s
4月 01 21:31:38 k8s-master01 kubelet[771]: F0401 21:31:38.796522 771 server.go:273] failed to run Kubelet: failed to create kubelet: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
4月 01 21:31:38 k8s-master01 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
4月 01 21:31:38 k8s-master01 systemd[1]: Unit kubelet.service entered failed state.
4月 01 21:31:38 k8s-master01 systemd[1]: kubelet.service failed.
遂使用命令systemctl start docker
重启docker,发现docker启动不了。百度了Job for docker.service failed because the control process exited with error
,说是让卸载docker重装;但是联想之前docker可以运行,觉得不应该是docker重装的问题。
看见文章评论区有人提示说daemon文件如下:
也有可能版本内核没问题 还是这样的话,就是容器引擎失败。修改/etc/docker/daemon.json { “storage-driver”: “devicemapper” } 和etc/sysconfig/docker-storage DOCKER_STORAGE_OPTIONS="–selinux-enabled --log-driver=journald --signature-verification=false"
想了一下,可能是我的daemon文件有问题,于是查看我的daemon文件,发现上次给镜像仓库地址的时候"insecure-registries": ["10.185.16.62"]
末尾没有加逗号,
;于是重新添加逗号后,重启docker,成功。
[root@k8s-master01 ~]# cat /etc/docker/daemon.json
{
"insecure-registries": ["10.185.16.62"]
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {"max-size": "100m"},
"registry-mirrors": ["https://dca8zh55.mirror.aliyuncs.com"]
}
如下是修复daemon文件前后重启docker的输出内容。
[root@k8s-master01 ~]# systemctl start docker
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[root@k8s-master01 ~]# systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since 四 2021-04-01 21:39:05 CST; 1min 24s ago
Docs: https://docs.docker.com
Process: 9369 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
Main PID: 9369 (code=exited, status=1/FAILURE)
4月 01 21:39:02 k8s-master01 systemd[1]: docker.service failed.
4月 01 21:39:05 k8s-master01 systemd[1]: docker.service holdoff time over, scheduling restart.
4月 01 21:39:05 k8s-master01 systemd[1]: Stopped Docker Application Container Engine.
4月 01 21:39:05 k8s-master01 systemd[1]: start request repeated too quickly for docker.service
4月 01 21:39:05 k8s-master01 systemd[1]: Failed to start Docker Application Container Engine.
4月 01 21:39:05 k8s-master01 systemd[1]: Unit docker.service entered failed state.
4月 01 21:39:05 k8s-master01 systemd[1]: docker.service failed.
4月 01 21:39:05 k8s-master01 systemd[1]: start request repeated too quickly for docker.service
4月 01 21:39:05 k8s-master01 systemd[1]: Failed to start Docker Application Container Engine.
4月 01 21:39:05 k8s-master01 systemd[1]: docker.service failed.
[root@k8s-master01 ~]# vim /etc/docker/daemon.json
[root@k8s-master01 ~]# systemctl start docker
[root@k8s-master01 ~]# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since 四 2021-04-01 21:43:40 CST; 3s ago
Docs: https://docs.docker.com
Main PID: 15124 (dockerd)
Tasks: 35
Memory: 68.9M
CGroup: /system.slice/docker.service
└─15124 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.053487466+08:00" level=info msg="Removing stale sandbox 9296344e9aae8302d0de3ccc0ea01bc208562b49db4f1...699c6d0034)"
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.054276607+08:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deletin...etrying...."
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.195547264+08:00" level=info msg="Removing stale sandbox a0b4b3f764ded753ea5e611f9eade76067f8c29c0ae43...71e52defa9)"
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.196283175+08:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deletin...etrying...."
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.226445462+08:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/1... IP address"
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.261766558+08:00" level=info msg="Loading containers: done."
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.584357782+08:00" level=info msg="Docker daemon" commit=4484c46d9d graphdriver(s)=overlay2 version=19.03.13
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.615556594+08:00" level=info msg="Daemon has completed initialization"
4月 01 21:43:40 k8s-master01 systemd[1]: Started Docker Application Container Engine.
4月 01 21:43:40 k8s-master01 dockerd[15124]: time="2021-04-01T21:43:40.648021852+08:00" level=info msg="API listen on /var/run/docker.sock"
Hint: Some lines were ellipsized, use -l to show in full.
[root@k8s-master01 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 四 2021-04-01 21:43:42 CST; 9min ago
Docs: https://kubernetes.io/docs/
Main PID: 15542 (kubelet)
Tasks: 21
Memory: 60.4M
CGroup: /system.slice/kubelet.service
└─15542 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgrou...
4月 01 21:44:23 k8s-master01 kubelet[15542]: E0401 21:44:23.635179 15542 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to ...
4月 01 21:44:23 k8s-master01 kubelet[15542]: E0401 21:44:23.635188 15542 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "coredns-5c98db65d4-kgrls_kube-system(1d66aa94-811c-4959-...
4月 01 21:44:23 k8s-master01 kubelet[15542]: E0401 21:44:23.635199 15542 kuberuntime_manager.go:692] createPodSandbox for pod "coredns-5c98db65d4-kgrls_kube-system(1d66aa94-811c-4959...
4月 01 21:44:23 k8s-master01 kubelet[15542]: E0401 21:44:23.635215 15542 pod_workers.go:190] Error syncing pod 1d66aa94-811c-4959-ac96-c00c7a69ef02 ("coredns-5c98db65d4-k...959-ac96-c00
4月 01 21:44:23 k8s-master01 kubelet[15542]: W0401 21:44:23.874880 15542 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook ...
4月 01 21:44:23 k8s-master01 kubelet[15542]: W0401 21:44:23.881986 15542 pod_container_deletor.go:75] Container "274bed3f47f840a4b1db40813cdb6997a6c815c167c44310e1a66f339...s containers
4月 01 21:44:23 k8s-master01 kubelet[15542]: W0401 21:44:23.883761 15542 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the ...f339752defc"
4月 01 21:44:23 k8s-master01 kubelet[15542]: W0401 21:44:23.886376 15542 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook ...
4月 01 21:44:23 k8s-master01 kubelet[15542]: W0401 21:44:23.899020 15542 pod_container_deletor.go:75] Container "ff30b90f449ddceaecbd4396e652937582b76138ceeed6bd36d871f76...s containers
4月 01 21:44:23 k8s-master01 kubelet[15542]: W0401 21:44:23.903519 15542 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the ...1f76014bdfb"
Hint: Some lines were ellipsized, use -l to show in full.
总结:docker和k8s他们所依赖的配置清单都需要严格的json格式,一定要注意!