CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container
我解决这个问题进行了以下操作:
- 重启Calico(没有变化)
- 重启Docker和Kubelet(出现了新的OOM的提示)
- 更改Pod资源配额,问题解决
- 再次调小资源配额,问题复现
因此标题所述问题跟资源配额太小有关。但是Pod的日志,Calico等网络插件的信息均没有直接指出问题所在。
Pod定义:
apiVersion: v1
kind: Pod
metadata:
name: downward
spec:
containers:
- name: main
image: busybox:1.28.4
imagePullPolicy: IfNotPresent
command: ["sleep", "9999999"]
resources:
requests:
cpu: 15m
memory: 100Ki
limits:
cpu: 100m
memory: 4Mi
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: CONTAINER_CPU_REQUEST_MILLICORES
valueFrom:
resourceFieldRef:
resource: requests.cpu
divisor: 1m
- name: CONTAINER_MEMORY_LIMIT_KIBIBYTES
valueFrom:
resourceFieldRef:
resource: limits.memory
divisor: 1Ki
Pod日志:
Warning FailedCreatePodSandBox 0s kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ba50e2ad6faf226a6680e2ba61179c626825141034b63611265463d71de57ae1" network for pod "downward": networkPlugin cni failed to set up pod "downward_kubernetes-dashboard" network: CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "ba50e2ad6faf226a6680e2ba61179c626825141034b63611265463d71de57ae1"
排查过程:
在网上查找了很多资料都不得其解,当然本人对K8S和Calico也不熟悉。
在stackoverflow上查到类似问题,但是解答得不清楚,为:
I solve this problem. The cause was argo Cd prune setting
在Redhat bugzilla中,没看清,说不是一个问题?
在stackoverflow中还看到一个帖子,这么说的:
Ensure that /etc/cni/net.d and its /opt/cni/bin friend both exist and are correctly populated with the CNI configuration files and binaries on all Nodes. For flannel specifically, one might make use of the flannel cni repo
Don’t overlook the are correctly populated part. For that /var/lib/calico/nodename error specifically, be careful if you are using a DaemonSet to configure calico: k8s may try to schedule that frontend Pod on the Node before the DaemonSet has finished configuring it. In that specific circumstance, just deleting all the Pods from the Node after calico is successfully configured will cure that sandbox problem
找不到原因,重启kubelet和docker后,发现新增了如下错误:
Warning FailedMount 77s (x5 over 88s) kubelet MountVolume.SetUp failed for volume "kube-api-access-rj9w6" : failed to fetch token: Post "https://10.0.12.10:6443/api/v1/namespaces/kubernetes-dashboard/serviceaccounts/default/token": dial tcp 10.0.12.10:6443: connect: connection refused
Normal SandboxChanged 52s (x10 over 68s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 52s (x9 over 67s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "downward": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: container init was OOM-killed (memory limit too low?): unknown
难道是因为资源限制太小的缘故?尝试将资源限制调整为:
resources:
requests:
cpu: 15m
memory: 100Mi
limits:
cpu: 100m
memory: 400Mi
Pod正常启动了!