openeuler+riscv的开发板部署了k3s集群,但coredns的pod存在启动问题,不同pod间不能ping通
k3s集群中部署了2个node,master中的coredns这个pod默认状态如下:
local-path-provisioner-6d44f4f9d7-z5b9c 0/1 CrashLoopBackOff 1089 (3m37s ago) 35d 10.42.0.25 openeuler-riscv64 <none> <none>
metrics-server-7c55d89d5d-kpj5h 0/1 CrashLoopBackOff 1077 (2m50s ago) 35d 10.42.0.22 openeuler-riscv64 <none> <none>
helm-install-traefik-crd-hhfn4 0/1 CrashLoopBackOff 820 (119s ago) 35d 10.42.0.21 openeuler-riscv64 <none> <none>
helm-install-traefik-r6bm8 0/1 CrashLoopBackOff 819 (109s ago) 35d 10.42.0.24 openeuler-riscv64 <none> <none>
coredns-97b598894-7l5ff 0/1 CrashLoopBackOff 8 (54s ago) 17m 10.42.0.27 openeuler-riscv64 <none> <none>
kubectl describe显示如下:
[root@openeuler-riscv64 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-97b598894-tqr5v 0/1 CrashLoopBackOff 9 (15h ago) 15h
metrics-server-7c55d89d5d-kpj5h 0/1 Running 1074 (119s ago) 35d
local-path-provisioner-6d44f4f9d7-z5b9c 0/1 CrashLoopBackOff 1086 (32s ago) 35d
helm-install-traefik-r6bm8 0/1 CrashLoopBackOff 815 (15s ago) 35d
helm-install-traefik-crd-hhfn4 0/1 CrashLoopBackOff 816 (15s ago) 35d coredns-97b598894-tqr5v
Name: coredns-97b598894-tqr5vn kube-system coredns-97b598894-tqr5v
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: coredns
Node: k3s-air1/172.20.10.3
Start Time: Mon, 22 Jan 2024 17:09:59 +0800
Labels: k8s-app=kube-dns
pod-template-hash=97b598894
Annotations: <none>
Status: Terminating (lasts 2m4s)
Termination Grace Period: 30s
IP: 10.42.1.26
IPs:
IP: 10.42.1.26
Controlled By: ReplicaSet/coredns-97b598894
Containers:
coredns:
Container ID: docker://c5386186c0177f658a96df702607bca0f795185cc7438ae29b1065dca1051cbc
Image: carvicsforth/coredns:1.10.1
Image ID: docker-pullable://carvicsforth/coredns@sha256:6cd10cf78af68af9bfebc932c22724a64d4ce0e7ff94738aef6b92df7565f4b1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 22 Jan 2024 17:32:03 +0800
Finished: Mon, 22 Jan 2024 17:32:08 +0800
Ready: False
Restart Count: 9
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/etc/coredns/custom from custom-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dbth2 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
DisruptionTarget True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
custom-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns-custom
Optional: true
kube-api-access-dbth2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector k8s-app=kube-dns
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15h default-scheduler Successfully assigned kube-system/coredns-97b598894-tqr5v to k3s-air1
Normal Pulled 15h (x3 over 15h) kubelet Container image "carvicsforth/coredns:1.10.1" already present on machine
Normal Created 15h (x3 over 15h) kubelet Created container coredns
Normal Started 15h (x3 over 15h) kubelet Started container coredns
Warning Unhealthy 15h (x14 over 15h) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 15h (x95 over 15h) kubelet Back-off restarting failed container coredns in pod coredns-97b598894-tqr5v_kube-system(5f26f744-5697-47c4-a895-7a1dbff23b96)
kubectl logs如下:
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
Listen: listen tcp :53: bind: permission denied
用ps -ef查看发现并不是root执行,然后更改coredns.yaml文件,增加如下内容:
+affinity:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: node-role.kubernetes.io/master
+ operator: Exists
nodeSelector:
kubernetes.io/os: linux
......
securityContext:
+ runAsUser: 0
+ runAsGroup: 0
allowPrivilegeEscalation: false
重启corddns这个pod后状态如下:
coredns-65dc9b694c-xx4pf 0/1 Running 0 63m 10.42.0.28 openeuler-riscv64 <none> <none>
helm-install-traefik-crd-hhfn4 0/1 CrashLoopBackOff 829 (3m56s ago) 35d 10.42.0.21 openeuler-riscv64 <none> <none>
helm-install-traefik-r6bm8 0/1 CrashLoopBackOff 828 (3m24s ago) 35d 10.42.0.24 openeuler-riscv64 <none> <none>
local-path-provisioner-6d44f4f9d7-z5b9c 0/1 CrashLoopBackOff 1102 (3m ago) 35d 10.42.0.25 openeuler-riscv64 <none> <none>
metrics-server-7c55d89d5d-kpj5h 0/1 CrashLoopBackOff 1090 (81s ago) 35d 10.42.0.22 openeuler-riscv64 <none> <none>
状态为running,但ready还是0/1,kubectl describe显示的event如下:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m17s (x1850 over 64m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
logs如下:
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.2/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1972775025]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.2/tools/cache/reflector.go:231 (23-Jan-2024 01:36:37.721) (total time: 30001ms):
Trace[1972775025]: ---"Objects listed" error:Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout 30001ms (01:37:07.723)
Trace[1972775025]: [30.001897321s] [30.001897321s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.2/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
pod间还是不能ping通,有小伙伴有过类似的问题吗?