描述:集群刚初始化以后创建了 calico 之后,发现有两个 pod 一直处于 0/1 状态,这些错误就好像牛皮膏药一样很烦人。
这个问题是由于节点多网卡引起的,我查看了一下我的节点网卡,有七八个,然后指定为了 eth1 重新 apply calico.yaml 之后成功。
- name: IP_AUTODETECTION_METHOD
value: "interface=eth1"
另外一个错误,但是修复方式是相同的
查看任意一个 calico 日志:
[root@1 ~]# kubectl get pod -n kube-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-6d4bfc7c57-87gnm 0/1 ContainerCreating 0 3m53s <none> 10-1-0-34-mesh-node <none> <none>
calico-node-bvhjc 0/1 CrashLoopBackOff 5 3m54s 10.1.0.34 10-1-0-34-mesh-node <none> <none>
calico-node-zp9tv 0/1 CrashLoopBackOff 5 3m54s 10.1.0.42 10-1-0-42-mesh-master <none> <none>
coredns-9d85f5447-kcp7d 0/1 ContainerCreating 0 4m35s <none> 10-1-0-34-mesh-node <none> <none>
coredns-9d85f5447-qc2kh 0/1 ContainerCreating 0 4m35s <none> 10-1-0-34-mesh-node <none> <none>
etcd-10-1-0-42-mesh-master 1/1 Running 0 4m41s 10.1.0.42 10-1-0-42-mesh-master <none> <none>
kube-apiserver-10-1-0-42-mesh-master 1/1 Running 0 4m40s 10.1.0.42 10-1-0-42-mesh-master <none> <none>
kube-controller-manager-10-1-0-42-mesh-master 1/1 Running 0 4m40s 10.1.0.42 10-1-0-42-mesh-master <none> <none>
kube-proxy-blm42 1/1 Running 0 4m14s 10.1.0.34 10-1-0-34-mesh-node <none> <none>
kube-proxy-gcj7k 1/1 Running 0 4m35s 10.1.0.42 10-1-0-42-mesh-master <none> <none>
kube-scheduler-10-1-0-42-mesh-master 1/1 Running 0 4m41s 10.1.0.42 10-1-0-42-mesh-master <none> <none>
[root@1 ~]#
[root@1 ~]# kubectl logs -f calico-node-zp9tv -n kube-system
2021-04-28 03:34:31.876 [INFO][9] startup/startup.go 379: Early log level set to info
2021-04-28 03:34:31.876 [INFO][9] startup/startup.go 395: Using NODENAME environment for node name
2021-04-28 03:34:31.876 [INFO][9] startup/startup.go 407: Determined node name: 10-1-0-42-mesh-master
2021-04-28 03:34:31.877 [INFO][9] startup/startup.go 439: Checking datastore connection
2021-04-28 03:34:31.883 [INFO][9] startup/startup.go 463: Datastore connection verified
2021-04-28 03:34:31.883 [INFO][9] startup/startup.go 112: Datastore is ready
2021-04-28 03:34:31.886 [INFO][9] startup/customresource.go 101: Error getting resource Key=GlobalFelixConfig(name=CalicoVersion) Name="calicoversion" Resource="GlobalFelixConfigs" error=the server could not find the requested resource (get GlobalFelixConfigs.crd.projectcalico.org calicoversion)
2021-04-28 03:34:31.904 [INFO][9] startup/startup.go 505: Initialize BGP data
2021-04-28 03:34:31.905 [WARNING][9] startup/startup.go 769: Unable to auto-detect an IPv4 address using interface regexes [ens.*]: no valid host interfaces found
2021-04-28 03:34:31.905 [WARNING][9] startup/startup.go 527: Couldn't autodetect an IPv4 address. If auto-detecting, choose a different autodetection method. Otherwise provide an explicit address.
2021-04-28 03:34:31.905 [INFO][9] startup/startup.go 343: Clearing out-of-date IPv4 address from this node IP=""
2021-04-28 03:34:31.909 [WARNING][9] startup/startup.go 1331: Terminating
Calico node failed to start
可以到看到有这样的错误:startup/startup.go 769: Unable to auto-detect an IPv4 address using interface regexes [ens.*]: no valid host interfaces found
正则表达式没有找到 [ens.*],查看一下本机使用的是什么网卡
[root@1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc mq state UP group default qlen 1000
link/ether fa:16:3e:a7:11:9a brd ff:ff:ff:ff:ff:ff
inet 10.1.0.42/24 brd 10.1.0.255 scope global noprefixroute dynamic eth0
valid_lft 65868sec preferred_lft 65868sec
inet6 fe80::d662:97b7:3976:db84/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:ab:79:73:d5 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
果然,本机使用的是 eth0 这样肯定是找不到的。修改一下 calico 配置
env:
- name: IP_AUTODETECTION_METHOD
value: "interface=ens.*"
# Use Kubernetes API as the backing datastore.
- name: DATASTORE_TYPE
value: "kubernetes"
# Wait for the datastore.
- name: WAIT_FOR_DATASTORE
value: "true"
# Set based on the k8s node name.
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# Choose the backend to use.
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
将 value: "interface=ens.*"
修改为直接指定网卡名称也可以以 .*
结尾匹配。
更新 calico 之后再一次查看 pod
[root@1 ~]# kubectl get pod -n kube-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-6d4bfc7c57-87gnm 1/1 Running 0 60m 192.168.22.65 10-1-0-34-mesh-node <none> <none>
calico-node-785cd 1/1 Running 0 27s 10.1.0.42 10-1-0-42-mesh-master <none> <none>
calico-node-vb5m7 1/1 Running 0 27s 10.1.0.34 10-1-0-34-mesh-node <none> <none>
coredns-9d85f5447-kcp7d 1/1 Running 0 60m 192.168.22.66 10-1-0-34-mesh-node <none> <none>
coredns-9d85f5447-qc2kh 1/1 Running 0 60m 192.168.22.67 10-1-0-34-mesh-node <none> <none>
etcd-10-1-0-42-mesh-master 1/1 Running 0 60m 10.1.0.42 10-1-0-42-mesh-master <none> <none>
kube-apiserver-10-1-0-42-mesh-master 1/1 Running 0 60m 10.1.0.42 10-1-0-42-mesh-master <none> <none>
kube-controller-manager-10-1-0-42-mesh-master 1/1 Running 0 60m 10.1.0.42 10-1-0-42-mesh-master <none> <none>
kube-proxy-blm42 1/1 Running 0 60m 10.1.0.34 10-1-0-34-mesh-node <none> <none>
kube-proxy-gcj7k 1/1 Running 0 60m 10.1.0.42 10-1-0-42-mesh-master <none> <none>
kube-scheduler-10-1-0-42-mesh-master 1/1 Running 0 60m 10.1.0.42 10-1-0-42-mesh-master <none> <none>
完成