问题现象
worker节点部署的calico-node 无法拉起,反复启动,日志信息如下
kubectl logs -f calico-node-hv4sf -nkube-system
2020-12-02 13:20:13.067 [INFO][8] startup.go 259: Early log level set to info
2020-12-02 13:20:13.067 [INFO][8] startup.go 275: Using NODENAME environment for node name
2020-12-02 13:20:13.067 [INFO][8] startup.go 287: Determined node name: xxx-work-1
2020-12-02 13:20:13.068 [INFO][8] k8s.go 228: Using Calico IPAM
2020-12-02 13:20:13.069 [INFO][8] startup.go 319: Checking datastore connection
2020-12-02 13:20:16.075 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: no route to host
2020-12-02 13:20:19.081 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: no route to host
2020-12-02 13:20:23.087 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: no route to host
2020-12-02 13:20:27.095 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: no route to host
问题定位
master集群双网络 172.31.0.0/16 10.0.0.0/24 worker单网络 172.31.0.0/16
私网 | 管理网 | |
---|---|---|
master-1 | 172.31.0.26 | 10.0.0.77 |
master-2 | 172.31.0.26 | 10.0.0.128 |
master-3 | 172.31.0.26 | 10.0.0.154 |
worker-1 | 172.31.0.23 | - |
worker-2 | 172.31.0.8 | - |
从worker节点 curl 10.96.0.1不通
apiserver svc 信息
kubectl get svc -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 29h <none>
endpoint 走的是10.0.0.0/xx 网络,因为k8s api-server 默认绑的是有默认网关的网卡
集群信息
kubectl cluster-info
Kubernetes master is running at https://k8s-cluster-ins-0029-master-vip.service.consul:6443
KubeDNS is running at https://k8s-cluster-ins-0029-master-vip.service.consul:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
kube-apiserver svc 信息
# kubectl get svc kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 25h
# kubectl get ep kubernetes
NAME ENDPOINTS AGE
kubernetes 10.0.0.133:6443,10.0.0.32:6443,10.0.0.50:6443 25h
查看worker信息
# iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
cali-PREROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
cali-OUTPUT all -- 0.0.0.0/0 0.0.0.0/0 /* cali:tVnHkvAo15HuiPy0 */
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
cali-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* cali:O3lYWMrLQYEMJtB5 */
KUBE-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */
MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0
Chain DOCKER (2 references)
target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-MARK-DROP (0 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK or 0x8000
Chain KUBE-MARK-MASQ (15 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000
Chain KUBE-NODEPORTS (1 references)
target prot opt source destination
Chain KUBE-POSTROUTING (1 references)
target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0 mark match ! 0x4000/0x4000
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xor 0x4000
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */
Chain KUBE-PROXY-CANARY (0 references)
target prot opt source destination
Chain KUBE-SEP-6AVEXVWMTAUJHVS6 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.131 0.0.0.0/0
DNAT udp -- 0.0.0.0/0 0.0.0.0/0 udp to:10.244.149.131:53
Chain KUBE-SEP-AJAH3OWF36MHDVF7 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.131 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.149.131:9153
Chain KUBE-SEP-F4NUFHPP6MV3U2FB (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.130 0.0.0.0/0
DNAT udp -- 0.0.0.0/0 0.0.0.0/0 udp to:10.244.149.130:53
Chain KUBE-SEP-ILEHVTEL5AKI6EAE (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.130 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.149.130:9153
Chain KUBE-SEP-N4P2JU5RW7IWUD2Z (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.229.131 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.229.131:3443
Chain KUBE-SEP-ORF6FH7KUHVWJER7 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.130 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.149.130:53
Chain KUBE-SEP-UMPZ2SD2APVNR4IN (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.131 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.149.131:53
Chain KUBE-SEP-6JBY7EOKHF37VPAE (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.50 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.50:6443
Chain KUBE-SEP-VW6RD437TCEB4BL4 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.133 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.133:6443
Chain KUBE-SEP-ZCIWMPUBNREXOPRW (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.32 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.32:6443
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-MARK-MASQ udp -- !10.244.0.0/16 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
KUBE-SVC-TCOU7JCQXEZGVUNU udp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
KUBE-SVC-JD5MR3NA4I4DYORP tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.107.150.63 /* orch-operator-system/orch-operator-webhook-service: cluster IP */ tcp dpt:3443
KUBE-SVC-6HOYT5WSPFV75AOP tcp -- 0.0.0.0/0 10.107.150.63 /* orch-operator-system/orch-operator-webhook-service: cluster IP */ tcp dpt:3443
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- 0.0.0.0/0 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-NODEPORTS all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
Chain KUBE-SVC-6HOYT5WSPFV75AOP (1 references)
target prot opt source destination
KUBE-SEP-N4P2JU5RW7IWUD2Z all -- 0.0.0.0/0 0.0.0.0/0
Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
target prot opt source destination
KUBE-SEP-ORF6FH7KUHVWJER7 all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.50000000000
KUBE-SEP-UMPZ2SD2APVNR4IN all -- 0.0.0.0/0 0.0.0.0/0
Chain KUBE-SVC-JD5MR3NA4I4DYORP (1 references)
target prot opt source destination
KUBE-SEP-ILEHVTEL5AKI6EAE all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.50000000000
KUBE-SEP-AJAH3OWF36MHDVF7 all -- 0.0.0.0/0 0.0.0.0/0
Chain KUBE-SVC-NPX46M4PTMTKRN6Y (1 references)
target prot opt source destination
KUBE-SEP-VW6RD437TCEB4BL4 all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.33333333349
KUBE-SEP-ZCIWMPUBNREXOPRW all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.50000000000
KUBE-SEP-6JBY7EOKHF37VPAE all -- 0.0.0.0/0 0.0.0.0/0
Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references)
target prot opt source destination
KUBE-SEP-F4NUFHPP6MV3U2FB all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.50000000000
KUBE-SEP-6AVEXVWMTAUJHVS6 all -- 0.0.0.0/0 0.0.0.0/0
Chain cali-OUTPUT (1 references)
target prot opt source destination
cali-fip-dnat all -- 0.0.0.0/0 0.0.0.0/0 /* cali:GBTAv2p5CwevEyJm */
Chain cali-POSTROUTING (1 references)
target prot opt source destination
cali-fip-snat all -- 0.0.0.0/0 0.0.0.0/0 /* cali:Z-c7XtVd2Bq7s_hA */
cali-nat-outgoing all -- 0.0.0.0/0 0.0.0.0/0 /* cali:nYKhEzDlr11Jccal */
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* cali:SXWvdsbh4Mw7wOln */ ADDRTYPE match src-type !LOCAL limit-out ADDRTYPE match src-type LOCAL
Chain cali-PREROUTING (1 references)
target prot opt source destination
cali-fip-dnat all -- 0.0.0.0/0 0.0.0.0/0 /* cali:r6XmIziWUJsdOK6Z */
Chain cali-fip-dnat (2 references)
target prot opt source destination
Chain cali-fip-snat (1 references)
target prot opt source destination
Chain cali-nat-outgoing (1 references)
target prot opt source destination
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* cali:flqWnvo8yq4ULQLa */ match-set cali40masq-ipam-pools src ! match-set cali40all-ipam-pools dst
经过分析 10.96.0.1 会被转发到 tcp to:10.0.0.50:6443 tcp to:10.0.0.133:6443 tcp to:10.0.0.32:6443
而 worker网段为 172.31.0.0/xx ,因此10.96.0.1无法访问 通过kubectl get ep kubernetes也验证了apiserver服务是转发到了到了10.0.0.0/24 网段的pod上
Chain KUBE-SEP-6JBY7EOKHF37VPAE (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.50 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.50:6443
Chain KUBE-SEP-VW6RD437TCEB4BL4 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.133 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.133:6443
Chain KUBE-SEP-ZCIWMPUBNREXOPRW (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.32 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.32:6443
解决方案
k8s创建时指定网卡
master1节点配置
cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.17.14
imageRepository: harbor.xxx.com/library/k8s.gcr.io
apiServer:
timeoutForControlPlane: 4m0s
certSANs:
- k8s-cluster-ins-0029-master-1.service.consul
- k8s-cluster-ins-0029-master-2.service.consul
- k8s-cluster-ins-0029-master-3.service.consul
controlPlaneEndpoint: k8s-cluster-ins-0029-master-vip.service.consul:6443 #此处为slb vip 端口
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/16 #podIP的范围
dns:
type: CoreDNS
etcd:
local:
dataDir: /data/etcd
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.31.0.26 #设置master要绑定的IP
master 2,3 节点添加命令
join_cmd=`kubeadm token create --print-join-command
$join_cmd --control-plane --apiserver-advertise-address=${主机绑定的ip}
kubeadm token create --print-join-command
kubeadm join k8s-cluster-ins-0029-master-vip.service.consul:6443 --token g9stix.vvinbvdt83ndeyoc --discovery-token-ca-cert-hash sha256:e966b388406a6a04b78c04d1d2a62b4a6a50799c37c708e5fadf6fabb7481231
参考
生成参考配置
kubeadm config print init-defaults
kubeadm config print init-defaults
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-cluster-ins-0029-master-1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.17.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
获取集群kubeadm实际配置
kubectl get cm kubeadm-config -n kube-system -oyaml
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
certSANs:
- k8s-cluster-ins-0029-master-1.service.consul
- k8s-cluster-ins-0029-master-2.service.consul
- k8s-cluster-ins-0029-master-3.service.consul
extraArgs:
advertise-address: 172.31.0.26
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: k8s-cluster-ins-0029-master-vip.service.consul:6443
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /data/etcd
imageRepository: harbor.xxx.com/library/k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.17.14
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
ClusterStatus: |
apiEndpoints:
k8s-cluster-ins-0029-master-1:
advertiseAddress: 10.0.0.77
bindPort: 6443
k8s-cluster-ins-0029-master-2:
advertiseAddress: 10.0.0.128
bindPort: 6443
k8s-cluster-ins-0029-master-3:
advertiseAddress: 10.0.0.154
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus
kind: ConfigMap
metadata:
creationTimestamp: "2020-12-03T09:29:09Z"
name: kubeadm-config
namespace: kube-system
resourceVersion: "665"
selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
uid: 78961de1-682a-49d0-8c8f-dc6d4e47ca04
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/
https://github.com/kubernetes/kubernetes/issues/33618
https://github.com/kubernetes/kubeadm/blob/master/docs/design/design_v1.9.md#optional-self-hosting
https://idig8.com/2019/08/08/zoujink8skubeadmdajian-kubernetes1-15-1jiqunhuanjing14/
https://feisky.gitbooks.io/kubernetes/content/troubleshooting/network.html
https://github.com/kubernetes/kubernetes/issues/33618
https://github.com/projectcalico/calico/issues/3092
https://github.com/projectcalico/calico/issues/2720