rancher : calico网络故障排查(calico/node is not ready: BIRD is not ready)

1.执行kubectl get pod -o wide -n kube-system,发现有一个节点的calico-node没有出入READY状态,如下: 

[root@localhost ~]# kubectl get pod -n kube-system 
NAME                                      READY   STATUS      RESTARTS   AGE
calico-kube-controllers-78467476b-8nlm4   1/1     Running     1          42m
calico-node-7s722                         0/1     Running     0          5m17s
calico-node-s4c7v                         0/1     Running     0          5m17s
coredns-67c6c9cf5f-2cdfc                  1/1     Running     0          3m45s
coredns-67c6c9cf5f-tr7xx                  1/1     Running     0          3m45s
coredns-autoscaler-7fc7b45c69-rmpzl       1/1     Running     1          42m
metrics-server-bc9d8649-mgvnn             1/1     Running     1          42m
rke-coredns-addon-deploy-job-v7hbx        0/1     Completed   0          42m
rke-ingress-controller-deploy-job-hkm6j   0/1     Completed   0          42m
rke-metrics-addon-deploy-job-kvnl6        0/1     Completed   0          42m
rke-network-plugin-deploy-job-8rjhp       0/1     Completed   0          42m

2.执行[root@localhost ~]# kubectl describe pod/calico-node-jbcl5 -n kube-system ,发现有如下报错

calico/node is not ready: BIRD is not ready)

3.执行kubectl exec -ti calico-node-jbcl5  -n kube-system -- bash,进入calico-node,打开bird配置文件,发现router id为192.168.0.1,此IP是bride网桥地址,正常应该是ens194网卡地址:172.16.10.4,如下:

[root@localhost /]# cat /etc/calico/confd/config/bird.cfg 
# Generated by confd
include "bird_aggr.cfg";
include "bird_ipam.cfg";

router id 192.168.0.1;

上面的ip是网桥IP,calico的BGP采用物理设备(网卡)作为虚拟路由器实现路由功能

[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:b9:58:2b brd ff:ff:ff:ff:ff:ff
    inet 172.16.10.4/24 brd 172.16.10.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:feb9:582b/64 scope link 
       valid_lft forever preferred_lft forever

4.在master执行calicoctl node status

[root@localhost ~]# wget https://github.com/projectcalico/calicoctl/releases/download/v3.5.4/calicoctl -O /usr/bin/calicoctl


[root@localhost ~]# chmod +x /usr/bin/calicoctl


[root@localhost ~]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config calicoctl get nodes
NAME           
172.16.10.4    
172.16.10.87


或
[root@localhost ~]# mkdir /etc/calico
[root@localhost ~]# cat  /etc/calico/calicoctl.cfg
apiVersion: projectcalico.org/v3
kind: CalicoAPIConfig
metadata:
spec:
  datastoreType: "kubernetes"
  kubeconfig: "/root/.kube/config"

到此,基本可以确定是calico的BGP网卡设备识别错误导致

二、解决过程
1.查看calico官网上关于此问题的解释,如下:

官网链接:https://docs.projectcalico.org/archive/v3.18/networking/ip-autodetection

By default, Calico uses the first-found method; the first valid IP address on the first interface (excluding local interfaces such as the docker bridge). However, you can change the default method to any of the following:

Address used by the node to reach a particular IP or domain (can-reach)
Regex to include matching interfaces (interface)
Regex to exclude matching interfaces (skip-interface
大致意思是说:calico默认采用first-found方法,选择第一个接口的第一个有效IP地址(排除本地网桥接口),但是上面的情况,就是calico采用172.19.0.1网桥地址,这个比较疑惑

官网建议采用can-reach、interface、skip-inteface方式之一,通过修改 IP_AUTODETECTION_METHOD,让IP自动探测固定到某一个接口或者IP上,下面的方式采用正则表达式将接口限定在ens开头的网卡上:

kubectl set env daemonset.apps/calico-node  -n calico-system IP_AUTODETECTION_METHOD=interface=ens.*

本文采用上述方式修改后并未起作用:

注:本在的k8s环境按照calico环境时采用的是calico Installation方式

后来按照官网https://docs.projectcalico.org/reference/installation/api#operator.tigera.io/v1.CalicoNetworkSpec链接说明,增加如下:

    nodeAddressAutodetectionV4:
      interface: ens.*

 

2.重新执行:kubectl create -f calico-operator.yaml    kubectl create -f calico-custom-resources.yaml 问题解决

重新执行calicoctl node status

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值