Kubernetes 二进制部署 《easzlab / kubeasz项目部署》- 06-安装网络组件

06-安装网络组件

首先回顾下K8S网络设计原则,在配置集群网络插件或者实践K8S 应用/服务部署请牢记这些原则:

  • 1.每个Pod都拥有一个独立IP地址,Pod内所有容器共享一个网络命名空间
  • 2.集群内所有Pod都在一个直接连通的扁平网络中,可通过IP直接访问
    - 所有容器之间无需NAT就可以直接互相访问
    - 所有Node和所有容器之间无需NAT就可以直接互相访问
    - 容器自己看到的IP跟其他容器看到的一样
  • 3.Service cluster IP只可在集群内部访问,外部请求需要通过NodePort、LoadBalance或者Ingress来访问

Container Network Interface (CNI)是目前CNCF主推的网络模型,它由两部分组成:

  • CNI Plugin负责给容器配置网络,它包括两个基本的接口
    • 配置网络: AddNetwork(net *NetworkConfig, rt *RuntimeConf) (types.Result, error)
    • 清理网络: DelNetwork(net *NetworkConfig, rt *RuntimeConf) error
  • IPAM Plugin负责给容器分配IP地址

Kubernetes Pod的网络是这样创建的:

    1. 每个Pod除了创建时指定的容器外,都有一个kubelet启动时指定的基础容器,即pause容器
    1. kubelet创建基础容器生成network namespace
    1. kubelet调用网络CNI driver,由它根据配置调用具体的CNI 插件
    1. CNI 插件给基础容器配置网络
    1. Pod 中其他的容器共享使用基础容器的网络

本项目基于CNI driver 调用各种网络插件来配置kubernetes的网络,常用CNI插件有 flannel calico cilium等等,这些插件各有优势,也在互相借鉴学习优点,比如:在所有node节点都在一个二层网络时候,flannel提供hostgw实现,避免vxlan实现的udp封装开销,估计是目前最高效的;calico也针对L3 Fabric,推出了IPinIP的选项,利用了GRE隧道封装;因此这些插件都能适合很多实际应用场景。

项目当前内置支持的网络插件有:calico cilium flannel kube-ovn kube-router

安装calico网络组件

本次以calico为例,进行安装演示

calico 是k8s社区最流行的网络插件之一,也是k8s-conformance test 默认使用的网络插件,功能丰富,支持network policy;是当前kubeasz项目的默认网络插件。

如果需要安装calico,请在clusters/xxxx/hosts文件中设置变量 CLUSTER_NETWORK="calico"

roles/calico/
├── tasks
│   └── main.yml
├── templates
│   ├── calico-csr.json.j2
│   ├── calicoctl.cfg.j2
│   ├── calico-v3.15.yaml.j2
│   ├── calico-v3.19.yaml.j2
│   └── calico-v3.8.yaml.j2
└── vars
    └── main.yml

roles/calico/tasks/main.yml文件,对照看以下讲解内容

创建calico 证书申请
{
  "CN": "calico",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "HangZhou",
      "L": "XS",
      "O": "k8s",
      "OU": "System"
    }
  ]
}

calico 使用客户端证书,所以hosts字段可以为空;后续可以看到calico证书用在四个地方:

  • calico/node 这个docker 容器运行时访问 etcd 使用证书
  • cni 配置文件中,cni 插件需要访问 etcd 使用证书
  • calicoctl 操作集群网络时访问 etcd 使用证书
  • calico/kube-controllers 同步集群网络策略时访问 etcd 使用证书
创建 calico DaemonSet yaml文件和rbac 文件

请对照 roles/calico/templates/calico.yaml.j2文件注释和以下注意内容

  • 详细配置参数请参考calico官方文档
  • 配置ETCD_ENDPOINTS 、CA、证书等,所有{{ }}变量与ansible hosts文件中设置对应
  • 配置集群POD网络 CALICO_IPV4POOL_CIDR={{ CLUSTER_CIDR }}
  • 配置FELIX_DEFAULTENDPOINTTOHOSTACTION=ACCEPT 默认允许Pod到Node的网络流量
安装calico 网络
  • 安装前检查主机名不能有大写字母,只能由小写字母- .组成 (name must consist of lower case alphanumeric characters, '-' or '.' (regex: a-z0-9?(.a-z0-9?)*))(calico-node v3.0.6以上已经解决主机大写字母问题)
  • 安装前必须确保各节点主机名不重复 ,calico node name 由节点主机名决定,如果重复,那么重复节点在etcd中只存储一份配置,BGP 邻居也不会建立。
  • 安装之前必须确保kube_master和kube_node节点已经成功部署
  • 轮询等待calico 网络插件安装完成,删除之前kube_node安装时默认cni网络配置
[可选]配置calicoctl工具 calicoctl.cfg.j2
apiVersion: projectcalico.org/v3
kind: CalicoAPIConfig
metadata:
spec:
  datastoreType: "etcdv3"
  etcdEndpoints: {{ ETCD_ENDPOINTS }}
  etcdKeyFile: /etc/calico/ssl/calico-key.pem
  etcdCertFile: /etc/calico/ssl/calico.pem
  etcdCACertFile: {{ ca_dir }}/ca.pem
异常1

安装过程中发现异常

[root@k8s-master-01 kubeasz]# ./ezctl setup k8s-cluster-01 06
...
TASK [准备 calicoctl配置文件] ******************************************************************************************************************
changed: [192.168.17.241]
changed: [192.168.17.240]
changed: [192.168.17.201]
changed: [192.168.17.200]
FAILED - RETRYING: 轮询等待calico-node 运行 (15 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (15 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (15 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (15 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (14 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (14 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (14 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (14 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (13 retries left).
FAILED - RETRYING: 轮询等待calico-node 运行 (13 retries left).

kubelet运行异常

[root@k8s-master-01 kubeasz]# systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2024-03-05 15:00:51 CST; 22h ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 24429 (kubelet)
    Tasks: 11
   Memory: 95.1M
   CGroup: /system.slice/kubelet.service
           └─24429 /usr/local/bin/kubelet --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --hostname-override=master-01 --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --root-dir=/var/lib/kubelet --v=2

Mar 06 13:30:22 k8s-master-01.xx.net kubelet[24429]: E0306 13:30:22.255763   24429 kubelet.go:2855] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
...

containerd运行异常

[root@k8s-master-01 kubeasz]# systemctl status containerd -l
● containerd.service - containerd container runtime
   Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2024-03-05 14:18:38 CST; 23h ago
     Docs: https://containerd.io
 Main PID: 18412 (containerd)
    Tasks: 10
   Memory: 44.6M
   CGroup: /system.slice/containerd.service
           └─18412 /usr/local/bin/containerd-bin/containerd

...
Mar 05 14:18:38 k8s-master-01.xx.net containerd[18412]: time="2024-03-05T14:18:38.457903693+08:00" level=info msg="containerd successfully booted in 0.035499s"
Mar 05 15:01:20 k8s-master-01.xx.net containerd[18412]: time="2024-03-05T15:01:20.031604884+08:00" level=info msg="CNI config is successfully loaded, skip generating cni config from template \"/etc/cni/net.d/10-default.conf\""
Mar 05 15:01:54 k8s-master-01.xx.net containerd[18412]: time="2024-03-05T15:01:54.444352626+08:00" level=error msg="ContainerStatus for \"830e84c74c0bf11e312139e0b190211fc15449540b2fd5defb21065d6b20ff4b\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"830e84c74c0bf11e312139e0b190211fc15449540b2fd5defb21065d6b20ff4b\": not found"
Mar 05 15:01:54 k8s-master-01.xx.net containerd[18412]: time="2024-03-05T15:01:54.507137817+08:00" level=error msg="ContainerStatus for \"94437b2f51971e98f34b24febb509c133c937837252fb7999c577b5dfb6c281b\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"94437b2f51971e98f34b24febb509c133c937837252fb7999c577b5dfb6c281b\": not found"
Mar 06 13:28:30 k8s-master-01.xx.net containerd[18412]: time="2024-03-06T13:28:30.869047681+08:00" level=error msg="failed to reload cni configuration after receiving fs change event(\"/etc/cni/net.d/10-default.conf\": REMOVE)" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

msg="failed to reload cni configuration after receiving fs change event(\"/etc/cni/net.d/10-default.conf\": REMOVE)"
发现在节点机没有10-default.conf

[root@k8s-node-01 ~]# ll /etc/cni/net.d/10-default.conf
ls: cannot access /etc/cni/net.d/10-default.conf: No such file or directory

重新安装kube-node

./ezctl setup k8s-cluster-01 05
./ezctl setup k8s-cluster-01 06

再次查看发现仍有问题

[root@k8s-master-01 kubeasz]# kubectl get pod -n kube-system
NAME                                       READY   STATUS    RESTARTS            AGE
calico-kube-controllers-86b55cf789-bmvht   1/1     Running   0                   4m28s
calico-node-bf9v5                          0/1     Running   3 (<invalid> ago)   4m27s
calico-node-hdtlk                          0/1     Running   3 (46s ago)         4m28s
calico-node-mg6px                          0/1     Running   2 (<invalid> ago)   4m28s
calico-node-x6jb2                          0/1     Running   2 (<invalid> ago)   4m28s
异常2
Warning  Unhealthy  45s   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused

由于笔者的虚拟机配置的双网卡,IP_AUTODETECTION_METHOD 配置项默认为first-found,这种模式中calico会使用第一获取到的有效网卡,虽然会排除docker网络,localhost啥的,但是在复杂网络环境下还是有出错的可能。笔者猜测在这次异常中master1上的calico选择了另外一张网卡ens33,而本次部署使用的网络是第二张网卡的17网段(内网网段)。

[root@k8s-master-01 kubeasz]# ip a
...
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:bc:d6:54 brd ff:ff:ff:ff:ff:ff
    inet 192.168.81.200/24 brd 192.168.81.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:febc:d654/64 scope link
       valid_lft forever preferred_lft forever
3: ens34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:bc:d6:5e brd ff:ff:ff:ff:ff:ff
    inet 192.168.17.200/24 brd 192.168.17.255 scope global noprefixroute ens34
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:febc:d65e/64 scope link
       valid_lft forever preferred_lft forever

在一个网友的帖子里了解到:

在 Calico 中,IP_AUTODETECTION_METHOD 的配置项用于指定 Calico 如何检测容器的 IP 地址。

一、kubernetes-internal-ip模式
其中,kubernetes-internal-ip 是一种特殊的模式,用于在 Kubernetes 环境中检测容器的 IP 地址。具体作用如下:

kubernetes-internal-ip 模式:

作用: Calico 使用 kubernetes-internal-ip 模式时,它将尝试从 Kubernetes 的 API 服务器获取容器的 IP 地址。
使用场景: 适用于运行在 Kubernetes 集群中的容器环境。在 Kubernetes 中,每个 Pod 都有一个 Pod IP,通过这种模式,
Calico 可以直接从 Kubernetes API 获取到 Pod 的 IP 地址,而无需通过其他手段进行检测。

使用 kubernetes-internal-ip 模式有以下优势:

- 集成度高: 与 Kubernetes 紧密集成,直接利用 Kubernetes API 获取 IP 地址。
- 减少额外配置: 无需额外的网络配置或手动干预,Calico 可以直接获取 Kubernetes 中分配的 IP 地址。
- 避免复杂网络环境下的问题: 由于直接利用 Kubernetes 的 IP 分配机制,可以避免复杂网络环境下的一些问题。

在使用 kubernetes-internal-ip 模式时,确保 Kubernetes 集群正常运行,Calico 与 Kubernetes API 服务器正常通信,并且 Pod 的 IP 地址能够被正确分配和获取。这样,Calico 将能够有效地管理和路由容器的流量。

二、其他的模式

first-found:把本机网卡的所有IP列出来,然后选择第一个IP(docker0与lo网卡的IP会被忽略掉),第一个IP会因为操作系统的不同而不同,比如在centos操作系统中,假设网卡ens33有一个IP 192.168.92.101,网卡ens37有两个IP 192.168.90.101(primray)和192.168.90.10(second),那么找到的IP就是192.168.90.101(好像是按每个网卡的primary IP升序排列)

can-reach=x.x.x.x:通过哪个IP能到达x.x.x.x,就选择哪个IP。如果一个网卡有多个IP,都可以到达x.x.x.x,则选择哪一个(待验证)。另外x.x.x.x可以换成域名比如can-reach=www.baidu.com

interface=INTERFACE-REGEX:通过interface的名字正则表达式来寻找IP,比如interface=ens33或interface=ens*。

skip-interface=INTERFACE-REGEX:先过滤掉某些网卡,然后使用first-found方法。

cidr=CIDR:通过CIDR来寻找。比如cidr=192.168.92.0/24,某主机有网卡ens33(192.168.92.101)和ens37(192.168.90.101),那么选择的IP就是192.168.92.101

原文链接:https://blog.csdn.net/summer_fish/article/details/135146119

笔者尝试把IP_AUTODETECTION_METHOD改成 can-reach=www.baidu.com,又发现新的问题

异常3

按照前面两个异常处理方法并没有解决问题,仍然报错

 Warning  Unhealthy  6m14s  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0307 16:23:28.110796      22 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  6m13s  kubelet  Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0307 16:23:29.077775      51 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  6m8s  kubelet  Readiness probe failed: 2024-03-07 16:23:34.291 [INFO][163] confd/health.go 180: Number of node(s) with BGP peering established = 2
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.17.241
W0307 16:23:34.284060     163 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  5m58s  kubelet  Readiness probe failed: 2024-03-07 16:23:44.274 [INFO][196] confd/health.go 180: Number of node(s) with BGP peering established = 3
calico/node is not ready: felix is not ready: Get "http://localhost:9099/readiness": dial tcp: lookup localhost on 8.8.8.8:53: no such host

最后找到一个网友,和笔者病症相同K8S的calico-node错误日志 listen tcp: lookup localhost on 8.8.4.4:53: no such host(容器无法正常运行)

需要把本地回环地址添加回/etc/hosts,问题解决

[root@k8s-master-01 kubeasz]# kubectl get pod -n kube-system -o wide
NAME                                       READY   STATUS    RESTARTS   AGE     IP               NODE        NOMINATED NODE   READINESS GATES
calico-kube-controllers-86b55cf789-6zfl5   1/1     Running   0          11m     192.168.17.241   worker-02   <none>           <none>
calico-node-b9jbd                          1/1     Running   0          31s     192.168.17.240   worker-01   <none>           <none>
calico-node-k7vqt                          1/1     Running   0          7m30s   192.168.17.200   master-01   <none>           <none>
calico-node-r6xbz                          1/1     Running   0          37s     192.168.17.241   worker-02   <none>           <none>
calico-node-xmzpz                          1/1     Running   0          45s     192.168.17.201   master-02   <none>           <none>
查看网卡和路由信息
  • 先在集群创建几个测试pod: kubectl run test --image=busybox --replicas=3 sleep 30000
kubectl run test --image=busybox  sleep 30000
kubectl run test1 --image=busybox  sleep 30000
kubectl run test2 --image=busybox  sleep 30000
[root@k8s-master-01 kubeasz]# kubectl get pod -n default -o wide
NAME    READY   STATUS    RESTARTS   AGE     IP               NODE        NOMINATED NODE   READINESS GATES
test    1/1     Running   0          4m32s   10.200.238.193   worker-01   <none>           <none>
test1   1/1     Running   0          62s     10.200.35.194    worker-02   <none>           <none>
test2   1/1     Running   0          4m23s   10.200.238.194   worker-01   <none>           <none>
  • node节点查看网卡信息
[root@k8s-node-02 ~]# ip a
...
6: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.200.35.192/32 scope global tunl0
       valid_lft forever preferred_lft forever
8: cali99c376db89a@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
- 可以看到包含类似cali1cxxx的网卡,是calico为测试pod生成的
-  tunl0网卡是默认生成的,当开启IPIP 特性时使用的隧道
  • 查看路由
[root@k8s-node-02 ~]# ip route
default via 192.168.81.1 dev ens33 proto static metric 100
default via 192.168.17.1 dev ens34 proto static metric 101
10.200.25.128/26 via 192.168.17.200 dev tunl0 proto bird onlink
10.200.31.0/26 via 192.168.17.201 dev tunl0 proto bird onlink
blackhole 10.200.35.192/26 proto bird
10.200.35.194 dev cali99c376db89a scope link
10.200.238.192/26 via 192.168.17.240 dev tunl0 proto bird onlink
192.168.17.0/24 dev ens34 proto kernel scope link src 192.168.17.241 metric 101
192.168.81.0/24 dev ens33 proto kernel scope link src 192.168.81.241 metric 100
  • 查看所有calico节点状态
[root@k8s-master-01 kubeasz]# calicoctl node status
Calico process is running.

IPv4 BGP status
+----------------+-------------------+-------+----------+-------------+
|  PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+----------------+-------------------+-------+----------+-------------+
| 192.168.17.201 | node-to-node mesh | up    | 16:34:34 | Established |
| 192.168.17.240 | node-to-node mesh | up    | 16:34:48 | Established |
| 192.168.17.241 | node-to-node mesh | up    | 16:34:42 | Established |
+----------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
  • BGP 协议是通过TCP 连接来建立邻居的,因此可以用netstat 命令验证 BGP Peer
[root@k8s-master-01 kubeasz]# netstat -antlp|grep ESTABLISHED|grep 179
tcp        0      0 192.168.17.200:179      192.168.17.201:60874    ESTABLISHED 83885/bird
tcp        0      0 192.168.17.200:6443     192.168.17.201:51798    ESTABLISHED 986/kube-apiserver
tcp        0      0 192.168.17.200:179      192.168.17.241:58412    ESTABLISHED 83885/bird
tcp        0      0 192.168.17.200:179      192.168.17.240:52144    ESTABLISHED 83885/bird
  • 查看etcd中calico相关信息
    因为这里calico网络使用etcd存储数据,所以可以在etcd集群中查看数据
[root@k8s-etcd-01 ~]# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" get --prefix /calico
...
[root@k8s-etcd-01 ~]# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" get --prefix /calico/ipam/v2/host
/calico/ipam/v2/host/k8s-master-01.xx.net/ipv4/block/10.200.25.128-26
{"state":"confirmed","deleted":false}
/calico/ipam/v2/host/k8s-master-02.xx.net/ipv4/block/10.200.31.0-26
{"state":"confirmed","deleted":false}
/calico/ipam/v2/host/k8s-node-01.xx.net/ipv4/block/10.200.238.192-26
{"state":"confirmed","deleted":false}
/calico/ipam/v2/host/k8s-node-02.xx.net/ipv4/block/10.200.35.192-26
{"state":"confirmed","deleted":false}
  • 17
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值