学习kubernetes(三):部署非IPIP模式的Calico的k8s测试环境

前面《学习kubernetes(二):kubeasz部署基于Calico的k8s测试环境》是按照默认步骤部署的calico网络,是IPIP模式,就是说跨node的通信是通过tunl0封装成IP-in-IP形式,本质上跟Flannel还是很类似的,并未将docker的IP暴露出来。下面记录一下关掉IPIP之后的部署验证过程

环境准备

  • 参考前文
  • 在最后ansible运行playbook之前,一定要将IPIP的值从Always改为off
[root@k8s-master ansible]# cat /etc/ansible/roles/calico/defaults/main.yml | grep IPIP
# 设置 CALICO_IPV4POOL_IPIP=“off”,可以提高网络性能,条件限制详见 docs/setup/calico.md
CALICO_IPV4POOL_IPIP: "off"
[root@k8s-master ansible]# 

部署

  • 执行所有的playbook
  • 部署完毕后查看node状态
[root@k8s-master ansible]# kubectl get nodes
NAME              STATUS                     ROLES    AGE     VERSION
192.168.122.135   Ready,SchedulingDisabled   master   4m49s   v1.17.2
192.168.122.143   Ready                      node     3m19s   v1.17.2
192.168.122.198   Ready                      node     3m19s   v1.17.2
  • 解决SchedulingDisabled的问题
[root@k8s-master ansible]#  kubectl uncordon 192.168.122.135
node/192.168.122.135 uncordoned
[root@k8s-master ansible]# 
[root@k8s-master ansible]# kubectl get nodes                
NAME              STATUS   ROLES    AGE     VERSION
192.168.122.135   Ready    master   5m19s   v1.17.2
192.168.122.143   Ready    node     3m49s   v1.17.2
192.168.122.198   Ready    node     3m49s   v1.17.2
[root@k8s-master ansible]# 
  • 查看BGP peer状态,此时是full-mesh连接,也就是三个节点彼此两两都要建立BGP邻居
[root@k8s-node-1 ~]#  ln -s /opt/kube/bin/calicoctl /usr/local/sbin/calicoctl
[root@k8s-node-1 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+-----------------+-------------------+-------+----------+-------------+
|  PEER ADDRESS   |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+-----------------+-------------------+-------+----------+-------------+
| 192.168.122.135 | node-to-node mesh | up    | 09:40:23 | Established |
| 192.168.122.198 | node-to-node mesh | up    | 09:40:24 | Established |
+-----------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

[root@k8s-node-1 ~]# 
  • 启动四个busybox的pod,用来验证跨node通信
[root@k8s-master ansible]# kubectl run test --image=busybox --replicas=4 sleep 30000 
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/test created
[root@k8s-master ansible]# 
[root@k8s-master ansible]# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE              NOMINATED NODE   READINESS GATES
test-7b48b75784-2bz2g   1/1     Running   0          14s   172.20.140.66    192.168.122.198   <none>           <none>
test-7b48b75784-7cxls   1/1     Running   0          14s   172.20.235.193   192.168.122.135   <none>           <none>
test-7b48b75784-hdx2m   1/1     Running   0          14s   172.20.109.67    192.168.122.143   <none>           <none>
test-7b48b75784-qnc6m   1/1     Running   0          14s   172.20.235.192   192.168.122.135   <none>           <none>
[root@k8s-master ansible]# 

查看

  • 此时发现在node上不再有tunnel0接口
[root@k8s-node-1 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:28:0f:64 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.143/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
       valid_lft 2927sec preferred_lft 2927sec
    inet6 fe80::bd31:baa6:a345:4bf1/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::b2b6:fc3a:4364:85a9/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::e489:9d94:404b:2b9a/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:3f:d0:36:9b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b2:30:6c:38:be:b6 brd ff:ff:ff:ff:ff:ff
5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 12:75:4a:0b:d9:4e brd ff:ff:ff:ff:ff:ff
    inet 10.68.0.1/32 brd 10.68.0.1 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.0.2/32 brd 10.68.0.2 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.173.81/32 brd 10.68.173.81 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.172.108/32 brd 10.68.172.108 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.145.230/32 brd 10.68.145.230 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.142.144/32 brd 10.68.142.144 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
6: cali3720ab0e5c8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
7: cali58c377b0851@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
8: cali9682b7d9b6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
9: calidd63611da70@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
[root@k8s-node-1 ~]# 
  • busybox的IP直接就出现在node的路由表中
[root@k8s-node-1 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:28:0f:64 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.143/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
       valid_lft 2927sec preferred_lft 2927sec
    inet6 fe80::bd31:baa6:a345:4bf1/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::b2b6:fc3a:4364:85a9/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::e489:9d94:404b:2b9a/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:3f:d0:36:9b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b2:30:6c:38:be:b6 brd ff:ff:ff:ff:ff:ff
5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 12:75:4a:0b:d9:4e brd ff:ff:ff:ff:ff:ff
    inet 10.68.0.1/32 brd 10.68.0.1 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.0.2/32 brd 10.68.0.2 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.173.81/32 brd 10.68.173.81 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.172.108/32 brd 10.68.172.108 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.145.230/32 brd 10.68.145.230 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.142.144/32 brd 10.68.142.144 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
6: cali3720ab0e5c8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
7: cali58c377b0851@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
8: cali9682b7d9b6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
9: calidd63611da70@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
[root@k8s-node-1 ~]# 
  • 路由表上
    • 本地的pod的主机地址,通过calixxx的veth-pair可达
    • 其它node上的pod,node的IP作为下一跳,目的网段是calico分配给node的可用地址段
[root@k8s-master ~]# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" get --prefix /calico/ipam/v2/host
/calico/ipam/v2/host/k8s-master/ipv4/block/172.20.235.192-26
{"state":"confirmed"}
/calico/ipam/v2/host/k8s-node-1/ipv4/block/172.20.109.64-26
{"state":"confirmed"}
/calico/ipam/v2/host/k8s-node-2/ipv4/block/172.20.140.64-26
{"state":"confirmed"}
[root@k8s-master ~]# 

[root@k8s-node-1 ~]# ip route
default via 192.168.122.1 dev eth0 proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
172.20.109.64 dev cali3720ab0e5c8 scope link 
blackhole 172.20.109.64/26 proto bird 
172.20.109.65 dev cali58c377b0851 scope link 
172.20.109.66 dev cali9682b7d9b6c scope link 
172.20.109.67 dev calidd63611da70 scope link 
172.20.140.64/26 via 192.168.122.198 dev eth0 proto bird 
172.20.235.192/26 via 192.168.122.135 dev eth0 proto bird 
192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.143 metric 100 
[root@k8s-node-1 ~]# 

验证互通

  • 从master上的busybox去ping node-2上的busybox,并在node-2的eth0端口抓包
# master上busybox发起ping
[root@k8s-master ~]# kubectl exec -it test-7b48b75784-7cxls ping 172.20.140.66
PING 172.20.140.66 (172.20.140.66): 56 data bytes
64 bytes from 172.20.140.66: seq=0 ttl=62 time=0.846 ms
64 bytes from 172.20.140.66: seq=1 ttl=62 time=0.496 ms
64 bytes from 172.20.140.66: seq=2 ttl=62 time=0.517 ms

# node-2上抓包,发现就是普通的ICMP报文,没有经过封装
[root@k8s-node-2 ~]# tcpdump -i eth0 -ennX host 172.20.235.193
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:45:07.927559 52:54:00:b9:d6:16 > 52:54:00:c2:6f:1e, ethertype IPv4 (0x0800), length 98: 172.20.235.193 > 172.20.140.66: ICMP echo request, id 2816, seq 0, length 64
        0x0000:  4500 0054 c3f5 4000 3f01 a786 ac14 ebc1  E..T..@.?.......
        0x0010:  ac14 8c42 0800 b452 0b00 0000 6de8 cac4  ...B...R....m...
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0050:  0000 0000                                ....
09:45:07.927765 52:54:00:c2:6f:1e > 52:54:00:b9:d6:16, ethertype IPv4 (0x0800), length 98: 172.20.140.66 > 172.20.235.193: ICMP echo reply, id 2816, seq 0, length 64
        0x0000:  4500 0054 80f6 0000 3f01 2a86 ac14 8c42  E..T....?.*....B
        0x0010:  ac14 ebc1 0000 bc52 0b00 0000 6de8 cac4  .......R....m...
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0050:  0000 0000      
  • 跟踪通信,container(master)->eth0(master)->eth0(node-2)->container(node-2)
[root@k8s-master ~]# kubectl exec -it test-7b48b75784-7cxls traceroute 172.20.140.66
traceroute to 172.20.140.66 (172.20.140.66), 30 hops max, 46 byte packets
 1  192-168-122-135.kubernetes.default.svc.cluster.local (192.168.122.135)  0.018 ms  0.006 ms  0.004 ms
 2  192.168.122.198 (192.168.122.198)  0.302 ms  0.398 ms  0.229 ms
 3  172.20.140.66 (172.20.140.66)  0.274 ms  0.488 ms  0.339 ms
[root@k8s-master ~]# 

小结

直接复制粘贴了

从上图可以看出,当容器创建时,calico为容器生成veth pair,一端作为容器网卡加入到容器的网络命名空间,并设置IP和掩码,一端直接暴露在宿主机上,
并通过设置路由规则,将容器IP暴露到宿主机的通信路由上。于此同时,calico为每个主机分配了一段子网作为容器可分配的IP范围,这样就可以根据子网的
CIDR为每个主机生成比较固定的路由规则。

当容器需要跨主机通信时,主要经过下面的简单步骤:
1)容器流量通过veth pair到达宿主机的网络命名空间上。
2)根据容器要访问的IP所在的子网CIDR和主机上的路由规则,找到下一跳要到达的宿主机IP。
3)流量到达下一跳的宿主机后,根据当前宿主机上的路由规则,直接到达对端容器的veth pair插在宿主机的一端,最终进入容器。

从上面的通信过程来看,跨主机通信时,整个通信路径完全没有使用NAT或者UDP封装,性能上的损耗确实比较低。但正式由于calico的通信机制是完全基于三层的,这种机制也带来了一些缺陷,例如:
1)calico目前只支持TCP、UDP、ICMP、ICMPv6协议,如果使用其他四层协议(例如NetBIOS协议),建议使用weave、原生overlay等其他overlay网络实现。
2)基于三层实现通信,在二层上没有任何加密包装,因此只能在私有的可靠网络上使用。

原文链接:https://blog.csdn.net/ccy19910925/article/details/82423452

存疑

  • 在部署的过程中遇到一个问题,就是ASN的值显示为unknown
[root@k8s-master ansible]# calicoctl get node -o wide
NAME         ASN         IPV4                 IPV6   
k8s-master   (unknown)   192.168.122.27/24           
k8s-node-1   (unknown)   192.168.122.212/24          
k8s-node-2   (unknown)   192.168.122.141/24          
  • 据说是因为使用了默认的AS号,所以这里显示unknown,但是不妨碍BGP邻居的建立
  • 另外发现没有默认的bgpconfig
[root@k8s-master ansible]# calicoctl get bgpconfig default
Failed to get resources: resource does not exist: BGPConfiguration(default) with error: <nil>
[root@k8s-master ansible]# 
[root@k8s-master templates]# calicoctl get bgpPeer -o yaml
apiVersion: projectcalico.org/v3
items: []
kind: BGPPeerList
metadata:
  resourceVersion: "156403"
[root@k8s-master templates]# 
  • 按照网上的文档手动添加一个
[root@k8s-master ansible]# cat << EOF | calicoctl create -f -
  apiVersion: projectcalico.org/v3
  kind: BGPConfiguration
  metadata:
    name: default
  spec:
    logSeverityScreen: Info
    nodeToNodeMeshEnabled: false
    asNumber: 63400
EOF 
Successfully created 1 'BGPConfiguration' resource(s)
[root@k8s-master ansible]# 
  • 结果悲剧了,BGP的邻居都不见了,没有路由同步了,跨node的docker之间的ping也不通了
[root@k8s-master ansible]#  calicoctl node status
Calico process is running.

IPv4 BGP status
No IPv4 peers found.

IPv6 BGP status
No IPv6 peers found.

[root@k8s-master ansible]# 
  • 没有找到好的恢复的方法,只能重新装机部署一遍
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值