CNI 网络分析 4.4 Calico IPIP


理清和观测网络流量

环境

在这里插入图片描述
在这里插入图片描述
可以看到,在宿主机上有到每个 pod IP 的路由指向 veth 设备
到对端节点网段的路由 指向 tunl0 下一跳 ens10 的 ip
有到本节点网段 第一个 ip 即 tunl0 的流量 指向 blackhole,丢弃

流量分析

Pod 间

  • 同 node 不同 pod 之间
    pod1 <-> pod2
  1. 在 pod1 eth0 抓包:
00:37:59.442570 32:21:45:c4:c5:d5 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: 10.244.153.204 > 10.244.153.201: ICMP echo request, id 56616, seq 1, length 64
00:37:59.442707 ee:ee:ee:ee:ee:ee > 32:21:45:c4:c5:d5, ethertype IPv4 (0x0800), length 98: 10.244.153.201 > 10.244.153.204: ICMP echo reply, id 56616, seq 1, length 64

pod1 中下一跳都是 169.254.1.1,且目的 mac 是 ee:ee:ee:ee:ee:ee
2. 在 host 端 calice0906292e2 抓包不变
3. 匹配主机路由 10.244.153.201 dev cali118af4ccd16 scope link 和 neighbor 10.244.153.201 dev calibd2348b4f67 lladdr f2:d4:17:63:9d:3d REACHABLE, 在 cali118af4ccd16 抓包

00:41:07.879975 ee:ee:ee:ee:ee:ee > f2:d4:17:63:9d:3d, ethertype IPv4 (0x0800), length 98: 10.244.153.204 > 10.244.153.201: ICMP echo request, id 56616, seq 185, length 64
00:41:07.879998 f2:d4:17:63:9d:3d > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: 10.244.153.201 > 10.244.153.204: ICMP echo reply, id 56616, seq 185, length 64
  1. 在 pod2 内 eth0 抓包:
00:43:59.911019 ee:ee:ee:ee:ee:ee > f2:d4:17:63:9d:3d, ethertype IPv4 (0x0800), length 98: 10.244.153.204 > 10.244.153.201: ICMP echo request, id 56616, seq 353, length 64
00:43:59.911056 f2:d4:17:63:9d:3d > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: 10.244.153.201 > 10.244.153.204: ICMP echo reply, id 56616, seq 353, length 64
  • 不同 node 上 pod 之间
    pod1 访问 pod3
    在 veth host 端抓包
20:22:47.858674 32:21:45:c4:c5:d5 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: 10.244.153.204 > 10.244.146.205: ICMP echo request, id 17555, seq 106, length 64
20:22:47.860043 ee:ee:ee:ee:ee:ee > 32:21:45:c4:c5:d5, ethertype IPv4 (0x0800), length 98: 10.244.146.205 > 10.244.153.204: ICMP echo reply, id 17555, seq 106, length 64

看 主机路由 10.244.146.192/26 via 192.168.100.112 dev tunl0 proto bird onlink 下一跳是 tunl0
在 tunl0 上抓包,没有 二层信息。

20:24:42.016898 ip: 10.244.153.204 > 10.244.146.205: ICMP echo request, id 17555, seq 220, length 64
20:24:42.022282 ip: 10.244.146.205 > 10.244.153.204: ICMP echo reply, id 17555, seq 220, length 64

在 业务网卡抓包,可以看到 mac 地址是业务网卡两个端点的 mac。外层 IP 是 业务网卡两个端点的 IP,内层是 icmp 报文。

20:25:41.151109 52:54:00:dc:c7:b4 > 52:54:00:d3:bf:21, ethertype IPv4 (0x0800), length 118: 192.168.100.111 > 192.168.100.112: 10.244.153.204 > 10.244.146.205: ICMP echo request, id 17555, seq 279, length 64
20:25:41.152198 52:54:00:d3:bf:21 > 52:54:00:dc:c7:b4, ethertype IPv4 (0x0800), length 118: 192.168.100.112 > 192.168.100.111: 10.244.146.205 > 10.244.153.204: ICMP echo reply, id 17555, seq 279, length 64

在 对面机器上的报文路径与之对称

Node 到 Pod

  • Node 到本 node 上的 pod
    在 node111 ping pod1
    在 veth host 端抓包,根据路由 10.244.153.204 dev calice0906292e2 scope link,生成报文时拿默认路由网卡的 ip 做源地址
20:15:23.174963 ee:ee:ee:ee:ee:ee > 32:21:45:c4:c5:d5, ethertype IPv4 (0x0800), length 98: 172.18.22.111 > 10.244.153.204: ICMP echo request, id 6, seq 9, length 64
20:15:23.175025 32:21:45:c4:c5:d5 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: 10.244.153.204 > 172.18.22.111: ICMP echo reply, id 6, seq 9, length 64

在 veth 内 eth0 抓包

20:18:23.399015 ee:ee:ee:ee:ee:ee > 32:21:45:c4:c5:d5, ethertype IPv4 (0x0800), length 98: 172.18.22.111 > 10.244.153.204: ICMP echo request, id 6, seq 185, length 64
20:18:23.399049 32:21:45:c4:c5:d5 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: 10.244.153.204 > 172.18.22.111: ICMP echo reply, id 6, seq 185, length 64
  • Node 到其他 node 上的 pod
    node111 到 pod3
    在 node111 tunl0 抓包,根据路由 10.244.146.192/26 via 192.168.100.112 dev tunl0 proto bird onlink,源 IP 为 tunl0 ip
21:10:02.099557 ip: 10.244.153.192 > 10.244.146.205: ICMP echo request, id 11, seq 46, length 64
21:10:02.100595 ip: 10.244.146.205 > 10.244.153.192: ICMP echo reply, id 11, seq 46, length 64

在 ens10 抓包

21:10:18.124555 52:54:00:dc:c7:b4 > 52:54:00:d3:bf:21, ethertype IPv4 (0x0800), length 118: 192.168.100.111 > 192.168.100.112: 10.244.153.192 > 10.244.146.205: ICMP echo request, id 11, seq 62, length 64
21:10:18.129910 52:54:00:d3:bf:21 > 52:54:00:dc:c7:b4, ethertype IPv4 (0x0800), length 118: 192.168.100.112 > 192.168.100.111: 10.244.146.205 > 10.244.153.192: ICMP echo reply, id 11, seq 62, length 64

Pod 到 service

# kubectl get svc
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nginx-service   ClusterIP   10.107.161.255   <none>        8080/TCP   2s
# kubectl get endpoints
NAME            ENDPOINTS                             AGE
nginx-service   10.244.146.205:80,10.244.153.201:80   5s
  • Pod 访问 service clusterIP
    在 pod1 veth 对抓包,目的地之为 svcIP
21:27:42.690221 32:21:45:c4:c5:d5 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 74: 10.244.153.204.38610 > 10.107.161.255.8080: Flags [S], seq 4133783852, win 64800, options [mss 1440,sackOK,TS val 3911662959 ecr 0,nop,wscale 7], length 0
21:27:42.690427 ee:ee:ee:ee:ee:ee > 32:21:45:c4:c5:d5, ethertype IPv4 (0x0800), length 74: 10.107.161.255.8080 > 10.244.153.204.38610: Flags [S.], seq 86025828, ack 4133783853, win 64260, options [mss 1440,sackOK,TS val 1534565294 ecr 3911662959,nop,wscale 7], length 0

在 pod2 veth 对抓包,源地址为 主机默认路由网卡 ip,目的地址为 pod2,目的端口为 80

21:27:42.690366 ee:ee:ee:ee:ee:ee > 2a:e7:de:3f:09:fb, ethertype IPv4 (0x0800), length 74: 10.244.153.204.38610 > 10.244.153.201.80: Flags [S], seq 4133783852, win 64800, options [mss 1440,sackOK,TS val 3911662959 ecr 0,nop,wscale 7], length 0
21:27:42.690404 2a:e7:de:3f:09:fb > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 74: 10.244.153.201.80 > 10.244.153.204.38610: Flags [S.], seq 86025828, ack 4133783853, win 64260, options [mss 1440,sackOK,TS val 1534565294 ecr 3911662959,nop,wscale 7], length 0

去 service dnat 成后端 IP 转到 pod2,pod2 回复 pod1,再 snat 成 svcIP。
后端为 跨节点的 pod3 和上面相同

Node 到 service

  • Node 访问 service clusterIP
    本节点 pod 时
    Dnat 成 pod2 ip,根据默认路由网卡 IP,构造报文
22:07:10.054483 ee:ee:ee:ee:ee:ee > 2a:e7:de:3f:09:fb, ethertype IPv4 (0x0800), length 74: 172.18.22.111.13579 > 10.244.153.201.80: Flags [S], seq 948451555, win 65495, options [mss 65495,sackOK,TS val 519032124 ecr 0,nop,wscale 7], length 0
22:07:10.054534 2a:e7:de:3f:09:fb > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 74: 10.244.153.201.80 > 172.18.22.111.13579: Flags [S.], seq 2350447822, ack 948451556, win 64260, options [mss 1440,sackOK,TS val 319012716 ecr 519032124,nop,wscale 7], length 0

跨节点 pod 时
Dnat 成 pod3 ip,根据路由用 node111 ippool 的 网关去请求

22:07:19.881187 ee:ee:ee:ee:ee:ee > 4e:11:e1:74:9d:6c, ethertype IPv4 (0x0800), length 74: 10.244.153.192.14543 > 10.244.146.205.http: Flags [S], seq 1990644142, win 65495, options [mss 65495,sackOK,TS val 519041957 ecr 0,nop,wscale 7], length 0
22:07:19.881227 4e:11:e1:74:9d:6c > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 74: 10.244.146.205.http > 10.244.153.192.14543: Flags [S.], seq 2030705031, ack 1990644143, win 64260, options [mss 1440,sackOK,TS val 1033275778 ecr 519041957,nop,wscale 7], length 0

外部到 svc
default nginx-service NodePort 10.107.161.255 8080:30080/TCP
在主机被访问 IP 网卡抓包

22:13:00.656471 ac:7e:8a:6c:41:c4 > 52:54:00:ba:dc:62, ethertype IPv4 (0x0800), length 149: 172.20.151.77.47334 > 172.18.22.111.30080: Flags [P.], seq 1:84, ack 1, win 229, options [nop,nop,TS val 430411787 ecr 1033616455], length 83
22:13:00.657729 52:54:00:ba:dc:62 > ac:7e:8a:6c:41:c4, ethertype IPv4 (0x0800), length 66: 172.18.22.111.30080 > 172.20.151.77.47334: Flags [.], ack 84, win 502, options [nop,nop,TS val 1033616544 ecr 430411787], length 
Chain KUBE-NODE-PORT (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  tcp  --  0.0.0.0/0            0.0.0.0/0            /* Kubernetes nodeport TCP port for masquerade purpose */ match-set KUBE-NODE-PORT-TCP dst

Masquerade 转为

22:15:32.507099 ee:ee:ee:ee:ee:ee > 2a:e7:de:3f:09:fb, ethertype IPv4 (0x0800), length 74: 172.18.22.111.10229 > 10.244.153.201.80: Flags [S], seq 450784444, win 29200, options [mss 1460,sackOK,TS val 430563700 ecr 0,nop,wscale 7], length 0
22:15:32.507198 2a:e7:de:3f:09:fb > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 74: 10.244.153.201.80 > 172.18.22.111.10229: Flags [S.], seq 3057963543, ack 450784445, win 64260, options [mss 1440,sackOK,TS val 319515169 ecr 430563700,nop,wscale 7], length 0

如果 后端不在本节点
Masquerade 转为

22:31:15.850370 ee:ee:ee:ee:ee:ee > 4e:11:e1:74:9d:6c, ethertype IPv4 (0x0800), length 74: 10.244.153.192.50499 > 10.244.146.205.http: Flags [S], seq 1374007914, win 29200, options [mss 1460,sackOK,TS val 431507052 ecr 0,nop,wscale 7], length 0
22:31:15.850422 4e:11:e1:74:9d:6c > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 74: 10.244.146.205.http > 10.244.153.192.50499: Flags [S.], seq 3861438831, ack 1374007915, win 64260, options [mss 1440,sackOK,TS val 1034711747 ecr 431507052,nop,wscale 7], length 0

NetworkPolicy

为 pod1 打上 role == pod1
为 pod2,pod3 打上 app == nginx

apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: allow-tcp-80
  namespace: default
spec:
  selector: app == 'nginx'
  ingress:
    - action: Allow
      protocol: TCP
      source:
        selector: role == 'pod1'
      destination:
        ports:
          - 80

应用后查看 iptables 流程

# iptables -nL
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
cali-OUTPUT  all  --  0.0.0.0/0            0.0.0.0/0            /* cali:tVnHkvAo15HuiPy0 */
// 下的是 ingress,则在宿主机上看到的是 Output chain,发给 pod 时的规则


Chain cali-OUTPUT (1 references)
target     prot opt source               destination         
cali-forward-endpoint-mark  all  --  0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:5Z67OUUpTOM7Xa1a */ mark match ! 0x0/0xfff00000


Chain cali-forward-endpoint-mark (1 references)
target     prot opt source               destination         
cali-to-wl-dispatch  all  --  0.0.0.0/0            0.0.0.0/0            /* cali:aFl0WFKRxDqj8oA6 */


Chain cali-to-wl-dispatch (2 references)
target     prot opt source               destination         
cali-tw-calibd2348b4f67  all  --  0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:m9Fd7J2kx1zys3Gw */


Chain cali-tw-calibd2348b4f67 (1 references)
target     prot opt source               destination         
MARK       all  --  0.0.0.0/0            0.0.0.0/0            /* cali:GqobtmvaGkGX_I6Q */ /* Start of policies */ MARK and 0xfffdffff
cali-pi-_w6c3i7lsXCdtfGqcxq5  all  --  0.0.0.0/0            0.0.0.0/0            /* cali:Ew7qVfwras3_yV_L */ mark match 0x0/0x20000

Chain cali-pi-_w6c3i7lsXCdtfGqcxq5 (1 references)
target     prot opt source               destination         
MARK       tcp  --  0.0.0.0/0            0.0.0.0/0            /* cali:O4FzgAjAMQ8CsxAM */ /* Policy default/default.allow-tcp-80 ingress */ match-set cali40s:SPeglQlTBmfidv00S2cBaDC src multiport dports 80 MARK or 0x10000
// 匹配策略的打 mark 0x10000 accept
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值