文章目录
Flannel host-gw
理清和观测网络流量
环境
流量分析
同 node 不同 pod 之间
pod1 <-> pod2
抓包依次在 vethe6089662,cni0,veth78d63c78 都可以看到源 ip,mac 为 pod1 ,目的 ip ,mac 为 pod2 的 ip,且报文五元组没有变化。
00:26:08.926441 4a:cd:ef:ab:d2:e4 > ea:f8:8d:38:8c:f4, ethertype IPv4 (0x0800), length 98: 10.244.0.5 > 10.244.0.6: ICMP echo request, id 58976, seq 289, length 64
00:26:08.926493 ea:f8:8d:38:8c:f4 > 4a:cd:ef:ab:d2:e4, ethertype IPv4 (0x0800), length 98: 10.244.0.6 > 10.244.0.5: ICMP echo reply, id 58976, seq 289, length 64
不同 node 上 pod 之间
pod1 <-> pod3
抓包分别在 veth,cni0,ens10;node112 ens10,cni0,veth 抓到报文
在 veth,cni0 源 mac 是 pod1 mac,目的 mac 是 cni0 的 mac,cni0 有网关 ip。
00:57:09.100769 4a:cd:ef:ab:d2:e4 > 3e:c4:72:de:ef:fc, ethertype IPv4 (0x0800), length 98: 10.244.0.5 > 10.244.1.18: ICMP echo request, id 25006, seq 30, length 64
00:57:09.101403 3e:c4:72:de:ef:fc > 4a:cd:ef:ab:d2:e4, ethertype IPv4 (0x0800), length 98: 10.244.1.18 > 10.244.0.5: ICMP echo reply, id 25006, seq 30, length 64
在 ens10 上 源 mac 是 本节点 ens10 mac,目的 mac 是对端节点 ens10 的 mac
00:57:19.124440 52:54:00:c2:e7:0e > 52:54:00:f1:ec:b8, ethertype IPv4 (0x0800), length 98: 10.244.0.5 > 10.244.1.18: ICMP echo request, id 25006, seq 40, length 64
00:57:19.132212 52:54:00:f1:ec:b8 > 52:54:00:c2:e7:0e, ethertype IPv4 (0x0800), length 98: 10.244.1.18 > 10.244.0.5: ICMP echo reply, id 25006, seq 40, length 64
流量到达 cni0 时,匹配路由表 10.244.1.0/24 via 192.168.100.112 dev ens10 ,到10.244.1.18 的流量从 ens10 发出,且网关是 192.168.100.112。
Node 到其他 node 上的 pod
node111 <-> pod3
抓包在 node111 上 ens10,在 node112 ens10 抓到报文
01:17:14.398436 52:54:00:c2:e7:0e > 52:54:00:f1:ec:b8, ethertype IPv4 (0x0800), length 98: 192.168.100.111 > 10.244.1.18: ICMP echo request, id 4893, seq 19, length 64
01:17:14.399235 52:54:00:f1:ec:b8 > 52:54:00:c2:e7:0e, ethertype IPv4 (0x0800), length 98: 10.244.1.18 > 192.168.100.111: ICMP echo reply, id 4893, seq 19, length 64
在 node112 veth
01:17:25.590859 4a:67:29:6e:6a:e8 > e6:c5:20:8a:ad:a0, ethertype IPv4 (0x0800), length 98: 192.168.100.111 > 10.244.1.18: ICMP echo request, id 4893, seq 30, length 64
01:17:25.590947 e6:c5:20:8a:ad:a0 > 4a:67:29:6e:6a:e8, ethertype IPv4 (0x0800), length 98: 10.244.1.18 > 192.168.100.111: ICMP echo reply, id 4893, seq 30, length 64
node111 发起到 10.244.1.18 的 request 匹配路由表 10.244.1.0/24 via 192.168.100.112 dev ens10,源 ip 为 node111 ens10 ip,网关是 192.168.100.112,从 ens10 发出。
# kubectl get svc
nginx-service NodePort 10.97.40.43 <none> 8080:30080/TCP 3d14h
# kubectl get endpoints
nginx-service 10.244.0.6:80,10.244.1.18:80 46h
Pod 访问 service clusterIP
在该种组网中,pod 访问 service,如果后端和 pod 在同一节点上(同一网段),则会不通,原因是 pod 访问 svc ip dnat 成 endpoint ip,但 endpoint ip 和 pod id 在同一网段,则直接回复,pod 收到的是 endpoint 的 ip,不是 svc 的 ip(没有 snat)。所以访问失败。
通常需要更改 kube-proxy 配置,将 masqueradeAll 设置为 true,这样 endpoint 时报文 snat 成 bridge 的 ip(即网关 ip),然后 回复给网关,再做地址转换为 svc ip 返回。
pod2 netns 内抓包
00:59:50.272087 3e:c4:72:de:ef:fc > ea:f8:8d:38:8c:f4, ethertype IPv4 (0x0800), length 74: 10.244.0.1.56465 > 10.244.0.6.http: Flags [S], seq 2798923111, win 65495, options [mss 65495,sackOK,TS val 142498968 ecr 0,nop,wscale 7], length 0
00:59:50.272120 ea:f8:8d:38:8c:f4 > 3e:c4:72:de:ef:fc, ethertype IPv4 (0x0800), length 74: 10.244.0.6.http > 10.244.0.1.56465: Flags [S.], seq 877639235, ack 2798923112, win 65160, options [mss 1460,sackOK,TS val 1783025537 ecr 142498968,nop,wscale 7], length 0
pod1 netns 内抓包
01:06:04.294475 4a:cd:ef:ab:d2:e4 > 3e:c4:72:de:ef:fc, ethertype IPv4 (0x0800), length 74: 10.244.0.5.60530 > 10.97.40.43.webcache: Flags [S], seq 2893464217, win 64240, options [mss 1460,sackOK,TS val 1338698913 ecr 0,nop,wscale 7], length 0
01:06:04.299262 3e:c4:72:de:ef:fc > 4a:cd:ef:ab:d2:e4, ethertype IPv4 (0x0800), length 74: 10.97.40.43.webcache > 10.244.0.5.60530: Flags [S.], seq 167067484, ack 2893464218, win 65160, options [mss 1460,sackOK,TS val 1676037596 ecr 1338698913,nop,wscale 7], length 0
如果转换为 pod3 endpoint ip
pod3 抓包
未开启 masqueradeAll
收到源地址为 pod1 的 ip
01:10:30.935315 4a:67:29:6e:6a:e8 > e6:c5:20:8a:ad:a0, ethertype IPv4 (0x0800), length 74: 10.244.0.5.46922 > 10.244.1.18.http: Flags [S], seq 1408226702, win 64240, options [mss 1460,sackOK,TS val 1338965560 ecr 0,nop,wscale 7], length 0
01:10:30.935444 e6:c5:20:8a:ad:a0 > 4a:67:29:6e:6a:e8, ethertype IPv4 (0x0800), length 74: 10.244.1.18.http > 10.244.0.5.46922: Flags [S.], seq 903084461, ack 1408226703, win 65160, options [mss 1460,sackOK,TS val 3416518735 ecr 1338965560,nop,wscale 7], length 0
开启 masqueradeAll
收到源地址为 node111 的 数据网卡 ip
01:08:14.224106 4a:67:29:6e:6a:e8 > e6:c5:20:8a:ad:a0, ethertype IPv4 (0x0800), length 74: 192.168.100.111.27776 > 10.244.1.18.http: Flags [S], seq 1589463928, win 64240, options [mss 1460,sackOK,TS val 1338828846 ecr 0,nop,wscale 7], length 0
01:08:14.224181 e6:c5:20:8a:ad:a0 > 4a:67:29:6e:6a:e8, ethertype IPv4 (0x0800), length 74: 10.244.1.18.http > 192.168.100.111.27776: Flags [S.], seq 4010889874, ack 1589463929, win 65160, options [mss 1460,sackOK,TS val 1676167528 ecr 1338828846,nop,wscale 7], length 0
Node 访问 service clusterIP
node111 直接 curl svc
在 pod2 netns 抓包
先 ipvs + ipset dnat 后,根据路由,从 cni0 用 cni0 的 ip 进行访问。
16:04:27.364671 3e:c4:72:de:ef:fc > ea:f8:8d:38:8c:f4, ethertype IPv4 (0x0800), length 74: 10.244.0.1.25397 > 10.244.0.6.http: Flags [S], seq 2466363544, win 65495, options [mss 65495,sackOK,TS val 196776061 ecr 0,nop,wscale 7], length 0
16:04:27.364719 ea:f8:8d:38:8c:f4 > 3e:c4:72:de:ef:fc, ethertype IPv4 (0x0800), length 74: 10.244.0.6.http > 10.244.0.1.25397: Flags [S.], seq 957695108, ack 2466363545, win 65160, options [mss 1460,sackOK,TS val 1837302629 ecr 196776061,nop,wscale 7], length 0
在 pod3 netns 抓包
先 ipvs + ipset dnat 后,根据路由 10.244.1.0/24 via 192.168.100.112 dev ens10,从 ens10 用 ens10 的 ip 进行访问。
16:06:00.470891 4a:67:29:6e:6a:e8 > e6:c5:20:8a:ad:a0, ethertype IPv4 (0x0800), length 74: 192.168.100.111.8805 > 10.244.1.18.http: Flags [S], seq 2461366502, win 65495, options [mss 65495,sackOK,TS val 196869175 ecr 0,nop,wscale 7], length 0
16:06:00.471047 e6:c5:20:8a:ad:a0 > 4a:67:29:6e:6a:e8, ethertype IPv4 (0x0800), length 74: 10.244.1.18.http > 192.168.100.111.8805: Flags [S.], seq 3668497900, ack 2461366503, win 65160, options [mss 1460,sackOK,TS val 1730033775 ecr 196869175,nop,wscale 7], length 0
外部访问 service NodePort,且 backend pod 在当前节点
访问 node111:30080
在 node111 网卡抓包
16:20:28.829876 ac:7e:8a:6c:41:c4 > 52:54:00:ba:dc:62, ethertype IPv4 (0x0800), length 74: 172.20.150.110.35662 > 172.18.22.111.30080: Flags [S], seq 3415939106, win 64240, options [mss 1460,sackOK,TS val 1380747195 ecr 0,nop,wscale 7], length 0
16:20:28.830114 52:54:00:ba:dc:62 > ac:7e:8a:6c:41:c4, ethertype IPv4 (0x0800), length 74: 172.18.22.111.30080 > 172.20.150.110.35662: Flags [S.], seq 3478177905, ack 3415939107, win 65160, options [mss 1460,sackOK,TS val 1838264095 ecr 1380747195,nop,wscale 7], length 0
在 pod2 netns 抓包
流量进到机器内做 masquerade 和 Node 访问 service clusterIP 一样
16:18:42.881908 3e:c4:72:de:ef:fc > ea:f8:8d:38:8c:f4, ethertype IPv4 (0x0800), length 74: 10.244.0.1.43601 > 10.244.0.6.http: Flags [S], seq 2936641286, win 64240, options [mss 1460,sackOK,TS val 1380641225 ecr 0,nop,wscale 7], length 0
16:18:42.881988 ea:f8:8d:38:8c:f4 > 3e:c4:72:de:ef:fc, ethertype IPv4 (0x0800), length 74: 10.244.0.6.http > 10.244.0.1.43601: Flags [S.], seq 1236494118, ack 2936641287, win 65160, options [mss 1460,sackOK,TS val 1838158146 ecr 1380641225,nop,wscale 7], length 0
外部访问 service NodePort,且 backend pod 不在当前节点
访问 node111:30080
在 pod3 netns 抓包
16:19:48.265489 4a:67:29:6e:6a:e8 > e6:c5:20:8a:ad:a0, ethertype IPv4 (0x0800), length 74: 192.168.100.111.34095 > 10.244.1.18.http: Flags [S], seq 2140428405, win 64240, options [mss 1460,sackOK,TS val 1380706638 ecr 0,nop,wscale 7], length 0
16:19:48.265579 e6:c5:20:8a:ad:a0 > 4a:67:29:6e:6a:e8, ethertype IPv4 (0x0800), length 74: 10.244.1.18.http > 192.168.100.111.34095: Flags [S.], seq 1160199023, ack 2140428406, win 65160, options [mss 1460,sackOK,TS val 1730861570 ecr 1380706638,nop,wscale 7], length 0