本实验接上一篇Antrea-安装与拓扑
本篇讨论在该拓扑下数据流情况
本实验地址表
Name | Port | IP Addr | MAC | Note |
---|---|---|---|---|
worker-01 br-int(OVS) | turn0(1) | N/A | 06:c6:e4:f2:94:51 | |
worker-01 br-int(OVS) | gw0(2) | 10.211.1.1 | 36🆎6b:31:0c:75 | |
worker-01 br-int(OVS) | veth(15) | N/A | 86:e0:33:27:a8:c0 | to pod frountend |
worker-01 br-int(OVS) | veth(16) | N/A | c6:97:94:e1:2e:a5 | to pod backend1 |
pod-frontend | eth0(15) | 10.211.1.14 | 2a:42:47:c6:9b:15 | |
pod-backend1 | eth0(16) | 10.211.1.15 | 86:5c:0a:d2:c8:c3 | |
worker-01 | ens192 | 192.168.110.66 | 00:50:56:b2:35:31 | |
worker-02 br-int(OVS) | turn0(1) | N/A | 0a:80:b4:dc:33:4c | |
worker-02 br-int(OVS) | gw0(2) | 10.211.2.1 | 12:c8:e9:86:7e:7b | |
worker-02 br-int(OVS) | veth(14) | N/A | e6:bf:f2:24:d4:f2 | to pod backend2 |
pod-backend2 | eth0(14) | 10.211.2.30 | 12:c2:26:2b:46:77 | |
worker-02 | ens192 | 192.168.110.67 | 00:50:56:b2:46:58 |
OVS Pipeline
OVS Pipeline是数据进入OVS后的判别及流程,通过一系列Table完成
在实验中,我们选取一段流进行分析:
Frontend to Service backendsvc
拓扑
[root@master-01 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
backendsvc ClusterIP 10.101.216.214 <none> 80/TCP 4h45m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d21h
我们知道,service backendsvc的地址为:10.101.216.214
进入pod frontend监控包
[root@master-01 ~]# kubectl exec -it frontend -- sh
/ # tcpdump -en
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
在另外一个终端(@master-01)上运行以下命令以访问服务backendsvc
[root@master-01 ~]# kubectl exec -it frontend -c frontend -- curl backendsvc
Praqma Network MultiTool (with NGINX) - backend2 - 10.211.2.30
[root@master-01 ~]# kubectl exec -it frontend -c frontend -- curl backendsvc
Praqma Network MultiTool (with NGINX) - backend1 - 10.211.1.15
观察pod的监控,取svc指向backend1时,得到抓包结果:
12:27:02.979220 2a:42:47:c6:9b:15 > 36:ab:6b:31:0c:75, ethertype IPv4 (0x0800), length 96: 10.211.1.14.38861 > 10.96.0.10.53: 32140+ A? backendsvc.default.svc.cluster.local. (54)
12:27:02.980269 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 96: 10.211.1.14.38861 > 10.96.0.10.53: 32464+ AAAA? backendsvc.default.svc.cluster.local. (54)
12:27:02.981453 36🆎6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 189: 10.96.0.10.53 > 10.211.1.14.38861: 32464*- 0/1/0 (147)
12:27:02.981470 36:ab:6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 148: 10.96.0.10.53 > 10.211.1.14.38861: 32140*- 1/0/0 A 10.101.216.214 (106)
12:27:02.981770 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 74: 10.211.1.14.48078 > 10.101.216.214.80: Flags [S], seq 2351183631, win 28200, options [mss 1410,sackOK,TS val 251618365 ecr 0,nop,wscale 7], length 0
12:27:02.982979 36:ab:6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 74: 10.101.216.214.80 > 10.211.1.14.48078: Flags [S.], seq 343862012, ack 2351183632, win 27960, options [mss 1410,sackOK,TS val 251618366 ecr 251618365,nop,wscale 7], length 0
12:27:02.983054 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 66: 10.211.1.14.48078 > 10.101.216.214.80: Flags [.], ack 1, win 221, options [nop,nop,TS val 251618367 ecr 251618366], length 0
12:27:02.983158 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 140: 10.211.1.14.48078 > 10.101.216.214.80: Flags [P.], seq 1:75, ack 1, win 221, options [nop,nop,TS val 251618367 ecr 251618366], length 74: HTTP: GET / HTTP/1.1
12:27:02.983201 36🆎6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 66: 10.101.216.214.80 > 10.211.1.14.48078: Flags [.], ack 75, win 219, options [nop,nop,TS val 251618367 ecr 251618367], length 0
12:27:02.984074 36🆎6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 302: 10.101.216.214.80 > 10.211.1.14.48078: Flags [P.], seq 1:237, ack 75, win 219, options [nop,nop,TS val 251618368 ecr 251618367], length 236: HTTP: HTTP/1.1 200 OK
12:27:02.984085 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 66: 10.211.1.14.48078 > 10.101.216.214.80: Flags [.], ack 237, win 229, options [nop,nop,TS val 251618368 ecr 251618368], length 0
12:27:02.984117 36🆎6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 129: 10.101.216.214.80 > 10.211.1.14.48078: Flags [P.], seq 237:300, ack 75, win 219, options [nop,nop,TS val 251618368 ecr 251618368], length 63: HTTP
12:27:02.984122 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 66: 10.211.1.14.48078 > 10.101.216.214.80: Flags [.], ack 300, win 229, options [nop,nop,TS val 251618368 ecr 251618368], length 0
12:27:02.984374 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 66: 10.211.1.14.48078 > 10.101.216.214.80: Flags [F.], seq 75, ack 300, win 229, options [nop,nop,TS val 251618368 ecr 251618368], length 0
12:27:02.984488 36🆎6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 66: 10.101.216.214.80 > 10.211.1.14.48078: Flags [F.], seq 300, ack 76, win 219, options [nop,nop,TS val 251618368 ecr 251618368], length 0
12:27:02.984500 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 66: 10.211.1.14.48078 > 10.101.216.214.80: Flags [.], ack 301, win 229, options [nop,nop,TS val 251618368 ecr 251618368], length 0
12:27:07.992440 86:5c:0a:d2:c8:c3 > 2a:42:47:c6:9b:15, ethertype ARP (0x0806), length 42: Request who-has 10.211.1.14 tell 10.211.1.15, length 28
12:27:07.992453 2a:42:47:c6:9b:15 > 86:5c:0a:d2:c8:c3, ethertype ARP (0x0806), length 42: Reply 10.211.1.14 is-at 2a:42:47:c6:9b:15, length 28
- 12:27:02.979220 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 96: 10.211.1.14.38861 > 10.96.0.10.53: 32140+ A? backendsvc.default.svc.cluster.local. (54)
12:27:02.980269 2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75, ethertype IPv4 (0x0800), length 96: 10.211.1.14.38861 > 10.96.0.10.53: 32464+ AAAA? backendsvc.default.svc.cluster.local. (54)
Pod/frontend询问coreDNS backendsvc.default.svc.cluster.local的地址
对Pod来查看DNS Server,就是10.96.0.10
/ # nslookup backendsvc
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: backendsvc.default.svc.cluster.local
Address: 10.101.216.214
流量在OVS里面2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75,即veth(15)–>gw0(2)
- 12:27:02.981470 36🆎6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 148: 10.96.0.10.53 > 10.211.1.14.38861: 32140- 1/0/0 A 10.101.216.214 (106)*
coreDNS回给pod/frontend DNS应答,流量在OVS里面36🆎6b:31:0c:75 > 2a:42:47:c6:9b:15,即gw0(2)–>veth(15) - 12:27:02.981470 36🆎6b:31:0c:75 > 2a:42:47:c6:9b:15, ethertype IPv4 (0x0800), length 148: 10.96.0.10.53 > 10.211.1.14.38861: 32140- 1/0/0 A 10.101.216.214 (106)*
pod/frontend开始向service backendsvc发送访问连接,流量在OVS里面2a:42:47:c6:9b:15 > 36🆎6b:31:0c:75,即veth(15)–>gw0(2)
Table流表分析
Table#0
可以通过以下命令查看worker-01上的Table#0
[root@master-01 ~]# kubectl exec -n kube-system -it antrea-agent-bj42h -c antrea-ovs – ovs-ofctl dump-flows br-int table=0 --no-stats
cookie=0x1000000000000, priority=200,in_port=“antrea-gw0” actions=load:0x1->NXM_NX_REG0[0…15],resubmit(,10)
cookie=0x1000000000000, priority=200,in_port=“antrea-tun0” actions=load:0->NXM_NX_REG0[0…15],load:0x1->NXM_NX_REG0[19],resubmit(,30)
cookie=0x1030000000000, priority=190,in_port=“coredns–c4f3d4” actions=load:0x2->NXM_NX_REG0[0…15],resubmit(,10)
cookie=0x1030000000000, priority=190,in_port=“backend1-911dea” actions=load:0x2->NXM_NX_REG0[0…15],resubmit(,10)
cookie=0x1030000000000, priority=190,in_port="frontend-fbd015" actions=load:0x2->NXM_NX_REG0[0..15],resubmit(,10)
cookie=0x1000000000000, priority=0 actions=drop
其中,对当前流进行分类; “0”代表隧道,“1”代表本地网关,“2”代表本地 pod。如上面例子中我们用到:
cookie=0x1030000000000, priority=190,in_port=“frontend-fbd015” actions=load:0x2->NXM_NX_REG0[0…15],resubmit(,10)
此流条目中的第一个动作是将寄存器 reg0[0…15] 的值设置为“0x2”,这意味着流来自本地 pod。 同一流条目中的第二个动作是将流移交给下一个表,即Table#10。 (resubmit(,10)) ,参考上一节中的Pipeline。
Table#10
[root@master-01 ~]# kubectl exec -n kube-system -it antrea-agent-bj42h -c antrea-ovs – ovs-ofctl dump-flows br-int table=10 --no-stats
cookie=0x1000000000000, table=10, priority=200,ipv6,ipv6_src=fe80::/10 actions=resubmit(,21)
cookie=0x1000000000000, table=10, priority=200,ip,in_port=“antrea-gw0” actions=resubmit(,29)
cookie=0x1000000000000, table=10, priority=200,arp,in_port=“antrea-gw0”,arp_spa=10.211.1.1,arp_sha=36🆎6b:31:0c:75 actions=resubmit(,20)
cookie=0x1030000000000, table=10, priority=200,arp,in_port=“coredns–c4f3d4”,arp_spa=10.211.1.4,arp_sha=92:d8:7a:ad:79:c9 actions=resubmit(,20)
cookie=0x1030000000000, table=10, priority=200,arp,in_port=“backend1-911dea”,arp_spa=10.211.1.15,arp_sha=86:5c:0a:d2:c8:c3 actions=resubmit(,20)
cookie=0x1030000000000, table=10, priority=200,arp,in_port="frontend-fbd015",arp_spa=10.211.1.14,arp_sha=2a:42:47:c6:9b:15 actions=resubmit(,20)
cookie=0x1030000000000, table=10, priority=200,ip,in_port=“coredns–c4f3d4”,dl_src=92:d8:7a:ad:79:c9,nw_src=10.211.1.4 actions=resubmit(,29)
cookie=0x1030000000000, table=10, priority=200,ip,in_port=“backend1-911dea”,dl_src=86:5c:0a:d2:c8:c3,nw_src=10.211.1.15 actions=resubmit(,29)
cookie=0x1030000000000, table=10, priority=200,ip,in_port="frontend-fbd015",dl_src=2a:42:47:c6:9b:15,nw_src=10.211.1.14 actions=resubmit(,29)
cookie=0x1000000000000, table=10, priority=0 actions=drop
和frontend相关的两条:
-
cookie=0x1030000000000, table=10, priority=200,arp,in_port=“frontend-fbd015”,arp_spa=10.211.1.14,arp_sha=2a:42:47:c6:9b:15 actions=resubmit(,20)
arp的包,地址为10.211.1.14,转到table#20 -
cookie=0x1030000000000, table=10, priority=200,ip,in_port=“frontend-fbd015”,dl_src=2a:42:47:c6:9b:15,nw_src=10.211.1.14 actions=resubmit(,29)
IP包且源地址为10.211.1.14,转到table#29
Table#30
table#29–>table#30,直接看table#30的情况
[root@master-01 ~]# kubectl exec -n kube-system -it antrea-agent-bj42h -c antrea-ovs -- ovs-ofctl dump-flows br-int table=30 --no-stats
cookie=0x1000000000000, table=30, priority=200,ip actions=ct(table=31,zone=65520,nat)
cookie=0x1000000000000, table=30, priority=200,ipv6 actions=ct(table=31,zone=65510,nat)
Conntrack 表的工作是开始跟踪所有流量。 (“ct”动作表示连接跟踪connection tracking)任何通过此表的流都将处于“已跟踪”(trk)状态。 在通用网络安全术语中,此功能使 OVS 成为有状态的连接感知组件。
流程然后被移交给下一个表,即table#31; 如所示 (actions=ct(table=31,))。 下一站是table#31。
Table#31
[root@master-01 ~]# kubectl exec -n kube-system -it antrea-agent-bj42h -c antrea-ovs – ovs-ofctl dump-flows br-int table=31 --no-stats
cookie=0x1040000000000, table=31, priority=200,ct_state=-new+trk,ct_mark=0x21,ip actions=load:0x1->NXM_NX_REG0[19],resubmit (,50)
cookie=0x1000000000000, table=31, priority=190,ct_state=+inv+trk,ip actions=drop
cookie=0x1040000000000, table=31, priority=190,ct_state=-new+trk,ip actions=resubmit(,50)
cookie=0x1000000000000, table=31, priority=0 actions=resubmit(,40),resubmit(,41)
Table#31是ConntrackState 表,它处理所有处于跟踪状态的流(基本上由 Conntrack 表 table#30 移交)。 上面显示的第一个和第三个流条目处理流不是新的且被跟踪的流。 (ct_state=-new 表示不是新的,+trk 表示被跟踪)第二个流条目处理相应流为 INVALID 和 TRACKED 的流,基本上它丢弃所有这些流。
测试的流是pod/frontend 到backendsvc 服务的全新流程。 因此,当前流将匹配上面流表中的最后一个条目。 该流条目中的操作是将流移交给下一个表 (resubmit(,40)),resubmit(,41) table#40/41。
Table#40/41
[root@master-01 ~]# kubectl exec -n kube-system -it antrea-agent-bj42h -c antrea-ovs – ovs-ofctl dump-flows br-int table=40 --no-stats
cookie=0x1040000000000, table=40, priority=0 actions=load:0x1->NXM_NX_REG4[16…18]
[root@master-01 ~]# kubectl exec -n kube-system -it antrea-agent-bj42h -c antrea-ovs – ovs-ofctl dump-flows br-int table=41 --no-stats
cookie=0x1040000000000, table=41, priority=200,tcp,reg4=0x10000/0x70000,nw_dst=10.96.0.1,tp_dst=443 actions=load:0x2->NXM_NX_REG4[16…18],load:0x1->NXM_NX_REG0[19],group:2
cookie=0x1040000000000, table=41, priority=200,tcp,reg4=0x10000/0x70000,nw_dst=10.96.119.154,tp_dst=443 actions=load:0x2->NXM_NX_REG4[16…18],load:0x1->NXM_NX_REG0[19],group:3
cookie=0x1040000000000, table=41, priority=200,tcp,reg4=0x10000/0x70000,nw_dst=10.96.0.10,tp_dst=53 actions=load:0x2->NXM_NX_REG4[16…18],load:0x1->NXM_NX_REG0[19],group:5
cookie=0x1040000000000, table=41, priority=200,tcp,reg4=0x10000/0x70000,nw_dst=10.96.0.10,tp_dst=9153 actions=load:0x2->NXM_NX_REG4[16…18],load:0x1->NXM_NX_REG0[19],group:6
cookie=0x1040000000000, table=41, priority=200,udp,reg4=0x10000/0x70000,nw_dst=10.96.0.10,tp_dst=53 actions=load:0x2->NXM_NX_REG4[16…18],load:0x1->NXM_NX_REG0[19],group:4
cookie=0x1040000000000, table=41, priority=200,tcp,reg4=0x10000/0x70000,nw_dst=10.101.216.214,tp_dst=80 actions=load:0x2->NXM_NX_REG4[16..18],load:0x1->NXM_NX_REG0[19],group:1
cookie=0x1000000000000, table=41, priority=0 actions=resubmit(,42)
cookie=0x1040000000000, table=41, priority=200,tcp,reg4=0x10000/0x70000,nw_dst=10.101.216.214,tp_dst=80 actions=load:0x2->NXM_NX_REG4[16…18],load:0x1->NXM_NX_REG0[19],group:1
该表把转到了gw0,在Pipeline的如下部分:
IPtables
Worker-01 节点的 Kernel IP stack 接收来自 antrea-gw0 接口的流,流由 kube-proxy 管理的 iptables NAT 规则处理。 Worker -01 节点上的 iptables NAT 规则如下所示。
- “-t nat” pulls the NAT table
- “-L” lists all the rules in the given table
- “-n” displayes IP address and port numbers (rather than hostnames and service names)
[root@worker-01 ~]# iptables -t nat -L -n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all – 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals /
DOCKER all – 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all – 0.0.0.0/0 0.0.0.0/0 / kubernetes service portals /
DOCKER all – 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
KUBE-POSTROUTING all – 0.0.0.0/0 0.0.0.0/0 / kubernetes postrouting rules /
MASQUERADE all – 172.17.0.0/16 0.0.0.0/0
RETURN all – 192.168.122.0/24 224.0.0.0/24
RETURN all – 192.168.122.0/24 255.255.255.255
MASQUERADE tcp – 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
MASQUERADE udp – 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
MASQUERADE all – 192.168.122.0/24 !192.168.122.0/24
ANTREA-POSTROUTING all – 0.0.0.0/0 0.0.0.0/0 / Antrea: jump to Antrea postrouting rules /
Chain ANTREA-POSTROUTING (1 references)
target prot opt source destination
MASQUERADE all – 10.211.1.0/24 0.0.0.0/0 / Antrea: masquerade pod to external packets / ! match-set ANTREA-POD-IP dst
Chain DOCKER (2 references)
target prot opt source destination
RETURN all – 0.0.0.0/0 0.0.0.0/0
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-MARK-DROP (0 references)
target prot opt source destination
MARK all – 0.0.0.0/0 0.0.0.0/0 MARK or 0x8000
Chain KUBE-MARK-MASQ (16 references)
target prot opt source destination
MARK all – 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000
Chain KUBE-NODEPORTS (1 references)
target prot opt source destination
Chain KUBE-POSTROUTING (1 references)
target prot opt source destination
RETURN all – 0.0.0.0/0 0.0.0.0/0 mark match ! 0x4000/0x4000
MARK all – 0.0.0.0/0 0.0.0.0/0 MARK xor 0x4000
MASQUERADE all – 0.0.0.0/0 0.0.0.0/0 / kubernetes service traffic requiring SNAT /
Chain KUBE-PROXY-CANARY (0 references)
target prot opt source destination
Chain KUBE-SEP-73HF7GS7WMEQK5M6 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all – 10.211.1.4 0.0.0.0/0 / kube-system/kube-dns:dns-tcp /
DNAT tcp – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:dns-tcp / tcp to:10.211.1.4:53
Chain KUBE-SEP-AB7YO7A3CB2CZWK5 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all – 10.211.1.4 0.0.0.0/0 / kube-system/kube-dns:dns /
DNAT udp – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:dns / udp to:10.211.1.4:53
Chain KUBE-SEP-BQS327LSMHNPOPC6 (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 10.211.2.30 0.0.0.0/0 /* default/backendsvc */ DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/backendsvc */ tcp to:10.211.2.30:80
Chain KUBE-SEP-DFFSKWBTAX72RCQG (1 references)
target prot opt source destination
KUBE-MARK-MASQ all – 192.168.110.66 0.0.0.0/0 / kube-system/antrea /
DNAT tcp – 0.0.0.0/0 0.0.0.0/0 / kube-system/antrea / tcp to:192.168.110.66:10349
Chain KUBE-SEP-EQY7OT4FTLCGYYAP (1 references)
target prot opt source destination
KUBE-MARK-MASQ all – 10.211.2.21 0.0.0.0/0 / kube-system/kube-dns:dns-tcp /
DNAT tcp – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:dns-tcp / tcp to:10.211.2.21:53
Chain KUBE-SEP-GH55AAVWO6AS4CDN (1 references)
target prot opt source destination
KUBE-MARK-MASQ all – 10.211.2.21 0.0.0.0/0 / kube-system/kube-dns:metrics /
DNAT tcp – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:metrics / tcp to:10.211.2.21:9153
Chain KUBE-SEP-GNMIPSQVBZ6K56T3 (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 10.211.1.15 0.0.0.0/0 /* default/backendsvc */ DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/backendsvc */ tcp to:10.211.1.15:80
Chain KUBE-SEP-JBDJ3O64IUA4B6TG (1 references)
target prot opt source destination
KUBE-MARK-MASQ all – 10.211.2.21 0.0.0.0/0 / kube-system/kube-dns:dns /
DNAT udp – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:dns / udp to:10.211.2.21:53
Chain KUBE-SEP-N3D6XPNX4VV2UWVN (1 references)
target prot opt source destination
KUBE-MARK-MASQ all – 192.168.110.61 0.0.0.0/0 / default/kubernetes:https /
DNAT tcp – 0.0.0.0/0 0.0.0.0/0 / default/kubernetes:https / tcp to:192.168.110.61:6443
Chain KUBE-SEP-VCCKD2O5C6GIOK2X (1 references)
target prot opt source destination
KUBE-MARK-MASQ all – 10.211.1.4 0.0.0.0/0 / kube-system/kube-dns:metrics /
DNAT tcp – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:metrics / tcp to:10.211.1.4:9153
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-MARK-MASQ tcp – !10.211.0.0/16 10.96.0.1 / default/kubernetes:https cluster IP / tcp dpt:443
KUBE-SVC-NPX46M4PTMTKRN6Y tcp – 0.0.0.0/0 10.96.0.1 / default/kubernetes:https cluster IP / tcp dpt:443
KUBE-MARK-MASQ tcp – !10.211.0.0/16 10.96.119.154 / kube-system/antrea cluster IP / tcp dpt:443
KUBE-SVC-QDWG4LJGNBTOT5ED tcp – 0.0.0.0/0 10.96.119.154 / kube-system/antrea cluster IP / tcp dpt:443
KUBE-MARK-MASQ udp – !10.211.0.0/16 10.96.0.10 / kube-system/kube-dns:dns cluster IP / udp dpt:53
KUBE-SVC-TCOU7JCQXEZGVUNU udp – 0.0.0.0/0 10.96.0.10 / kube-system/kube-dns:dns cluster IP / udp dpt:53
KUBE-MARK-MASQ tcp – !10.211.0.0/16 10.96.0.10 / kube-system/kube-dns:dns-tcp cluster IP / tcp dpt:53
KUBE-SVC-ERIFXISQEP7F7OF4 tcp – 0.0.0.0/0 10.96.0.10 / kube-system/kube-dns:dns-tcp cluster IP / tcp dpt:53
KUBE-MARK-MASQ tcp – !10.211.0.0/16 10.96.0.10 / kube-system/kube-dns:metrics cluster IP / tcp dpt:9153
KUBE-SVC-JD5MR3NA4I4DYORP tcp – 0.0.0.0/0 10.96.0.10 / kube-system/kube-dns:metrics cluster IP / tcp dpt:9153
KUBE-MARK-MASQ tcp -- !10.211.0.0/16 10.101.216.214 /* default/backendsvc cluster IP */ tcp dpt:80 KUBE-SVC-SQRBXEDPJTJWMLH3 tcp -- 0.0.0.0/0 10.101.216.214 /* default/backendsvc cluster IP */ tcp dpt:80
KUBE-NODEPORTS all – 0.0.0.0/0 0.0.0.0/0 / kubernetes service nodeports; NOTE: this must be the last rule in this chain / ADDRTYPE match dst-type LOCAL
Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
target prot opt source destination
KUBE-SEP-73HF7GS7WMEQK5M6 all – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:dns-tcp / statistic mode random probability 0.50000000000
KUBE-SEP-EQY7OT4FTLCGYYAP all – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:dns-tcp /
Chain KUBE-SVC-JD5MR3NA4I4DYORP (1 references)
target prot opt source destination
KUBE-SEP-VCCKD2O5C6GIOK2X all – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:metrics / statistic mode random probability 0.50000000000
KUBE-SEP-GH55AAVWO6AS4CDN all – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:metrics /
Chain KUBE-SVC-NPX46M4PTMTKRN6Y (1 references)
target prot opt source destination
KUBE-SEP-N3D6XPNX4VV2UWVN all – 0.0.0.0/0 0.0.0.0/0 / default/kubernetes:https /
Chain KUBE-SVC-QDWG4LJGNBTOT5ED (1 references)
target prot opt source destination
KUBE-SEP-DFFSKWBTAX72RCQG all – 0.0.0.0/0 0.0.0.0/0 / kube-system/antrea /
Chain KUBE-SVC-SQRBXEDPJTJWMLH3 (1 references)
target prot opt source destination
KUBE-SEP-GNMIPSQVBZ6K56T3 all – 0.0.0.0/0 0.0.0.0/0 / default/backendsvc / statistic mode random probability 0.50000000000
KUBE-SEP-BQS327LSMHNPOPC6 all – 0.0.0.0/0 0.0.0.0/0 / default/backendsvc /
Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references)
target prot opt source destination
KUBE-SEP-AB7YO7A3CB2CZWK5 all – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:dns / statistic mode random probability 0.50000000000
KUBE-SEP-JBDJ3O64IUA4B6TG all – 0.0.0.0/0 0.0.0.0/0 / kube-system/kube-dns:dns */
摘取相关条目
Chain KUBE-SEP-BQS327LSMHNPOPC6 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.211.2.30 0.0.0.0/0 /* default/backendsvc */
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/backendsvc */ tcp to:10.211.2.30:80
Chain KUBE-SEP-GNMIPSQVBZ6K56T3 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.211.1.15 0.0.0.0/0 /* default/backendsvc */
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/backendsvc */ tcp to:10.211.1.15:80
KUBE-MARK-MASQ tcp -- !10.211.0.0/16 10.101.216.214 /* default/backendsvc cluster IP */ tcp dpt:80
KUBE-SVC-SQRBXEDPJTJWMLH3 tcp -- 0.0.0.0/0 10.101.216.214 /* default/backendsvc cluster IP */ tcp dpt:80
在流向 backendsvc 服务 IP 的流上应用DNAT,并选择 backend1 pod 10.211.1.15 或 backend2 pod 10.211.2.30作为目的地。
这是典型的 kube-proxy 管理、iptables 驱动的“ClusterIP”类型的 Kubernetes 服务功能,它为 Kubernetes 集群内的流提供分布式负载平衡。
在Antrea pod上查看
# antctl trace-packet -S default/frontend -D default/backendsvc -f "tcp,tcp_dst=80"
Fow: tcp,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=36:ab:6b:31:0c:75,nw_src=10.211.1.14,nw_dst=10.101.216.214,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
bridge("br-int")
----------------
0. in_port=15, priority 190, cookie 0x1030000000000
load:0x2->NXM_NX_REG0[0..15]
goto_table:10
10. ip,in_port=15,dl_src=2a:42:47:c6:9b:15,nw_src=10.211.1.14, priority 200, cookie 0x1030000000000
goto_table:29
29. priority 0, cookie 0x1000000000000
goto_table:30
30. ip, priority 200, cookie 0x1000000000000
ct(table=31,zone=65520,nat)
nat
-> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 31.
-> Sets the packet to an untracked state, and clears all the conntrack fields.
Final flow: tcp,reg0=0x2,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=36:ab:6b:31:0c:75,nw_src=10.211.1.14,nw_dst=10.101.216.214,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
Megaflow: recirc_id=0,eth,ip,in_port=15,dl_src=2a:42:47:c6:9b:15,nw_src=10.211.1.14,nw_dst=0.0.0.0/1,nw_frag=no
Datapath actions: ct(zone=65520,nat),recirc(0x2f1e)
===============================================================================
recirc(0x2f1e) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
Replacing src/dst IP/ports to simulate NAT:
Initial flow:
Modified flow:
===============================================================================
Flow: recirc_id=0x2f1e,ct_state=new|trk,ct_zone=65520,eth,tcp,reg0=0x2,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=36:ab:6b:31:0c:75,nw_src=10.211.1.14,nw_dst=10.101.216.214,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
bridge("br-int")
----------------
thaw
Resuming from table 31
31. priority 0, cookie 0x1000000000000
resubmit(,40)
40. priority 0, cookie 0x1040000000000
load:0x1->NXM_NX_REG4[16..18]
resubmit(,41)
41. tcp,reg4=0x10000/0x70000,nw_dst=10.101.216.214,tp_dst=80, priority 200, cookie 0x1040000000000
load:0x2->NXM_NX_REG4[16..18]
load:0x1->NXM_NX_REG0[19]
group:1
-> no live bucket
Final flow: recirc_id=0x2f1e,ct_state=new|trk,ct_zone=65520,eth,tcp,reg0=0x80002,reg4=0x20000,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=36:ab:6b:31:0c:75,nw_src=10.211.1.14,nw_dst=10.101.216.214,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
Megaflow: recirc_id=0x2f1e,ct_state=+new-inv+trk,ct_mark=0,eth,tcp,in_port=15,nw_dst=10.101.216.214,nw_frag=no,tp_dst=80
Datapath actions: hash(l4(0)),recirc(0x2f1f)
此过程和我们前面分析的一致。
查看frontend到backend1的全过程
# antctl trace-packet -S default/frontend -D default/backend1 -f "tcp,tcp_src=80"
Flow: tcp,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=86:5c:0a:d2:c8:c3,nw_src=10.211.1.14,nw_dst=10.211.1.15,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=0,tcp_flags=0
bridge("br-int")
----------------
0. in_port=15, priority 190, cookie 0x1030000000000
load:0x2->NXM_NX_REG0[0..15]
goto_table:10
10. ip,in_port=15,dl_src=2a:42:47:c6:9b:15,nw_src=10.211.1.14, priority 200, cookie 0x1030000000000
goto_table:29
29. priority 0, cookie 0x1000000000000
goto_table:30
30. ip, priority 200, cookie 0x1000000000000
ct(table=31,zone=65520,nat)
nat
-> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 31.
-> Sets the packet to an untracked state, and clears all the conntrack fields.
Final flow: tcp,reg0=0x2,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=86:5c:0a:d2:c8:c3,nw_src=10.211.1.14,nw_dst=10.211.1.15,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=0,tcp_flags=0
Megaflow: recirc_id=0,eth,ip,in_port=15,dl_src=2a:42:47:c6:9b:15,nw_src=10.211.1.14,nw_dst=0.0.0.0/1,nw_frag=no
Datapath actions: ct(zone=65520,nat),recirc(0x2f3e)
===============================================================================
recirc(0x2f3e) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
Replacing src/dst IP/ports to simulate NAT:
Initial flow:
Modified flow:
===============================================================================
Flow: recirc_id=0x2f3e,ct_state=new|trk,ct_zone=65520,eth,tcp,reg0=0x2,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=86:5c:0a:d2:c8:c3,nw_src=10.211.1.14,nw_dst=10.211.1.15,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=0,tcp_flags=0
bridge("br-int")
----------------
thaw
Resuming from table 31
31. priority 0, cookie 0x1000000000000
resubmit(,40)
40. priority 0, cookie 0x1040000000000
load:0x1->NXM_NX_REG4[16..18]
resubmit(,41)
41. priority 0, cookie 0x1000000000000
goto_table:42
42. priority 0, cookie 0x1000000000000
goto_table:50
50. priority 0, cookie 0x1000000000000
goto_table:60
60. priority 0, cookie 0x1000000000000
goto_table:61
61. priority 0, cookie 0x1000000000000
goto_table:70
70. priority 0, cookie 0x1000000000000
goto_table:80
80. dl_dst=86:5c:0a:d2:c8:c3, priority 200, cookie 0x1030000000000
load:0x10->NXM_NX_REG1[]
load:0x1->NXM_NX_REG0[16]
goto_table:90
90. priority 0, cookie 0x1000000000000
goto_table:100
100. priority 0, cookie 0x1000000000000
goto_table:101
101. priority 0, cookie 0x1000000000000
goto_table:105
105. ct_state=+new+trk,ip, priority 190, cookie 0x1000000000000
ct(commit,table=106,zone=65520)
drop
-> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 106.
-> Sets the packet to an untracked state, and clears all the conntrack fields.
Final flow: recirc_id=0x2f3e,eth,tcp,reg0=0x10002,reg1=0x10,reg4=0x10000,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=86:5c:0a:d2:c8:c3,nw_src=10.211.1.14,nw_dst=10.211.1.15,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=0,tcp_flags=0
Megaflow: pkt_mark=0/0x1,recirc_id=0x2f3e,ct_state=+new-est-rpl-inv+trk,ct_mark=0,eth,ip,in_port=15,dl_dst=86:5c:0a:d2:c8:c3,nw_dst=10.211.1.0/24,nw_frag=no
Datapath actions: ct(commit,zone=65520),recirc(0x2f3f)
===============================================================================
recirc(0x2f3f) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
===============================================================================
Flow: recirc_id=0x2f3f,ct_state=new|trk,ct_zone=65520,eth,tcp,reg0=0x10002,reg1=0x10,reg4=0x10000,in_port=15,vlan_tci=0x0000,dl_src=2a:42:47:c6:9b:15,dl_dst=86:5c:0a:d2:c8:c3,nw_src=10.211.1.14,nw_dst=10.211.1.15,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=0,tcp_flags=0
bridge("br-int")
----------------
thaw
Resuming from table 106
106. priority 0, cookie 0x1000000000000
goto_table:110
110. ip,reg0=0x10000/0x10000, priority 200, cookie 0x1000000000000
output:NXM_NX_REG1[]
-> output port is 16
Final flow: unchanged
Megaflow: recirc_id=0x2f3f,eth,ip,in_port=15,nw_src=10.211.1.14,nw_frag=no
Datapath actions: 4
以上