Kubernetes inter-pod networking

no network address translation (NAT)

Each pod gets its own unique IP address and can communicate with all other pods through a flat, NAT-less network.
When pod A connects to(sends a network packet to) pod B,the source IP pod B sees must be the same IP that pod A sees as its own. There should be no network address translation (NAT) performed inbetween – the packet sent by pod A must reach pod B with both the source and destination address unchanged.

Figure: Kubernetes mandates pods are connected through a NAT-less network
在这里插入图片描述

The requirement for NAT-less communication between pods also extends to pod-to-node and node-to-pod communication. But, when a pod communicates with servicesout in the internet, the source IP of the packets the pod sends does need to be changed, because the pod’s IP is private. Thus, the source IP of outbound packets is changed to the host worker node’s IP address.

A pod’s network interface is thus whatever is set up in the infrastructure container.

ENABLING COMMUNICATION BETWEEN PODS ON THE SAME NODE

Before the infrastructure container is started, a virtual ethernet interface pair (a veth pair) is created for the container. One interface of the pair remains in the host’s namespace (you’ll see it listed as vethXXX when you run ifconfig on the node), whereas(而) the other is moved into the container’s network namespace and renamed to eth0. The two virtual interfaces are like two ends of a pipe (or like two network devices connected by an ethernet cable(电缆)) – what goes in on one side, comes out on the other, and vice-versa(反之亦然).

# ip link
...
16: vethweplefb56da@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT 
    link/ether 82:f9:87:8a:29:73 brd ff:ff:ff:ff:ff:ff link-netnsid 0
68: vethwepl24ec8fc@if67: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT 
    link/ether 66:b1:cd:5a:7d:e3 brd ff:ff:ff:ff:ff:ff link-netnsid 1
76: vethwepl9bc7f84@if75: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT 
    link/ether 42:ed:26:7c:98:2f brd ff:ff:ff:ff:ff:ff link-netnsid 2
	
	
(kubectl scale rc kubia --replicas=3, 之后多生成了两个)
78: vethwepl2c5420e@if77: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT 
    link/ether ea:bd:17:0b:7e:f9 brd ff:ff:ff:ff:ff:ff link-netnsid 3
80: vethwepl9220b8d@if79: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT 
    link/ether ae:50:fc:66:0e:4e brd ff:ff:ff:ff:ff:ff link-netnsid 4

Figure: Pods on a node are connected to the same bridge through virtual ethernet interface pairs
在这里插入图片描述

The interface in the host’s network namespace is attached to a network bridge that the container runtime is configured to use. The eth0 interface in the container is assigned an IP address from the bridge’s address range. Anything that an application running inside the container sends to the eth0 network interface (the one in the container’s namespace), comes out at the other veth interface in the host’s namespace and is sent to the bridge. This means it can be received by any network interface that is connected to the bridge.

If pod A sends a network packet to pod B, the packet first

  • goes through pod A’s veth pair,
  • to the bridge and
  • then to through pod B’s veth pair.
    All containers on a node are connected to the same bridge, so they can all communicate with each other like this.

But to enable communication between containers running on different nodes, the bridges on those nodes need to be connected somehow(某种方式).

ENABLING COMMUNICATION BETWEEN PODS ON DIFFERENT NODES

We know podIP addresses must be unique across the whole cluster, so the bridges across the nodes must use non-overlapping(非重叠) address ranges to prevent pods on different nodes from getting the same IP. In the example shown in figure 11.16, the bridge on node A is using the 10.1.1.0/24 IP range and the bridge on node B is using 10.1.2.0/24, which ensures there is never an IP address conflict.

在这里插入图片描述
Figure: For pods on different nodes to communicate, the bridges need to be connected somehow

Figure shows that to enable communication between pods across two nodes with plain layer 3 networking, the node’s physical network interface needs to be connected to the bridge, as well. Routing tables on node A need to be configured so all packets destined for 10.1.2.0/24 are routed to node B, whereas node B’s routing tables need to be configured so packets sent to 10.1.1.0/24 are routed to node A.

With this type of setup, when a packet is sent by a container on one of the nodes to a container on the other node, the packet first goes through the veth pair, then through the bridge to the node’s physical adapter, then over the wire to the other node’s physical adapter, through the other node’s bridge and finally through the veth pair of the destination container.

This works only when nodes are connected to the same network switch(网关), without any routers in between, otherwise(否则) those routers(路由器) would drop the packets because they refer to pod IPs, which are private. Sure, the routers in between could be configured to route packets between the nodes, but this becomes increasingly difficult and error-prone(易错) as the number of routers between the nodes increases. Because of this, it’s easier to use a Software Defined
Network (SDN), which makes the nodes appear as though(好像) they are connected to the same network switch, regardless of the actual underlying network topology, no matter how complex it is. Packets sent from the pod are encapsulated(封装), sent over the network to the node running the other pod, where they are de-encapsulated(解封装) and delivered to the pod in their original form.

kube-proxy

Everything related to services is handled by the kube-proxy process running on each node. Initially, the kube-proxy was an actual proxy waiting for connectionsand for each incoming connection, opening a new connection to one of the pods.

Each service gets its own stable IP address and port. Clients (usually pods) use the service by connecting to this IP address and port. The IP address is completely virtual – it’s not assigned to any network interfaces and is never listed as either the source or the destination IP address in a network packet when the packet leaves the node. A key detail of services is that they consist of an IP and port pair (or multiple IP and port pairs in the case of multi-port services), so the service IP by itself doesn’t represent anything. That’s why you can’t ping them.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值