一、需求
为了应对公司业务服务迁移到k8s集群,同时为了满足在我们的idc中能够实现直接与pod进行通信,从而实现业务能够注册集群外部zk的需求
二、整体网络实现思想
整体网络采用的一个机架对等一个ToR(对应着不同的peer的asNumber和peer ip),不同的ToR直接通过网络设施实现路由对等
三、k8s calico Bgp实现流程及必备条件
1)拥有一个刚刚初始化完成的k8s集群
2)根据calico官网,安装tigera-operator文件
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/tigera-operator.yaml
3)拉取自定义配置文件
curl https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/custom-resources.yaml -O
4)修改自定义配置文件custom-resources.yaml
[root@l-shahe-k8s-master1 calico]$ cat custom-resources.yaml
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
registry: harbor-sh.xc.com/
imagePath: calico
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
ipPools:
- blockSize: 27 #每台机器占用的预分配的ip地址
cidr: 172.21.0.0/16
encapsulation: IPIPCrossSubnet
natOutgoing: Disabled
nodeSelector: all()
---
# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
5)删除默认的ippool
6)创建自定义的ippool (由于我们要区分内外网),另外还要注意一点,首次创建的ippool cidr的范围最好是整个集群分配cidr范围的1/2,预留1/2的cidr做后续补充
[root@l-shahe-k8s-master1 calico]$ cat ippool.yaml
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: outbound-ippool
spec:
blockSize: 27
cidr: 172.21.0.0/18
ipipMode: "Never"
natOutgoing: false
disabled: false
nodeSelector: networker == "outbound"
allowedUses:
- Workload
- Tunnel
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: internal-ippool
spec:
blockSize: 27
cidr: 172.21.64.0/18
ipipMode: "Never"
natOutgoing: false
disabled: false
nodeSelector: networker == "internal"
allowedUses:
- Workload
- Tunnel
7)将master节点打上对应的ippool的标签,以让其从自定义节点池分配ip段
8)将master节点上从默认节点池分配的pod进行重启,让其释放掉原来ip,从自定义节点池重新分配ip
9)验证节点的容器的网络是否是Bgp
[root@l-shahe-k8s-master1 calico]$ ifconfig
cali0b46ffa7faa: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
cali421ddd347c1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
从ifconfig输出可以看出,我们创建出来的pod的mtu为 1500,符合我们的预期;
10)此时向集群加入一个新的节点,注意为不同ToR的节点;因为我们下面要验证不同的ToR之间路由的对等。eg:如果我们master(192.168.1.1)的ToR的asNumber为64531、peer ip为192.168.1.254,则新加入节点192.168.2.1的ToR的asNumber为64532、peer ip为192.168.2.254;注意我们新加入的集群的节点一定要打上我们的ippool的标签,从而从自定义节点池获取ip段
11)启动测试的pod,使其分布到不同的node节点上
[root@l-shahe-k8s-master1 calico]$ kubectl get pod -owide |grep Running
nginx4-f8c76d5f8-4qrgs 1/1 Running 0 15h 172.21.26.109 l-shahe-k8s-master1 <none> <none>
nginx4-f8c76d5f8-cfdnw 1/1 Running 0 15h 172.21.26.110 l-shahe-k8s-master1 <none> <none>
nginx4-f8c76d5f8-rcntv 1/1 Running 0 15h 172.21.25.108 192.168.2.1 <none> <none>
12)通过ping pod的ip进行验证,可以得出在master上面,ping自己的node上的podip可以通信,同理192.168.2.1也是如此;但是在master1上ping node192.168.2.1的pod是不通的,符合预期
13) 建立bgp peer
[root@l-shahe-k8s-master1 calico]$ cat bgp-peer.yaml
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: 10-120-128
spec:
peerIP: '192.168.1.254'
keepOriginalNextHop: true
asNumber: 64531
nodeSelector: rack == '192.168.1'
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: 10-120-129
spec:
peerIP: '192.168.2.254'
keepOriginalNextHop: true
asNumber: 64532
nodeSelector: rack == '192.168.2'
14)在master1和node192.168.2.1上打上对应的rack标签,此时还是无法完成对等,因为默认情况下是安装默认全局的asNumber去跟ToR进行对等
15)创建bgpconfig
[root@l-shahe-k8s-master1 calico]$ cat bgp-config.yaml
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: false
asNumber: 63400
serviceClusterIPs:
- cidr: 10.96.0.0/12
listenPort: 178
bindMode: NodeIP
注意直接禁用的全局互联
16)为每个node节点增加asNumber,使其覆盖全局的asNumber,从而用自己的asNumber进行与ToR的对等
calicoctl patch node l-shahe-k8s-master1 -p '{"spec": {"bgp": {"asNumber": "64531"}}}'
17)验证节点的Bgp对等是否成功
[root@l-shahe-k8s-master1 calico]$ calicoctl node status
Calico process is running.
IPv4 BGP status
+----------------+---------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+----------------+---------------+-------+----------+-------------+
| 192.168.1.254 | node specific | up | 08:22:57 | Established |
+----------------+---------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found
可以看到Established,对等完成
18)此时通过master1去ping node192.168.2.1的pod已经完全可以通信,此时ToR的Bgp对等实验完成