近年来,Kubernetes好归好,但是也有不太完美的地方,比如,网络通信,的确是个问题。目前只能通过各种CNI插件来实现网络上的互通,如下图,需求就是需要外部环境与pod之间能够直接互通,这几天百忙中抽出了点时间,决定倒腾一下。
首先想到的Flannel,它通过Overlay技术,打通pod之间的网络是没有问题,pod去往外界会通过NAT将本身的地址(地址段192.168.0.0/16)转换程node的eth0接口地址,这也没毛病,但是外部环境想要直接地主动地访问pod的地址,那可有点费劲,理论上,我们是可以把去往pod节点的路由静态地指向某一台或者某几台node,但,如果node挂掉了,就算可以实现切换,底层网络中静态路由的切换效率是很慢的,这样一想,又一看calico,话说这款插件纯三层的,支持BGP,于是决定尝试。
k8s环境已经搭好了,果断开始。
安装calico ---Master上操作:
calico有多种安装方式,既然k8s都装好了,那就直接装。
官方文档:https://docs.projectcalico.org/getting-started/kubernetes/quickstart
这里的kubernetes是用kubeadm部署的,在安装之前,非常有必要看下etcd的运行进程,并记下证书和key的保存位置:
[root@vm-lw-k8ssxc-t01 ~]# ps -ef |grep etcd
root 4986 4947 0 14:15 ? 00:04:42 etcd --advertise-client-urls=https://10.213.10.50:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls=https://10.213.10.50:2380 --initial-cluster=vm-lw-k8ssxc-t01=https://10.213.10.50:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://10.213.10.50:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://10.213.10.50:2380 --name=vm-lw-k8ssxc-t01 --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
root 4990 4939 1 14:15 ? 00:09:29 kube-apiserver --advertise-address=10.213.10.50 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
root 31609 30756 0 22:20 pts/0 00:00:00 grep --color=auto etcd
下载calico.ymal文件
curl https://docs.projectcalico.org/manifests/calico-etcd.yaml -o calico.yaml
修改calico.yaml文件,按照以下文字描述
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: calico-etcd-secrets
namespace: kube-system
data:
# Populate the following with etcd TLS configuration if desired, but leave blank if
# not using TLS for etcd.
# The keys below should be uncommented and the values populated with the base64
# encoded contents of each file that would be associated with the TLS data.
# Example command for encoding a file contents: cat <file> | base64 -w 0
etcd-key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBeGcvc3M0QmZ4dW5ON2VJZUZvRjhncXltdjArL2F4RVRtWFpGdnNaTzJ4ZEp1bFh0CjBsQStEZGJ
aemNRNU9yWGxG0Nsd1BncEE5S3FEcE04jIvcmhkOUkwN0JIQkE9PQo0EgUFJJVkFURSBLRVktLS0tLQo........
#通过(cat /etc/kubernetes/pki/etcd/server.key | base64 | tr -d 'n')获取
etcd-cert: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURKakNDQWc2Z0F3SUJBZ0lJYkdISkpHV1ZmWHd3RFFZSktvWklodmNOQVFFTEJRQXdFakVRTUE0R0ExVUUKQXhNSFpYUmpaQz
FqWVRBZUZ3MH......
#通过(cat /etc/kubernetes/pki/etcd/server.crt | base64 | tr -d 'n')获取
etcd-ca: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN3akNDQWFxZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFTTVJBd0RnWURWUVFERXdkbGRHTmsKTFdOaE1CNFhEVEl3
LS0tLS0K......
#通过(cat /etc/kubernetes/pki/etcd/ca.crt | base64 | tr -d 'n')获取
#还有这部分
---
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# Configure this with the location of your etcd cluster.
#填下etcd连接地址
etcd_endpoints: "https://10.213.10.50:2379"
# If you're using TLS enabled etcd uncomment the following.
# You must also populate the Secret below with these files.
## 删除以下三条注释即可
etcd_ca: "/calico-secrets/etcd-ca"
etcd_cert: "/calico-secrets/etcd-cert"
etcd_key: "/calico-secrets/etcd-key"
# Typha is disabled.
typha_service_name: "none"
# Configure the backend to use.
calico_backend: "bird"
# Configure the MTU to use for workload interfaces and the
# tunnels. For IPIP, set to your network MTU - 20; for VXLAN
# set to your network MTU - 50.
veth_mtu: "1440"
.......................
然后,安装
kubectl apply -f calico.yaml
安装完成后,通过kubectl get pod --all-namespaces -o wide进行查看
[root@vm-lw-k8ssxc-t01 ~]# kubectl get pod --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
biz-test k8s-demo-76ffd4c6cb-sd66l 1/1 Running 0 9h 192.168.139.65 vm-lw-k8ssxc-t02 <none> <none>
kube-system calico-kube-controllers-849d4459c9-q54gz 1/1 Running 0 28h 10.213.10.52 vm-lw-k8ssxc-t03 <none> <none>
kube-system calico-node-9q8qp 1/1 Running 0 28h 10.213.10.51 vm-lw-k8ssxc-t02 <none> <none>
kube-system calico-node-d2l77 1/1 Running 0 28h 10.213.10.52 vm-lw-k8ssxc-t03 <none> <none>
kube-system calico-node-gjclj 1/1 Running 0 28h 10.213.10.53 vm-lw-k8ssxc-t04 <none> <none>
kube-system calico-node-qh2mm 1/1 Running 2 28h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system coredns-546565776c-25vg8 1/1 Running 186 2d7h 192.168.130.194 vm-lw-k8ssxc-t04 <none> <none>
kube-system coredns-546565776c-rqshx 1/1 Running 114 2d7h 192.168.130.193 vm-lw-k8ssxc-t04 <none> <none>
kube-system etcd-vm-lw-k8ssxc-t01 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system kube-apiserver-vm-lw-k8ssxc-t01 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system kube-controller-manager-vm-lw-k8ssxc-t01 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system kube-proxy-894rs 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system kube-proxy-csqph 1/1 Running 2 2d7h 10.213.10.52 vm-lw-k8ssxc-t03 <none> <none>
kube-system kube-proxy-g8fd5 1/1 Running 2 2d7h 10.213.10.51 vm-lw-k8ssxc-t02 <none> <none>
kube-system kube-proxy-s7fqw 1/1 Running 2 2d7h 10.213.10.53 vm-lw-k8ssxc-t04 <none> <none>
kube-system kube-scheduler-vm-lw-k8ssxc-t01 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
[root@vm-lw-k8ssxc-t01 ~]#
四台node上的calico-node-xxx都runing状态了,安装完成。
安装管理工具calicoctl ---继续在Master上操作:
下载calicoctl
curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.14.1/calicoctl
cp calicoctl /usr/bin/calicoctl
配置calicoctl
calicoctl的默认配置在/etc/calico/calicoctl.cfg ,有,就直接去编辑,如果没有,直接新建也行,这里需要加入
[root@vm-lw-k8ssxc-t01 ~]# cat /etc/calico/calicoctl.cfg
apiVersion: projectcalico.org/v3
kind: CalicoAPIConfig
metadata:
spec:
datastoreType: etcdv3
#填入etcd的访问地址,可以是集群,这里只有单机。
etcdEndpoints: "https://10.213.10.50:2379"
etcdCACert: |
#填入证书和key的信息,文件与安装calico的位置形同,不同于安装calico的是,这里证书和key的内容是通过cat获得。
-----BEGIN CERTIFICATE-----
MIICwjCCAaqgAwIBAgIBADANBgkqhkiG9w0BAQsFADASMRAwDgYDVQQDEwdldGNk
LWNhMB4XDTIwMDYwMTA3MDM0MloXDTMwMDUzMDA3MDM0MlowEjEQMA4GA1UEAxMH
-----END CERTIFICATE-----
etcdCert: |
-----BEGIN CERTIFICATE-----
MIIDJjCCAg6gAwIBAgIIbGHJJGWVfXwwDQYJKoZIhvcNAQELBQAwEjEQMA4GA1UE
AxMHZXRjZC1jYTAeFw0yMDA2MDEwNzAzNDJaFw0yMTA2MDEwNzAzNDJaMBsxGTAX
-----END CERTIFICATE-----
etcdKey: |
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAxg/ss4BfxunN7eIeFoF8gqymv0+/axETmXZFvsZO2xdJulXt
0lA+DdbZzcQ5OrXlFgR7x6L7ram1pwKQpMiBgdaIKbXCs4nM6xypFVgqNPYFztHe
-----END RSA PRIVATE KEY-----
验证安装结果:出现以下内容表示安装完成
[root@vm-lw-k8ssxc-t01 ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 10.213.10.51 | node-to-node mesh | up | 06:16:37 | Established |
| 10.213.10.52 | node-to-node mesh | up | 06:16:38 | Established |
| 10.213.10.53 | node-to-node mesh | up | 06:16:38 | Established |
| 10.213.10.254 | global | up | 06:16:39 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
#若部分节点未能正常运行,请在节点安装calicoctl并运行:
calicoctl node run --ip=10.213.10.50#(nodeip地址)
初始化calico配置:
创建文件bgpconfig_default.yaml ,并初始化bgp配置,文件内容:
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: true
asNumber: 64567 #设置集群的BGP as为64567
serviceClusterIPs:
- cidr: 192.168.0.0/16 #宣告集群IP地址
应用初始化配置:
calicoctl apply -f bgpconfig_default.yaml
详细介绍请见:
为node配置bgp邻居,这里我们只有4台node,我们为这4台创建一个共同的bgp邻居,也就是交换机的地址,10.213.10.254,as号为64568
创建文件bgppeerconfig.yaml,文件内容如下:
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: my-global-peer
spec:
peerIP: 10.213.10.254
asNumber: 64568
应用BGP邻居配置文件:
calicoctl apply -f bgppeerconfig.yaml
交换机上配置bgp
配置交换机bgp
为达到实验效果,这里只配置了50、51两个peer
bgp 64568
peer 10.213.10.50 as-number 64567
peer 10.213.10.51 as-number 64567
#
peer 10.213.10.50 enable
peer 10.213.10.51 enable
稍等片刻查看bgp邻居状态:
display bgp peer ipv4
BGP local router ID: 172.21.2.1
Local AS number: 64568
Total number of peers: 5 Peers in established state: 5
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
10.213.10.50 64567 600 688 0 5 00:00:36 Established
10.213.10.51 64567 904 854 0 5 00:00:17 Established
图中可知,邻居状态已经正常起来了,我们再看下BGP路由
display bgp routing-table ipv4
Total number of routes: 52
BGP local router ID is 172.21.2.1
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
Origin: i - IGP, e - EGP, ? - incomplete
Network NextHop MED LocPrf PrefVal Path/Ogn
* >e 192.168.33.0/26 10.213.10.50 0 64567i
* e 10.213.10.51 0 64567i
* >e 192.168.36.128/26 10.213.10.50 0 64567i
* e 10.213.10.51 0 64567i
* >e 192.168.130.192/26 10.213.10.50 0 64567i
* e 10.213.10.51 0 64567i
* >e 192.168.139.64/26 10.213.10.50 0 64567i
* e 10.213.10.51 0 64567i
* >e 192.168.255.0 10.213.10.50 0 64567i
* e 10.213.10.51 0 64567i
路由学习成功,通过50和51分别学到了192.168.0.0/16的路由,不过都是子网明细,这个并不影响。
k8s环境检查
k8s环境路由查看
route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.213.10.254 0.0.0.0 UG 100 0 0 eth0
10.213.10.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.33.0 10.213.10.52 255.255.255.192 UG 0 0 0 eth0
192.168.36.128 0.0.0.0 255.255.255.192 U 0 0 0 *
192.168.130.192 10.213.10.53 255.255.255.192 UG 0 0 0 eth0
192.168.139.64 10.213.10.51 255.255.255.192 UG 0 0 0 eth0
注意,只要calico顺利安装,k8s内部pod之间互通问题其实就已经帮你全解决了。
我们到master 地址后缀为50,看下bgp邻居信息
calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 10.213.10.51 | node-to-node mesh | up | 06:16:37 | Established |
| 10.213.10.52 | node-to-node mesh | up | 06:16:38 | Established |
| 10.213.10.53 | node-to-node mesh | up | 06:16:38 | Established |
| 10.213.10.254 | global | up | 06:16:39 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
3个node和1个global也也就是我们的交换机,邻居建立正常,这时候可以到node1上也看下邻居状态,也正常。另外两个node上的邻居状态显示与交换机邻居不正常,因为我们在交换机上未配,显示如下:
calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+--------------------------------+
| 10.213.10.254 | global | start | 01:46:35 | Connect Socket: Connection |
| | | | | refused |
| 10.213.10.50 | node-to-node mesh | up | 06:16:38 | Established |
| 10.213.10.51 | node-to-node mesh | up | 01:46:37 | Established |
| 10.213.10.53 | node-to-node mesh | up | 01:46:37 | Established |
+---------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
互通性测试
路由是有了,那么,能否正常互通呢,现在还不可以,需要配置calico的
集群所有机器开启ipv4 转发:
# vim /etc/sysctl.conf
在文件尾部加上:
net.ipv4.ip_forward = 1
#sysctl -p
创建pod的出站策略为非NAT,当然开启NAT也行,也能通,问题不大
创建配置文件:disable_all_nat.yaml 内容如下
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: default-ipv4-ippool
spec:
cidr: 192.168.0.0/16
natOutgoing: false
#开启NAT对应的选项为true
应用配置:
calicoctl apply -f disable_all_nat.yaml
测试开始:
找几台192.168.x.x的pod
[root@vm-lw-k8ssxc-t01 ~]# kubectl get pod -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
biz-test k8s-demo-76ffd4c6cb-sd66l 1/1 Running 0 9h 192.168.139.65 vm-lw-k8ssxc-t02 <none> <none>
kube-system calico-kube-controllers-849d4459c9-q54gz 1/1 Running 0 29h 10.213.10.52 vm-lw-k8ssxc-t03 <none> <none>
kube-system calico-node-9q8qp 1/1 Running 0 29h 10.213.10.51 vm-lw-k8ssxc-t02 <none> <none>
kube-system calico-node-d2l77 1/1 Running 0 29h 10.213.10.52 vm-lw-k8ssxc-t03 <none> <none>
kube-system calico-node-gjclj 1/1 Running 0 29h 10.213.10.53 vm-lw-k8ssxc-t04 <none> <none>
kube-system calico-node-qh2mm 1/1 Running 2 29h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system coredns-546565776c-25vg8 1/1 Running 186 2d7h 192.168.130.194 vm-lw-k8ssxc-t04 <none> <none>
kube-system coredns-546565776c-rqshx 1/1 Running 114 2d7h 192.168.130.193 vm-lw-k8ssxc-t04 <none> <none>
kube-system etcd-vm-lw-k8ssxc-t01 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system kube-apiserver-vm-lw-k8ssxc-t01 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system kube-controller-manager-vm-lw-k8ssxc-t01 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system kube-proxy-894rs 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none> <none>
kube-system kube-proxy-csqph 1/1 Running 2 2d7h 10.213.10.52 vm-lw-k8ssxc-t03 <none> <none>
kube-system kube-proxy-g8fd5 1/1 Running 2 2d7h 10.213.10.51 vm-lw-k8ssxc-t02 <none> <none>
kube-system kube-proxy-s7fqw 1/1 Running 2 2d7h 10.213.10.53 vm-lw-k8ssxc-t04 <none> <none>
kube-system kube-scheduler-vm-lw-k8ssxc-t01 1/1 Running 5 2d7h 10.213.10.50 vm-lw-k8ssxc-t01 <none>
直接交换机上pingping看:
<LWCO-N6/7-U41-SW-6800-POC>ping 192.168.139.65
Ping 192.168.139.65 (192.168.139.65): 56 data bytes, press CTRL_C to break
56 bytes from 192.168.139.65: icmp_seq=0 ttl=63 time=1.343 ms
56 bytes from 192.168.139.65: icmp_seq=1 ttl=63 time=2.202 ms
56 bytes from 192.168.139.65: icmp_seq=2 ttl=63 time=1.343 ms
56 bytes from 192.168.139.65: icmp_seq=3 ttl=63 time=1.855 ms
56 bytes from 192.168.139.65: icmp_seq=4 ttl=63 time=1.302 ms
--- Ping statistics for 192.168.139.65 ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 1.302/1.609/2.202/0.360 ms
<LWCO-N6/7-U41-SW-6800-POC>ping 192.168.130.194
Ping 192.168.130.194 (192.168.130.194): 56 data bytes, press CTRL_C to break
56 bytes from 192.168.130.194: icmp_seq=0 ttl=63 time=0.870 ms
56 bytes from 192.168.130.194: icmp_seq=1 ttl=63 time=0.835 ms
56 bytes from 192.168.130.194: icmp_seq=2 ttl=63 time=0.682 ms
56 bytes from 192.168.130.194: icmp_seq=3 ttl=63 time=0.754 ms
56 bytes from 192.168.130.194: icmp_seq=4 ttl=63 time=0.790 ms
--- Ping statistics for 192.168.130.194 ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.682/0.786/0.870/0.065 ms
<LWCO-N6/7-U41-SW-6800-POC>ping 192.168.130.193
Ping 192.168.130.193 (192.168.130.193): 56 data bytes, press CTRL_C to break
56 bytes from 192.168.130.193: icmp_seq=0 ttl=63 time=0.870 ms
56 bytes from 192.168.130.193: icmp_seq=1 ttl=63 time=0.722 ms
56 bytes from 192.168.130.193: icmp_seq=2 ttl=63 time=0.743 ms
56 bytes from 192.168.130.193: icmp_seq=3 ttl=63 time=0.740 ms
56 bytes from 192.168.130.193: icmp_seq=4 ttl=63 time=0.745 ms
--- Ping statistics for 192.168.130.193 ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.722/0.764/0.870/0.054 ms
全通
BGP有很多丰富的特性,这里不一一尝试,后面将继续探索,有好玩的再作分享。
今天先到这里。