前言
在上一篇中我们完成了kubernetes的高可用集群的搭建,但我们搭建出来的集群状态均显示都是Not Ready,这背后的原因是由于我们的
集群网络并没打通,本文我们将接着上文继续往下中,完成我们的集群网络插件的安装与部署。
➜ kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
#网络插件
Kubernetes的三个网络
注意: 这里的三个网络的网段都不能重叠。
Node Network
Node 网络自然是你本地宿主机的网络,即局域网内通信的网络。这里的IP我们称之为Node ip,在我们的教程中他的网段为192.168.0.0/24。
Pod Network
Pod 网络是为了提供Cluster中pod与pod之间的通信。无论是同猪鸡上不同pod之间的互通,还是跨属猪鸡之间的POD互通,都属于Pod网络的范畴,这里的IP我们叫成Pod IP(如果你熟悉虚拟机,你也可以认为这就是每一个虚拟机的ip地址)。它有kubernetes的网络插件进行维护和更新,在本文中他的网段地址是10.244.0.0/12。
Service Network(即VIP)
Service网络其实是VIP(虚拟IP),对的就是和你理解的那个VIP是一个意思,他是虚拟存在的,可以看成流量入口的IP,这里的IP我们叫成Cluster IP。我们使用ipvs作为SVC的实现之后,将在host节点上创建一个kube-ipvs0的虚拟网卡,由kube-proxy进行规则的维护和更新,在本文中的它的网段地址是10.96.0.0/12。
部署flannel网络插件
你可以从github获取到更多的关于flannel
kubectl apply -f flannel.yaml
flannel.yaml
#
---
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
# Users and groups
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
# Privilege Escalation
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
# Capabilities
allowedCapabilities: ['NET_ADMIN']
defaultAddCapabilities: []
requiredDropCapabilities: []
# Host namespaces
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
# SELinux
seLinux:
# SELinux is unsed in CaaSP
rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "172.224.0.0/12",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-flannel-ds-amd64
namespace: kube-system
labels:
tier: node
app: flannel
spec:
template:
metadata:
labels:
tier: node
app: flannel
spec:
hostNetwork: true
nodeSelector:
beta.kubernetes.io/arch: amd64
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: freemanliu/flannel:v0.11.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: freemanliu/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=eth0
resources:
requests:
cpu: "1000m"
memory: "500Mi"
limits:
cpu: "1000m"
memory: "500Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: host-time
mountPath: /etc/localtime
readOnly: true
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: host-time
hostPath:
path: /etc/localtime
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
检测部署状态
使用命令kubectl get pods -nkube-system -owide | grep flannel 查看到flannel的运行情况。
kubectl get pods -nkube-system -owide | grep flannel
kube-flannel-ds-amd64-5fn9l 1/1 Running 0 23h 192.168.2.10 node-i002 <none> <none>
kube-flannel-ds-amd64-5gj5d 1/1 Running 0 23h 192.168.2.13 node-i004 <none> <none>
kube-flannel-ds-amd64-9qrjk 1/1 Running 0 23h 192.168.0.112 node-g001 <none> <none>
kube-flannel-ds-amd64-bbkd5 1/1 Running 0 23h 192.168.1.139 node-h001 <none> <none>
kube-flannel-ds-amd64-c9l74 1/1 Running 0 23h 192.168.1.1 server2.k8s.local <none> <none>
kube-flannel-ds-amd64-kdvw5 1/1 Running 0 23h 192.168.1.137 node-h003
排错
如果发现并没有运行成功排查问题的思路如下,主要也是通过日志信息来排查。
1.使用kubectl logs -f ${NAME} -nkube-system,这里的NAME取值为kube-flannel-ds-amd64-xxxx,查看相关日志输出。
2.如果以上命令无法使用 可以在任意节点使用dokcer命令查看容器的日志,可以通过docker ps -a | grep flanneld来定位到flannel的容器id,然后使用docker logs -f 容器id 进行日志查看并排错。
查看集群状态
当网络插件部署成功之后,可以看到所有节点的状态已经是ready状态了,这样我们可以继续部署其他插件了。
kubectl get node
NAME STATUS ROLES AGE VERSION
node-g001 Ready node 24h v1.14.2
node-g002 Ready node 24h v1.14.2
node-g003 Ready node 24h v1.14.2
node-g004 Ready node 24h v1.14.2
....
部署metrics-server
部署完成之后,你可以使用kubectl top xx 方式来查看的集群的一个资源情况
kubectl apply -f metrics-server.yaml
metrics-server.yaml
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:aggregated-metrics-reader
labels:
rbac.authorization.k8s.io/aggregate-to-view: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rules:
- apiGroups: ["metrics.k8s.io"]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
- name: host-time
hostPath:
path: /etc/localtime
containers:
- name: metrics-server
image: freemanliu/metrics-server-amd64:v0.3.3
imagePullPolicy: IfNotPresent
args:
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
volumeMounts:
- name: tmp-dir
mountPath: /tmp
- mountPath: /etc/localtime
name: host-time
---
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: kube-system
labels:
kubernetes.io/name: "Metrics-server"
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
protocol: TCP
targetPort: 443
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
error: metrics not available yet
执行 kubectl top node 抛出以上错误时,说明插件还没生效,需要等待几分钟之后在试。
查看集群资源
# 显示node资源
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node-g001 484m 6% 15846Mi 24%
node-g002 366m 4% 9218Mi 14%
node-g003 382m 4% 13197Mi 20%
...
# 显示每一个pod资源
kubectl top pod
NAME CPU(cores) MEMORY(bytes)
activity-service-7d946795cf-2stmq 11m 443Mi
yyy-service-7d946795cf-gnckg 59m 446Mi
xxx-service-7d946795cf-tf6wj 9m 439Mi
xxx-service-f4758dd65-jjh2j 2m 402Mi
...
替换coredns
kubeadm安装的coredns的版本比较老,会触发一个bug,具体的可以在google自行搜索。
这里我们部署1.15.0版本的coredns来替换1.13.x的coredns
这里我们先删除一下老的coredns 然后在重新创建一个。
kubectl delete -f coredns.yaml
kubectl apply -f coredns.yaml
coredns.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: coredns
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
rules:
- apiGroups:
- ""
resources:
- endpoints
- services
- pods
- namespaces
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:coredns
subjects:
- kind: ServiceAccount
name: coredns
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 114.114.114.114 223.5.5.5
cache 30
reload
loop
loadbalance
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/name: "CoreDNS"
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
spec:
serviceAccountName: coredns
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
containers:
- name: coredns
image: coredns/coredns:1.5.0
imagePullPolicy: IfNotPresent
args: [ "-conf", "/etc/coredns/Corefile" ]
volumeMounts:
- name: host-time
mountPath: /etc/localtime
readOnly: true
- name: config-volume
mountPath: /etc/coredns
readOnly: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
dnsPolicy: Default
volumes:
- name: host-time
hostPath:
path: /etc/localtime
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "CoreDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.96.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP
查看插件状态
如果error 请通过日志排查问题。
kubectl get pods -nkube-system | grep coredns
coredns-5d4f7b9674-6wltm 1/1 Running 0 18d
coredns-5d4f7b9674-84szj 1/1 Running 0 18d
测试coredns 是否生效
# kubectl apply -f busybox.yaml
pod/busybox created
# 检测DNS
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
#清理
kubectl delete -f busybox.yaml
#busybox.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- name: busybox
image: busybox:1.28
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent