引言
本文探讨如何搭建k8s集群。
搭建k8s环境平台规划
单master集群
单master集群,如果master挂了,就比较麻烦。
多master集群
多master如果挂了1个master,还有两个master可以用。明显可用性更强(高可用)。
部署k8s集群方式
目前主要有三种方式。
kubeadm
kubeadm
是一个k8s部署工具,提供kubeadm init
和kubeadm join
,用于快速部署k8s集群。
如何安装? 可查看安装指南。
二进制包
从github下载发行版的二进制包,手动部署每个组件,组成k8s集群。
kubeadm降低部署门槛,但屏蔽了很多细节,遇到问题很难排查。如果想更容易可控,推荐使用二进制包部署k8s集群。
下面分别用这两种方式来搭建集群。
RKE
RKE是一款经过CNCF认证的开源Kubernetes发行版,可以在Docker容器内运行。它通过删除大部分主机依赖项,并为部署、升级和回滚提供一个稳定的路径,从而解决了Kubernetes最常见的安装复杂性问题。
使用它安装异常简单。
安装虚拟机
kubeadm命令搭建集群
现在有了3台Centos7的虚拟机,首先做的事情是系统初始化,比如永久关闭防火墙。
系统初始化
关闭防火墙
systemctl disable firewalld
关闭swap
sed -ri 's/.*swap.*/#&/' /etc/fstab
关闭selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config
设置主机名
hostnamectl set-hostname centos1 # 在三台机器上分别执行
hostnamectl set-hostname centos2
hostnamectl set-hostname centos3
然后在所有机器上都执行
cat >> /etc/hosts <<EOF
172.20.10.2 centos1
172.20.10.13 centos2
172.20.10.14 centos3
EOF
时间同步
yum install ntpdate -y
ntpdate time.windows.com
安装Docker
yum install wget -y # 安装wget
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo #下载docker
yum -y install docker-ce-18.06.1.ce-3.el7 # 安装docker
systemctl enable docker && systemctl start docker # 设成开机启动
docker --version # 查看版本 检验是否安装成功
输出
[root@centos3 ~]# docker --version # 查看版本 检验是否安装成功
Docker version 18.06.1-ce, build e68fc7a
安装好了之后,为了能正常下载东西,需要改成国内镜像:
cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://b9pmyelo.mirror.aliyuncs.com"]
}
EOF
重启使生效
systemctl restart docker
安装kubeadm,kubelet和kubectl
安装之前,需要配置yum国内源
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
下载
yum install -y kubelet-1.20.4 kubeadm-1.20.4 kubectl-1.20.4
当前安装的版本是
================================================================================
Package 架构 版本 源 大小
================================================================================
正在安装:
kubeadm x86_64 1.20.4-0 kubernetes 8.3 M
kubectl x86_64 1.20.4-0 kubernetes 8.5 M
kubelet x86_64 1.20.4-0 kubernetes 20 M
然后设置开机启动
systemctl enable kubelet
查看所需的镜像
[root@centos3 ~]# kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.20.5
k8s.gcr.io/kube-controller-manager:v1.20.5
k8s.gcr.io/kube-scheduler:v1.20.5
k8s.gcr.io/kube-proxy:v1.20.5
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns:1.7.0
可以看到,当前已经到了1.20.5的,但是此时国内镜像源还未更新,因此我们只能安装1.20.4
拉取相关镜像
kubeadm config print init-defaults >init.default.yaml
然后主要修改以下几行:
advertiseAddress: 172.20.10.2 #修改为指定的IP地址
imageRepository: registry.aliyuncs.com/google_containers #修改镜像源
kubernetesVersion: v1.20.4 #修改版本
然后执行
kubeadm config images pull --config=init.default.yaml
根据配置文件来拉取相关镜像,以免下面init
的时候拉取失败。
下面就可以部署启动master了
部署master
在想作为Master的机器上执行
kubeadm init \
--apiserver-advertise-address=172.20.10.2 \ #指定master ip地址 ,应用于多网卡情况
--image-repository registry.aliyuncs.com/google_containers \ # 指定采用国内源
--kubernetes-version=1.20.4 \ # 指定版本,很重要,不然会拉最新的版本,可能出现国内源未更新的情况,导致部署失败
--service-cidr=10.96.0.0/12 \ #用于安装网络插件
--pod-network-cidr=10.244.0.0/16 #用于安装网络插件
执行结果:
[root@centos3 ~]# kubeadm init \
> --apiserver-advertise-address=172.20.10.2 \
> --image-repository registry.aliyuncs.com/google_containers \
> --kubernetes-version=1.20.4 \
> --service-cidr=10.96.0.0/12 \
> --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.20.4
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [centos3 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.20.10.2]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [centos3 localhost] and IPs [172.20.10.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [centos3 localhost] and IPs [172.20.10.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 70.005199 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node centos3 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node centos3 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: h445j1.egcjfuzsap4onq5g
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.20.10.2:6443 --token h445j1.egcjfuzsap4onq5g \
--discovery-token-ca-cert-hash sha256:13405f02dd37fabccbedd202924329291da1948f0ad8cb4cfe448f454f2104f2
根据提示执行下面代码
[root@centos3 ~]# mkdir -p $HOME/.kube
[root@centos3 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@centos3 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
或者如果是root的话,直接执行:
export KUBECONFIG=/etc/kubernetes/admin.conf
然后还有其他节点加入集群的提示:
kubeadm join 172.20.10.2:6443 --token h445j1.egcjfuzsap4onq5g \
--discovery-token-ca-cert-hash sha256:13405f02dd37fabccbedd202924329291da1948f0ad8cb4cfe448f454f2104f2
但是不着急,我们先看下状态:
[root@centos3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
centos3 NotReady control-plane,master 5m31s v1.20.4
可以看到status
是NotReady,不是Running,说明有一定的问题。
其实是因为有些组件没有起来,一种解决方法就是安装网络插件CNI。
[root@centos3 ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
The connection to the server raw.githubusercontent.com was refused - did you specify the right host or port?
该网址访问不了,是因为在外网。此时需要大家想办法各显神通了。
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
# Users and groups
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
# Privilege Escalation
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
# Capabilities
allowedCapabilities: ['NET_ADMIN', 'NET_RAW']
defaultAddCapabilities: []
requiredDropCapabilities: []
# Host namespaces
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
# SELinux
seLinux:
# SELinux is unused in CaaSP
rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.13.1-rc2
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.13.1-rc2
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN", "NET_RAW"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
开个玩笑,其实也不长,我直接贴上来了。大家可以直接复制。
[root@centos3 ~]# kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
我就复制到文件kube-flannel.yml
中。
不出意外,过了一会,再执行
[root@centos3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
centos3 Ready control-plane,master 17m v1.20.4
已经变成Ready了,KO。
加入集群
在另外两台机器上执行
[root@centos1 ~]# kubeadm join 172.20.10.2:6443 --token h445j1.egcjfuzsap4onq5g \
> --discovery-token-ca-cert-hash sha256:13405f02dd37fabccbedd202924329291da1948f0ad8cb4cfe448f454f2104f2
就是master节点最后输出的提示。
输出
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
稍等片刻,在master上执行
[root@centos3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
centos1 Ready <none> 3m18s v1.20.4
centos2 Ready <none> 3m1s v1.20.4
centos3 Ready control-plane,master 22m v1.20.4
可以看到,都是Ready了,说明集群可用了。
测试
在集群中创建一个pod,验证是否正常运行:
kubectl create deployment nginx --image=nginx
kubectl get
等状态变成Running再执行:
kubectl expose deployment nginx --port=80 --type=NodePort
查看最终状态
[root@centos3 ~]# kubectl get pod,svc
NAME READY STATUS RESTARTS AGE
pod/nginx-6799fc88d8-dslgr 1/1 Running 0 2m45s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 35m
service/nginx NodePort 10.99.100.132 <none> 80:30362/TCP 2m45s
访问地址:http://节点IP:Port
注意端口是上买的的30362,下面以任意节点的IP+端口访问:
好了,下面介绍二进制方式搭建集群
二进制方式搭建集群
这节有空补上。
rke
这是最容易的部署方式了。
由于公司电脑是ubuntu,因此这里的虚拟机环境换成了ubuntu20.4.2
两台ubuntu虚拟机,两台的/ec/hosts
都添加:
192.168.1.6 rancher1
192.168.1.7 rancher2
192.168.1.6 rancher.my.com
下载工具
可以从 http://mirror.cnrancher.com 复制最新下载地址
wget http://rancher-mirror.cnrancher.com/helm/v3.5.3/helm-v3.5.3-linux-amd64.tar.gz
wget http://rancher-mirror.cnrancher.com/kubectl/v1.19.6/linux-amd64-v1.19.6-kubectl
wget http://rancher-mirror.cnrancher.com/rke/v1.2.7/rke_linux-amd64
如果解析不了地址,跟着下面修改DNS:
vim /etc/NetworkManager/NetworkManager.conf
在[main]中添加dns=no
,保存退出
vim /etc/resolv.conf
添加nameserver 114.114.114.114 nameserver 8.8.8.8
mv rke_linux-amd64 /usr/bin/rke
mv linux-amd64-v1.19.6-kubectl /usr/bin/kubectl
tar -zxvf helm-v3.5.3-linux-amd64.tar.gz
mv linux-amd64/helm /usr/bin/helm
chmod +x /usr/bin/rke /usr/bin/kubectl /usr/bin/helm
生成ssh访问公钥
执行之前确保hosts配置OK
并且注意,rke注册不能使用root用户。
如在centos7系统下,需要先新增一个用户,然后切换到新用户,再执行下面的代码。
比如,新增一个rancher用户:adduser rancher passwd rancher usermod -aG docker rancher newgrp docker
ssh-keygen -t rsa -C "xxxx@qq.com"
ssh-copy-id rancher1
ssh-copy-id rancher2
#集群机器都推送,包括自身
RKE创建Rancher k8s集群
创建 rancher-cluster.yml 文件,用于 rke 推送集群配置使用:
cluster_name: rancher
nodes:
- address: 192.168.1.6
user: yjw # docker组用户,集群虚拟机必须都有这个用户
role: [controlplane,worker,etcd]
- address: 192.168.1.7
user: yjw # docker组用户
role:
- worker
services:
etcd:
snapshot: true
creation: 6h
retention: 24h
network:
plugin: weave
ingress:
provider: nginx
执行集群部署
rke up --config rancher-cluster.yml
等待部署完毕,正常显示:
INFO[0138] Finished building Kubernetes cluster successfully
使用kubectl
验证集群健康状态
mkdir -p ~/.kube
cp kube_config_rancher-cluster.yml ~/.kube/config
kubectl get nodes
kubectl get cs
查看k8s集群
yjw@rancher1:~/temp$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.1.6 Ready controlplane,etcd,worker 4m37s v1.20.5
192.168.1.7 Ready worker 4m34s v1.20.5
yjw@rancher1:~/temp$ kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}