使用 Kubeadm 部署 k8s 集群
本文会使用 kubeadm
工具部署一套最新版的 k8s
集群。您通过该文章将可以顺利地搭建一个自建的 Kubernetes 集群。
k8s简介
安装版本为:最新版。一个高可用的 Kubernetes 集群的架构应该是这个样子的。
第一种高可用方案:etcd 作为 Kubernetes 内部的一个组件。
第二种高可用方案:etcd 作为一个外部组件。
开始部署
Jenkins:10.20.20.18:8080
Gitlab:10.20.20.18
MySQL:192.168.56.101:3306 root/UMd8qAXoE-mm
主机规划
如果是在虚拟机下演示此集群,则每台虚拟机的配置至少为:2c2g
。
192.168.56.101 master
192.168.56.102 node01
192.168.56.103 node02
准备了 5 台虚拟机,每台虚拟机运行 CentOS7.7 x64 系统。使用 Flannel 作为网络解决方案。
网络规划
网络类型 | 网段 | 用途 |
---|---|---|
节点网络 | 10.20.20.0/23 | 用于物理节点之间内部通信 |
Pod网络 | 10.244.0.0/16 | Flannel 默认的网段 |
Service网络 | 10.96.0.0/12 | 集群网络 |
部署规划
For containerd 1.0, a daemon called cri-containerd was required to operate between Kubelet and containerd. Cri-containerd handled the Container Runtime Interface (CRI) service requests from Kubelet and used containerd to manage containers and container images correspondingly. Compared to the Docker CRI implementation (dockershim), this eliminated one extra hop in the stack.
However, cri-containerd and containerd 1.0 were still 2 different daemons which interacted via grpc. The extra daemon in the loop made it more complex for users to understand and deploy, and introduced unnecessary communication overhead.
In containerd 1.1, the cri-containerd daemon is now refactored to be a containerd CRI plugin. The CRI plugin is built into containerd 1.1, and enabled by default. Unlike cri-containerd, the CRI plugin interacts with containerd through direct function calls. This new architecture makes the integration more stable and efficient, and eliminates another grpc hop in the stack. Users can now use Kubernetes with containerd 1.1 directly. The cri-containerd daemon is no longer needed.
通常常见的部署方式:
- 各个组件以守护进程的方式进行部署(部署麻烦及繁琐,手工解决配置、证书相关的配置等)
- 每一个节点之上使用 Kubeadm 进行部署
- 每个节点都需要安装 Containerd
- 每个节点都安装 Kubelet
- 把第一个节点初始化为 Master 节点
- 把其他节点初始化为 Node 节点
- 上述初始化完成后,把 Master 的节点上的服务以 Pod 方式运行
- Node 节点上的 Kube-Proxy 也以 Pod 方式运行
- 每个节点以 Pod 的方式运行 Flannel(动态 Pod,上述为静态 Pod)
Kubeadm方式部署:
- master,nodes 上安装 kubelet、kubeadm、Containerd
- master:kubeadm init
- nodes:kubeadm join
系统设置
主机名解析
各个节点的主机名解析:
[root@linux-node01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.101 node01.lavenliu.cn node01
192.168.56.102 node02.lavenliu.cn node02
192.168.56.103 node03.lavenliu.cn node03
[root@linux-node01 ~]# scp /etc/hosts node02:/etc/
[root@linux-node01 ~]# scp /etc/hosts node03:/etc/
NTP 对时
ntp 时间对时,确保各节点时间一致。
yum install -y ntpdate
ntpdate ntp.aliyun.com
swap 设置
关闭swap(从k8s 1.8版本开始,要求关闭系统的swap。如果不关闭,默认配置下kubelet无法启动。不过可以在启动kubelet时添加命令行参数来解决--fail-swap-on=false
):
swapoff -a
sed -i 's/.*swap.*/#&/' /etc/fstab
接着调整内核的swappiness
参数,设置为 0 即可。操作系统设置,每个节点上都执行:
modprobe br_netfilter
# RHEL/CentOS7系统上可能会路由失败,需要如下设置
sudo cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
使配置生效:
sudo sysctl -p /etc/sysctl.d/k8s.conf
sudo sysctl --system
sshd 设置
设置SSHD的KeepAlive:
echo "ClientAliveInterval 10" >> /etc/ssh/sshd_config
echo "TCPKeepAlive yes" >> /etc/ssh/sshd_config
systemctl restart sshd
部署 Kubernetes Master
开始安装,
[root@linux-node01 ~]# sudo yum -y remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
sudo yum install -y yum-utils
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
sudo yum -y install containerd.io
# rpm -ql containerd.io
/etc/containerd
/etc/containerd/config.toml
/usr/bin/containerd
/usr/bin/containerd-shim
/usr/bin/containerd-shim-runc-v1
/usr/bin/containerd-shim-runc-v2
/usr/bin/ctr
/usr/bin/runc
/usr/lib/systemd/system/containerd.service
/usr/share/doc/containerd.io-1.4.8
/usr/share/doc/containerd.io-1.4.8/README.md
/usr/share/licenses/containerd.io-1.4.8
/usr/share/licenses/containerd.io-1.4.8/LICENSE
/usr/share/man/man5/containerd-config.toml.5
/usr/share/man/man8/containerd-config.8
/usr/share/man/man8/containerd.8
/usr/share/man/man8/ctr.8
准备 Containerd 配置文件,
containerd config default |sudo tee /etc/containerd/config.toml
启动 Containerd 服务,
sudo systemctl restart containerd
sudo systemctl enable containerd
开始安装,在所有节点上执行:
# 安装最新版
[root@linux-node01 ~]# sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
[root@linux-node01 ~]# sudo systemctl enable --now kubelet
# 查看安装的信息
[root@node01 ~]# rpm -qa |grep kube
kubectl-1.26.3-0.x86_64
kubelet-1.26.3-0.x86_64
kubernetes-cni-1.2.0-0.x86_64
kubeadm-1.26.3-0.x86_64
修改 Kubelet 配置,在每个节点上执行:
cat > /etc/sysconfig/kubelet <<EOF
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd
EOF
# config kubelet cgroup
cat > /etc/sysconfig/kubelet <<EOF
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd
EOF
# config CRI
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
vi /etc/containerd/config.toml
# [plugins."io.containerd.grpc.v1.cri"] 下的 sandbox_image
# 修改为一个你可以获取到镜像的源地址
sandbox_image="registry.aliyuncs.com/google_containers/pause:3.9"
# 还有需要加上下面
在[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]中加入
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
# 重启 containerd
sudo systemctl restart containerd
# 重启 kubelet
sudo systemctl restart kubelet
在所有节点上执行如下操作,
[root@linux-node01 ~]# systemctl enable kubelet
[root@linux-node01 ~]# kubeadm init \
--apiserver-advertise-address=192.168.56.145 \
--kubernetes-version=v1.26.0 \
--image-repository=registry.aliyuncs.com/google_containers \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/12
[init] Using Kubernetes version: v1.26.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local node01.lavenliu.cn] and IPs [10.96.0.1 192.168.56.145]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost node01.lavenliu.cn] and IPs [192.168.56.145 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost node01.lavenliu.cn] and IPs [192.168.56.145 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 10.508593 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node node01.lavenliu.cn as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node node01.lavenliu.cn as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: xaib2m.sur21ugc1gtv20n6
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.56.145:6443 --token xaib2m.sur21ugc1gtv20n6 \
--discovery-token-ca-cert-hash sha256:fb013945c83922934bb1264bd8c74322c24d2e323c4bee8091e81e8594506d04
# 最后一行的输出很重要,需要记录下来
[root@linux-node01 ~]# echo $?
0
[root@linux-node01 ~]# systemctl enable kubelet
上述过程会有详细的输出,内容如下:
-
[init]:指定版本进行初始化操作
-
[preflight] :初始化前的检查和下载所需要的Docker镜像文件。这需要一段时间,需要做好镜像加速。
(venv36) [root@node01 ~]# crictl images ls IMAGE TAG IMAGE ID SIZE registry.aliyuncs.com/google_containers/coredns v1.9.3 5185b96f0becf 14.8MB registry.aliyuncs.com/google_containers/etcd 3.5.6-0 fce326961ae2d 103MB registry.aliyuncs.com/google_containers/kube-apiserver v1.26.0 a31e1d84401e6 35.3MB registry.aliyuncs.com/google_containers/kube-controller-manager v1.26.0 5d7c5dfd3ba18 32.2MB registry.aliyuncs.com/google_containers/kube-proxy v1.26.0 556768f31eb1d 21.5MB registry.aliyuncs.com/google_containers/kube-scheduler v1.26.0 dafd8ad70b156 17.5MB registry.aliyuncs.com/google_containers/pause 3.9 e6f1816883972 322kB
-
[kubelet-start]:生成kubelet的配置文件”/var/lib/kubelet/config.yaml”,没有这个文件kubelet无法启动,所以初始化之前的kubelet实际上启动失败。
-
[certificates]:生成Kubernetes使用的证书,存放在/etc/kubernetes/pki目录中。
-
[kubeconfig] :生成 KubeConfig文件,存放在/etc/kubernetes目录中,组件之间通信需要使用对应文件。
-
[control-plane]:使用/etc/kubernetes/manifest目录下的YAML文件,安装 Master组件。
-
[etcd]:使用/etc/kubernetes/manifest/etcd.yaml安装Etcd服务。
-
[wait-control-plane]:等待control-plan部署的Master组件启动。
-
[apiclient]:检查Master组件服务状态。
-
[uploadconfig]:更新配置
-
[kubelet]:使用configMap配置kubelet。
-
[patchnode]:更新CNI信息到Node上,通过注释的方式记录。
-
[mark-control-plane]:为当前节点打标签,打了角色Master,和不可调度标签,这样默认就不会使用Master节点来运行Pod。
-
[bootstrap-token]:生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到
-
[addons]:安装附加组件CoreDNS和kube-proxy
根据其提示,我们创建配置文件:
[root@linux-node01 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@linux-node01 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
(venv36) [root@node01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node01.lavenliu.cn NotReady control-plane 3m11s v1.26.3
(venv36) [root@node01 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node01.lavenliu.cn NotReady control-plane 3m28s v1.26.3 192.168.56.145 <none> CentOS Linux 7 (Core) 3.10.0-1160.88.1.el7.x86_64 containerd://1.6.20
我们发现该节点处于 NotReady
状态,那是因为此时还没有使用任何网络插件,此时的 Node 与 Master 的连接不正常。接下来我们安装网络插件。目前最流行的Kubernetes网络插件有 Flannel、Calico、Canal,这里分别列举了 Canal 和 Flannel,我们可以选择其中之一进行部署。
我们再查看一下集群的组件状态,
(venv36) [root@node01 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
发现出现了报错,其实并不影响集群的正常运行。如果你看着不舒服,接下来我们就解决一下。步骤如下:
[root@linux-node01 ~]# cd /etc/kubernetes/manifests/
[root@linux-node01 manifests]# ls
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
[root@linux-node01 manifests]# sed -n '/--port/p' kube-scheduler.yaml kube-controller-manager.yaml
- --port=0
- --port=0
# 我们把上述两个配置注释掉,然后重启 kubelet 服务
[root@linux-node01 manifests]# sed -i 's@- --port@# &@g' kube-scheduler.yaml kube-controller-manager.yaml
# 验证是否注释成功
[root@linux-node01 manifests]# sed -n '/--port/p' kube-scheduler.yaml kube-controller-manager.yaml
# - --port=0
# - --port=0
接着重启 kubelet 服务,
[root@linux-node01 manifests]# systemctl restart kubelet.service
kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
Installing a Pod network add-on
Caution:
This section contains important information about networking setup and deployment order. Read all of this advice carefully before proceeding.
You must deploy a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed.
- Take care that your Pod network must not overlap with any of the host networks: you are likely to see problems if there is any overlap. (If you find a collision between your network plugin’s preferred Pod network and some of your host networks, you should think of a suitable CIDR block to use instead, then use that during
kubeadm init
with--pod-network-cidr
and as a replacement in your network plugin’s YAML).- By default,
kubeadm
sets up your cluster to use and enforce use of RBAC (role based access control). Make sure that your Pod network plugin supports RBAC, and so do any manifests that you use to deploy it.- If you want to use IPv6–either dual-stack, or single-stack IPv6 only networking–for your cluster, make sure that your Pod network plugin supports IPv6. IPv6 support was added to CNI in v0.6.0.
Note: Kubeadm should be CNI agnostic and the validation of CNI providers is out of the scope of our current e2e testing. If you find an issue related to a CNI plugin you should log a ticket in its respective issue tracker instead of the kubeadm or kubernetes issue trackers.
安装 Pod 网络插件(CNI)
Flannel can be added to any existing Kubernetes cluster though it’s simplest to add flannel
before any pods using the pod network have been started.
For Kubernetes v1.17+ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
(venv36) [root@node01 ~]# kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
namespace/kube-flannel created
serviceaccount/flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
再次查看状态:
(venv36) [root@node01 ~]# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5bbd96d687-gxps6 1/1 Running 0 10m
coredns-5bbd96d687-nz84v 1/1 Running 0 10m
etcd-node01.lavenliu.cn 1/1 Running 0 10m
kube-apiserver-node01.lavenliu.cn 1/1 Running 0 10m
kube-controller-manager-node01.lavenliu.cn 1/1 Running 0 10m
kube-proxy-zxxzq 1/1 Running 0 10m
kube-scheduler-node01.lavenliu.cn 1/1 Running 0 10m
(venv36) [root@node01 ~]# kubectl get po -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-98qst 1/1 Running 0 66s
如果是网络原因导致在Init状态,可以修改yaml文件,使用国内镜像源。
此时再次查看 Master 节点的状态:
(venv36) [root@node01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node01.lavenliu.cn Ready control-plane 11m v1.26.3
Node 节点加入集群
我们拿到 Master
节点初始化完成时的最后一个输出,放到 Node
节点进行执行即可。如下,
[root@node02 ~]# kubeadm join 192.168.56.145:6443 --token xaib2m.sur21ugc1gtv20n6 \
> --discovery-token-ca-cert-hash sha256:fb013945c83922934bb1264bd8c74322c24d2e323c4bee8091e81e8594506d04
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
NOTE:可以查看 token,命令如下:
kubeadm token create --print-join-command
当我们看到 kubectl get nodes
的输出时,就说明 Node
节点加入集群成功。
(venv36) [root@node01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node01.lavenliu.cn Ready control-plane 22m v1.26.3
node02.lavenliu.cn Ready <none> 112s v1.26.3
(venv36) [root@node01 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node01.lavenliu.cn Ready control-plane 21m v1.26.3 192.168.56.145 <none> CentOS Linux 7 (Core) 3.10.0-1160.88.1.el7.x86_64 containerd://1.6.20
node02.lavenliu.cn Ready <none> 94s v1.26.3 192.168.56.146 <none> CentOS Linux 7 (Core) 3.10.0-1160.88.1.el7.x86_64 containerd://1.6.20
使用 kubeadm join
命令时,其输出信息也是很有用的。如 "[kubelet-start] Writing kubelet configuration to file /var/lib/kubelet/config.yaml"
,我们看一下该文件的内容,
[root@node02 ~]# cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
我们再看看节点上的环境变量文件的内容,
[root@node02 ~]# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9"
再把第二台节点加入到集群中,
[root@node03 ~]# systemctl enable containerd --now
Created symlink from /etc/systemd/system/multi-user.target.wants/containerd.service to /usr/lib/systemd/system/containerd.service.
[root@node03 ~]# kubeadm join 192.168.56.145:6443 --token dff6se.6wy3zr0oqsnb8yln --discovery-token-ca-cert-hash sha256:fb013945c83922934bb1264bd8c74322c24d2e323c4bee8091e81e8594506d04
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
我们再看看集群中的节点情况,
(venv36) [root@node01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node01.lavenliu.cn Ready control-plane 24m v1.26.3
node02.lavenliu.cn Ready <none> 4m47s v1.26.3
node03.lavenliu.cn NotReady <none> 24s v1.26.3
(venv36) [root@node01 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5bbd96d687-gxps6 1/1 Running 0 26m
coredns-5bbd96d687-nz84v 1/1 Running 0 26m
etcd-node01.lavenliu.cn 1/1 Running 0 26m
kube-apiserver-node01.lavenliu.cn 1/1 Running 0 26m
kube-controller-manager-node01.lavenliu.cn 1/1 Running 0 26m
kube-proxy-n8gl9 1/1 Running 0 6m31s
kube-proxy-r5zvr 1/1 Running 0 2m8s
kube-proxy-zxxzq 1/1 Running 0 26m
kube-scheduler-node01.lavenliu.cn 1/1 Running 0 26m
(venv36) [root@node01 ~]# kubectl get pods -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-98qst 1/1 Running 0 17m
kube-flannel-ds-9pcfh 1/1 Running 0 6m51s
kube-flannel-ds-chq8t 1/1 Running 0 2m28s
这个时候 kubernetes 会使用 DaemonSet 在所有节点上都部署canal/flannel和kube-proxy。部署完毕之后节点即部署完毕。
(venv36) [root@node01 ~]# kubectl get ds --all-namespaces
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-flannel kube-flannel-ds 3 3 3 3 3 <none> 17m
kube-system kube-proxy 3 3 3 3 3 kubernetes.io/os=linux 27m
如何给 Node 添加 ROLES 标签?首先查看 Node 的标签:
(venv36) [root@node01 ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node01.lavenliu.cn Ready control-plane 27m v1.26.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node01.lavenliu.cn,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
node02.lavenliu.cn Ready <none> 7m23s v1.26.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node02.lavenliu.cn,kubernetes.io/os=linux
node03.lavenliu.cn Ready <none> 3m v1.26.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node03.lavenliu.cn,kubernetes.io/os=linux
接着增加标签:
[root@linux-node01 ~]# kubectl label nodes node02.lavenliu.cn node-role.kubernetes.io/node=
node/node02.lavenliu.cn labeled
[root@linux-node01 ~]# kubectl label nodes node03.lavenliu.cn node-role.kubernetes.io/node=
node/node03.lavenliu.cn labeled
再次查看:
(venv36) [root@node01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node01.lavenliu.cn Ready control-plane 28m v1.26.3
node02.lavenliu.cn Ready node 8m21s v1.26.3
node03.lavenliu.cn Ready node 3m58s v1.26.3
验证集群
通过上面的步骤,我们部署好了k8s集群,现在验证一下集群是否可用。我们在集群中创建一个 Nginx
的 Pod
,验证是否可以访问该 Nginx
服务。如下:
[root@linux-node01 ~]# kubectl create deployment nginx --image=nginx
deployment.apps/nginx created
[root@linux-node01 ~]# k get po
NAME READY STATUS RESTARTS AGE
nginx-6799fc88d8-4fxbx 0/1 ContainerCreating 0 26s
[root@linux-node01 ~]# kubectl expose deployment nginx --port=80 --type=NodePort
service/nginx exposed
[root@linux-node01 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-6799fc88d8-qm2hh 1/1 Running 0 78s 10.244.2.2 linux-node03.lavenliu.com <none> <none>
[root@linux-node01 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 68m
nginx NodePort 10.108.218.229 <none> 80:31557/TCP 36s
# 还可以把上述两个命令放一起执行
[root@linux-node01 ~]# kubectl get pod,svc
NAME READY STATUS RESTARTS AGE
pod/nginx-6799fc88d8-qm2hh 1/1 Running 0 2m53s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 69m
service/nginx NodePort 10.108.218.229 <none> 80:31557/TCP 102s
我们验证一下服务,
[root@linux-node01 ~]# curl 10.108.218.229
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
上面的 10.108.218.229
这个 IP 是集群内部地址,外面的用户是不能访问的。不过该 IP 上的 80
端口映射到了 192.168.56.101
上面的 31557
端口上了。我们通过 192.168.56.101:31557
即可访问该服务。如下,
我们还可以很方便地扩容上面的 Nginx
服务以实现高可用。我们现查看一下当前 Nginx
的副本数。如下:
[root@linux-node01 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-6799fc88d8-qm2hh 1/1 Running 0 4m41s
现在进行扩容操作,
[root@linux-node01 ~]# kubectl scale deployment nginx --replicas=3
deployment.extensions/nginx scaled
# 再次查看,另外两
[root@linux-node01 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-6799fc88d8-7h8z8 0/1 ContainerCreating 0 18s
nginx-6799fc88d8-qm2hh 1/1 Running 0 5m13s
nginx-6799fc88d8-s9f28 0/1 ContainerCreating 0 18s
可以看到另外两个副本的状态正在创建:ContainerCreating
。稍等片刻再查看应该都会处于 Running
的状态。
部署 Dashboard
[root@linux-node01 ~]# kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml
我们查看一下状态,
[root@linux-node01 ~]# k get po -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-856586f554-dlndc 0/1 ContainerCreating 0 65s
kubernetes-dashboard-78c79f97b4-56cxr 1/1 Running 0 65s
[root@linux-node01 ~]# k get po -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-856586f554-dlndc 1/1 Running 0 88s
kubernetes-dashboard-78c79f97b4-56cxr 1/1 Running 0 88s
查看 Dashboard
状态,
[root@linux-node01 ~]# kubectl get svc -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.104.76.155 <none> 8000/TCP 6m51s
kubernetes-dashboard ClusterIP 10.96.47.180 <none> 443/TCP 6m51s
创建service account并绑定默认cluster-admin管理员集群角色:
[root@linux-node01 ~]# cat dashboard-adminuser.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
执行上述文件:
[root@linux-node01 ~]# kubectl apply -f dashboard-adminuser.yaml
如果执行成功,会有如下提示,
serviceaccount/admin-user created
clusterrolebinding.rbac.authorization.k8s.io/admin-user created
接着我们查看 token
,
[root@linux-node01 ~]# kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}"
执行成功,会有如下输出,
eyJhbGciOiJSUzI1NiIsImtpZCI6IjVMTVlhbHZnUHByVm9xVnZBOU8zLU44TC1fQTFRYnp2bzdoR1ZfWVZGR1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyLXRva2VuLTVwZ2toIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI4MjFmOTljMS0xZjJiLTQ5OWYtYmQyZC0xNWIzMjQzYTYwODciLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.2QNUGHYAR3BBVk8n-WMweVbEExtpNBbhlr-La6R3dSDEGv_ADP422n5_gFHb0npnajEgdEcNzOiU1VKKXs5kZsuZRvb-3LB4XEgncovvjEhoCpD_i2j638VKY-ZfazDdrtu3LSDTd91sSbhiuwLtk6tveeiUVM1uBXSJ-iWS7dPyVJKrxlw-zSBZCWVEUBCJx52VHX-NDH61w8qMzEOhpIbQmVaVh1X2Gm2hNsNEQkxQtfIDf8pAHmvWU8ZXdoaPUqhdZnM7b9jcVnFWxGLYdRxz5OURPksuZhqRlNSuaChC27TipHHKglrQ9qSnuJszzKZ45WQ_len60EU8RFhnfA
开启外网访问:
[root@linux-node01 ~]# kubectl patch svc kubernetes-dashboard -p '{"spec":{"type":"NodePort"}}' -n kubernetes-dashboard
service/kubernetes-dashboard patched
查看 NodePort 端口,
[root@linux-node01 ~]# kubectl get svc -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.104.76.155 <none> 8000/TCP 50m
kubernetes-dashboard NodePort 10.96.47.180 <none> 443:32296/TCP 50m
使用输出的 token
登录 Dashboard
。使用 Firefox 访问地址:https://<NodeIP>:32296
。截图如下,
点击“确认安全例外”以进行下一步,输入我们的 token
,点击“登录”,
部署 Ingress Controller
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-eiPn3Mgm-1682498980442)(images/dia-MH-2021-04-06-NIC-for-KIC-03-no-legend-padding-LP-1400x515-1.svg)]
在每个节点上先执行如下拉取镜像的命令,因为在部署的时候会访问外网。或者 Containerd 配置代理也可以。
[root@linux-node01 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/kubernetes-fan/ingress-nginx:v0.48.1
[root@linux-node01 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/kubernetes-fan/ingress-nginx:v0.48.1 k8s.gcr.io/ingress-nginx/controller:v0.48.1
[root@linux-node01 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.aliyuncs.com/google_containers/kube-apiserver v1.21.3 3d174f00aa39 2 weeks ago 126MB
registry.aliyuncs.com/google_containers/kube-scheduler v1.21.3 6be0dc1302e3 2 weeks ago 50.6MB
registry.aliyuncs.com/google_containers/kube-controller-manager v1.21.3 bc2bb319a703 2 weeks ago 120MB
registry.aliyuncs.com/google_containers/kube-proxy v1.21.3 adb2816ea823 2 weeks ago 103MB
k8s.gcr.io/ingress-nginx/controller v0.48.1 ac0e4fe3e6b0 2 weeks ago 279MB
registry.cn-hangzhou.aliyuncs.com/kubernetes-fan/ingress-nginx v0.48.1 ac0e4fe3e6b0 2 weeks ago 279MB
quay.io/coreos/flannel v0.14.0 8522d622299c 2 months ago 67.9MB
registry.aliyuncs.com/google_containers/kube-apiserver v1.21.0 4d217480042e 3 months ago 126MB
registry.aliyuncs.com/google_containers/kube-proxy v1.21.0 38ddd85fe90e 3 months ago 122MB
registry.aliyuncs.com/google_containers/kube-controller-manager v1.21.0 09708983cc37 3 months ago 120MB
registry.aliyuncs.com/google_containers/kube-scheduler v1.21.0 62ad3129eca8 3 months ago 50.6MB
registry.aliyuncs.com/google_containers/pause 3.4.1 0f8457a4c2ec 6 months ago 683kB
coredns/coredns 1.8.0 296a6d5035e2 9 months ago 42.5MB
registry.aliyuncs.com/google_containers/coredns/coredns v1.8.0 296a6d5035e2 9 months ago 42.5MB
registry.aliyuncs.com/google_containers/coredns v1.8.0 296a6d5035e2 9 months ago 42.5MB
registry.aliyuncs.com/google_containers/etcd 3.4.13-0 0369cf4303ff 11 months ago 253MB
安装对应的支持版本,目前版本支持情况如下,
Ingress-nginx version | k8s supported version | Alpine Version | Nginx Version |
---|---|---|---|
v0.48.1 | 1.21, 1.20, 1.19 | 3.13.5 | 1.20.1 |
v0.47.0 | 1.21, 1.20, 1.19 | 3.13.5 | 1.20.1 |
v0.46.0 | 1.21, 1.20, 1.19 | 3.13.2 | 1.19.6 |
接着安装 Ingress Nginx Controller,
[root@linux-node01 ~]# wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.3/deploy/static/provider/baremetal/deploy.yaml
(venv368) [root@node01 ~]# curl -s https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.3/deploy/static/provider/baremetal/deploy.yaml -o ingress_deploy.yaml
接着应用该文件,
[root@linux-node01 ~]# k apply -f deploy.yaml
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
configmap/ingress-nginx-controller created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
service/ingress-nginx-controller-admission created
service/ingress-nginx-controller created
deployment.apps/ingress-nginx-controller created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
serviceaccount/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created
接着查看 ingress-nginx
名称空间下面的 pod 及 service,
[root@linux-node01 ~]# k get rs -n ingress-nginx
NAME DESIRED CURRENT READY AGE
ingress-nginx-controller-57c7885b48 1 1 1 3h14m
[root@node01 ingress]# k -n ingress-nginx get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
ingress-nginx-controller 1/1 1 1 69d
[root@node01 ingress]# k -n ingress-nginx get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-nginx-admission-create--1-plmgd 0/1 Completed 0 69d <none> node03.lavenliu.cn <none> <none>
ingress-nginx-admission-patch--1-nphx5 0/1 Completed 3 69d <none> node02.lavenliu.cn <none> <none>
ingress-nginx-controller-644555766d-tdj4w 1/1 Running 9 (13m ago) 69d 10.244.1.123 node02.lavenliu.cn <none> <none>
[root@linux-node01 ~]# k get pods -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --watch
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-mcggz 0/1 Completed 0 5m49s
ingress-nginx-admission-patch-pg2c4 0/1 Completed 0 5m49s
ingress-nginx-controller-57c7885b48-d9mbn 1/1 Running 0 5m49s
[root@linux-node01 ~]# k get svc -n ingress-nginx
NAME TYPE CLUSTER-IP PORT(S) AGE
ingress-nginx-controller NodePort 10.107.9.236 80:31724/TCP,443:31789/TCP 16m
ingress-nginx-controller-admission ClusterIP 10.102.147.105 443/TCP 16m
验证 Ingress Nginx 版本:
[root@node01 ~]# POD_NAMESPACE=ingress-nginx
[root@node01 ~]# POD_NAME=$(kubectl get pods -n $POD_NAMESPACE -l app.kubernetes.io/name=ingress-nginx --field-selector=status.phase=Running -o jsonpath='{.items[0].metadata.name}')
[root@node01 ~]#
[root@node01 ~]# kubectl exec -it $POD_NAME -n $POD_NAMESPACE -- /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.0.4
Build: 9b78b6c197b48116243922170875af4aa752ee59
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.9
-------------------------------------------------------------------------------
创建 Ingress 规则
这里我们使用《Kubernetes In Action》一书上的示例,如下:
[root@linux-node01 ~]# cat app.js
const http = require('http');
const os = require('os');
console.log("Kubia server starting...");
var handler = function(request, response) {
console.log("Received request from " + request.connection.remoteAddress);
response.writeHead(200);
response.end("You've hit " + os.hostname() + "\n");
};
var www = http.createServer(handler);
www.listen(80);
Dockerfile
内容为:
FROM node:7
ADD app.js /app.js
ENTRYPOINT ["node", "app.js"]
构建上述镜像,
[root@linux-node01 ~]# docker build -t lavenliu.cn/kubia:latest .
接下来就准备 Deployment
及 Service
,其配置文件 dep-svc-nginx-http-v1.yaml
,
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-dm
spec:
replicas: 2
selector:
matchLabels:
name: nginx-ingress-v1
template:
metadata:
labels:
name: nginx-ingress-v1
spec:
containers:
- name: kubia
image: lavenliu.cn/kubia:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress-v1-svc
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
name: nginx-ingress-v1
最后准备 Ingress 规则 ingress-nginx-http-v1.yaml
,
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx-ingress-http
spec:
ingressClassName: nginx
rules:
- host: web.lavenliu.cn
http:
paths:
- path: "/"
pathType: Prefix
backend:
service:
name: nginx-ingress-v1-svc
port:
number: 80
使用 apply
来应用上述文件,
[root@linux-node01 ~]# k apply -f dep-svc-nginx-http-v1.yaml ingress-nginx-http-v1.yaml
验证 Ingress 规则
我们前面创建了 Ingress 相关的资源,接下来就一一验证一下,
[root@linux-node01 ~]# k get deployments -n ingress-nginx
NAME READY UP-TO-DATE AVAILABLE AGE
ingress-nginx-controller 1/1 1 1 4h13m
[root@linux-node01 ~]# k get svc -n ingress-nginx
NAME TYPE CLUSTER-IP PORT(S) AGE
ingress-nginx-controller NodePort 10.107.9.236 80:31724/TCP,443:31789/TCP 16m
ingress-nginx-controller-admission ClusterIP 10.102.147.105 443/TCP 16m
# 注意上述 NodePort 端口
[root@linux-node01 ~]# k get pod -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create--1-bfw7m 0/1 Completed 0 6m4s
ingress-nginx-admission-patch--1-h7j76 0/1 Completed 0 6m4s
ingress-nginx-controller-6c68f5b657-hjkxt 1/1 Running 0 6m5s
[root@linux-node01 ~]# k get ingress -A
NAME CLASS HOSTS ADDRESS PORTS AGE
nginx-ingress-http nginx web.lavenliu.cn 192.168.56.102 80 23m # 注意这一行
[root@linux-node01 ~]# k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-6799fc88d8-2mjkn 1/1 Running 0 24h 10.244.1.8 linux-node02.lavenliu.com <none> <none>
nginx-6799fc88d8-njhkt 1/1 Running 0 24h 10.244.1.6 linux-node02.lavenliu.com <none> <none>
nginx-6799fc88d8-xvhzg 1/1 Running 0 24h 10.244.1.7 linux-node02.lavenliu.com <none> <none>
nginx-dm-6979f66f9b-fd76r 1/1 Running 0 15m 10.244.2.14 linux-node03.lavenliu.com <none> <none>
nginx-dm-6979f66f9b-phz8g 1/1 Running 0 15m 10.244.2.15 linux-node03.lavenliu.com <none> <none>
接下来就在 /etc/hosts
里设置静态解析,如下:
[root@linux-node01 ~]# echo "192.168.56.102 web.lavenliu.cn" >> /etc/hosts
使用 curl
验证是否可以访问:
[root@linux-node01 ~]# curl http://web.lavenliu.cn:31724/
You've hit nginx-dm-6979f66f9b-fd76r
[root@linux-node01 ~]# curl http://web.lavenliu.cn:31724/
You've hit nginx-dm-6979f66f9b-phz8g
附录
Kubernetes 常用命令
[root@linux-node01 ~]# kubectl version --short
Client Version: v1.22.2
Server Version: v1.22.0
[root@linux-node01 ~]# kubectl cluster-info
Kubernetes control plane is running at https://192.168.56.101:6443
CoreDNS is running at https://192.168.56.101:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# 也可以通过 proxy 登录 dashboard
kubectl proxy --address='192.168.56.101' --accept-hosts='你本机IP或其他IP'
Containerd 安装
每个节点上都进行操作。
[root@linux-node01 ~]# sudo yum -y remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
sudo yum install -y yum-utils
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
sudo yum -y install containerd.io
生产环境建议使用 Kubernetes 推荐的 Containerd 版本。
Docker 配置代理:
# http
[Service]
Environment="HTTP_PROXY=http://proxy.example.com:80/" "NO_PROXY=localhost,127.0.0.1,docker-registry.example.com,.corp"
# https
[Service]
Environment="HTTPS_PROXY=https://proxy.example.com:443/" "NO_PROXY=localhost,127.0.0.1,docker-registry.example.com,.corp"
sudo systemctl daemon-reload
sudo systemctl restart docker
systemctl show --property=Environment docker
Containerd 配置代理:
# cd /etc/systemd/system
# mkdir containerd.service.d
# cat http_proxy.conf
[Service]
Environment="HTTPS_PROXY=http://192.168.26.1:7890"
# systemctl daemon-reload
# systemctl restart containerd
cd /etc/systemd/system
mkdir containerd.service.d
cd containerd.service.d
cat >> http_proxy.conf <<EOF
[Service]
Environment="HTTPS_PROXY=http://192.168.26.1:7890"
EOF
systemctl daemon-reload
systemctl restart containerd
Kubernetes yum 源
每个节点都进行操作。
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
setenforce 0
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet
查看 yum 源,
# yum repolist
repo id repo name status
base/7/x86_64 CentOS-7 - Base 10,097
docker-ce-stable/x86_64 Docker CE Stable - x86_64 63
epel/x86_64 Extra Packages for Enterprise Linux 7 - x86_64 13,524
extras/7/x86_64 CentOS-7 - Extras 323
kubernetes Kubernetes 469
updates/7/x86_64 CentOS-7 - Updates 1,446
repolist: 26,059
# 可以看到每个仓库都是有软件包的
三台主机的网络接口情况
Master 节点,
[root@linux-node01 ~]# ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.1 netmask 255.255.255.0 broadcast 10.244.0.255
inet6 fe80::9065:73ff:fed4:7d84 prefixlen 64 scopeid 0x20<link>
ether 92:65:73:d4:7d:84 txqueuelen 1000 (Ethernet)
RX packets 368873 bytes 29680901 (28.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 371933 bytes 33909051 (32.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:66:ba:29:34 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.56.101 netmask 255.255.252.0 broadcast 10.20.23.255
inet6 fe80::f816:3eff:fe65:f9f0 prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:65:f9:f0 txqueuelen 1000 (Ethernet)
RX packets 68882167 bytes 9327511897 (8.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 65375269 bytes 9824709459 (9.1 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.0 netmask 255.255.255.255 broadcast 10.244.0.0
inet6 fe80::f4f7:f3ff:fe03:cb05 prefixlen 64 scopeid 0x20<link>
ether f6:f7:f3:03:cb:05 txqueuelen 0 (Ethernet)
RX packets 29526 bytes 6328985 (6.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 36190 bytes 18577459 (17.7 MiB)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 44051910 bytes 10815496721 (10.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 44051910 bytes 10815496721 (10.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth215a55f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::a479:b2ff:fe96:e058 prefixlen 64 scopeid 0x20<link>
ether a6:79:b2:96:e0:58 txqueuelen 0 (Ethernet)
RX packets 99664 bytes 9416583 (8.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 100486 bytes 9153824 (8.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth44b060c3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::4c19:48ff:fe2f:be93 prefixlen 64 scopeid 0x20<link>
ether 4e:19:48:2f:be:93 txqueuelen 0 (Ethernet)
RX packets 99620 bytes 9411937 (8.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 100549 bytes 9159048 (8.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Node01 节点,
[root@linux-node02 ~]# ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.1.1 netmask 255.255.255.0 broadcast 10.244.1.255
inet6 fe80::98f8:f2ff:fecd:971c prefixlen 64 scopeid 0x20<link>
ether 9a:f8:f2:cd:97:1c txqueuelen 1000 (Ethernet)
RX packets 172231 bytes 32573109 (31.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 194263 bytes 32407589 (30.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0
inet6 fe80::42:42ff:fead:33ac prefixlen 64 scopeid 0x20<link>
ether 02:42:42:ad:33:ac txqueuelen 0 (Ethernet)
RX packets 550 bytes 1016421 (992.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 712 bytes 74451 (72.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.56.102 netmask 255.255.252.0 broadcast 10.20.23.255
inet6 fe80::f816:3eff:fee8:fe6c prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:e8:fe:6c txqueuelen 1000 (Ethernet)
RX packets 72955598 bytes 11375346445 (10.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 69181348 bytes 7153485998 (6.6 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.1.0 netmask 255.255.255.255 broadcast 10.244.1.0
inet6 fe80::f48f:f5ff:fec4:b08b prefixlen 64 scopeid 0x20<link>
ether f6:8f:f5:c4:b0:8b txqueuelen 0 (Ethernet)
RX packets 18113 bytes 7243641 (6.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 12127 bytes 2071014 (1.9 MiB)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1 (Local Loopback)
RX packets 13651253 bytes 1499817454 (1.3 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 13651253 bytes 1499817454 (1.3 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth21cda578: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::c846:4ff:fe19:f345 prefixlen 64 scopeid 0x20<link>
ether ca:46:04:19:f3:45 txqueuelen 0 (Ethernet)
RX packets 57760 bytes 5686013 (5.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 61576 bytes 6248851 (5.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth613644be: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::fc1e:d5ff:fe71:a71d prefixlen 64 scopeid 0x20<link>
ether fe:1e:d5:71:a7:1d txqueuelen 0 (Ethernet)
RX packets 114437 bytes 29293468 (27.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 132669 bytes 26156699 (24.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth78f261f4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::1879:ecff:fec7:946f prefixlen 64 scopeid 0x20<link>
ether 1a:79:ec:c7:94:6f txqueuelen 0 (Ethernet)
RX packets 17 bytes 2192 (2.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 29 bytes 2435 (2.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth7cf52e29: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::58bf:a4ff:fe24:ed84 prefixlen 64 scopeid 0x20<link>
ether 5a:bf:a4:24:ed:84 txqueuelen 0 (Ethernet)
RX packets 1 bytes 42 (42.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14 bytes 816 (816.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth8d40caf9: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::fcaf:1fff:fe80:534 prefixlen 64 scopeid 0x20<link>
ether fe:af:1f:80:05:34 txqueuelen 0 (Ethernet)
RX packets 12 bytes 2460 (2.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 27 bytes 1884 (1.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Node02 节点,
[root@linux-node01 ~]# ssh 192.168.56.103 -- ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.2.1 netmask 255.255.255.0 broadcast 10.244.2.255
inet6 fe80::a8e9:8bff:fef9:5a6e prefixlen 64 scopeid 0x20<link>
ether aa:e9:8b:f9:5a:6e txqueuelen 1000 (Ethernet)
RX packets 204273 bytes 35880327 (34.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 230490 bytes 80225311 (76.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:78:09:3b:bd txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.56.103 netmask 255.255.252.0 broadcast 10.20.23.255
inet6 fe80::f816:3eff:feb8:f638 prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:b8:f6:38 txqueuelen 1000 (Ethernet)
RX packets 68392625 bytes 12186539270 (11.3 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 58419609 bytes 6423160525 (5.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.2.0 netmask 255.255.255.255 broadcast 10.244.2.0
inet6 fe80::846d:aff:febc:4a0b prefixlen 64 scopeid 0x20<link>
ether 86:6d:0a:bc:4a:0b txqueuelen 0 (Ethernet)
RX packets 25565 bytes 11723480 (11.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16594 bytes 4216241 (4.0 MiB)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1 (Local Loopback)
RX packets 16151762 bytes 1915650922 (1.7 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16151762 bytes 1915650922 (1.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth127cc3b8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::34ea:b2ff:fee9:3b8 prefixlen 64 scopeid 0x20<link>
ether 36:ea:b2:e9:03:b8 txqueuelen 0 (Ethernet)
RX packets 46619 bytes 6431933 (6.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 51052 bytes 16480192 (15.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth6dd91e0b: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::f6:b6ff:fe61:f5d0 prefixlen 64 scopeid 0x20<link>
ether 02:f6:b6:61:f5:d0 txqueuelen 0 (Ethernet)
RX packets 18 bytes 1796 (1.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 34 bytes 3556 (3.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth815d5023: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::b085:c0ff:fead:2cad prefixlen 64 scopeid 0x20<link>
ether b2:85:c0:ad:2c:ad txqueuelen 0 (Ethernet)
RX packets 18 bytes 1796 (1.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 33 bytes 3514 (3.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
部署 metrics server
[root@node01 ~]# kubectl \
apply \
-f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
[root@node01 ~]# k get apiservice
NAME SERVICE AVAILABLE AGE
v1. Local True 7d19h
v1.admissionregistration.k8s.io Local True 7d19h
v1.apiextensions.k8s.io Local True 7d19h
v1.apps Local True 7d19h
v1.authentication.k8s.io Local True 7d19h
v1.authorization.k8s.io Local True 7d19h
v1.autoscaling Local True 7d19h
v1.batch Local True 7d19h
v1.certificates.k8s.io Local True 7d19h
v1.coordination.k8s.io Local True 7d19h
v1.discovery.k8s.io Local True 7d19h
v1.events.k8s.io Local True 7d19h
v1.networking.k8s.io Local True 7d19h
v1.node.k8s.io Local True 7d19h
v1.policy Local True 7d19h
v1.rbac.authorization.k8s.io Local True 7d19h
v1.scheduling.k8s.io Local True 7d19h
v1.storage.k8s.io Local True 7d19h
v1beta1.batch Local True 7d19h
v1beta1.discovery.k8s.io Local True 7d19h
v1beta1.events.k8s.io Local True 7d19h
v1beta1.flowcontrol.apiserver.k8s.io Local True 7d19h
v1beta1.metrics.k8s.io kube-system/metrics-server False (MissingEndpoints) 25s
v1beta1.node.k8s.io Local True 7d19h
v1beta1.policy Local True 7d19h
v1beta1.storage.k8s.io Local True 7d19h
v2beta1.autoscaling Local True 7d19h
v2beta2.autoscaling Local True 7d19h
查看一下 metrics-server POD 的信息,
[root@node01 ~]# kubectl get pod -n kube-system|grep metrics-server
metrics-server-7b9c4d7fd9-wkwv2 0/1 ImagePullBackOff 0 2m30s
没有创建成功是因为镜像没有下载下来,修改上述配置文件,使用 docker.io 上面的镜像,
[root@node01 ~]# grep image components.yaml
- --kubelet-insecure-tls # 新增这一行,不使用 TLS
image: ccr.ccs.tencentyun.com/mirrors/metrics-server:v0.5.0 # 修改为这个镜像地址
imagePullPolicy: IfNotPresent
再次查看 POD 状态,
[root@node01 ~]# k -n kube-system get po |grep metrics
metrics-server-6b9759d4b-ss2ng 1/1 Running 0 2m18s
之后就可以查看集群的资源使用情况了,
[root@node01 ~]# k top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node01.lavenliu.cn 422m 21% 993Mi 26%
node02.lavenliu.cn 112m 5% 578Mi 15%
node03.lavenliu.cn 131m 6% 535Mi 14%
[root@node01 ~]# k top po
NAME CPU(cores) MEMORY(bytes)
nginx-6799fc88d8-zrtd5 0m 3Mi
参考文档:https://particule.io/en/blog/kubeadm-metrics-server/。
镜像加速
我们在做实验或工作中经常会遇到网络问题,导致拉取镜像速度慢或者无处拉取。遇到这种情况其实是非常有挫败感的。对于不能直接拉取的镜像我们采取间接的方式,以下是给出的解决方案,供大家参考使用。
另外可以使用代理上网。
docker.io 加速
如果使用阿里云的镜像加速服务,可以在阿里云控制台查看自己的加速地址,然后写进配置文件即可。如下:
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://hsdlmjc3.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
gcr.io 加速
k8s.gcr.io 加速
docker pull k8s.gcr.io/<image_name>:<version>
相当于 docker pull gcr.io/google-containers/<image_name>:<version>
,如果使用 Azure
中国镜像,应该是这样拉取:
docker pull gcr.azk8s.cn/google-containers/<image_name>:<version>
如果不行,就使用下面的方案,
# 原始命令
docker pull k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
# 使用国内第三方(网友)同步仓库
docker pull gcrxio/kubernetes-dashboard-amd64:v1.10.0
docker pull anjia0532/kubernetes-dashboard-amd64:v1.10.0
quay 加速
docker pull quay.azk8s.cn/coreos/flannel:v0.11.0-amd64
命令行补全
如果有需要,请先安装 bash-completion
包。
yum install bash-completion -y
然后在 ~/.bashrc
中设置:
echo 'source <(kubectl completion bash)' >>~/.bashrc
kubectl completion bash >/etc/bash_completion.d/kubectl
echo 'alias k=kubectl' >>~/.bashrc
echo 'complete -F __start_kubectl k' >>~/.bashrc
设置完成,退出当前 Bash,重新登陆即可。
Zsh
的设置如下:
source <(kubectl completion zsh)
echo 'alias k=kubectl' >>~/.zshrc
echo 'complete -F __start_kubectl k' >>~/.zshrc
排错
如果遇到问题,请及时查看 /var/log/messages
日志文件,或者在命令行使用如下命令查看,
[root@linux-node01 ~]# journalctl -xe
在配置稍微高点的虚拟机(2c8g 或 4c8g)上部署,非常的顺利。
CoreDNS 镜像问题
使用新版本的 Kubernetes 部署时,提示 CoreDNS 找不到镜像,那是因为 CoreDNS 改名字了。如何解决?
[root@linux-node01 ~]# docker pull coredns/coredns:1.8.0
[root@linux-node01 ~]# docker tag coredns/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
[root@linux-node01 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.aliyuncs.com/google_containers/kube-apiserver v1.21.3 3d174f00aa39 12 days ago 126MB
registry.aliyuncs.com/google_containers/kube-scheduler v1.21.3 6be0dc1302e3 12 days ago 50.6MB
registry.aliyuncs.com/google_containers/kube-proxy v1.21.3 adb2816ea823 12 days ago 103MB
registry.aliyuncs.com/google_containers/kube-controller-manager v1.21.3 bc2bb319a703 12 days ago 120MB
registry.aliyuncs.com/google_containers/kube-apiserver v1.21.0 4d217480042e 3 months ago 126MB
registry.aliyuncs.com/google_containers/kube-proxy v1.21.0 38ddd85fe90e 3 months ago 122MB
registry.aliyuncs.com/google_containers/kube-controller-manager v1.21.0 09708983cc37 3 months ago 120MB
registry.aliyuncs.com/google_containers/kube-scheduler v1.21.0 62ad3129eca8 3 months ago 50.6MB
registry.aliyuncs.com/google_containers/pause 3.4.1 0f8457a4c2ec 6 months ago 683kB
coredns/coredns 1.8.0 296a6d5035e2 9 months ago 42.5MB
registry.aliyuncs.com/google_containers/coredns v1.8.0 296a6d5035e2 9 months ago 42.5MB
registry.aliyuncs.com/google_containers/etcd 3.4.13-0 0369cf4303ff 11 months ago 253MB
Migrating to the systemd
driver
To change the cgroup driver of an existing kubeadm cluster to systemd
in-place, a similar procedure to a kubelet upgrade is required. This must include both steps outlined below.
Note: Alternatively, it is possible to replace the old nodes in the cluster with new ones that use the
systemd
driver. This requires executing only the first step below before joining the new nodes and ensuring the workloads can safely move to the new nodes before deleting the old nodes.
Modify the kubelet ConfigMap
-
Find the kubelet ConfigMap name using
kubectl get cm -n kube-system | grep kubelet-config
.[root@linux-node01 ~]# kubectl get cm -n kube-system | grep kubelet-config kubelet-config-1.21 1 22h
-
Call
kubectl edit cm kubelet-config-x.yy -n kube-system
(replacex.yy
with the Kubernetes version).[root@linux-node01 ~]# kubectl edit cm kubelet-config-1.21 -n kube-system
-
Either modify the existing
cgroupDriver
value or add a new field that looks like this:cgroupDriver: systemd
This field must be present under the
kubelet:
section of the ConfigMap.
Update the cgroup driver on all nodes
For each node in the cluster:
-
Drain the node using
kubectl drain <node-name> --ignore-daemonsets
[root@linux-node01 ~]# kubectl drain linux-node02.lavenliu.com --ignore-daemonsets node/linux-node02.lavenliu.com cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-8dbv6, kube-system/kube-proxy-mtxpb evicting pod default/nginx-6799fc88d8-s9f28 evicting pod default/nginx-6799fc88d8-7h8z8 pod/nginx-6799fc88d8-7h8z8 evicted pod/nginx-6799fc88d8-s9f28 evicted node/linux-node02.lavenliu.com evicted [root@linux-node01 ~]# k get nodes NAME STATUS ROLES AGE VERSION linux-node01.lavenliu.com Ready control-plane,master 22h v1.21.3 linux-node02.lavenliu.com Ready,SchedulingDisabled node 21h v1.21.3 linux-node03.lavenliu.com Ready node 21h v1.21.3 [root@linux-node01 ~]# k get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6799fc88d8-pnvlt 1/1 Running 0 5m38s 10.244.2.5 linux-node03.lavenliu.com <none> <none> nginx-6799fc88d8-qm2hh 1/1 Running 0 21h 10.244.2.2 linux-node03.lavenliu.com <none> <none> nginx-6799fc88d8-wg72c 1/1 Running 0 5m38s 10.244.2.6 linux-node03.lavenliu.com <none> <none> # 可以看到原来的 Pods 都已经调度到其他节点上了
-
Stop the kubelet using
systemctl stop kubelet
-
Stop the container runtime
-
Modify the container runtime cgroup driver to
systemd
[root@linux-node02 ~]# cat /etc/docker/daemon.json { "registry-mirrors": ["https://hsdlmjc3.mirror.aliyuncs.com"], "exec-opts": ["native.cgroupdriver=systemd"] # 增加这一行 }
-
Set
cgroupDriver: systemd
in/var/lib/kubelet/config.yaml
-
Start the container runtime
[root@linux-node02 ~]# systemctl start docker [root@linux-node02 ~]# docker info |grep Cgroup Cgroup Driver: systemd Cgroup Version: 1
-
Start the kubelet using
systemctl start kubelet
-
Uncordon the node using
kubectl uncordon <node-name>
[root@linux-node01 ~]# kubectl uncordon linux-node02.lavenliu.com node/linux-node02.lavenliu.com uncordoned [root@linux-node01 ~]# k get nodes NAME STATUS ROLES AGE VERSION linux-node01.lavenliu.com Ready control-plane,master 22h v1.21.3 linux-node02.lavenliu.com Ready node 21h v1.21.3 linux-node03.lavenliu.com Ready node 21h v1.21.3
Execute these steps on nodes one at a time to ensure workloads have sufficient time to schedule on different nodes.
Once the process is complete ensure that all nodes and workloads are healthy. 查看集群组件状态,
[root@linux-node01 ~]# k get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}