ubuntu20.04安装k8sv1.26完整篇

0X码上链

已于 2024-09-04 14:51:11 修改

阅读量1k

点赞数 9

分类专栏：容器技术 k8s 分布式技术文章标签： kubernetes 容器云原生

于 2024-09-04 11:23:43 首次发布

本文链接：https://blog.csdn.net/wzygis/article/details/141887883

版权

分布式技术同时被 3 个专栏收录

33 篇文章 0 订阅

订阅专栏

容器技术

19 篇文章 3 订阅

订阅专栏

k8s

11 篇文章 0 订阅

订阅专栏

本文详细介绍了在 Ubuntu 20.04 上安装 Kubernetes 1.26.3-00 的步骤，包括环境配置、主机设置、kubeadm、kubectl 和 kubelet 的安装，以及containerd的配置。还涉及了集群初始化、节点加入、 Helm 安装、网络插件Calico的部署和coredns问题的排查与解决，为读者提供了一套完整的K8s集群搭建流程。

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.5 LTS
Release:	20.04
Codename:	focal

2个节点主机名分别是node01、node02
添加主机名解析
node01和node02都做解析
写到/etc/hosts文件中

192.168.30.4 node01
192.168.30.5 node02

规划

node01作为主节点
node02作为从节点

主机配置

说明：每个节点都执行。
#设置iptables设置，使其能够看到桥接流量

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
ip_vs
ip_vs_wrr
ip_vs_sh
ip_vs_rr
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF

应用参数

sudo sysctl --system

安装kubeadm kubectl kubelet
说明：所有节点都操作。
标题配置阿里k8s源

curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes.gpg

写入软件源列表

echo "deb [signed-by=/etc/apt/keyrings/kubernetes.gpg] https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

安装

sudo apt update
sudo apt install -y kubelet=1.26.3-00 kubeadm=1.26.3-00 kubectl=1.26.3-00

说明：这条命令执行的前提是containerd必须是启动的。
检查一下拉下来的pause镜像的版本，确保和 /etc/containerd/config.toml 里面sandbox_image的版本一致。

sudo kubeadm config images pull --image-repository=registry.aliyuncs.com/google_containers

配置crictl工具
说明：只在master节点（node01）操作即可，如有需要，所有节点都操作。
该工具跟docker命令差不多，也可用来查看管理containerd的镜像。

cat <<EOF> /etc/crictl.yaml 
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

安装配置containerd
说明：所有节点都操作。

问题排查：

k8s执行crictl images报错-CSDN博客

安装

apt-get  install containerd

配置部分参考官网截止本文发表前这个地址有效，如果无效，不用在意，照着本文配置即可。
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd
生成默认配置
注意：该配置文件默认没有，需要生成一份默认配置，在此基础上更改。

mkdir /etc/containerd
containerd config default > /etc/containerd/config.toml

修改cgroups
找到/etc/containerd/config.toml配置中如下配置，将SystemdCgroup值改为true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

确保/etc/containerd/config.toml配置文件中disabled_plugins列表中没有cri，如果有删除掉。如下

disabled_plugins = []

修改/etc/containerd/config.toml配置文件中sandbox_image的值，该值要执行命令
kubeadm config images pull --image-repository=registry.aliyuncs.com/google_containers获取。

sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"

containerd加入开机自启和启动

systemctl enable containerd
systemctl start containerd
systemctl restart containerd

初始化集群

说明：master节点操作。

kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.26.3 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all

root@ubuntu:~# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.26.3 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all
[init] Using Kubernetes version: v1.26.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ubuntu] and IPs [10.96.0.1 192.168.39.6]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost ubuntu] and IPs [192.168.39.6 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost ubuntu] and IPs [192.168.39.6 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s

如果出现错误：

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

解决办法：

kubernetes新版本使用kubeadm init的相关问题和解决方法_寻梦者的技术博客_51CTO博客

查看加入集群命令
说明：只在master节点（node01）操作。
执行完kubeadm init命令成功后，会有加入集群命令输出，如果忘记了，执行如下命令即可。

kubeadm token create --print-join-command

kubeadm join 192.168.30.4:6443 --token ftb7lz.919ch4z4h6yqihqp --discovery-token-ca-cert-hash sha256:7a9fc662d5bfb999a6551235f30f1f76f277abc40a0a9bf7fd3381670fe1fc98

配置KUBECONFIG
说明：在想要操作k8s集群的节点加，本文加在了master（node01）节点上。

echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bashrc
source ~/.bashrc

加入集群

只在node02操作。如果后续有机器加入，先执行所有节点都操作的命令后，再加入集群即可。
在master节点（node01）执行kubeadm token create --print-join-command命令查看加入集群的命令。
执行如下命令加入集群

kubeadm join 192.168.30.4:6443 --token ftb7lz.919ch4z4h6yqihqp --discovery-token-ca-cert-hash sha256:7a9fc662d5bfb999a6551235f30f1f76f277abc40a0a9bf7fd3381670fe1fc98

如果成功，输出类似如下

[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

安装helm

说明：在master节点（node01）执行。也可以在能操作k8s的任意节点安装。
参考helm官网
https://helm.sh/docs/intro/install/
下载二进制安装包

wget https://get.helm.sh/helm-v3.11.3-linux-amd64.tar.gz

解压安装包

tar zxvf helm-v3.11.3-linux-amd64.tar.gz

复制helm二进制命令

cp linux-amd64/helm /usr/local/bin/

确认安装成功

helm version

如果输出类似如下内容说明安装成功。

version.BuildInfo{Version:"v3.11.3", GitCommit:"323249351482b3bbfc9f5004f65d400aa70f9ae7", GitTreeState:"clean", GoVersion:"go1.20.3"}

部署网络插件
说明：在master节点（node01）执行。
本文采用helm方式安装

先添加helm源

helm repo add projectcalico https://projectcalico.docs.tigera.io/charts

安装完helm源后更新下

helm repo update 

helm search repo projectcalico

输出

NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
projectcalico/tigera-operator	v3.25.1      	v3.25.1    	Installs the Tigera operator for Calico

下载calico
注意版本和源中的可以不一致。另外k8s和calico有版本对应关系，详情参考官网
https://docs.tigera.io/calico/latest/getting-started/kubernetes/requirements

helm pull  projectcalico/tigera-operator --version v3.25.1

解压calico包

tar zxvf tigera-operator-v3.25.1.tgz

安装calico

helm install calico -n kube-system --create-namespace -f tigera-operator/values.yaml tigera-operator

如果报错如下，是因为未设置KUBECONFIG。

Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp 127.0.0.1:8080: connect: connection refused

解决

export KUBECONFIG=/etc/kubernetes/admin.conf

再次执行安装

helm install calico -n kube-system --create-namespace -f tigera-operator/values.yaml tigera-operator

输出如下表示helm执行成功。

NAME: calico
LAST DEPLOYED: Sat May 13 19:26:37 2023
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

默认会将calico安装在calico-system名称空间中。查看该名称空间的pod状态都正常即可，如果出现问题，查看我的另外一篇博文calico网络问题排查

kubectl get pod -n calico-system
NAME                                       READY   STATUS             RESTARTS   AGE
calico-kube-controllers-6bb86c78b4-p4hmv   1/1     Running            0          86m
calico-node-gzwwd                          1/1     Running            0          86m
calico-node-k2vkc                          1/1     Running            0          86m
calico-typha-674597d59d-4dknd              1/1     Running            0          86m
csi-node-driver-cwwf2                      0/2     ImagePullBackOff   0          86m
csi-node-driver-k9lkh                      1/2     ImagePullBackOff   0          86m

安装calicoctl
说明：没个节点都安装。
如果ubuntu无法访问github，安装可能失败，解决方法如下，打开windows cmd，ping gitbub.com
ping objects.githubusercontent.com，看输出的ip地址分别是什么，如下图

在ubuntu上/etc/hosts中配置域名解析

20.205.243.166 github.com
185.199.111.133 objects.githubusercontent.com

下载calicoctl客户端工具，注意要和服务端版本一致。我的服务端版本是3.25.1
下载命令
说明:每个节点都下载。

curl -L https://github.com/projectcalico/calico/releases/latest/download/calicoctl-linux-amd64 -o calicoctl-v3.25.1

复制命令到PATH路径下

cp calicoctl-v3.25.1 /usr/local/bin/calicoctl

授予执行权限

chmod +x /usr/local/bin/calicoctl

在node01、mode02分别执行如下命令确认calico安装成功。
注意PEER ADDRESS的值必须是对端的ip地址，否则calico-node的状态是0/1 running（不正常）。

calicoctl node status

node01输出

Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.30.5 | node-to-node mesh | up    | 11:29:01 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

node02输出

Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.30.4 | node-to-node mesh | up    | 11:29:00 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

coredns状态异常排查

kubectl logs -f  coredns-5bbd96d687-zcz4q -n kube-system

[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.9.3
linux/amd64, go1.18.2, 45b0a11
[FATAL] plugin/loop: Loop (127.0.0.1:35209 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 5731636628710972604.1610418445198542983."

日志中已经有了定位这个问题的网址
https://coredns.io/plugins/loop/#troubleshooting
根据官网说法
CoreDNS日志包含消息循环时的故障排除…发现…这意味着循环检测插件已经在一个上游DNS服务器中检测到无限转发循环。这是一个致命的错误，因为无限循环操作将消耗内存和CPU，直到主机最终内存不足而死亡。转发循环通常由以下原因引起:最常见的是，CoreDNS将请求直接转发给自己。例如经由诸如127.0.0.1、:1或127.0.0.53的回送地址，CoreDNS转发到上游服务器，上游服务器又将请求转发回CoreDNS。要解决此问题，请在Corefile中查找检测到环路的区域的任何转发。确保它们没有转发到本地地址或另一个将请求转发回CoreDNS的DNS服务器。如果forward使用文件(例如/etc/resolv.conf)，请确保该文件不包含本地地址。
当部署在Kubernetes中的CoreDNS Pod检测到环路时，CoreDNS Pod将开始“CrashLoopBackOff”。这是因为每当CoreDNS检测到循环并退出时，Kubernetes都会尝试重新启动Pod。

Kubernetes集群中转发循环的一个常见原因是与主机节点上的本地DNS缓存的交互(例如systemd-resolved)。例如，在某些配置中，systemd-resolved会将环回地址127.0.0.53作为名称服务器放入/etc/resolv.conf。Kubernetes(通过kubelet)默认情况下会将此/etc/resolv . conf文件传递给所有使用默认DNS策略的pod，使它们无法进行DNS查找(这包括CoreDNS Pods)。CoreDNS使用/etc/resolv.conf作为转发请求的上游列表。由于它包含一个回送地址，CoreDNS最终会将请求转发给自己。

有许多方法可以解决这个问题，下面列出了一些方法:

将以下内容添加到kubelet配置yaml中:resolv conf:< path-to-your-real-resolv-conf-file >(或通过命令行flag - resolv-conf，在1.10中已弃用)。“真正的”resolv.conf包含上游服务器的实际IP地址，而没有本地/环回地址。这个标志告诉kubelet将一个替代的resolv.conf传递给Pods。对于使用systemd-resolved的系统，/run/systemd/resolve/resolv.conf通常是“真正的”resolv . conf的位置，尽管这可能因您的发行版而异。
禁用主机节点上的本地DNS缓存，并将/etc/resolv.conf恢复到原始状态。
一个快速和肮脏的修复是编辑您的核心文件，取代前进。/etc/resolv.conf带有您的上游DNS的IP地址，例如forward。8.8.8.8 .但这只是修复了CoreDNS的问题，kubelet会继续将无效的resolv.conf转发给所有默认的dnsPolicy Pods，让它们无法解析DNS.

coredns问题解决
cat /etc/netplan/00-installer-config.yaml
增加

nameservers:
addresses: [114.114.114.114]
修改后的/etc/netplan/00-installer-config.yaml文件内容

network:
  ethernets:
    ens33:
      addresses: [192.168.30.4/24]
      routes:
      - to: "default"
        via: "192.168.30.2"
      nameservers:
        addresses: [114.114.114.114]

应用配置

netplan apply
删除coredns的pod，等待重启即可。
仅仅配置好了配置文件执行sudo netplan apply是不能生效的，需要将配置软连接指向/etc/resolv.conf，参考这里.
具体操作（本文未做这个操作）
1、sudo rm -rf /etc/resolv.conf
2、sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf
确认集群状态
pod全部running即可。

kubectl get pod -A
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE
calico-apiserver   calico-apiserver-565d577889-blchx          1/1     Running   0          105m
calico-apiserver   calico-apiserver-565d577889-qwkwz          1/1     Running   0          105m
calico-system      calico-kube-controllers-6bb86c78b4-p4hmv   1/1     Running   0          110m
calico-system      calico-node-gzwwd                          1/1     Running   0          110m
calico-system      calico-node-k2vkc                          1/1     Running   0          110m
calico-system      calico-typha-674597d59d-4dknd              1/1     Running   0          110m
calico-system      csi-node-driver-cwwf2                      2/2     Running   0          110m
calico-system      csi-node-driver-k9lkh                      2/2     Running   0          110m
kube-system        coredns-5bbd96d687-pxm2v                   1/1     Running   0          6m1s
kube-system        coredns-5bbd96d687-vqjfr                   1/1     Running   0          3m23s
kube-system        etcd-node01                                1/1     Running   0          162m
kube-system        kube-apiserver-node01                      1/1     Running   0          162m
kube-system        kube-controller-manager-node01             1/1     Running   0          162m
kube-system        kube-proxy-gw294                           1/1     Running   0          130m
kube-system        kube-proxy-pkq42                           1/1     Running   0          162m
kube-system        kube-scheduler-node01                      1/1     Running   0          162m
kube-system        tigera-operator-5d6845b496-n6cq4           1/1     Running   0          110m

0X码上链

关注

9
点赞
踩
13

收藏

觉得还不错? 一键收藏
打赏
0
评论
ubuntu20.04安装k8sv1.26完整篇

本文详细介绍了在 Ubuntu 20.04 上安装 Kubernetes 1.26.3-00 的步骤，包括环境配置、主机设置、kubeadm、kubectl 和 kubelet 的安装，以及containerd的配置。还涉及了集群初始化、节点加入、 Helm 安装、网络插件Calico的部署和coredns问题的排查与解决，为读者提供了一套完整的K8s集群搭建流程。2个节点主机名分别是node01、node02添加主机名解析node01和node02都做解析写到/etc/hosts文件中规划。
复制链接

扫一扫