Kubernetes安装及使用踩坑填坑大全【详细步骤】

马尔科夫司机

已于 2023-06-10 12:35:12 修改

阅读量6.3k

点赞数 5

分类专栏： kubernetes 文章标签： kubernetes 容器云原生

于 2021-12-23 12:29:29 首次发布

本文链接：https://blog.csdn.net/marlinlm/article/details/122095760

版权

kubernetes 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

本文记录了Kubernetes部署过程中遇到的五大常见问题及其解决方案，包括kubeadminit命令执行失败、连接控制平面失败、创建Pod失败、kubeadmjoin预检查警告以及安装Flannel后CoreDNS异常等问题。

摘要由CSDN通过智能技术生成

坑1：kubeadm init命令执行失败，无法连接k8s.gcr.io库

执行kubeadmin init 命令：

sudo kubeadm init --pod-network-cidr=10.244.0.0/16

出现错误：

error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.23.1: output: Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.23.1: output: Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.23.1: output: Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.23.1: output: Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.6: output: Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.5.1-0: output: Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns/coredns:v1.8.6: output: Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1

解决：

kubeadmin init命令增加参数 --image-repository registry.aliyuncs.com/google_containers

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --image-repository registry.aliyuncs.com/google_containers

坑2：kubeadmin init命令无法连接control plane

执行命令：

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --image-repository registry.aliyuncs.com/google_containers

出现错误：

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in docker:
                - 'docker ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

解决：

1、修改daemon.json

sudo vim /etc/docker/daemon.json

{
"exec-opts": ["native.cgroupdriver=systemd"]
}

2、重启docker

systemctl daemon-reload
systemctl restart docker

完成。

坑3：kubeadm init之后，创建新pod出现failed to set bridge addr: "cni0" already has an IP address different from XXXX的问题。

kubeadm init之后，想创建flannel网络插件，发现pod一直无法启动成功。使用kubectl describe pod命令查看pod情况。

Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Warning  FailedScheduling        12m                    default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled               12m                    default-scheduler  Successfully assigned kube-system/coredns-6d8c4cb4d-6pd2b to debian-1
  Warning  FailedCreatePodSandBox  12m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f9b5c6a4da4db3b43e00ae71baa2388f8f45c2001e5615aae861c6208ad85de1" network for pod "coredns-6d8c4cb4d-6pd2b": networkPlugin cni failed to set up pod "coredns-6d8c4cb4d-6pd2b_kube-system" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.0.0.1/24
  Warning  FailedCreatePodSandBox  12m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "014675b219aa5c6abaa173ccfd5edb23258b2af3ce43ac8a186cac3db040046d" network for pod "coredns-6d8c4cb4d-6pd2b": networkPlugin cni failed to set up pod "coredns-6d8c4cb4d-6pd2b_kube-system" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.0.0.1/24

解决：删除cni0网卡

sudo ifconfig cni0 down    
sudo ip link delete cni0

坑4：kubeadm join 命令pre-flight check warning: [WARNING FileExisting-ebtables]: ebtables not found in system path

[WARNING FileExisting-ebtables]: ebtables not found in system path

安装的时候却说包已经安装了

Reading package lists... Done
Building dependency tree
Reading state information... Done
ebtables is already the newest version (2.0.10.4+snapshot20181205-3).
ebtables set to manually installed.
ethtool is already the newest version (1:4.19-1).
ethtool set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.

解决：前边加sudo

sudo kubeadm join

坑5：安装网络插件flannel后coredns一直重启，无法正常运行。pod之间也无法进行网络传输。

linmao@debian-1:~/kubernetes$ sudo kubectl get pods -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE
kube-system   coredns-6d8c4cb4d-fjnzt            0/1     Running   2 (56s ago)   4m46s
kube-system   coredns-6d8c4cb4d-vl54c            0/1     Running   2 (58s ago)   4m46s
kube-system   etcd-debian-1                      1/1     Running   6             4m57s
kube-system   kube-apiserver-debian-1            1/1     Running   6             4m57s
kube-system   kube-controller-manager-debian-1   1/1     Running   2             4m57s
kube-system   kube-flannel-ds-btn8q              1/1     Running   0             15s
kube-system   kube-proxy-9cz5g                   1/1     Running   0             4m46s
kube-system   kube-scheduler-debian-1            1/1     Running   6             4m57s
linmao@debian-1:~/kubernetes$ sudo kubectl logs -f coredns-6d8c4cb4d-fjnzt -n kube-system
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:36120->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:44455->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:47435->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:38028->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:47276->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:54208->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:58839->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:35322->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:52081->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 2333562686986131856.1223269804745438122. HINFO: read udp 10.0.0.2:33026->192.168.1.1:53: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

解决：

直接kubectl delete pod删除这两个coredns让他重启就行了。

其他坑的链接：

kubectl describe pod 里边没有看到events问题解决_marlinlm的博客-CSDN博客

访问k8s集群出现Unable to connect to the server: x509: certificate is valid for xxx, not xxx问题解决

通过kubeadm join 为k8s集群增加节点出错 couldn‘t validate the identity of the API Server

启动容器时incompatible CNI versions；config is \“1.0.0\“, plugin supports [\“0.1.0\“ \“0.2.0\“...]问题解决