设备
master0:172.16.142.131
work0:172.16.142.132
系统版本
不同环境下可能会有细微差距
ubuntu:22.04.1
docker:20.10.21
kubeadm:1.27.3
kubelet:1.27.3
环境准备
这些操作不管master还是work都需要做,这是kubernetes依赖基本环境要求
1.1 配置hostname,host文件
$ sudo vim /etc/hosts 例如:在配置文件中插入 172.16.142.132 work0
1.2 配置内核转发以及网桥过滤
$ cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
$ sudo sysctl --system
1.3 关闭swap分区
禁用交换分区。为了保证 kubelet 正常工作,你必须禁用交换分区。
例如,sudo swapoff -a 将暂时禁用交换分区。要使此更改在重启后保持不变,请确保在如 /etc/fstab、systemd.swap 等配置文件中禁用交换分区,具体取决于你的系统如何配置。
$ sudo swapoff -a
$ sudo sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab
- 安装docker
2.1 使用阿里云安装docker
在这里我使用的是系统自带的不需要设置镜像源http://cn.archive.ubuntu.com/ubuntu/
$ sudo apt update
$ sudo apt -y install apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
$ sudo apt -y update
$ sudo apt install docker.io
2.2 配置docker的cgroup为systemd
cat <<EOF | sudo vim /etc/docker/daemon.json
{
"exec-opts":["native.cgroupdriver=systemd"]
}
EOF
- 安装容器运行时cri-docker
kubernetes在1.24版本之后就不再默认支持docker的容器运行时,需要自行安装配置,这里采用deb包进行安装
$ wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.3/cri-dockerd_0.3.3.3-0.ubuntu-jammy_amd64.deb
$ sudo dpkg -i cri-dockerd_0.3.3.3-0.ubuntu-jammy_amd64.deb
/**
对ExecStart选项做出修改
修改配置文件指定pause为阿里源
**/
$ sudo vim /lib/systemd/system/cri-docker.service
ExecStart=/usr/bin/cri-dockerd --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9 --container-runtime-endpoint fd://
$ sudo systemctl daemon-reload
$ sudo systemctl start cri-docker
- 安装kubeadm kubelet kubectl
$ sudo curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
$ cat <<EOF | sudo vim /etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
$ sudo apt update
$ sudo apt install kubeadm kubelet kubectl
//最好按照官方的要求锁定版本否则更新容易出问题
$ sudo apt-mark hold kubeadm kubelet kubectl
master机器安装kubelet
$ sudo kubeadm init --apiserver-advertise-address=172.16.142.131
--pod-network-cidr=10.10.0.0/16
--image-repository=registry.aliyuncs.com/google_containers
--cri-socket=unix:///run/cri-dockerd.sock
apiserver-advertise-addres:指定当前网络/网卡的ip地址
pod-network-cidr:pod-network-cidr
image-repository=registry.aliyuncs.com/google_containers:指定镜像为阿里云
cri-socket:指定容器运行时端点
运行时 | Unix域套接字 |
---|---|
containerd | unix:///var/run/containerd/containerd.sock |
CRI-O | unix:///var/run/crio/crio.sock |
Docker Engine (使用 cri-dockerd) | unix:///var/run/cri-dockerd.sock |
成功后记得看打印到控制台的日志,他会要求你做一些额外操作
例如:
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
以及保存后面打印出来的
kubeadm join 172.16.142.131:6443 --token d6mdoq.jbajudxpjyqof1a9 \
--discovery-token-ca-cert-hash sha256:dee5a5dd23209d7d24788210725838c5227c4a38a04e63efb7bd553d619ebfcc
此时可以使用kubectl get pod -n kube-system查询各pod的状态
NAME READY STATUS RESTARTS AGE
coredns-7bdc4cb885-ngf86 0/1 Pending 0 21h
coredns-7bdc4cb885-zsjhx 0/1 Pending 0 21h
etcd-master0 1/1 Running 0 21h
kube-apiserver-master0 1/1 Running 0 21h
kube-controller-manager-master0 1/1 Running 35 (84m ago) 21h
kube-proxy-2q9hd 1/1 Running 0 21h
kube-scheduler-master0 1/1 Running 45 (15m ago) 21h
coredns起不来是因为还未安装网络插件
work节点
输入master0节点输出的
sudo kubeadm join 172.16.142.131:6443 --token d6mdoq.jbajudxpjyqof1a9 --discovery-token-ca-cert-hash sha256:dee5a5dd23209d7d24788210725838c5227c4a38a04e63efb7bd553d619ebfcc
后出现
Found multiple CRI endpoints on the host. Please define which one do you wish to use by setting the 'criSocket' field in the kubeadm configuration file: unix:///var/run/containerd/containerd.sock, unix:///var/run/cri-dockerd.sock
To see the stack trace of this error execute with --v=5 or higher
这是因为当前work节点上有多个cri需要指定到cri-docker上
sudo kubeadm join 172.16.142.131:6443
--token d6mdoq.jbajudxpjyqof1a9
--discovery-token-ca-cert-hash sha256:dee5a5dd23209d7d24788210725838c5227c4a38a04e63efb7bd553d619ebfcc
--cri-socket=unix:///run/cri-dockerd.sock
如果出现错误可以使用sudo kubeadm reset 或者 sudo kubeadm reset
–cri-socket=unix:///run/cri-dockerd.sock 重置
成功
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
最后告诉你可以在master节点上查看该节点
安装网络插件flannel
- 可以在这个地址下找到https://github.com/flannel-io/flannel/releases/tag/v0.22.0
下载下来后对内容进行修改
vim kube-flannel.yml
我们需要修改net-conf.json部分这里需要将network修改为我们初始化的时候指定的pod-network-cidr的值
也就是10.10.0.0/16
{ "Network": "10.10.0.0/16", "Backend": { "Type": "vxlan" }
- 需要将顶部的创建工作空间的部分删除或者注释
``
# apiVersion: v1
##kind: Namespace
#metadata:
# labels:
# k8s-app: flannel
# pod-security.kubernetes.io/enforce: privileged
# name: kube-flannel
3.将下面其他指定工作空间为kube-lannel的地方指定到kube-system,例如:
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-flannel
- 根据yml创建pod
kubectl apply -n kube-system
查询pod后发现有2个flannel其中一个的状态是Init:0/2
$ kubectl describe pod kube-flannel-ds-bswzr -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned kube-system/kube-flannel-ds-bswzr to work0
Warning FailedCreatePodSandBox 16m (x8 over 21m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "registry.k8s.io/pause:3.6": Error response from daemon: Head "https://asia-northeast1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.6": dial tcp 74.125.204.82:443: i/o timeout
Warning FailedCreatePodSandBox 6m20s (x14 over 15m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "registry.k8s.io/pause:3.6": Error response from daemon: Head "https://asia-northeast1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.6": dial tcp 108.177.97.82:443: i/o timeout
Warning FailedCreatePodSandBox 4m34s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "registry.k8s.io/pause:3.6": Error response from daemon: Head "https://asia-northeast1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.6": dial tcp 142.251.170.82:443: connect: connection refused
Warning FailedCreatePodSandBox 4m19s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "registry.k8s.io/pause:3.6": Error response from daemon: Get "https://registry.k8s.io/v2/": dial tcp: lookup registry.k8s.io: Temporary failure in name resolution
Warning FailedCreatePodSandBox 82s (x6 over 5m38s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "registry.k8s.io/pause:3.6": Error response from daemon: Head "https://asia-northeast1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.6": dial tcp 142.251.170.82:443: i/o timeout
发现他使用了pause:3.6,并且拉不下来这个镜像,我记得之前有指定3.9???
解决方法使用docker重新拉去3.6的pause
$ sudo docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6
$ sudo docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6 k8s.gcr.io/pause:3.6