Centos7 部署Kubenetes文档
部署说明:一个master + 一个node
操作系统信息
[root@zf zhangfeng]# uname -a
Linux zf.master 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@zf zhangfeng]# cat /proc/version
Linux version 3.10.0-1062.1.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Sep 13 22:55:44 UTC 2019
[root@zf zhangfeng]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
一、系统配置
1、设置主机名
1、master节点
hostnamectl set-hostname zf.master
2、node节点
hostnamectl set-hostname zf.node1
2、必须设置域名解析(master、node都要设置)
master和node都要执行以下命令
cat <<EOF >>/etc/hosts
192.168.1.4 zf.master
192.168.1.143 zf.node1
EOF
3、关闭防火墙 、selinux和swap
systemctl disable firewalld --now
setenforce 0
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
swapoff -a
echo "vm.swappiness = 0">> /etc/sysctl.conf
sed -i 's/.*swap.*/#&/' /etc/fstab
sysctl -p
4、配置内核参数,将桥接的IPv4流量传递到iptables的链
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
5、配置软件源
1、配置yum源base repo为阿里云的yum源
cd /etc/yum.repos.d
mv CentOS-Base.repo CentOS-Base.repo.bak
mv epel.repo epel.repo.bak
curl https://mirrors.aliyun.com/repo/Centos-7.repo -o CentOS-Base.repo
sed -i 's/gpgcheck=1/gpgcheck=0/g' /etc/yum.repos.d/CentOS-Base.repo
curl https://mirrors.aliyun.com/repo/epel-7.repo -o epel.repo
2、配置kubernetes源为阿里的yum源
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 Kubernetes源设为阿里
gpgcheck=0:表示对从这个源下载的rpm包不进行校验
repo_gpgcheck=0:某些安全性配置文件会在 /etc/yum.conf 内全面启用 repo_gpgcheck,以便能检验软件库的中继数据的加密签署
如果gpgcheck设为1,会进行校验,就会报错如下,所以这里设为0
3、update cache 更新缓存
yum clean all && yum makecache && yum repolist
二、安装docker
master和node都要安装docker,安装成功后需要设置开机自己启动。
systemctl enable docker && systemctl start docker
三、安装kubeadm、kubelet和kubectl
1、安装kubelet kubeadm kubectl(master node都要执行)
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
Kubelet负责与其他节点集群通信,并进行本节点Pod和容器生命周期的管理。
Kubeadm是Kubernetes的自动化部署工具,降低了部署难度,提高效率。
Kubectl是Kubernetes集群管理工具。
最后启动kubelet:
systemctl enable kubelet --now
查看k8s版本
kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:27:49Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
2、kubernetes 镜像 (master node都要执行)
开始初始化集群之前可以使用kubeadm config images list查看一下初始化需要哪些镜像
kubeadm config images list
以下是所需要的镜像
k8s.gcr.io/kube-apiserver:v1.17.2
k8s.gcr.io/kube-controller-manager:v1.17.2
k8s.gcr.io/kube-scheduler:v1.17.2
k8s.gcr.io/kube-proxy:v1.17.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.4.3-0
k8s.gcr.io/coredns:1.6.5
可以先通过kubeadm config images pull手动在各个节点上拉取所k8s需要的docker镜像,master节点初始化或者node节点加入集群时,会用到这些镜像
如果不先执行kubeadm config images pull拉取镜像,其实在master节点执行kubeadm init 或者node节点执行 kubeadm join命令时,也会先拉取镜像。
本人没有提前拉取镜像,都是在master节点kubeadm init 和node节点 kubeadm join时,自动拉的镜像
kubeadm config images pull
3、 初始化kubeadm
1、执行初始化
与初始化相对应的命令是kubeadm reset
kubeadm init --kubernetes-version=1.17.2 \
--apiserver-advertise-address=192.168.1.4 \
--image-repository=registry.aliyuncs.com/google_containers \
--service-cidr=10.1.0.0/16 \
--pod-network-cidr=10.244.0.0/16
–kubernetes-version: 用于指定k8s版本;
–apiserver-advertise-address:用于指定kube-apiserver监听的ip地址,就是 master本机IP地址。
–pod-network-cidr:用于指定Pod的网络范围; 10.244.0.0/16
–service-cidr:用于指定SVC的网络范围;
–image-repository: 指定阿里云镜像仓库地址
这一步很关键,由于kubeadm 默认从官网k8s.grc.io下载所需镜像,国内无法访问,因此需要通过–image-repository指定阿里云镜像仓库地址
初始化时间比较长一些,大概1分钟左右,当出现以下结果,标识初始化完成(初始化完成,不代表集群运行正常)
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.43.88:6443 --token m16ado.6ne248sk47nln0jj \
--discovery-token-ca-cert-hash sha256:09cda974fb18e716219bf08ef9d7a4eaa76bfe59ec91d0930b4ccfbd111276de
3、按提示执行命令
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
4、将pod网络(flannel)部署到集群
下载kube-flannel.yml文件
curl -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
执行改文件需要下载 quay.io/coreos/flannel:v0.11.0-arm64镜像。
在所有机器上拉去flannel所需要的镜像,获取在master上导出镜像后,在其他节点上导入。
# 手动拉取flannel的docker镜像
docker pull easzlab/flannel:v0.11.0-amd64
# 修改镜像名称
docker tag easzlab/flannel:v0.11.0-amd64 quay.io/coreos/flannel:v0.11.0-amd64
安装 flannel network add-on
kubectl apply -f ./kube-flannel.yml
执行结果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
删除
kubectl delete -f ./kube-flannel.yml
5、查看node状态、pod状态
一般node状态为NotReady状态,kubectl get pod -n kube-system看一下pod状态,
一般可以发现问题,flannel镜像下载失败ImagePullBackOff,coredns状态一直是pending,
这时查看一下docker image ls里面有没有quay.io/coreos/flannel:v0.11.0-amd64镜像,如果没有,尝试
docker pull quay.io/coreos/flannel:v0.11.0-amd64
必须把flannel镜像拉取下,kube-flannel-ds-amd64的状态才正常running
本人虚拟机里面死活没办法拉取quay.io/coreos/flannel:v0.11.0-amd64镜像,只好从其他地方拉取后save导出,再load到虚拟机的docker里面
注意:master节点和node节点,都必须有这个镜像quay.io/coreos/flannel:v0.11.0-amd64
4、部署Node节点加入集群
1、首先确保node节点是否存在flannel的docker镜像:quay.io/coreos/flannel:v0.11.0-amd64
2、执行kubeadm join命令加入集群
kubeadm join 192.168.43.88:6443 --token ep9bne.6at6gds2o05dgutd \
--discovery-token-ca-cert-hash sha256:b2f75a6e5a49e66e467392d7d237548664ba8a28aafe98bdb18a7dd63ecc4aa8
到master节点查看node状态,都显示ready
kubectl get nodes
NAME STATUS ROLES AGE VERSION
zf.master Ready master 33h v1.17.2
zf.node1 Ready <none> 70m v1.17.2
节点加入集群时可能会遇到token过期
错信息
kubeadm 报错 error execution phase preflight: couldn’t validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s
kubeadm token create --print-join-command //默认有效期24小时,若想久一些可以结合--ttl参数,设为0则用不过期
kubeadm join k8smaster.com:6443 --token pdas2m.fkgn8q7mz5u96jm6 --discovery-token-ca-cert-hash sha256:6fd9b1bf2d593d2d4f550cd9f1f596865f117fef462db42860228311c2712b8b
5、节点管理
1、移除Node节点
在node节点执行
kubeadm reset
撤销kubeadm join,再手动rm掉提示的配置文件夹
2、在master节点执行(kubectl get node可以查看节点名)
kubectl delete node 节点名称
kubectl delete node zf.node1
四、验证kubenetes
kubectl create deployment nginx --image=nginx
deployment.apps/nginx created
kubectl expose deployment nginx --port=80 --type=NodePort
service/nginx exposed
kubectl get pods,svc
NAME READY STATUS RESTARTS AGE
pod/nginx-86c57db685-ljzhp 0/1 ContainerCreating 0 15s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 91m
service/nginx NodePort 10.1.136.233 <none> 80:32387/TCP 9s
五、部署Dashboard
六、配置 单机版 k8s
默认 Master Node不参与工作负载,所以 要配置让Master
工作,请安如下2步操作.
查看
kubectl describe node zf.master | grep Taints
或
kubectl describe node -A | grep Taints
结果
Taints: node-role.kubernetes.io/master:NoSchedule
去掉 taint , 这样 单机 k8s 就能部署各种东东了
kubectl taint nodes --all node-role.kubernetes.io/master-
或
kubectl taint nodes zf.master node-role.kubernetes.io/master-
查看
kubectl describe node zf.master | grep Taints
或
kubectl describe node -A | grep Taints
结果
Taints: <none>
七、部署应用测试kubenetes
1、 kubenetes自带pod
kubectl get pod --namespace=kube-system
或
kubectl get pod -n kube-system
或
kubectl get pod -A
结果
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-9d85f5447-54x8h 1/1 Running 0 33h
coredns-9d85f5447-jhnc7 1/1 Running 0 33h
etcd-zf.master 1/1 Running 0 33h
kube-apiserver-zf.master 1/1 Running 0 33h
kube-controller-manager-zf.master 1/1 Running 1 33h
kube-flannel-ds-amd64-4cfjl 1/1 Running 0 33h
kube-flannel-ds-amd64-wwk7n 1/1 Running 0 96m
kube-proxy-cxfqx 1/1 Running 0 96m
kube-proxy-nmg7h 1/1 Running 0 33h
kube-scheduler-zf.master 1/1 Running 1 33h
2、pod错误日志查看
这里假设 :kube-flannel-ds-amd64-wwk7n STATUS 状态 Pending,那么用如下查看
kubectl describe pod kube-flannel-ds-amd64-wwk7n --namespace=kube-system
或
kubectl describe pod kube-flannel-ds-amd64-wwk7n -n kube-system
就会输出 错误日志信息
3、非系统pod
当使用非系统pod时,不需要-n或–namespace选项。
节点标签、集群namespace;
常见问题
kubeadm join报错及解决
1、报错:
kubeadm join —
[WARNING IsDockerSystemdCheck]: detected “cgroupfs” as the Docker cgroup driver. The recommended driver is “systemd”. Please follow the guide at https://kubernetes.io/docs/setup/cri/
原因k8s默认的cgroup-driver为cgroupfs,但是yum安装kubelet的时候自动修改为systemd,而docker通过docker info命令查看是cgroupfs,解决方法有两种。
方法一:
将k8s的修改为cgroupfs
#vim /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
#systemctl enable docker
#systemctl enable kubelet
#kubeadm join --token c04f89.b781cdb55d83c1ef 10.10.3.4:63 --discovery-token-ca-cert-hash sha256:986e83a9cb948368ad0552b95232e31d3b76e2476b595bd1d905d5242ace29af --ignore-preflight-errors=Swap
方法二:
修改docker的cgroup driver为systemd
mkdir /etc/docker
Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
Restart Docker
systemctl daemon-reload
systemctl restart docker
错误
Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image “k8s.gcr.io/pause:3.1”: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
原因:新增节点需要下载pause:3.1镜像,默认镜像源gcr.io被GFW墙了