一、kubeadm安装
服务器配置至少是2G2核的。如果不是则可以在集群初始化后面增加 --ignore-preflight-errors=NumCPU
1.准备环境
1)部署软件、系统要求
软件 | 版本 |
---|---|
Centos | CentOS Linux release 7.5及以上 |
Docker | 19.03.12 |
Kubernetes | V1.19.1 |
Flannel | V0.13.0 |
Kernel-lm | kernel-lt-4.4.245-1.el7.elrepo.x86_64.rpm |
Kernel-lm-deve | kernel-lt-devel-4.4.245-1.el7.elrepo.x86_64.rpm |
2)节点规划
IP建议采用192网段,避免与kubernetes内网冲突
二、部署k8s
1、系统优化(所有节点都做)
1)关闭swap分区
#1.一旦触发 swap,会导致系统性能急剧下降,所以一般情况下,K8S 要求关闭 swap
vim /etc/fstab
用#注释掉UUID swap分区那一行
swapoff -a
echo 'KUBELET_EXTRA_ARGS="--fail-swap-on=false"' > /etc/sysconfig/kubelet
2)关闭selinux、firewalld
sed -i 's#enforcing#disabled#g' /etc/selinux/config
setenforce 0 #临时关闭selinux
systemctl disable firewalld #永久关闭selinux
3)修改主机名并且做域名解析
#1.修改主机名
hostnamectl set-hostname k8s-master1
hostnamectl set-hostname k8s-node1
hostnamectl set-hostname k8s-node2
#2.修改hosts文件 (主节点)
vim /etc/hosts
192.168.12.11 k8s-master1 m1
192.168.12.12 k8s-node1 n1
192.168.12.13 k8s-node2 n2
4)配置免密登录、分发公钥(主节点)
ssh-keygen -t rsa
for i in m1 n1 n2;do ssh-copy-id -i ~/.ssh/id_rsa.pub root@$i;done
5)同步集群时间
在集群中,时间是一个很重要的概念,一旦集群当中某台机器视觉按跟集群时间不一致,可能会导致集群面临很多问题。所以,在部署集群之前,需要同步集群当中的所有机器时间
yum install ntpdate -y
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
echo 'Asia/Shanghai' > /etc/timezone
ntpdate time2.aliyun.com
#写入定时任务
*/1 * * * * ntpdate time2.aliyun.com > /dev/null 2>&1
6)配置镜像源
#1.默认情况下,centos使用的是官方yum源,所以一般情况下在国内使用时非常慢的,所以我们可以替换成国内的一些比较成熟的yum源,例如:清华大学镜像源,网易云镜像源等等。
curl -o /etc/yum.repos.d/CentOS-Base.repo https://repo.huaweicloud.com/repository/conf/CentOS-7-reg.repo
#2.刷新缓存
yum clean all
yum makecache
7)更新系统
yum update -y --exclud=kernel*
8)安装基础常用软件
yum install wget expect vim net-tools ntp bash-completion ipvsadm ipset jq iptables conntrack sysstat libseccomp -y
9)更新系统内核(docker对系统内核要求比较高,最好用4.4+)
#如果是centos8则不需要升级内核
wget https://elrepo.org/linux/kernel/el7/x86_64/RPMS/kernel-lt-5.4.107-1.el7.elrepo.x86_64.rpm
wget https://elrepo.org/linux/kernel/el7/x86_64/RPMS/kernel-lt-devel-5.4.107-1.el7.elrepo.x86_64.rpm
10)安装系统内容
yum localinstall -y kernel-lt*
grub2-set-default 0 && grub2-mkconfig -o /etc/grub2.cfg #调到默认启动
grubby --default-kernel #查看当前默认启动的内核
reboot #重启
11)安装IPVS
#IPVS是系统内核中的一个模块,其网络转发性能很高。一般情况下我们首选ipvs
yum install -y conntrack-tools ipvsadm ipset conntrack libseccomp
vim /etc/sysconfig/modules/ipvs.modules #加载IPVS模块
#!/bin/bash
ipvs_modules="ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_fo ip_vs_nq ip_vs_sed ip_vs_ftp nf_conntrack"
for kernel_module in \${ipvs_modules}; do
/sbin/modinfo -F filename \${kernel_module} > /dev/null 2>&1
if [ $? -eq 0 ]; then
/sbin/modprobe \${kernel_module}
fi
done
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash #给文件修改权限
/etc/sysconfig/modules/ipvs.modules && lsmod | grep ip_vs
12)修改内核启动参数
#内核参数优化的主要目的是使其更合适kubernetes的正常运行
vim /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp.keepaliv.probes = 3
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp.max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp.max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.top_timestamps = 0
net.core.somaxconn = 16384
sysctl --system #立即生效
2.安装docker(所有节点)
docker主要是作为k8s管理得常用的容器工具之一
#1.卸载之前安装过的docker
sudo yum remove docker docker-common docker-selinux docker-engine
#2.安装docker需要的依赖包
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
#3.安装docker的yum源
wget -O /etc/yum.repos.d/docker-ce.repo https://repo.huaweicloud.com/docker-ce/linux/centos/docker-ce.repo
#4.安装docker
yum install docker-ce -y
#5.设置开机自启
systemctl enable --now docker.service
3.安装kubele(所有节点)
#1.安装kebenetes yum源
vim /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
#2.安装kubelet
yum install kubectl-1.20.2 kubeadm-1.20.2 kubelet-1.20.2 -y #此处指定版本下载为了与下边初始化节点版本对应
systemctl enable --now kubelet
4.初始化master节点(只在master节点执行)
kubeadm init \
--image-repository=registry.cn-hangzhou.aliyuncs.com/k8sos \
--kubernetes-version=v1.20.2 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
注:初始化节点需注意版本问题,yum不指定版本下载默认下载最新版本,所以初始化需指定对应版本
所有节点版本都需对应,node节点与master节点版本不对应会导致node节点加入集群失败
注:避免导致上述问题可以指定版本下载
yum install kubectl-1.20.2 kubeadm-1.20.2 kubelet-1.20.2 -y
5.初始化后续(只在master节点执行)
#1.建立用户集群权限
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#如果是root用户,则可以使用:export KUBECONFIG=/etc/kubernetes/admin.conf
#2.安装集群网络插件(flannel.yaml)
vi /root/flannel.yaml
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
# Users and groups
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
# Privilege Escalation
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
# Capabilities
allowedCapabilities: ['NET_ADMIN', 'NET_RAW']
defaultAddCapabilities: []
requiredDropCapabilities: []
# Host namespaces
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
# SELinux
seLinux:
# SELinux is unused in CaaSP
rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: registry.cn-hangzhou.aliyuncs.com/alvinos/flanned:v0.13.1-rc1
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: registry.cn-hangzhou.aliyuncs.com/alvinos/flanned:v0.13.1-rc1
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN", "NET_RAW"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
kubectl apply -f flannel.yaml
#3.将node节点加入集群
[root@gdx1 ~]# kubeadm token create --print-join-command
kubeadm join 192.168.15.31:6443 --token s6svmh.lw88lchyl6m24tts --discovery-token-ca-cert-hash sha256:4d7e3e37e73176a97322e26fe501d2c27830a7bf3550df56f3a55b68395b507b
注:将上方生成的内容(token)复制到node的两台节点上执行
#4.查看token值命令
[root@gdx1 ~]# kubeadm token list
kubeadm join 192.168.12.11:6443 --token fm0387.iqixomz5jmsukwsi --discovery-token-ca-cert-hash sha256:d8ff83ffed5967000034d07b3da738ae4f1f0254e8417bb30c21f3ed15ac5d18
注:每生成一次token值都不一样,一次token值有效期24小时
#扩展:生成永久Token(node加入的时候会用到)
kubeadm token create --ttl 0 --print-join-command
`kubeadm join 192.168.233.3:6443 --token rpi151.qx3660ytx2ixq8jk --discovery-token-ca-cert-hash sha256:5cf4e801c903257b50523af245f2af16a88e78dc00be3f2acc154491ad4f32a4`#这是生成的Token,node加入时使用,此``是起到注释作用,无其他用途。
6.检查集群状态(主节点)
#1.第一种方式
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-m-01 Ready control-plane,master 13m v1.20.5
k8s-n-01 Ready <none> 35s v1.20.5
k8s-n-02 Ready <none> 39s v1.20.5
注:都出现ready的状态就证明成功
#2.第二种方式
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-f68b4c98f-5t7wm 1/1 Running 0 5m54s
coredns-f68b4c98f-5xqjs 1/1 Running 0 5m54s
etcd-k8s-m-01 1/1 Running 0 6m3s
kube-apiserver-k8s-m-01 1/1 Running 0 6m3s
kube-controller-manager-k8s-m-01 1/1 Running 0 6m3s
kube-flannel-ds-7bcwl 1/1 Running 0 104s
kube-proxy-ntpjx 1/1 Running 0 5m54s
kube-scheduler-k8s-m-01 1/1 Running 0 6m3s
注:看到所有状态都是1/1就证明成功
#3.第三种方式:直接验证集群DNS
kubectl run test -it --rm --image=busybox:1.28.3
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes #进入容器输入这行内容(有内容反馈就证明成功)
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
====================================================================================
node节点加入集群失败,状态为NotReady 解决方法
#node节点与master节点版本不一致
[root@gdx1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
gdx1 Ready control-plane,master 73m v1.20.2
gdx2 NotReady <none> 10m v1.21.0
gdx3 NotReady <none> 26s v1.21.0
解决办法:
# 默认下载是最新版本,难免出现版本不一致的问题,所以下载时指定同一版本才行
# 1.从节点删除下载版本重新指定版本格式:
yum remove kubectl kubeadm kubelet -y
yum install kubectl-1.20.2 kubeadm-1.20.2 kubelet-1.20.2 -y
#2.设置开机自启
systemctl enable --now kubelet
#3.重置nonde节点配置(因为上述已经加入过集群,会报错证书,配置文件,端口号已存在,需要格式化子节点配置)
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[root@gdx2 ~]# kubectl reset #报错以上内容执行此命令格式化子节点
#4.从集群移除状态为notready的node节点
[root@gdx1 ~]# kubectl delete node gdx3
#5.重新将node节点加入集群,此时需注意token值是否相同,如果多次生成token值,需确认最后生成的token值
注:此处做好在master节点重新生成一次token值用来node节点加入集群使用
[root@gdx1 ~]# kubeadm token create --print-join-command
kubeadm join 192.168.12.11:6443 --token fm0387.iqixomz5jmsukwsi --discovery-token-ca-cert-hash sha256:d8ff83ffed5967000034d07b3da738ae4f1f0254e8417bb30c21f3ed15ac5d18
注:将生成结果在node节点执行
#6.将node节点重新加入集群
[root@gdx2 ~]# kubeadm join 192.168.12.11:6443 --token fm0387.iqixomz5jmsukwsi --discovery-token-ca-cert-hash sha256:d8ff83ffed5967000034d07b3da738ae4f1f0254e8417bb30c21f3ed15ac5d18