目录
7、安装kubeadm,kubelet和kubectl[所有节点]
8、部署Kubernetes Master [ master ]
获取dashboard token, 也就是创建service account并绑定默认cluster-admin管理员集群角色
一、Kubeadm部署K8s1.18.0版本
1、 安装要求
3台纯净centos虚拟机,版本为7.x及以上
机器配置 2核2G以上 x3台
服务器网络互通
禁止swap分区
2、环境准备
# 1. 关闭防火墙功能
systemctl stop firewalld
systemctl disable firewalld
# 2.关闭selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0
# 3. 关闭swap
swapoff -a # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab # 永久
# 4. 服务器规划
cat > /etc/hosts << EOF
192.168.0.121 k8s-master1
192.168.0.122 k8s-node1
192.168.0.123 k8s-node2
EOF
#5. 主机名配置:
hostnamectl set-hostname k8s-master1
bash
#6. 时间同步配置
yum install -y ntpdate
ntpdate time.windows.com
#开启转发
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
#7. 时间同步
echo '*/5 * * * * /usr/sbin/ntpdate -u ntp.api.bz' >>/var/spool/cron/root
systemctl restart crond.service
crontab -l
# 以上可以全部复制粘贴直接运行,但是主机名配置需要重新修改
3、 docker安装[所有节点都需要安装]
#源添加
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
wget -P /etc/yum.repos.d/ http://mirrors.aliyun.com/repo/epel-7.repo
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
yum clean all
yum install -y bash-completion.noarch
# 安装指定版版本
yum -y install docker-ce-18.09.9-3.el7
#也可以查看版本安装
yum list docker-ce --showduplicates | sort -r
#启动docker
systemctl enable docker
systemctl start docker
systemctl status docker
4、docker配置cgroup驱动[所有节点]
rm -f /etc/docker/*
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://ajvcw8qn.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
systemctl enable docker.service
拉取flanel镜像:
docker pull lizhenliang/flannel:v0.11.0-amd64
5、镜像加速[所有节点]
curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://f1361db2.m.daocloud.io
systemctl restart docker
#如果源太多容易出错. 错了就删除一个.bak源试试看
#保留 curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://f1361db2.m.daocloud.io
这个是阿里云配置的加速,直接添加阿里云加速源就可以了.
https://cr.console.aliyun.com/cn-hangzhou/instances/mirrors
6、kubernetes源配置[所有节点]
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
7、安装kubeadm,kubelet和kubectl[所有节点]
yum install -y kubelet-1.18.0 kubeadm-1.18.0 kubectl-1.18.0
systemctl enable kubelet
8、部署Kubernetes Master [ master ]
kubeadm init \
--apiserver-advertise-address=192.168.0.121 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.18.0 \
--service-cidr=10.1.0.0/16 \
--pod-network-cidr=10.244.0.0/16
部署成功后会出现下面这段话,注意保存:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.0.121:6443 --token ih5sha.ceeo1v982dh5tygc \
--discovery-token-ca-cert-hash sha256:3fddd061e713790e9cd4e8300e73087fcce3e31871280116b2df75c847b6beed
报错处理
报错1:
W0507 00:43:52.681429 3118 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [init] Using Kubernetes version: v1.18.0 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2 [preflight] If you know what you are doing, you can make a check non-fatal with--ignore-preflight-errors=...To see the stack trace of this error execute with --v=5 or higher
需要修改docker驱动为systemd /etc/docker/daemon.json 文件中加入: "exec-opts": ["native.cgroupdriver=systemd"]
报错2:
[ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
出现该报错,是cpu有限制,将cpu修改为2核4G以上配置即可
出现该报错,是cpu有限制,将cpu修改为2核4G以上配置即可
报错3:
加入集群出现报错:
W0507 01:19:49.406337 26642 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[root@k8s-master2 yum.repos.d]# kubeadm join 10.0.0.63:6443 --token q8bfij.fipmsxdgv8sgcyq4 \
> --discovery-token-ca-cert-hash sha256:26fc15b6e52385074810fdbbd53d1ba23269b39ca2e3ec3bac9376ed807b595c
> --discovery-token-ca-cert-hash sha256:26fc15b6e52385074810fdbbd53d1ba23269b39ca2e3ec3bac9376ed807b595c
W0507 01:20:26.246981 26853 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
解决办法:
执行: kubeadm reset 重新加入
9、 kubectl命令工具配置[master]
master执行授权操作:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
node节点执行加入集群操作:
kubeadm join 192.168.0.121:6443 --token ih5sha.ceeo1v982dh5tygc \
--discovery-token-ca-cert-hash sha256:3fddd061e711122333300e73087fcccxddxxxx280116b2df75c847b6beed
#获取节点信息
# kubectl get nodes
[root@k8s-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master1 NotReady master 2m59s v1.18.0
k8s-node1 NotReady <none> 86s v1.18.0
k8s-node2 NotReady <none> 85s v1.18.0
#可以获取到其他主机的状态信息,证明集群完毕,status状态需要等CNI插件启动才正常
10、安装网络插件[master]
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
# Users and groups
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
# Privilege Escalation
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
# Capabilities
allowedCapabilities: ['NET_ADMIN']
defaultAddCapabilities: []
requiredDropCapabilities: []
# Host namespaces
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
# SELinux
seLinux:
# SELinux is unused in CaaSP
rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan",
"Directrouting": true
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-amd64
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- amd64
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-arm64
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- arm64
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-arm
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- arm
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-ppc64le
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- ppc64le
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-s390x
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- s390x
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
[所有节点操作]:
docker pull lizhenliang/flannel:v0.11.0-amd64
[master上操作]上传kube-flannel.yaml,并执行:
kube-flannel.yaml文件不方便下载,我已经在下面贴出来了
kubectl apply -f kube-flannel.yaml
kubectl get pods -n kube-system
[必须全部运行起来,否则有问题.]
[root@k8s-master1 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7ff77c879f-5dq4s 1/1 Running 0 13m
coredns-7ff77c879f-v68pc 1/1 Running 0 13m
etcd-k8s-master1 1/1 Running 0 13m
kube-apiserver-k8s-master1 1/1 Running 0 13m
kube-controller-manager-k8s-master1 1/1 Running 0 13m
kube-flannel-ds-amd64-2ktxw 1/1 Running 0 3m45s
kube-flannel-ds-amd64-fd2cb 1/1 Running 0 3m45s
kube-flannel-ds-amd64-hb2zr 1/1 Running 0 3m45s
kube-proxy-4vt8f 1/1 Running 0 13m
kube-proxy-5nv5t 1/1 Running 0 12m
kube-proxy-9fgzh 1/1 Running 0 12m
kube-scheduler-k8s-master1 1/1 Running 0 13m
[root@k8s-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master1 Ready master 14m v1.18.0
k8s-node1 Ready <none> 12m v1.18.0
k8s-node2 Ready <none> 12m v1.18.0
至此,k8s集群部署已完成,接下来列一些常用的操作!
二、集群部署完成的后续操作
1、token创建和查询
默认token会保存24消失,过期后就不可用,如果需要重新建立token,可在master节点使用以下命令重新生成:
kubeadm token create
kubeadm token list
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
结果:
3d847b858ed649244b4110d4d60ffd57f43856f42ca9c22e12ca33946673ccb4
新token加入集群方法:
kubeadm join 10.0.0.63:6443 --discovery-token nuja6n.o3jrhsffiqs9swnu --discovery-token-ca-cert-hash 3d847b858ed649244b4110d4d60ffd57f43856f42ca9c22e12ca33946673ccb4
2、安装merics-server和dashboard
注意:merics-server用于获取容器指标,例如top,dashboard为官方UI,理论上没有必要部署,但是貌似prometheus有依赖,建议新手还是装上吧。
部署yaml文件下载:
这份资料包含prometheus全量部署,不需要可以百度下merics-server和dashboard的yaml
[root@k8s-master ~]# kubectl get po -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-694557449d-h4v7z 1/1 Running 0 167m
kubernetes-dashboard-86c788c4d7-k6972 1/1 Running 0 167m
访问是https://任意节点地址:30001
获取dashboard token, 也就是创建service account并绑定默认cluster-admin管理员集群角色
# kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard
# kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-admin
# kubectl describe secrets -n kubernetes-dashboard $(kubectl -n kubernetes-dashboard get secret | awk '/dashboard-admin/{print $1}')
将复制的token 填写到 上图中的 token选项,并选择token登录
一. token过期处理办法:
每隔24小时,之前创建的token就会过期,这样会无法登录集群的dashboard页面,此时需要重新生成token
生成命令:
kubeadm token create
kubeadm token list
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
查询token
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
3d847b858ed649244b4110d4d60ffd57f43856f42ca9c22e12ca33946673ccb4
然后使用新的token让新服务器加入:
kubeadm join 10.0.0.63:6443 --token 0dr1pw.ejybkufnjpalb8k6 --discovery-token-ca-cert-hash sha256:3d847b858ed649244b4110d4d60ffd57f43856f42ca9c22e12ca33946673ccb4
二. dashboard登录密码获取
kubectl describe secrets -n kubernetes-dashboard $(kubectl -n kubernetes-dashboard get secret | awk '/dashboard-admin/{print $1}')
3、验证集群是否工作正常
验证集群状态是否正常有三个方面:
1. 能否正常部署应用
2. 集群网络是否正常
3. 集群内部dns解析是否正常
3.1 验证部署应用和日志查询
#创建一个nginx应用
kubectl create deployment k8s-status-checke --image=nginx
#暴露80端口
kubectl expose deployment k8s-status-checke --port=80 --target-port=80 --type=NodePort
#删除这个deployment
kubectl delete deployment k8s-status-checke
#查询日志:
[root@k8s-master1 ~]# kubectl logs -f nginx-f89759699-m5k5z
3.2 验证集群网络是否正常
1. 拿到一个应用地址
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED READINESS
pod/nginx 1/1 Running 0 25h 10.244.2.18 k8s-node2 <none> <none>
2. 通过任意节点ping这个应用ip
[root@k8s-node1 ~]# ping 10.244.2.18
PING 10.244.2.18 (10.244.2.18) 56(84) bytes of data.
64 bytes from 10.244.2.18: icmp_seq=1 ttl=63 time=2.63 ms
64 bytes from 10.244.2.18: icmp_seq=2 ttl=63 time=0.515 ms
3. 访问节点
[root@k8s-master1 ~]# curl -I 10.244.2.18
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sun, 10 May 2020 13:19:02 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 14 Apr 2020 14:19:26 GMT
Connection: keep-alive
ETag: "5e95c66e-264"
Accept-Ranges: bytes
4. 查询日志
[root@k8s-master1 ~]# kubectl logs -f nginx
10.244.1.0 - - [10/May/2020:13:14:25 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" "-"
3.3 验证集群内部dns解析是否正常
检查DNS:
[root@k8s-master1 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7ff77c879f-5dq4s 1/1 Running 1 4d #有时dns会出问题
coredns-7ff77c879f-v68pc 1/1 Running 1 4d #有时dns会出问题
etcd-k8s-master1 1/1 Running 4 4d
kube-apiserver-k8s-master1 1/1 Running 3 4d
kube-controller-manager-k8s-master1 1/1 Running 3 4d
kube-flannel-ds-amd64-2ktxw 1/1 Running 1 4d
kube-flannel-ds-amd64-fd2cb 1/1 Running 1 4d
kube-flannel-ds-amd64-hb2zr 1/1 Running 4 4d
kube-proxy-4vt8f 1/1 Running 4 4d
kube-proxy-5nv5t 1/1 Running 2 4d
kube-proxy-9fgzh 1/1 Running 2 4d
kube-scheduler-k8s-master1 1/1 Running 4 4d
#有时dns会出问题,解决方法:
1. 导出yaml文件
kubectl get deploy coredns -n kube-system -o yaml >coredns.yaml
2. 删除coredons
kubectl delete -f coredns.yaml
检查:
[root@k8s-master1 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
etcd-k8s-master1 1/1 Running 4 4d
kube-apiserver-k8s-master1 1/1 Running 3 4d
kube-controller-manager-k8s-master1 1/1 Running 3 4d
kube-flannel-ds-amd64-2ktxw 1/1 Running 1 4d
kube-flannel-ds-amd64-fd2cb 1/1 Running 1 4d
kube-flannel-ds-amd64-hb2zr 1/1 Running 4 4d
kube-proxy-4vt8f 1/1 Running 4 4d
kube-proxy-5nv5t 1/1 Running 2 4d
kube-proxy-9fgzh 1/1 Running 2 4d
kube-scheduler-k8s-master1 1/1 Running 4 4d
coredns已经删除了
3. 重建coredns
kubectl apply -f coredns.yaml
[root@k8s-master1 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7ff77c879f-5mmjg 1/1 Running 0 13s
coredns-7ff77c879f-t74th 1/1 Running 0 13s
etcd-k8s-master1 1/1 Running 4 4d
kube-apiserver-k8s-master1 1/1 Running 3 4d
kube-controller-manager-k8s-master1 1/1 Running 3 4d
kube-flannel-ds-amd64-2ktxw 1/1 Running 1 4d
kube-flannel-ds-amd64-fd2cb 1/1 Running 1 4d
kube-flannel-ds-amd64-hb2zr 1/1 Running 4 4d
kube-proxy-4vt8f 1/1 Running 4 4d
kube-proxy-5nv5t 1/1 Running 2 4d
kube-proxy-9fgzh 1/1 Running 2 4d
kube-scheduler-k8s-master1 1/1 Running 4 4d
日志复查:
coredns-7ff77c879f-5mmjg:
[root@k8s-master1 ~]# kubectl logs coredns-7ff77c879f-5mmjg -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
coredns-7ff77c879f-t74th:
[root@k8s-master1 ~]# kubectl logs coredns-7ff77c879f-t74th -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
#k8s创建一个容器验证dns
[root@k8s-master1 ~]# kubectl run -it --rm --image=busybox:1.28.4 sh
/ # nslookup kubernetes
Server: 10.1.0.10
Address 1: 10.1.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes
Address 1: 10.1.0.1 kubernetes.default.svc.cluster.local
#通过 nslookup来解析 kubernetes 能够出现解析,说明dns正常工作
4、在k8s中部署一个nginx
[root@k8s-master ~]# kubectl create deployment nginx --image=nginx
[root@k8s-master ~]# kubectl expose deployment nginx --port=80 --target-port=80 --type=NodePort
[root@k8s-master ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx NodePort 10.1.161.50 <none> 80:30499/TCP 152m
[root@k8s-master ~]# curl http://10.1.161.50
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
5、创建临时busybox访问集群内部服务
查看正在运行的服务
kubectl get svc
使用busybox
kubectl run busybox --rm=true --image=busybox --restart=Never -it
通过服务名可以访问服务
wget http://servername
三、ETCD常用操作
1、kubernetes自动补全:
自动补全k8s常用的资源变量,提高效率
yum install -y bash-completion
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
2、拷贝etcdctl命令行工具:
etcd类似redis命令,两者用法有很多相似之处,都是键值对数据库
$ kubectl -n kube-system exec etcd-k8s-master which etcdctl
$ kubectl -n kube-system cp etcd-k8s-master:/usr/local/bin/etcdctl /usr/bin/etcdctl
3、etcdctl常用操作:
1 查看etcd集群的成员节点:
# 刚开始会出现该报错,需要调整版本至V3
WARNING:
Environment variable ETCDCTL_API is not set; defaults to etcdctl v2.
Set environment variable ETCDCTL_API=3 to use v3 API or ETCDCTL_API=2 to use v2 API.
$ export ETCDCTL_API=3
# 因为每次etcdctl命令都需要附加证书,直接做个别名
$ alias etcdctl='etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key'
$ etcdctl member list -w table
[root@k8s-master ~]# etcdctl member list -w table
+------------------+---------+------------+----------------------------+----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+------------+----------------------------+----------------------------+
| 49c374033081590d | started | k8s-master | https://192.168.0.121:2380 | https://192.168.0.121:2379 |
+------------------+---------+------------+----------------------------+-----------------
2 查看etcd集群节点状态:
$ etcdctl endpoint status -w table
$ etcdctl endpoint health -w table
3 设置key值:
# 类似redis,手动设置键值对
$ etcdctl put luffy 1
$ etcdctl get luffy
查看所有key值:
$ etcdctl get / --prefix --keys-only
# 我们知道etcd一直处于监听状态,所以集群任何变动都能及时同步到etcd中去,实际上etcd监视着集群所有资源,目录结构为:/registry/资源类型/命名空间/对象ID/,例如监听pod状态,执行 <etcdctl get /registry/pods/kube-system/coredns-5644d7b6d9-7gw6t --prefix> 会得到key为目录名,value为pod的yaml内容。
$ etcdctl watch 目录名或者对象名
$ /registry/pods/kube-system/coredns-5644d7b6d9-7gw6t
$ etcdctl get /registry/pods/kube-system/coredns-5644d7b6d9-7gw6t --prefix
查看具体的key对应的数据:
$ etcdctl get /registry/pods/jenkins/sonar-postgres-7fc5d748b6-gtmsb
4 etcd数据快照与恢复
添加定时任务做数据快照
$ etcdctl snapshot save `hostname`-etcd_`date +%Y%m%d%H%M`.db
恢复快照:
停止etcd和apiserver
移走当前数据目录
$ mv /var/lib/etcd/ /tmp
恢复快照
$ etcdctl snapshot restore `hostname`-etcd_`date +%Y%m%d%H%M`.db --data-dir=/var/lib/etcd/
集群恢复
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md
5 etcd生产级别的备份方案
# 生产级别利用cronjob定时备份
# etcd-db-bak:/var/lib/etcd_backup
# etcd-cert:/etc/etcd/pki
# stcd-bin:pod-name/usr/local/bin/etcd
# firewalld:/usr/lib/firewalld/services/etcd-client.xml
# yaml:/home/install/k8s-self/template/master/k8s-etcd-backup.yaml
# shell:/home/install/k8s-self/scripts/etcd/afterInstall.sh 36-zhu
# 这是定时备份etcd数据的任务
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: k8s-etcd-backup-0
namespace: kube-system
spec:
# timezone is same as controller manager, default is UTC
# 国际标准时间18点换算为北京时间2点
schedule: "12 18 * * *"
concurrencyPolicy: Replace # #并发调度策略:Allow运行同时运行过个任务。Forbid:不运行并发执行。Replace:替换之前的任务
failedJobsHistoryLimit: 2 # 为失败的任务执行保留历史记录数,默认为1.
successfulJobsHistoryLimit: 2 # 为成功执行的任务保留历史记录,默认值为3;所以可以看到6个运行完成的cronjob生成的pod
startingDeadlineSeconds: 3600 # 因为各种原因缺乏执行作业的时间点导致的启动作业错误的超时时长,会被记入错误历史记录;
jobTemplate: # Job控制器模板,用于为CronJob控制器生成Job对象
spec:
template:
metadata:
labels:
app: k8s-etcd-backup
spec:
tolerations: # Taints(污点),Tolerations(容忍)aints定义在Node节点上,声明污点及标准行为,Tolerations定义在Pod,声明可接受得污点。当前容忍度为允许没有污点的master节点执行任务,通过kubectl describe node nodename | greo Tains结果为none
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬亲和性:实现的是强制性规则,是Pod调度时必须满足的规则,否则Pod对象的状态会一直是Pending
nodeSelectorTerms: # #nodeSelectorTerms可以定义多条约束,只需满足其中一条。
- matchExpressions: # matchExpressions可以定义多条约束,必须满足全部约束。
- key: kubernetes.io/hostname # 强制绑定到label标签kubernetes.io/hostname的value是k8s-hostname-node1的node上
operator: In # In:label的值在某个列表中
values:
- k8s-hostname-node1 # 每个job和node一对一亲和绑定
containers:
- name: k8s-etcd-backup
image: harborIP/kubernetes/etcd:3.4.3-0
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: "0"
memory: "0"
limits:
cpu: 1000m
memory: 1Gi
env:
- name: ENDPOINTS
value: "https://k8s-node1:2379"
command:
- /bin/sh
- -c
- |
set -ex # -e 脚本中的命令一旦运行失败就终止脚本的执行 -x 用于显示出命令与其执行结果debug模式
rm -rf /data/backup/tmp
mkdir -p /data/backup/tmp && test -d /data/backup/tmp || exit 1; #判断
export backupfilename=`date +"%Y%m%d%H%M%S"`; # 设置环境变量
test -f /certs/ca.pem || (rm -rf /data/backup/tmp && exit 1);test -f /certs/client.pem || (rm -rf /data/backup/tmp && exit 1);test -f /certs/client-key.pem || (rm -rf /data/backup/tmp &&exit 1);\ # 确认是否存在证书文件
ETCDCTL_API=3 /usr/local/bin/etcdctl \
--endpoints=$ENDPOINTS \
--cacert=/certs/ca.pem \
--cert=/certs/client.pem \
--key=/certs/client-key.pem \
--command-timeout=1800s \
snapshot save /data/backup/tmp/etcd-snapshot.db && \ # etcd数据备份命令
cd /data/backup/tmp; tar -czf /data/backup/etcd-snapshot-${backupfilename}.tar.gz * && \
cd -; rm -rf /data/backup/tmp
if [ $? -ne 0 ]; then # 如果运行失败则exit1
exit 1
fi
# delete old file more than 7
count=0;
for file in `ls -t /data/backup/*tar.gz`
do
count=`expr $count + 1`
if [ $count -gt 7 ]; then
rm -rf $file
fi
done
volumeMounts: # 容器目录
- name: master-backup
mountPath: /data/backup
- name: etcd-certs
mountPath: /certs
- name: timezone
mountPath: /etc/localtime
readOnly: true
volumes: # 映射到宿主机的目录
- name: master-backup # 备份文件目录
hostPath:
path: /var/lib/etcd_backup
- name: etcd-certs
hostPath:
path: /etc/etcd/pki # cert文件目录
- name: timezone
hostPath:
path: /etc/localtime # 系统时区文件
restartPolicy: Never # 重启策略,job执行完毕自动退出无需重启
hostNetwork: true
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: k8s-etcd-backup-1
namespace: kube-system
spec:
# timezone is same as controller manager, default is UTC
schedule: "12 19 * * *"
concurrencyPolicy: Replace
failedJobsHistoryLimit: 2
successfulJobsHistoryLimit: 2
startingDeadlineSeconds: 3600
jobTemplate:
spec:
template:
metadata:
labels:
app: k8s-etcd-backup
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8s-hostname-master
containers:
- name: k8s-etcd-backup
image: harborIP/kubernetes/etcd:3.4.3-0
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: "0"
memory: "0"
limits:
cpu: 1000m
memory: 1Gi
env:
- name: ENDPOINTS
value: "https://k8s-master:2379"
command:
- /bin/sh
- -c
- |
set -ex
rm -rf /data/backup/tmp
mkdir -p /data/backup/tmp && test -d /data/backup/tmp || exit 1;
export backupfilename=`date +"%Y%m%d%H%M%S"`;
test -f /certs/ca.pem || (rm -rf /data/backup/tmp && exit 1);test -f /certs/client.pem || (rm -rf /data/backup/tmp && exit 1);test -f /certs/client-key.pem || (rm -rf /data/backup/tmp &&exit 1);\
ETCDCTL_API=3 /usr/local/bin/etcdctl \
--endpoints=$ENDPOINTS \
--cacert=/certs/ca.pem \
--cert=/certs/client.pem \
--key=/certs/client-key.pem \
--command-timeout=1800s \
snapshot save /data/backup/tmp/etcd-snapshot.db && \
cd /data/backup/tmp; tar -czf /data/backup/etcd-snapshot-${backupfilename}.tar.gz * && \
cd -; rm -rf /data/backup/tmp
if [ $? -ne 0 ]; then
exit 1
fi
# delete old file more than 7
count=0;
for file in `ls -t /data/backup/*tar.gz`
do
count=`expr $count + 1`
if [ $count -gt 7 ]; then
rm -rf $file
fi
done
volumeMounts:
- name: master-backup
mountPath: /data/backup
- name: etcd-certs
mountPath: /certs
- name: timezone
mountPath: /etc/localtime
readOnly: true
volumes:
- name: master-backup
hostPath:
path: /var/lib/etcd_backup
- name: etcd-certs
hostPath:
path: /etc/etcd/pki
- name: timezone
hostPath:
path: /etc/localtime
restartPolicy: Never
hostNetwork: true
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: k8s-etcd-backup-2
namespace: kube-system
spec:
# timezone is same as controller manager, default is UTC
schedule: "12 20 * * *"
concurrencyPolicy: Replace
failedJobsHistoryLimit: 2
successfulJobsHistoryLimit: 2
startingDeadlineSeconds: 3600
jobTemplate:
spec:
template:
metadata:
labels:
app: k8s-etcd-backup
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8s-hostname-node2
containers:
- name: k8s-etcd-backup
image: harborIP/kubernetes/etcd:3.4.3-0
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: "0"
memory: "0"
limits:
cpu: 1000m
memory: 1Gi
env:
- name: ENDPOINTS
value: "https://k8s-node2:2379"
command:
- /bin/sh
- -c
- |
set -ex
rm -rf /data/backup/tmp
mkdir -p /data/backup/tmp && test -d /data/backup/tmp || exit 1;
export backupfilename=`date +"%Y%m%d%H%M%S"`;
test -f /certs/ca.pem || (rm -rf /data/backup/tmp && exit 1);test -f /certs/client.pem || (rm -rf /data/backup/tmp && exit 1);test -f /certs/client-key.pem || (rm -rf /data/backup/tmp &&exit 1);\
ETCDCTL_API=3 /usr/local/bin/etcdctl \
--endpoints=$ENDPOINTS \
--cacert=/certs/ca.pem \
--cert=/certs/client.pem \
--key=/certs/client-key.pem \
--command-timeout=1800s \
snapshot save /data/backup/tmp/etcd-snapshot.db && \
cd /data/backup/tmp; tar -czf /data/backup/etcd-snapshot-${backupfilename}.tar.gz * && \
cd -; rm -rf /data/backup/tmp
if [ $? -ne 0 ]; then
exit 1
fi
# delete old file more than 7
count=0;
for file in `ls -t /data/backup/*tar.gz`
do
count=`expr $count + 1`
if [ $count -gt 7 ]; then
rm -rf $file
fi
done
volumeMounts:
- name: master-backup
mountPath: /data/backup
- name: etcd-certs
mountPath: /certs
- name: timezone
mountPath: /etc/localtime
readOnly: true
volumes:
- name: master-backup
hostPath:
path: /var/lib/etcd_backup
- name: etcd-certs
hostPath:
path: /etc/etcd/pki
- name: timezone
hostPath:
path: /etc/localtime
restartPolicy: Never
hostNetwork: true