环境规划
类型 | 服务器IP地址 |
---|---|
k8s-deploy | 172.21.90.211 |
k8s-harbor | 172.21.90.219 |
k8s-master(3台) | 172.21.90.212/213/220 |
k8s-node(3台) | 172.21.90.214/215/221 |
k8s-etcd(3台) | 172.21.90.216/217/218 |
阿里云SLB | 47.122.7.18 |
一、harbor证书签发
将harbor离线安装包解压后,进入harbor/目录,创建certs/目录并进入
# pwd
/apps/harbor/certs
步骤参考官网Harbor docs | Configure HTTPS Access to Harbor (goharbor.io)
1、自签名CA机构
# openssl genrsa -out ca.key 4096 #私有CA key
Generating RSA private key, 4096 bit long modulus (2 primes)
..................++++
..........................................................................................................................................................................................................................................++++
e is 65537 (0x010001)
# openssl req -x509 -new -nodes -sha512 -days 3650 \
-subj "/C=CN/ST=Beijing/L=Beijing/O=example/OU=Personal/CN=y73.harbor.com" \
-key ca.key \
-out ca.crt #自签发CA crt证书
# ll
total 20
drwxr-xr-x 2 root root 4096 Nov 20 16:52 ./
drwxr-xr-x 3 root root 4096 Nov 20 16:41 ../
-rw-r--r-- 1 root root 2053 Nov 20 16:51 ca.crt
-rw------- 1 root root 3243 Nov 20 16:45 ca.key
# touch /root/.rnd #记录证书签发信息
2、客户端域名证书申请
# openssl genrsa -out y73.harbor.com.key 4096 #harbor服务器私钥
# ll
total 20
drwxr-xr-x 2 root root 4096 Nov 20 16:52 ./
drwxr-xr-x 3 root root 4096 Nov 20 16:41 ../
-rw-r--r-- 1 root root 2053 Nov 20 16:51 ca.crt
-rw------- 1 root root 3243 Nov 20 16:45 ca.key
-rw------- 1 root root 3243 Nov 20 16:52 y73.harbor.com.key #harbor私钥key
# openssl req -sha512 -new \
-subj "/C=CN/ST=Beijing/L=Beijing/O=example/OU=Personal/CN=y73.harbor.com" \
-key y73.harbor.com.key \
-out y73.harbor.com.csr #harbor服务器csr文件
# ll
total 24
drwxr-xr-x 2 root root 4096 Nov 20 16:55 ./
drwxr-xr-x 3 root root 4096 Nov 20 16:41 ../
-rw-r--r-- 1 root root 2053 Nov 20 16:51 ca.crt
-rw------- 1 root root 3243 Nov 20 16:45 ca.key
-rw-r--r-- 1 root root 1708 Nov 20 16:55 y73.harbor.com.csr
-rw------- 1 root root 3243 Nov 20 16:52 y73.harbor.com.key
3、准备签发环境
# cat > v3.ext <<-EOF
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1=y73.harbor.com
DNS.2=y73.harbor
DNS.3=k8s-harbor1
EOF #证书签发SAN文件
# ll
total 28
drwxr-xr-x 2 root root 4096 Nov 20 16:58 ./
drwxr-xr-x 3 root root 4096 Nov 20 16:41 ../
-rw-r--r-- 1 root root 2053 Nov 20 16:51 ca.crt
-rw------- 1 root root 3243 Nov 20 16:45 ca.key
-rw-r--r-- 1 root root 268 Nov 20 16:58 v3.ext
-rw-r--r-- 1 root root 1708 Nov 20 16:55 y73.harbor.com.csr
-rw------- 1 root root 3243 Nov 20 16:52 y73.harbor.com.key
4、使用自签名CA签发证书
# openssl x509 -req -sha512 -days 3650 \
-extfile v3.ext \
-CA ca.crt -CAkey ca.key -CAcreateserial \
-in y73.harbor.com.csr \
-out y73.harbor.com.crt #自签发harbor证书
# ll
total 36
drwxr-xr-x 2 root root 4096 Nov 20 17:01 ./
drwxr-xr-x 3 root root 4096 Nov 20 16:41 ../
-rw-r--r-- 1 root root 2053 Nov 20 16:51 ca.crt
-rw------- 1 root root 3243 Nov 20 16:45 ca.key
-rw-r--r-- 1 root root 41 Nov 20 17:01 ca.srl
-rw-r--r-- 1 root root 268 Nov 20 16:58 v3.ext
-rw-r--r-- 1 root root 2122 Nov 20 17:01 y73.harbor.com.crt
-rw-r--r-- 1 root root 1708 Nov 20 16:55 y73.harbor.com.csr
-rw------- 1 root root 3243 Nov 20 16:52 y73.harbor.com.key
5、安装harbor
修改配置文件引用证书
/apps/harbor/certs/y73.harbor.com.crt
/apps/harbor/certs/y73.harbor.com.key
# ./install.sh --help
Note: Please set hostname and other necessary attributes in harbor.yml first. DO NOT use localhost or 127.0.0.1 for hostname, because Harbor needs to be accessed by external clients.
Please set --with-notary if needs enable Notary in Harbor, and set ui_url_protocol/ssl_cert/ssl_cert_key in harbor.yml bacause notary must run under https.
Please set --with-trivy if needs enable Trivy in Harbor
Please set --with-chartmuseum if needs enable Chartmuseum in Harbor
# ./install.sh --with-trivy --with-chartmuseum
查看网站
6、部署节点安装docker并同步harbor crt证书
在部署节点k8s-deploy创建目录
# mkdir /etc/docker/certs.d/y73.harbor.com -p
在服务器把公钥拷贝到客户端
# pwd
/apps/harbor/certs
# scp y73.harbor.com.crt 172.21.90.189:/etc/docker/certs.d/y73.harbor.com
# systemctl restart docker 重启docker
之后在客户端才能成功登录harbor仓库
# docker login y73.harbor.com
Username: admin
Password:
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
测试push镜像到harbor
# docker pull alpine
# docker tag alpine:latest y73.harbor.com/baseimages/alpine:latest #要先创建baseimages项目
# docker push y73.harbor.com/baseimages/alpine:latest
The push refers to repository [y73.harbor.com/baseimages/alpine]
8d3ac3489996: Pushed
latest: digest: sha256:e7d88de73db3d3fd9b2d63aa7f447a10fd0220b7cbf39803c803f2af9ba256b3 size: 528
二、ansible部署k8s集群
1、基础环境准备
root@k8s-deploy:~# apt install ansible -y
root@k8s-deploy:~# ssh-keygen
root@k8s-deploy:~# apt install sshpass #安装sshpass命令用于同步公钥到各k8s服务器
root@k8s-deploy:~# vim key.sh
#!/bin/bash
IP="
172.21.90.204
172.21.90.203
172.21.90.202
172.21.90.206
172.21.90.207
172.21.90.209
172.21.90.208
"
for node in ${IP};do
sshpass -p Abcd1234 ssh-copy-id ${node} -o StrictHostKeyChecking=no
echo "${node} 密钥copy完成"
ssh ${node} ln -sv /usr/bin/python3 /usr/bin/python #ansible需要python环境,为每个节点创建软连接
echo "${node} /usr/bin/python3 软连接创建完成"
done
root@k8s-deploy:~# bash key.sh #执行脚本同步
root@k8s-deploy:~# ln -sv /usr/bin/python3 /usr/bin/python #ansible需要python环境,为每个节点创建软连接
'/usr/bin/python' -> '/usr/bin/python3'
验证可以免密钥登录其它服务器
2、下载kubeasz项目及组件
步骤参考githubkubeasz/00-planning_and_overall_intro.md at master · easzlab/kubeasz (github.com)
root@k8s-deploy:~# apt install git
root@k8s-deploy:~# export release=3.3.1
root@k8s-deploy:~# wget https://github.com/easzlab/kubeasz/releases/download/${release}/ezdown
root@k8s-deploy:~# chmod +x ./ezdown
root@k8s-deploy:~# ./ezdown -D #下载kubeasz代码、二进制、默认容器镜像
上述脚本运行成功后,所有文件(kubeasz代码、二进制、离线镜像)均已整理好放入目录/etc/kubeasz
3、生产并自定义hosts文件
root@k8s-deploy:~# cd /etc/kubeasz/
root@k8s-deploy:/etc/kubeasz# ./ezctl new k8s-cluster01
2022-11-25 15:48:10 DEBUG generate custom cluster files in /etc/kubeasz/clusters/k8s-cluster01
2022-11-25 15:48:10 DEBUG set versions
2022-11-25 15:48:11 DEBUG cluster k8s-cluster01: files successfully created.
2022-11-25 15:48:11 INFO next steps 1: to config '/etc/kubeasz/clusters/k8s-cluster01/hosts'
2022-11-25 15:48:11 INFO next steps 2: to config '/etc/kubeasz/clusters/k8s-cluster01/config.yml'
然后根据提示配置’/etc/kubeasz/clusters/k8s-01/hosts’ 和 ‘/etc/kubeasz/clusters/k8s-01/config.yml’:根据前面节点规划修改hosts 文件和其他集群层面的主要配置选项;其他集群组件等配置项可以在config.yml 文件中修改。
1、编辑hosts文件
指定etcd节点、master节点、node节点、VIP、运行时、网络组件类型、service IP和pod IP范围等配置信息
root@k8s-deploy:~# cd /etc/kubeasz/clusters/k8s-cluster01
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster01# pwd
/etc/kubeasz/clusters/k8s-cluster01
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster01# vim hosts
以下为hosts文件
# 'etcd' cluster should have odd member(s) (1,3,5,...)
[etcd]
172.21.90.204
172.21.90.203
172.21.90.202
# master node(s)
[kube_master]
172.21.90.209
172.21.90.208
# work node(s)
[kube_node]
172.21.90.206
172.21.90.207
# [optional] harbor server, a private docker registry
# 'NEW_INSTALL': 'true' to install a harbor server; 'false' to integrate with existed one
[harbor]
#192.168.1.8 NEW_INSTALL=false
# [optional] loadbalance for accessing k8s from outside
[ex_lb]
#192.168.1.6 LB_ROLE=backup EX_APISERVER_VIP=192.168.1.250 EX_APISERVER_PORT=8443
#192.168.1.7 LB_ROLE=master EX_APISERVER_VIP=192.168.1.250 EX_APISERVER_PORT=8443
# [optional] ntp server for the cluster
[chrony]
#192.168.1.1
[all:vars]
# --------- Main Variables ---------------
# Secure port for apiservers
SECURE_PORT="6443"
# Cluster container-runtime supported: docker, containerd
# if k8s version >= 1.24, docker is not supported
CONTAINER_RUNTIME="containerd"
# Network plugins supported: calico, flannel, kube-router, cilium, kube-ovn
CLUSTER_NETWORK="calico"
# Service proxy mode of kube-proxy: 'iptables' or 'ipvs'
PROXY_MODE="ipvs"
# K8S Service CIDR, not overlap with node(host) networking
SERVICE_CIDR="10.100.0.0/16"
# Cluster CIDR (Pod CIDR), not overlap with node(host) networking
CLUSTER_CIDR="10.200.0.0/16/16"
# NodePort Range
NODE_PORT_RANGE="30000-60000"
# Cluster DNS Domain
CLUSTER_DNS_DOMAIN="y73.local"
# -------- Additional Variables (don't change the default value right now) ---
# Binaries Directory
bin_dir="/usr/local/bin"
# Deploy Directory (kubeasz workspace)
base_dir="/etc/kubeasz"
# Directory for a specific cluster
cluster_dir="{{ base_dir }}/clusters/k8s-cluster01"
# CA and other components cert/key Directory
ca_dir="/etc/kubernetes/ssl"
2、编辑config.yml文件
root@k8s-deploy:~# vim /etc/kubeasz/clusters/k8s-cluster01/config.yml
############################
# prepare
############################
# 可选离线安装系统软件包 (offline|online)
INSTALL_SOURCE: "online"
# 可选进行系统安全加固 github.com/dev-sec/ansible-collection-hardening
OS_HARDEN: false
############################
# role:deploy
############################
# default: ca will expire in 100 years
# default: certs issued by the ca will expire in 50 years
CA_EXPIRY: "876000h"
CERT_EXPIRY: "438000h"
# kubeconfig 配置参数
CLUSTER_NAME: "cluster1"
CONTEXT_NAME: "context-{{ CLUSTER_NAME }}"
# k8s version
K8S_VER: "1.24.2"
############################
# role:etcd
############################
# 设置不同的wal目录,可以避免磁盘io竞争,提高性能
ETCD_DATA_DIR: "/var/lib/etcd"
ETCD_WAL_DIR: ""
############################
# role:runtime [containerd,docker]
############################
# ------------------------------------------- containerd
# [.]启用容器仓库镜像
ENABLE_MIRROR_REGISTRY: true
# [containerd]基础容器镜像
#SANDBOX_IMAGE: "easzlab.io.local:5000/easzlab/pause:3.7"
SANDBOX_IMAGE: "y73.harbor.com/baseimages/pause:3.7"
# [containerd]容器持久化存储目录
CONTAINERD_STORAGE_DIR: "/var/lib/containerd"
# ------------------------------------------- docker
# [docker]容器存储目录
DOCKER_STORAGE_DIR: "/var/lib/docker"
# [docker]开启Restful API
ENABLE_REMOTE_API: false
# [docker]信任的HTTP仓库
INSECURE_REG: '["http://easzlab.io.local:5000"]'
############################
# role:kube-master
############################
# k8s 集群 master 节点证书配置,可以添加多个ip和域名(比如增加公网ip和域名)
MASTER_CERT_HOSTS:
- "10.1.1.1"
- "k8s.easzlab.io"
#- "www.test.com"
# node 节点上 pod 网段掩码长度(决定每个节点最多能分配的pod ip地址)
# 如果flannel 使用 --kube-subnet-mgr 参数,那么它将读取该设置为每个节点分配pod网段
# https://github.com/coreos/flannel/issues/847
NODE_CIDR_LEN: 24
############################
# role:kube-node
############################
# Kubelet 根目录
KUBELET_ROOT_DIR: "/var/lib/kubelet"
# node节点最大pod 数
MAX_PODS: 500
# 配置为kube组件(kubelet,kube-proxy,dockerd等)预留的资源量
# 数值设置详见templates/kubelet-config.yaml.j2
KUBE_RESERVED_ENABLED: "no"
# k8s 官方不建议草率开启 system-reserved, 除非你基于长期监控,了解系统的资源占用状况;
# 并且随着系统运行时间,需要适当增加资源预留,数值设置详见templates/kubelet-config.yaml.j2
# 系统预留设置基于 4c/8g 虚机,最小化安装系统服务,如果使用高性能物理机可以适当增加预留
# 另外,集群安装时候apiserver等资源占用会短时较大,建议至少预留1g内存
SYS_RESERVED_ENABLED: "no"
############################
# role:network [flannel,calico,cilium,kube-ovn,kube-router]
############################
# ------------------------------------------- flannel
# [flannel]设置flannel 后端"host-gw","vxlan"等
FLANNEL_BACKEND: "vxlan"
DIRECT_ROUTING: false
# [flannel] flanneld_image: "quay.io/coreos/flannel:v0.10.0-amd64"
flannelVer: "v0.15.1"
flanneld_image: "easzlab.io.local:5000/easzlab/flannel:{{ flannelVer }}"
# ------------------------------------------- calico
# [calico]设置 CALICO_IPV4POOL_IPIP=“off”,可以提高网络性能,条件限制详见 docs/setup/calico.md
CALICO_IPV4POOL_IPIP: "Always" #改为off,则需要node(node1、node2、node3……)节点必须在同一个子网
# [calico]设置 calico-node使用的host IP,bgp邻居通过该地址建立,可手工指定也可以自动发现
IP_AUTODETECTION_METHOD: "can-reach={{ groups['kube_master'][0] }}"
# [calico]设置calico 网络 backend: brid, vxlan, none
CALICO_NETWORKING_BACKEND: "brid"
# [calico]设置calico 是否使用route reflectors
# 如果集群规模超过50个节点,建议启用该特性
CALICO_RR_ENABLED: false
# CALICO_RR_NODES 配置route reflectors的节点,如果未设置默认使用集群master节点
# CALICO_RR_NODES: ["192.168.1.1", "192.168.1.2"]
CALICO_RR_NODES: []
# [calico]更新支持calico 版本: [v3.3.x] [v3.4.x] [v3.8.x] [v3.15.x]
calico_ver: "v3.19.4"
# [calico]calico 主版本
calico_ver_main: "{{ calico_ver.split('.')[0] }}.{{ calico_ver.split('.')[1] }}"
# ------------------------------------------- cilium
# [cilium]镜像版本
cilium_ver: "1.11.6"
cilium_connectivity_check: true
cilium_hubble_enabled: false
cilium_hubble_ui_enabled: false
# ------------------------------------------- kube-ovn
# [kube-ovn]选择 OVN DB and OVN Control Plane 节点,默认为第一个master节点
OVN_DB_NODE: "{{ groups['kube_master'][0] }}"
# [kube-ovn]离线镜像tar包
kube_ovn_ver: "v1.5.3"
# ------------------------------------------- kube-router
# [kube-router]公有云上存在限制,一般需要始终开启 ipinip;自有环境可以设置为 "subnet"
OVERLAY_TYPE: "full"
# [kube-router]NetworkPolicy 支持开关
FIREWALL_ENABLE: true
# [kube-router]kube-router 镜像版本
kube_router_ver: "v0.3.1"
busybox_ver: "1.28.4"
############################
# role:cluster-addon
############################
# coredns 自动安装
dns_install: "no"
corednsVer: "1.9.3"
ENABLE_LOCAL_DNS_CACHE: false #在node本地开启缓存,node会找它做域名解析,有缓存就返回,没有缓存就转给coredns
dnsNodeCacheVer: "1.21.1"
# 设置 local dns cache 地址
LOCAL_DNS_CACHE: "169.254.20.10"
# metric server 自动安装
metricsserver_install: "no"
metricsVer: "v0.5.2"
# dashboard 自动安装
dashboard_install: "no"
dashboardVer: "v2.5.1"
dashboardMetricsScraperVer: "v1.0.8"
# prometheus 自动安装
prom_install: "no"
prom_namespace: "monitor"
prom_chart_ver: "35.5.1"
# nfs-provisioner 自动安装
nfs_provisioner_install: "no"
nfs_provisioner_namespace: "kube-system"
nfs_provisioner_ver: "v4.0.2"
nfs_storage_class: "managed-nfs-storage"
nfs_server: "192.168.1.10"
nfs_path: "/data/nfs"
# network-check 自动安装
network_check_enabled: false
network_check_schedule: "*/5 * * * *"
############################
# role:harbor
############################
# harbor version,完整版本号
HARBOR_VER: "v2.1.3"
HARBOR_DOMAIN: "harbor.easzlab.io.local"
HARBOR_TLS_PORT: 8443
# if set 'false', you need to put certs named harbor.pem and harbor-key.pem in directory 'down'
HARBOR_SELF_SIGNED_CERT: true
# install extra component
HARBOR_WITH_NOTARY: false
HARBOR_WITH_TRIVY: false
HARBOR_WITH_CLAIR: false
HARBOR_WITH_CHARTMUSEUM: true
4、部署k8s集群
通过ansible脚本初始化环境及部署k8s高可用集群
1、环境初始化
root@k8s-deploy:/etc/kubeasz# ./ezctl --help
Usage: ezctl COMMAND [args]
-------------------------------------------------------------------------------------
Cluster setups:
list to list all of the managed clusters
checkout <cluster> to switch default kubeconfig of the cluster
new <cluster> to start a new k8s deploy with name 'cluster'
setup <cluster> <step> to setup a cluster, also supporting a step-by-step way
start <cluster> to start all of the k8s services stopped by 'ezctl stop'
stop <cluster> to stop all of the k8s services temporarily
upgrade <cluster> to upgrade the k8s cluster
destroy <cluster> to destroy the k8s cluster
backup <cluster> to backup the cluster state (etcd snapshot)
restore <cluster> to restore the cluster state from backups
start-aio to quickly setup an all-in-one cluster with 'default' settings
Cluster ops:
add-etcd <cluster> <ip> to add a etcd-node to the etcd cluster
add-master <cluster> <ip> to add a master node to the k8s cluster
add-node <cluster> <ip> to add a work node to the k8s cluster
del-etcd <cluster> <ip> to delete a etcd-node from the etcd cluster
del-master <cluster> <ip> to delete a master node from the k8s cluster
del-node <cluster> <ip> to delete a work node from the k8s cluster
Extra operation:
kcfg-adm <cluster> <args> to manage client kubeconfig of the k8s cluster
Use "ezctl help <command>" for more information about a given command.
准备CA和基础环境设置
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster01 01
2、部署etcd集群
可更改启动脚本路径及版本等自定义配置
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster01 02
各etcd服务器验证etcd服务
root@k8s-etcd1:~# export NODE_IPS="192.168.0.116 192.168.0.117 192.168.0.118"
root@k8s-etcd1:~# for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem endpoint health; done
https://192.168.0.116:2379 is healthy: successfully committed proposal: took = 11.389437ms
https://192.168.0.117:2379 is healthy: successfully committed proposal: took = 11.717108ms
https://192.168.0.118:2379 is healthy: successfully committed proposal: took = 13.212932ms
3、部署运行时
master和node节点都要同时安装运行时(docker或者containerd),可以自行使用部署工具匹配安装、yum安装、或者自行使用二进制安装,因此该步骤为可选步骤。
# 验证基础容器镜像
root@k8s-deploy:/etc/kubeasz# grep SANDBOX_IMAGE ./clusters/* -R
./clusters/k8s-cluster01/config.yml:#SANDBOX_IMAGE: "easzlab.io.local:5000/easzlab/pause:3.7"
./clusters/k8s-cluster01/config.yml:SANDBOX_IMAGE: "y73.harbor.com/baseimages/pause:3.7"
./clusters/k8s-cluster02/config.yml:SANDBOX_IMAGE: "easzlab.io.local:5000/easzlab/pause:3.7"
使用containerd需要修改镜像源为部署的harbor,修改ansible里的containerd的配置文件模板
root@k8s-deploy:/etc/kubeasz# vim roles/containerd/templates/config.toml.j2
#在147行下面添加
158 [plugins."io.containerd.grpc.v1.cri".registry.mirrors."y73.harbor.com"]
159 endpoint = ["https://y73.harbor.com"]
160 [plugins."io.containerd.grpc.v1.cri".registry.configs."y73.harbor.com".tls]
161 insecure_skip_verify = true
162 [plugins."io.containerd.grpc.v1.cri".registry.configs."y73.harbor.com".auth]
163 username = "admin"
164 password = "123456"
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster01 03
在master和node节点验证containerd能否从harbor下载镜像
root@k8s-master1:~# systemctl status containerd.service
# 查看配置文件也改好了
root@k8s-master1:~# cat /etc/containerd/config.toml | grep "y73.harbor.com"
sandbox_image = "y73.harbor.com/baseimages/pause:3.7"
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."y73.harbor.com"]
endpoint = ["https://y73.harbor.com"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."y73.harbor.com".tls]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."y73.harbor.com".auth]
# 下载镜像看看
root@k8s-master1:~# crictl pull y73.harbor.com/baseimages/alpine@sha256:c0d488a800e4127c334ad20d61d7bc21b4097540327217dfab52262adc02380c
Image is up to date for sha256:49176f190c7e9cdb51ac85ab6c6d5e4512352218190cd69b08e6fd803ffbf3da
root@k8s-master1:~# crictl images
IMAGE TAG IMAGE ID SIZE
y73.harbor.com/baseimages/alpine <none> 49176f190c7e9 3.37MB
4、部署master
可选更改启动脚本参数及路径等自定义功能
# 可自定义配置
root@k8s-deploy:/etc/kubeasz# vim roles/kube-master/tasks/main.yml
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster01 04
# 查看node。Ready说明master是就绪的,SchedulingDisabled表示它的调度被关闭了,这是因为master作为管理节点,不运行业务,pod也不会被调度到master上
root@k8s-deploy:/etc/kubeasz# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
192.168.0.110 Ready,SchedulingDisabled master 6h24m v1.24.2 192.168.0.110 <none> Ubuntu 20.04.3 LTS 5.4.0-81-generic containerd://1.6.4
192.168.0.111 Ready,SchedulingDisabled master 6h24m v1.24.2 192.168.0.111 <none> Ubuntu 20.04.3 LTS 5.4.0-81-generic containerd://1.6.4
# 查看pod
root@k8s-deploy:/etc/kubeasz# kubectl get pod -A -o wide
No resources found
5、部署node
可选更改启动脚本参数及路径等自定义功能
# 可自定义配置
root@k8s-deploy:/etc/kubeasz# vim roles/kube-node/tasks/
create-kubelet-kubeconfig.yml main.yml
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster01 05
# 查看node
root@k8s-deploy:/etc/kubeasz# kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.0.110 Ready,SchedulingDisabled master 6h46m v1.24.2
192.168.0.111 Ready,SchedulingDisabled master 6h46m v1.24.2
192.168.0.113 Ready node 9m46s v1.24.2
192.168.0.114 Ready node 9m46s v1.24.2
6、部署网络服务calico
将calico用到的4个镜像上传到harbor,并修改roles/calico/templates/calico-v3.19.yaml.j2
文件里的镜像源为harbor,实现标准化方便管理
# 修改镜像源
root@k8s-deploy:/etc/kubeasz# cat roles/calico/templates/calico-v3.19.yaml.j2 | grep image
image: y73.harbor.com/baseimages/calico-cni:v3.19.4
image: y73.harbor.com/baseimages/calico-pod2daemon-flexvol:v3.19.4
image: y73.harbor.com/baseimages/calico-node:v3.19.4
image: y73.harbor.com/baseimages/calico-kube-controllers:v3.19.4
# 上传镜像到harbor
root@k8s-deploy:/etc/kubeasz# docker tag easzlab.io.local:5000/calico/node:v3.19.4 y73.harbor.com/baseimages/calico-node:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker push y73.harbor.com/baseimages/calico-node:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker tag calico/pod2daemon-flexvol:v3.19.4 y73.harbor.com/baseimages/calico-pod2daemon-flexvol:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker push y73.harbor.com/baseimages/calico-pod2daemon-flexvol:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker tag calico/cni:v3.19.4 y73.harbor.com/baseimages/calico-cni:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker push y73.harbor.com/baseimages/calico-cni:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker tag calico/kube-controllers:v3.19.4 y73.harbor.com/baseimages/calico-kube-controllers:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker push y73.harbor.com/baseimages/calico-kube-controllers:v3.19.4
# 安装
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster01 06
# 验证节点都为running
root@k8s-deploy:/etc/kubeasz# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-77c59f4758-sg7hl 1/1 Running 11 (7m49s ago) 51m
kube-system calico-node-g56wb 1/1 Running 0 51m
kube-system calico-node-lf95c 1/1 Running 4 (39m ago) 51m
kube-system calico-node-t2bfx 1/1 Running 11 (5m37s ago) 51m
kube-system calico-node-v2dg7 1/1 Running 2 (48m ago) 51m
在master或者node节点验证calico
root@k8s-master1:~# calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 192.168.0.111 | node-to-node mesh | start | 16:40:32 | Passive |
| 192.168.0.113 | node-to-node mesh | up | 16:40:49 | Established |
| 192.168.0.114 | node-to-node mesh | up | 16:40:32 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
7、验证网络
创建pod验证跨主机通信
# 创建命名空间
root@k8s-deploy:~# kubectl create ns myserver
namespace/myserver created
# 创建pod
root@k8s-deploy:~# kubectl run net-test1 --image=alpine sleep 1000000 -n myserver
pod/net-test1 created
root@k8s-deploy:~# kubectl run net-test2 --image=alpine sleep 1000000 -n myserver
pod/net-test2 created
root@k8s-deploy:~# kubectl get pod -o wide -n myserver
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
net-test1 1/1 Running 0 32s 172.20.36.65 192.168.0.113 <none> <none>
net-test2 1/1 Running 0 25s 172.20.169.129 192.168.0.114 <none> <none>
#进入pod进行验证
root@k8s-deploy:~# kubectl -n myserver exec -it net-test1 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ #
/ # ping 110.242.68.66 #因为没有DNS解析,所以直接ping外网地址
PING 110.242.68.66 (110.242.68.66): 56 data bytes
64 bytes from 110.242.68.66: seq=0 ttl=50 time=29.780 ms
64 bytes from 110.242.68.66: seq=1 ttl=50 time=30.113 ms
64 bytes from 110.242.68.66: seq=2 ttl=50 time=28.620 ms
^C
/ # ping 172.20.169.129 #ping内网地址
PING 172.20.169.129 (172.20.169.129): 56 data bytes
64 bytes from 172.20.169.129: seq=0 ttl=62 time=0.913 ms
64 bytes from 172.20.169.129: seq=1 ttl=62 time=0.962 ms
64 bytes from 172.20.169.129: seq=2 ttl=62 time=1.071 ms
^C
8、负载均衡配置
Kubernetes kubeconfig配置文件详细解读_富士康质检员张全蛋的博客-CSDN博客_kubeconfig
在部署节点上的kubeconfig文件里,kubectl使用kubeconfig认证文件链接k8s集群。在部署节点k8s-deploy上的/root/.kube/config
文件里把认证地址的IP改成负载均衡的地址。
这里的负载均衡只负责转发请求到apiserver服务器,而且它也连不上etcd,由apiserver服务器来进行认证、鉴权、准入、执行操作等。
9、命令补全
# 当前环境生效
root@k8s-deploy:~# source <(kubectl completion bash)
# 开机生效,在末尾添加
root@k8s-deploy:~# vim /etc/profile
source <(kubectl completion bash)
5、集群节点伸缩管理
主要是添加master、添加node、删除master和删除node等节点管理和监控。
node节点有个负载均衡服务,目前配置文件如下,是自动生成的。添加master节点会推送新的配置文件
root@k8s-node2:~# systemctl status kube-lb.service
root@k8s-node1:~# vim /etc/kube-lb/conf/kube-lb.conf
user root;
worker_processes 1;
error_log /etc/kube-lb/logs/error.log warn;
events {
worker_connections 3000;
}
stream {
upstream backend {
server 192.168.0.110:6443 max_fails=2 fail_timeout=3s;
server 192.168.0.111:6443 max_fails=2 fail_timeout=3s;
}
server {
listen 127.0.0.1:6443;
proxy_connect_timeout 1s;
proxy_pass backend;
}
}
当前集群状态:
root@k8s-deploy:~# kubectl get node
NAME STATUS ROLES AGE VERSION
192.168.0.110 Ready,SchedulingDisabled master 22h v1.24.2
192.168.0.111 Ready,SchedulingDisabled master 22h v1.24.2
192.168.0.113 Ready node 15h v1.24.2
192.168.0.114 Ready node 15h v1.24.2
1、添加master节点
# 因为新添加的节点在“基础环境准备”步骤没做免密钥登录,所以需要把密钥拷过去。
root@k8s-deploy:/etc/kubeasz# sshpass -p 123456 ssh-copy-id 192.168.0.112 -o StrictHostKeyChecking=no
# 要在新节点修改hosts文件,因为使用的镜像源是自建harbor
root@k8s-master003:~# echo "172.21.90.219 y73.harbor.com" >> /etc/hosts
# 添加master节点
root@k8s-deploy:/etc/kubeasz# ./ezctl add-master k8s-cluster01 192.168.0.112
2、添加node节点
当前集群状态
root@k8s-deploy:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 29m v1.24.2
172.21.90.213 Ready,SchedulingDisabled master 29m v1.24.2
172.21.90.214 Ready node 27m v1.24.2
172.21.90.215 Ready node 27m v1.24.2
172.21.90.220 Ready,SchedulingDisabled master 12m v1.24.2
# 因为新添加的节点在“基础环境准备”步骤没做免密钥登录,所以需要把密钥拷过去。
root@k8s-deploy:/etc/kubeasz# sshpass -p Abcd1234 ssh-copy-id 172.21.90.221 -o StrictHostKeyChecking=no
# 要在新节点修改hosts文件,因为使用的镜像源是自建harbor
root@k8s-node003:~# echo "172.21.90.219 y73.harbor.com" >> /etc/hosts
# 添加node节点
root@k8s-deploy:/etc/kubeasz# ./ezctl add-node k8s-cluster01 172.21.90.221
3、验证当前节点
root@k8s-deploy:/etc/kubeasz# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 35m v1.24.2
172.21.90.213 Ready,SchedulingDisabled master 35m v1.24.2
172.21.90.214 Ready node 33m v1.24.2
172.21.90.215 Ready node 33m v1.24.2
172.21.90.220 Ready,SchedulingDisabled master 18m v1.24.2
172.21.90.221 Ready node 3m4s v1.24.2
4、验证calico状态
# 在master或者node节点操作
root@k8s-master002:~# calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 172.21.90.212 | node-to-node mesh | up | 09:04:02 | Established |
| 172.21.90.220 | node-to-node mesh | up | 09:04:01 | Established |
| 172.21.90.214 | node-to-node mesh | up | 09:04:01 | Established |
| 172.21.90.215 | node-to-node mesh | up | 09:04:12 | Established |
| 172.21.90.221 | node-to-node mesh | up | 09:16:13 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
5、验证node节点路由
root@k8s-master002:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.21.255.253 0.0.0.0 UG 100 0 0 eth0
10.200.6.128 172.21.90.215 255.255.255.192 UG 0 0 0 tunl0
10.200.8.64 172.21.90.220 255.255.255.192 UG 0 0 0 tunl0
10.200.121.64 172.21.90.212 255.255.255.192 UG 0 0 0 tunl0
10.200.150.192 172.21.90.214 255.255.255.192 UG 0 0 0 tunl0
10.200.167.128 0.0.0.0 255.255.255.192 U 0 0 0 *
10.200.196.192 172.21.90.221 255.255.255.192 UG 0 0 0 tunl0
172.21.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
172.21.255.253 0.0.0.0 255.255.255.255 UH 100 0 0 eth0
6、升级k8s
GitHub下载4个包kubernetes/CHANGELOG-1.24.md at master · kubernetes/kubernetes (github.com),解压后得到kubernetes/
目录
1、升级master节点
1、修改kube-lb.conf
在每个node节点修改负载均衡配置文件/etc/kube-lb/conf/kube-lb.conf
,先升级master001就注释掉对应的配置,此时node上的请求就不会转发到master001服务器的api-server上,master001就下线了,就不影响业务了。
# 修改配置文件,注释掉master001
root@k8s-node001:~# vim /etc/kube-lb/conf/kube-lb.conf
# 改好后重新加载配置文件
root@k8s-node001:~# systemctl reload kube-lb.service
2、master节点停止服务
在每个node节点修改后配置文件后,要在master001上关闭服务,因为这些服务此时还在调用当前版本k8s的二进制文件。需要关闭的进程如下:
- kube-apiserver.service
- kube-controller-manager.service
- kubelet.service
- kube-scheduler.service
- kube-proxy.service
root@k8s-master001:~# systemctl stop kube-apiserver.service kube-controller-manager.service kubelet.service kube-scheduler.service kube-proxy.service
3、替换二进制文件
在k8s-deploy端,把刚刚上传的4个包解压后,进入kubernetes/server/bin/
目录,把对应的二进制文件拷贝到master001的/usr/local/bin/
目录下
# 部署端拷贝二进制文件
root@k8s-deploy:/usr/local/src# cd kubernetes/server/bin/
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# scp kube-apiserver kube-controller-manager kubelet kube-proxy kube-scheduler kubectl 172.21.90.212:/usr/local/bin/
kube-apiserver 100% 120MB 188.7MB/s 00:00
kube-controller-manager 100% 110MB 199.1MB/s 00:00
kubelet 100% 111MB 190.7MB/s 00:00
kube-proxy 100% 40MB 189.9MB/s 00:00
kube-scheduler 100% 45MB 208.0MB/s 00:00
kubectl 100% 44MB 201.5MB/s 00:00
# 拷贝好后,在master001查看版本是否已更新,确认无误后启动这几个服务即可
root@k8s-master001:~# /usr/local/bin/kube-apiserver --version
Kubernetes v1.24.8
# 启动服务
root@k8s-master001:~# systemctl start kube-apiserver.service kube-controller-manager.service kubelet.service kube-scheduler.service kube-proxy.service
如果还要修改service文件,比如修改kube-proxy的service文件,要在“拷贝二进制文件后,启动服务前”这段时间进行,改好后在启动服务。
4、验证版本
启动好服务后,进行验证,发现master001版本已经变为1.24.8
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 5h20m v1.24.8
172.21.90.213 Ready,SchedulingDisabled master 5h20m v1.24.2
172.21.90.214 Ready node 5h18m v1.24.2
172.21.90.215 Ready node 5h18m v1.24.2
172.21.90.220 Ready,SchedulingDisabled master 5h2m v1.24.2
172.21.90.221 Ready node 4h47m v1.24.2
注:“VERSION”列里的v1.24.8是谁来决定的呢?是kubelet来决定的。node节点没有api server,但是只要把kubelet更新后,node节点的版本也会变成v1.24.8。
5、剩余master节点升级
在每个node节点修改负载均衡配置文件/etc/kube-lb/conf/kube-lb.conf
,把master001取消注释,剩余的进行注释,一次性把剩余节点升级。
# 修改配置文件,取消注释master001,把剩余节点注释
root@k8s-node002:~# vim /etc/kube-lb/conf/kube-lb.conf
# 改好后重新加载配置文件
root@k8s-node002:~# systemctl reload kube-lb.service
# 剩余master节点关掉服务
root@k8s-master002:~# systemctl stop kube-apiserver.service kube-controller-manager.service kubelet.service kube-scheduler.service kube-proxy.service
# 把二进制文件拷贝过去
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# scp kube-apiserver kube-controller-manager kubelet kube-proxy kube-scheduler kubectl 172.21.90.213:/usr/local/bin/
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# scp kube-apiserver kube-controller-manager kubelet kube-proxy kube-scheduler kubectl 172.21.90.220:/usr/local/bin/
#在剩余节点验证版本
root@k8s-master002:~# /usr/local/bin/kube-apiserver --version
Kubernetes v1.24.8
root@k8s-master003:~# /usr/local/bin/kube-apiserver --version
Kubernetes v1.24.8
# 剩余节点启动服务
root@k8s-master002:~# systemctl start kube-apiserver.service kube-controller-manager.service kubelet.service kube-scheduler.service kube-proxy.service
root@k8s-master003:~# systemctl start kube-apiserver.service kube-controller-manager.service kubelet.service kube-scheduler.service kube-proxy.service
# 验证集群状态,发现master版本都已经更新,但是node版本还是旧的
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 5h39m v1.24.8
172.21.90.213 Ready,SchedulingDisabled master 5h39m v1.24.8
172.21.90.214 Ready node 5h37m v1.24.2
172.21.90.215 Ready node 5h37m v1.24.2
172.21.90.220 Ready,SchedulingDisabled master 5h22m v1.24.8
172.21.90.221 Ready node 5h6m v1.24.2
2、升级node节点
1、驱逐node节点的业务pod
上一步更新好后发现master版本都已经更新,但是node版本还是旧的。因为master节点不运行pod,业务pod都运行在node节点上,所以要先把业务pod驱逐才可以升级node节点
# 驱逐pod
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl drain 172.21.90.214
node/172.21.90.214 cordoned
error: unable to drain node "172.21.90.214" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-lx27k, continuing command...
There are pending nodes to be drained:
172.21.90.214
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-lx27k
root@k8s-deploy:/usr/local/src/kubernetes/server/bin#
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl drain 172.21.90.214 --ignore-daemonsets
node/172.21.90.214 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-lx27k
evicting pod kube-system/calico-kube-controllers-77c59f4758-wmmqg
pod/calico-kube-controllers-77c59f4758-wmmqg evicted
node/172.21.90.214 drained
root@k8s-deploy:/usr/local/src/kubernetes/server/bin#
# 查看集群状态,发现node172.21.90.214的状态变成“SchedulingDisabled”,不会在往上调度pod了
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 5h49m v1.24.8
172.21.90.213 Ready,SchedulingDisabled master 5h49m v1.24.8
172.21.90.214 Ready,SchedulingDisabled node 5h47m v1.24.2
172.21.90.215 Ready node 5h47m v1.24.2
172.21.90.220 Ready,SchedulingDisabled master 5h32m v1.24.8
172.21.90.221 Ready node 5h17m v1.24.2
2、node节点停止服务
node001节点停止两个服务
- kubelet.service
- kube-proxy.service
root@k8s-node001:~# systemctl stop kubelet.service kube-proxy.service
3、替换二进制文件
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# scp kubelet kube-proxy kubectl 172.21.90.214:/usr/local/bin/
kubelet 100% 111MB 189.3MB/s 00:00
kube-proxy 100% 40MB 189.9MB/s 00:00
kubectl 100% 44MB 189.6MB/s 00:00
root@k8s-deploy:/usr/local/src/kubernetes/server/bin#
4、启动服务
root@k8s-node001:~# systemctl start kubelet.service kube-proxy.service
5、验证版本
发现node172.21.90.214版本也变为v1.24.8。但是状态还是“SchedulingDisabled”
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 6h3m v1.24.8
172.21.90.213 Ready,SchedulingDisabled master 6h3m v1.24.8
172.21.90.214 Ready,SchedulingDisabled node 6h1m v1.24.8
172.21.90.215 Ready node 6h1m v1.24.2
172.21.90.220 Ready,SchedulingDisabled master 5h46m v1.24.8
172.21.90.221 Ready node 5h31m v1.24.2
root@k8s-deploy:/usr/local/src/kubernetes/server/bin#
6、取消SchedulingDisabled状态
在部署节点取消node节点的“SchedulingDisabled”状态
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl uncordon 172.21.90.214
node/172.21.90.214 uncordoned
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 6h7m v1.24.8
172.21.90.213 Ready,SchedulingDisabled master 6h7m v1.24.8
172.21.90.214 Ready node 6h5m v1.24.8
172.21.90.215 Ready node 6h5m v1.24.2
172.21.90.220 Ready,SchedulingDisabled master 5h50m v1.24.8
172.21.90.221 Ready node 5h35m v1.24.2
7、剩余node节点升级
# 驱逐node节点172.21.90.215
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl drain 172.21.90.215
node/172.21.90.215 cordoned
error: unable to drain node "172.21.90.215" due to error:[cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-zt6w8, cannot delete Pods declare no controller (use --force to override): myserver/net-test1, myserver/net-test2], continuing command...
There are pending nodes to be drained:
172.21.90.215
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-zt6w8
cannot delete Pods declare no controller (use --force to override): myserver/net-test1, myserver/net-test2
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl drain 172.21.90.215 --ignore-daemonsets
node/172.21.90.215 already cordoned
error: unable to drain node "172.21.90.215" due to error:cannot delete Pods declare no controller (use --force to override): myserver/net-test1, myserver/net-test2, continuing command...
There are pending nodes to be drained:
172.21.90.215
cannot delete Pods declare no controller (use --force to override): myserver/net-test1, myserver/net-test2
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl drain 172.21.90.215 --ignore-daemonsets --force
node/172.21.90.215 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-zt6w8; deleting Pods that declare no controller: myserver/net-test1, myserver/net-test2
evicting pod myserver/net-test2
evicting pod myserver/net-test1
pod/net-test2 evicted
pod/net-test1 evicted
node/172.21.90.215 drained
# 驱逐node节点
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl drain 172.21.90.221
node/172.21.90.221 cordoned
error: unable to drain node "172.21.90.221" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-twp6j, continuing command...
There are pending nodes to be drained:
172.21.90.221
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-twp6j
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl drain 172.21.90.221 --ignore-daemonsets
node/172.21.90.221 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-twp6j
evicting pod kube-system/calico-kube-controllers-77c59f4758-whcv8
pod/calico-kube-controllers-77c59f4758-whcv8 evicted
node/172.21.90.221 drained
# 两个节点停止服务
root@k8s-node002:~# systemctl stop kubelet.service kube-proxy.service
root@k8s-node003:~# systemctl stop kubelet.service kube-proxy.service
# 替换二进制文件
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# scp kubelet kube-proxy kubectl 172.21.90.215:/usr/local/bin/
kubelet 100% 111MB 183.9MB/s 00:00
kube-proxy 100% 40MB 206.1MB/s 00:00
kubectl 100% 44MB 179.2MB/s 00:00
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# scp kubelet kube-proxy kubectl 172.21.90.221:/usr/local/bin/
kubelet 100% 111MB 186.2MB/s 00:00
kube-proxy 100% 40MB 179.8MB/s 00:00
kubectl 100% 44MB 189.8MB/s 00:00
root@k8s-deploy:/usr/local/src/kubernetes/server/bin#
# 启动服务
root@k8s-node002:~# systemctl start kubelet.service kube-proxy.service
root@k8s-node003:~# systemctl start kubelet.service kube-proxy.service
# 在部署节点取消node节点的“SchedulingDisabled”状态
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl uncordon 172.21.90.215
node/172.21.90.215 uncordoned
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl uncordon 172.21.90.221
node/172.21.90.221 uncordoned
# 验证版本
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 6h19m v1.24.8
172.21.90.213 Ready,SchedulingDisabled master 6h19m v1.24.8
172.21.90.214 Ready node 6h17m v1.24.8
172.21.90.215 Ready node 6h17m v1.24.8
172.21.90.220 Ready,SchedulingDisabled master 6h2m v1.24.8
172.21.90.221 Ready node 5h47m v1.24.8
3、修改kube-lb.conf
至此,当前集群的master节点和node节点都已经升级好了。在每个node节点修改负载均衡配置文件/etc/kube-lb/conf/kube-lb.conf
,把注释全部取消,不然请求不会转发给其他apiserver服务器
# 在每个node节点执行
root@k8s-node001:~# vim /etc/kube-lb/conf/kube-lb.conf
root@k8s-node001:~# systemctl reload kube-lb.service
4、拷贝二进制文件
此时已经升级完k8s了,但是还要把新的二进制文件拷贝到/etc/kubeasz/bin/
目录,这样以后添加节点就都是新版本的了
# 此时/etc/kubeasz/bin/目录里的二进制文件还是旧的
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# /etc/kubeasz/bin/kube-apiserver --version
Kubernetes v1.24.2
# 把新的二进制文件拷贝过去
root@k8s-deploy:/usr/local/src/kubernetes/server/bin# \cp kube-apiserver kube-controller-manager kubelet kube-proxy kube-scheduler kubectl /etc/kubeasz/bin/
# 在cp指令前面加反斜杠可以不弹出是否覆盖的询问而直接覆盖!
7、升级containerd
1、下载二进制包
root@k8s-deploy:/usr/local/src# wget https://github.com/containerd/containerd/releases/download/v1.6.10/containerd-1.6.10-linux-amd64.tar.gz
2、查看kubeasz里containerd二进制文件存放路径
root@k8s-deploy:/usr/local/src# vim /etc/kubeasz/roles/containerd/tasks/main.yml
发现存放在/etc/kubeasz/bin/containerd-bin/
目录下,而且还有runc和客户端命令行ctr、crictl,那么就把它们也升级
root@k8s-deploy:/usr/local/src# ll /etc/kubeasz/bin/containerd-bin/
total 180436
drwxr-xr-x 2 root root 4096 Jun 22 22:56 ./
drwxr-xr-x 3 root root 4096 Nov 27 16:09 ../
-rwxr-xr-x 1 root root 59584224 May 3 2022 containerd*
-rwxr-xr-x 1 root root 7389184 May 3 2022 containerd-shim*
-rwxr-xr-x 1 root root 9555968 May 3 2022 containerd-shim-runc-v1*
-rwxr-xr-x 1 root root 9580544 May 3 2022 containerd-shim-runc-v2*
-rwxr-xr-x 1 root root 25735456 May 3 2022 containerd-stress*
-rwxr-xr-x 1 root root 33228961 May 27 2022 crictl*
-rwxr-xr-x 1 root root 30256896 May 3 2022 ctr*
-rwxr-xr-x 1 root root 9419136 Jun 22 22:55 runc*
# 查看kubeasz里的runc版本为旧版本需要升级
root@k8s-deploy:/usr/local/src# /etc/kubeasz/bin/containerd-bin/runc --version
runc version 1.1.2
commit: v1.1.2-0-ga916309f
spec: 1.0.2-dev
go: go1.17.10
libseccomp: 2.5.3
3、解压containerd并上传runc和nerdctl
把containerd压缩包解压后得到bin/目录,进入并上传runc二进制包和nerdctl客户端命令行二进制文件
# 解压containerd
root@k8s-deploy:/usr/local/src# tar xvf containerd-1.6.10-linux-amd64.tar.gz
bin/
bin/ctr
bin/containerd
bin/containerd-shim
bin/containerd-stress
bin/containerd-shim-runc-v2
bin/containerd-shim-runc-v1
# 上传runc二进制文件,并给执行权限
root@k8s-deploy:/usr/local/src# cd bin/
root@k8s-deploy:/usr/local/src/bin# mv runc.amd64 runc
root@k8s-deploy:/usr/local/src/bin# chmod a+x runc
# 解压nerdctl二进制包,解压后包含3个二进制文件
root@k8s-deploy:/usr/local/src/bin# tar xvf nerdctl-1.0.0-linux-amd64.tar.gz
nerdctl
containerd-rootless-setuptool.sh
containerd-rootless.sh
root@k8s-deploy:/usr/local/src/bin# rm nerdctl-1.0.0-linux-amd64.tar.gz
# 解压crictl二进制包
root@k8s-deploy:/usr/local/src/bin# tar xvf crictl-v1.25.0-linux-amd64.tar.gz
crictl
root@k8s-deploy:/usr/local/src/bin# rm crictl-v1.25.0-linux-amd64.tar.gz
root@k8s-deploy:/usr/local/src/bin# ll
total 209360
drwxr-xr-x 2 root root 4096 Nov 27 23:56 ./
drwxr-xr-x 5 root root 4096 Nov 27 23:36 ../
-rwxr-xr-x 1 root root 51521464 Nov 15 02:20 containerd*
-rwxr-xr-x 1 root root 21562 Oct 21 21:10 containerd-rootless-setuptool.sh*
-rwxr-xr-x 1 root root 7032 Oct 21 21:10 containerd-rootless.sh*
-rwxr-xr-x 1 root root 7254016 Nov 15 02:20 containerd-shim*
-rwxr-xr-x 1 root root 9355264 Nov 15 02:20 containerd-shim-runc-v1*
-rwxr-xr-x 1 root root 9375744 Nov 15 02:20 containerd-shim-runc-v2*
-rwxr-xr-x 1 root root 22735192 Nov 15 02:20 containerd-stress*
-rwxr-xr-x 1 1001 docker 50311268 Aug 26 15:18 crictl*
-rwxr-xr-x 1 root root 26708024 Nov 15 02:20 ctr*
-rwxr-xr-x 1 root root 27639808 Oct 21 21:11 nerdctl*
-rwxr-xr-x 1 root root 9431456 Nov 27 23:40 runc*
# 查看版本是否正确
root@k8s-deploy:/usr/local/src/bin# ./crictl -v
crictl version v1.25.0
root@k8s-deploy:/usr/local/src/bin# ./nerdctl -v
nerdctl version 1.0.0
root@k8s-deploy:/usr/local/src/bin# ./runc -v
runc version 1.1.4
commit: v1.1.4-0-g5fd4c4d1
spec: 1.0.2-dev
go: go1.17.10
libseccomp: 2.5.4
4、替换kubeasz二进制文件
把kubeasz里的旧版本containerd和runc二进制文件替换掉
root@k8s-deploy:/usr/local/src/bin# \cp ./* /etc/kubeasz/bin/containerd-bin/
需要注意的是nerdctl命令行的3个二进制文件是我们后加的,而kubeasz的containerd的tasks/main.yml文件里并没有包含这3个二进制文件,所以我们要把这3个二进制文件加到tasks/main.yml文件里。
修改/etc/kubeasz/roles/containerd/tasks/main.yml,添加nerdctl的3个二进制文件:
- nerdctl
- containerd-rootless-setuptool.sh
- containerd-rootless.sh
root@k8s-deploy:~# vim /etc/kubeasz/roles/containerd/tasks/main.yml
这样以后添加node节点,使用的就是新版本containerd、crictl和runc,还能额外使用nerdctl命令了
5、替换node节点二进制文件
查看node节点的containerd的service文件调用的是哪个目录的二进制文件,发现它是放在了/usr/local/bin/
目录下
root@k8s-node003:~# find / -name containerd.service
root@k8s-node003:~# cat /etc/systemd/system/containerd.service | grep ExecStart
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
所以拷贝到node节点的/usr/local/bin/
目录下,所有node节点都要拷贝
root@k8s-deploy:/usr/local/src/bin# scp ./* 172.21.90.214:/usr/local/bin
scp: /usr/local/bin/containerd: Text file busy
containerd-rootless-setuptool.sh 100% 21KB 29.9MB/s 00:00
containerd-rootless.sh 100% 7032 24.5MB/s 00:00
containerd-shim 100% 7084KB 148.4MB/s 00:00
containerd-shim-runc-v1 100% 9136KB 174.6MB/s 00:00
scp: /usr/local/bin/containerd-shim-runc-v2: Text file busy
containerd-stress 100% 22MB 188.6MB/s 00:00
crictl 100% 48MB 183.1MB/s 00:00
ctr 100% 25MB 205.6MB/s 00:00
nerdctl 100% 26MB 201.2MB/s 00:00
runc 100% 9210KB 178.4MB/s 00:00
发现runc无法拷贝,因为node节点的runc正在被调用,在node节点查看这个进程是个容器
root@k8s-node003:~# ps -ef | grep shim
root 4913 1 0 Nov27 ? 00:00:28 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id 3b3aa3415a0e68c69ca05c212bbaa554130bce6e478e1dbc7f64d998ae3890b5 -address /run/containerd/containerd.sock
root 219527 123018 0 00:21 pts/0 00:00:00 grep --color=auto shim
如果要升级runc,就在node节点把这个容器停掉。但是停掉后就又被k8s启动了。
root@k8s-node003:~# crictl stop a9faef1529a6d
a9faef1529a6d
root@k8s-node003:~# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
c45e1edafd879 172a034f72979 4 seconds ago Running calico-node 1 3b3aa3415a0e6 calico-node-twp6j
这是就要把node节点的kubelet停掉。kubelet停掉,node也就接收不到k8s的指令,也就无法在启动容器了。
root@k8s-node003:~# systemctl stop kubelet.service
root@k8s-node003:~# crictl ps
c45e1edafd879 172a034f72979 5 minutes ago Running calico-node 1 3b3aa3415a0e6 calico-node-twp6j
root@k8s-node003:~# crictl stop c45e1edafd879
c45e1edafd879
# 发现容器已经停掉了
root@k8s-node003:~# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
这时再次拷贝二进制文件,发现还是在被调用
root@k8s-deploy:/usr/local/src/bin# scp ./* 172.21.90.221:/usr/local/bin
那么现在就直接停掉这3个服务,再重启服务器,达到纯净环境
root@k8s-node003:~# systemctl disable kubelet.service kube-proxy.service containerd.service
Removed /etc/systemd/system/multi-user.target.wants/kube-proxy.service.
Removed /etc/systemd/system/multi-user.target.wants/containerd.service.
Removed /etc/systemd/system/multi-user.target.wants/kubelet.service.
root@k8s-node003:~# reboot
再次拷贝成功
root@k8s-deploy:/usr/local/src/bin# scp ./* 172.21.90.221:/usr/local/bin
最后把停止的3个服务重新启动
root@k8s-node003:~# systemctl start kubelet.service kube-proxy.service containerd.service
root@k8s-node003:~# systemctl enable kubelet.service kube-proxy.service containerd.service
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kube-proxy.service → /etc/systemd/system/kube-proxy.service.
Created symlink /etc/systemd/system/multi-user.target.wants/containerd.service → /etc/systemd/system/containerd.service.
这样就完成了升级,在部署节点验证
6、验证版本
root@k8s-deploy:~# kubectl get node -o wide
7、剩余node节点升级
直接停止这3个服务,并重启服务器
root@k8s-node001:~# systemctl disable kubelet.service kube-proxy.service containerd.service
root@k8s-node001:~# reboot
# 拷贝二进制文件成功
root@k8s-deploy:/usr/local/src/bin# scp ./* 172.21.90.214:/usr/local/bin
root@k8s-deploy:/usr/local/src/bin# scp ./* 172.21.90.215:/usr/local/bin
在剩余node节点重启服务
root@k8s-node001:~# systemctl start kubelet.service kube-proxy.service containerd.service
root@k8s-node001:~# systemctl enable kubelet.service kube-proxy.service containerd.service
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kube-proxy.service → /etc/systemd/system/kube-proxy.service.
Created symlink /etc/systemd/system/multi-user.target.wants/containerd.service → /etc/systemd/system/containerd.service.
查看发现所有node节点的containerd都已经升级,但是master节点还没升级
root@k8s-deploy:~# kubectl get node -o wide
8、master节点升级
步骤和node节点一样,停止服务并重启。
注意:在生产环境不能直接停止服务。要先驱逐pod,再将服务停止或者重启服务器,然后替换二进制再启动服务
如何驱逐pod,参考 “6、升级k8s -----> 2、升级node节点 -----> 1、驱逐node节点的业务pod”
三、DNS服务
DNS组件历史版本有skydns、kube-dns和coredns 三个。k8s 1.3版本之前使用 skydns,之后的版本到1.17及之间的版本使用kube-dns,目前主要使用coredns, DNS组件用于解析k8s集群中service name所对应得到IP地址。
1、域名解析流程
niginx-pod接收到用户的请求,它想要把动态请求转发给tomcat-pod。所以nginx pod先找coredns的service,coredns-service把请求转给coredns-pod(可以是多副本)。coredns-pod负责域名解析,它去找apiserver(即kuberbetes),在内部通过SERVICE NAME(kubernetes)找到CLUSTER-IP(10.100.0.1)。这个IP:10.100.0.1就是本实验的apiserver地址。
由apiserver再去etcd查询数据(数据都在etcd里面),将查询到的数据再返回到coredns-pod,coredns-pod本地做缓存后,再把数据返回给nginx-pod。这样nginx就知道tomcat的service地址了,然后tomcat-service再把请求转发给tomcat-pod。
**有一个问题:那么nginx怎么找到coredns的service地址呢?**其实在创建pod的时候已经内置进去了。进入一个pod查看/etc/resolv.conf
可以看到此处获取的IP和我的SERVICE _CIDR="10.100.0.0/16"不是一个网段的,那是因为我的/etc/kubeasz/clusters/k8s-cluster01/config.yml
文件里面"ENABLE_LOCAL_DNS_CACHE: true"设置本地dns缓存没有关闭,将其改为false,并通过增删节点来更新配置:
# 删除node3节点
root@k8s-deploy:~# ezctl del-node k8s-cluster01 172.21.90.221
# 查看当前集群nodes,发现已经没有node3
root@k8s-deploy:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.21.90.212 Ready,SchedulingDisabled master 3d17h v1.24.8
172.21.90.213 Ready,SchedulingDisabled master 3d17h v1.24.8
172.21.90.214 Ready node 3d17h v1.24.8
172.21.90.215 Ready node 3d17h v1.24.8
172.21.90.220 Ready,SchedulingDisabled master 3d17h v1.24.8
# 查看当前集群pod,发现原来在node3上面的podnet-test1已经被删除
root@k8s-deploy:~# kubectl get pod -o wide -n myserver
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
linux73-nginx-deployment-cfdc998c-2kgrl 1/1 Running 4 (81m ago) 40h 10.200.150.201 172.21.90.214 <none> <none>
linux73-nginx-deployment-cfdc998c-2ldkp 1/1 Running 4 (81m ago) 40h 10.200.6.147 172.21.90.215 <none> <none>
linux73-nginx-deployment-cfdc998c-gtc97 1/1 Running 4 (81m ago) 40h 10.200.6.146 172.21.90.215 <none> <none>
linux73-nginx-deployment-cfdc998c-zjh8b 1/1 Running 0 4m39s 10.200.150.204 172.21.90.214 <none> <none>
net-test2 1/1 Running 4 (81m ago) 40h 10.200.6.148 172.21.90.215 <none> <none>
net-test3 1/1 Running 2 (81m ago) 17h 10.200.150.202 172.21.90.214 <none> <none>
net-test5 1/1 Running 0 21m 10.200.150.203 172.21.90.214 <none> <none>
# 再次添加node3节点
root@k8s-deploy:~# ezctl add-node k8s-cluster01 172.21.90.221
root@k8s-deploy:~# kubectl get pod -n myserver -o wide
#创建pod并进入查看dns
root@k8s-deploy:~# kubectl run net-test1 --image=alpine sleep 1000000 -n myserver
root@k8s-deploy:~# kubectl exec -it net-test1 sh -n myserver
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # cat /etc/resolv.conf
search myserver.svc.y73.local svc.y73.local y73.local
nameserver 10.100.0.2
options ndots:5
/ # ^C
之后按照同样方法把另外两个node节点删除再添加。
2、修改模板文件
使用模板安装deployment/coredns.yaml.sed at master · coredns/deployment (github.com),或者使用前面官方二进制升级包解压后的/usr/local/src/kubernetes/cluster/addons/dns/coredns/目录里的文件
root@k8s-deploy:~# cp /usr/local/src/kubernetes/cluster/addons/dns/coredns/coredns.yaml.base /yaml/
root@k8s-deploy:~# cd /yaml/
root@k8s-deploy:/yaml# mv coredns.yaml.base coredns.yaml
- 修改77行为y73.local(在文件
/etc/kubeasz/clusters/k8s-cluster01/hosts
里的CLUSTER_DNS_DOMAIN=“y73.local”)
- 修改146行的资源限制,并新增cpu限制为200毫核(生产环境memory为4Gi,cpu为2或者4核)
- 修改213行clusterIP,ip地址在pod里面获得
- 修改142行的镜像地址为harbor地址y73.harbor.com/baseimages/coredns:1.9.3
# 先从docker官方下载镜像
root@k8s-deploy:~# docker pull coredns/coredns:1.9.3
1.9.3: Pulling from coredns/coredns
Digest: sha256:8e352a029d304ca7431c6507b56800636c321cb52289686a581ab70aaa8a2e2a
Status: Image is up to date for coredns/coredns:1.9.3
docker.io/coredns/coredns:1.9.3
# 上传镜像到harbor
root@k8s-deploy:~# docker tag docker.io/coredns/coredns:1.9.3 y73.harbor.com/baseimages/coredns:1.9.3
root@k8s-deploy:~# docker push y73.harbor.com/baseimages/coredns:1.9.3
The push refers to repository [y73.harbor.com/baseimages/coredns]
df1818f16337: Pushed
256bc5c338a6: Pushed
1.9.3: digest: sha256:bdb36ee882c13135669cfc2bb91c808a33926ad1a411fee07bd2dc344bb8f782 size: 739
- 当k8s集群里的业务出现访问慢、登录慢、查询慢、提交订单也慢等问题时。此时请求量很大,需要反反复复做域名解析,就需要考虑调整coredns的资源限制和多开副本(优先开多副本,默认以轮询方式解析)
root@k8s-deploy:/yaml# vim coredns-v1.9.3.yaml
root@k8s-deploy:/yaml# kubectl apply -f coredns-v1.9.3.yaml
root@k8s-deploy:/yaml# kubectl get pod -A | grep core
kube-system coredns-79f868bf5c-bsf5p 1/1 Running 0 56m
kube-system coredns-79f868bf5c-qdq2t 1/1 Running 0 28s
3、安装coredns
# 安装
root@k8s-deploy:/yaml# kubectl apply -f coredns.yaml
serviceaccount/coredns created
clusterrole.rbac.authorization.k8s.io/system:coredns created
clusterrolebinding.rbac.authorization.k8s.io/system:coredns created
configmap/coredns created
deployment.apps/coredns created
service/kube-dns created
# 查看pod,发现coredns pod已经创建
root@k8s-deploy:/yaml# kubectl get pod -n kube-system -o wide | grep core
coredns-79f868bf5c-bsf5p 1/1 Running 0 70s 10.200.6.129 172.21.90.215 <none> <none>
# 查看service,发现coredns的service IP为10.100.0.2,它的名称为kube-dns
root@k8s-deploy:/yaml# kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 3d18h
kube-system kube-dns ClusterIP 10.100.0.2 <none> 53/UDP,53/TCP,9153/TCP 2m10s
myserver linux73-nginx-service NodePort 10.100.49.222 <none> 80:30004/TCP,443:30443/TCP 41h
进入pod发现外网不通,这是因为这些没有控制器的pod,通过备份恢复后会出现这种问题,此时重启这个pod即可
新建一个pod测试
4、核心配置文件
以下是官方提供的默认插件,如需开启其它插件可参考官网Plugins (coredns.io)
# 查看集群的configmap
root@k8s-deploy:/yaml# kubectl get configmaps -A
NAMESPACE NAME DATA AGE
default kube-root-ca.crt 1 3d22h
kube-node-lease kube-root-ca.crt 1 3d22h
kube-public kube-root-ca.crt 1 3d22h
kube-system calico-config 8 3d21h
kube-system coredns 1 3h49m
kube-system extension-apiserver-authentication 6 3d22h
kube-system kube-root-ca.crt 1 3d22h
myserver kube-root-ca.crt 1 3d21h
# 编辑configmap
root@k8s-deploy:/yaml# kubectl edit configmaps coredns -n kube-system
参数 | 描述 |
---|---|
errors | 错误信息到标准输出。 |
health | CoreDNS自身健康状态报告,默认监听端口8080,一般用来做健康检查。您可以通过http://localhost:8080/health 获取健康状态。 |
ready | CoreDNS插件状态报告,默认监听端口8181,一般用来做可读性检查。可以通过http://localhost:8181/ready 获取可读状态。当所有插件都运行后,ready状态为200。 |
kubernetes | CoreDNS Kubernetes插件,提供集群内服务解析能力。CoreDNS架构基于kubernetes service name 进行dns查询并返回查询记录给客户端。 |
prometheus | CoreDNS自身metrics数据接口。可以通过http://localhost:9153/metrics 获取prometheus格式(key-value)的监控数据。 |
forward(或proxy) | 将域名查询请求转到预定义的DNS服务器。默认配置中,不是kubernetes集群内的其它任何域名,都将转发到预定义的解析器(/etc/resolv.conf)中。默认使用宿主机的/etc/resolv.conf配置,可以直接使用8.8.8.8。 |
cache | 启用service解析缓存,单位为秒。 |
loop | 检测域名解析是否有死循环,如coredns转发给内网dns服务器,而内网dns服务器又转发给coredns,如果发现解析是死循环,则强制中止CoreDNS进程(kubernetes会重建)。 |
reload | 检测corefile是否更改,在重新编辑configmap配置后,默认2分钟后会自动加载生效。 |
loadbalance | 轮询dns域名解析,如果一个域名存在多个记录则轮询解析。 |
四、etcd的备份和恢复-基于镜像
etcd是CoreOS团队于2013年6月发起的开源项目,它的目标是构建一个高可用的分布式键值(key-value)数据库。etcd内部采用raft协议作为-致性算法,etcd基 于Go语言实现。
etcd具有下面这些属性:
- 完全复制:集群中的每个节点都可以使用完整的存档
- 高可用性: Etcd可用 于避免硬件的单点故障或网络问题
- 一致性:每次读取都会返回跨多主机的最新写入
- 简单:包括一个定义良好、面向用户的API (gRPC)
- 安全:实现了带有可选的客户端证书身份验证的自动化TLS
- 快速:每秒10000次写入的基准速度
- 可靠:使用Raft算法实现了存储的合理分布Etcd的工作原理
1、service文件
root@k8s-etcd001:~# vim /etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd #数据保存目录
ExecStart=/usr/local/bin/etcd \ #二进制文件路径
--name=etcd-172.21.90.216 \
--cert-file=/etc/kubernetes/ssl/etcd.pem \
--key-file=/etc/kubernetes/ssl/etcd-key.pem \
--peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
--peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
--trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--initial-advertise-peer-urls=https://172.21.90.216:2380 \ #通告自己的集群端口
--listen-peer-urls=https://172.21.90.216:2380 \ #集群之间通讯端口
--listen-client-urls=https://172.21.90.216:2379,http://127.0.0.1:2379 \ #客户端访问地址
--advertise-client-urls=https://172.21.90.216:2379 \ #通告自己的客户端端口
--initial-cluster-token=etcd-cluster-0 \ #创建集群使用的token,一个集群内的节点保持一致
--initial-cluster=etcd-172.21.90.216=https://172.21.90.216:2380,etcd-172.21.90.217=https://172.21.90.217:2380,etcd-172.21.90.218=https://172.21.90.218:2380 \ #集群所有节点信息
--initial-cluster-state=new \ #新建集群的时候值为new,如果是已存在的集群是existing
--data-dir=/var/lib/etcd \ #数据目录路径
--wal-dir= \
--snapshot-count=50000 \
--auto-compaction-retention=1 \
--auto-compaction-mode=periodic \
--max-request-bytes=10485760 \
--quota-backend-bytes=8589934592
Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999
2、etcd参数优化
--max-request-bytes=10485760
# request size limit(请求的最大字节数,默认一个key最大1.5Mib,官方推荐最大10Mib)--quota-backend-bytes=8589934592
#storage size limit(磁盘存储空间大小限制,默认为2G,此值超过8G启动会有警告信息)
数据压缩
--auto-compaction-mode=periodic
#周期性压缩--auto-compaction-retention=10h
#第一 次压缩等待10小时,以后每次10小时*10%=1小时压缩一次。
集群碎片整理
root@k8s-etcd001:~# etcdctl defrag --cluster --endpoints=https://172.21.90.216:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem
Finished defragmenting etcd member[https://172.21.90.216:2379]
Finished defragmenting etcd member[https://172.21.90.218:2379]
Finished defragmenting etcd member[https://172.21.90.217:2379]
验证当前所有成员状态
root@k8s-etcd001:~# export NODE_IPS="172.21.90.216 172.21.90.217 172.21.90.218"
root@k8s-etcd001:~# for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem endpoint health; done
https://172.21.90.216:2379 is healthy: successfully committed proposal: took = 7.492484ms
https://172.21.90.217:2379 is healthy: successfully committed proposal: took = 7.508284ms
https://172.21.90.218:2379 is healthy: successfully committed proposal: took = 7.432369ms
以表格方式显示节点详细信息
root@k8s-etcd001:~# export NODE_IPS="172.21.90.216 172.21.90.217 172.21.90.218"
root@k8s-etcd001:~# for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem; done
3、查看etcd数据信息
# 以路径的方式显示所有key信息
root@k8s-etcd001:~# etcdctl get / --prefix --keys-only
# pod信息
root@k8s-etcd001:~# etcdctl get / --prefix --keys-only | grep pod
# namespace信息
root@k8s-etcd001:~# etcdctl get / --prefix --keys-only | grep ns
# 控制器信息
etcdctl get / --prefix --keys-only | grep deployment
# calico组件信息
root@k8s-etcd001:~# etcdctl get / --prefix --keys-only | grep calico
4、etcd增删改查数据
# 添加数据
root@k8s-etcd001:~# etcdctl put /name "tom"
OK
# 查询数据
root@k8s-etcd001:~# etcdctl get /name
/name
tom
# 改动数据,直接覆盖就是更新数据
root@k8s-etcd001:~# etcdctl put /name "jack"
OK
root@k8s-etcd001:~# etcdctl get /name
/name
jack
# 删除数据
root@k8s-etcd001:~# etcdctl del /name
1
5、etcd数据watch机制
基于不断监看数据,发生变化就主动触发通知客户端,Etcd v3的watch机制支持watch某个固定的key,也支持watch一个范围。
# 在etcd node1上watch一个key,没有此key也可以执行watch,后期可以再创建
root@k8s-etcd001:~# etcdctl watch /data
# 在etcd node2修改数据,验证etcd node1是否能够发现数据变化
root@k8s-etcd002:~# etcdctl put /data "v1"
OK
root@k8s-etcd002:~# etcdctl put /data "v2"
OK
验证etcd node1,发现数据实时更新
6、etcd API数据备份与恢复
WAL是write ahead log(预写日志)的缩写,顾名思义,也就是在执行真正的写操作之前先写一个日志,预写日志。
wal:存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中, 所有数据的修改在提交前,都要先写人到WAL中。
1、备份数据
root@k8s-etcd001:~# mkdir /data/etcd{1,2} -pv
mkdir: created directory '/data'
mkdir: created directory '/data/etcd1'
mkdir: created directory '/data/etcd2'
# 保存快照到/data/etcd1/目录
root@k8s-etcd001:~# etcdctl snapshot save /data/etcd1/snapshot.db
root@k8s-etcd001:~# ll /data/etcd1
total 2188
drwxr-xr-x 2 root root 4096 Nov 29 17:01 ./
drwxr-xr-x 4 root root 4096 Nov 29 16:54 ../
-rw------- 1 root root 2228256 Nov 29 17:01 snapshot.db
2、恢复数据
要将数据恢复到一个新的目录或者没有任何文件的目录
root@k8s-etcd001:~# etcdctl snapshot restore /data/etcd1/snapshot.db --data-dir=/data/etcd1
Deprecated: Use `etcdutl snapshot restore` instead.
Error: data-dir "/data/etcd1" not empty or could not be read
root@k8s-etcd001:~#
root@k8s-etcd001:~#
root@k8s-etcd001:~# etcdctl snapshot restore /data/etcd1/snapshot.db --data-dir=/data/etcd2
恢复好数据后,修改etcd的service文件,把里面的WorkingDirectory=/var/lib/etcd
和 --data-dir=/var/lib/etcd
里的/var/lib/etcd目录改成/data/etcd2目录,然后重启服务即可。或者将/var/lib/etcd目录里的数据删除了,替换成/data/etcd2目录里的数据,然后重启etcd服务即可。
7、etcd集群数据备份和恢复
1、创建pod
使用nginx.yml文件创建几个pod
root@k8s-deploy:/yaml/nginx-tomcat-case# vim nginx.yaml
kind: Deployment
#apiVersion: extensions/v1beta1
apiVersion: apps/v1
metadata:
labels:
app: linux73-nginx-deployment-label
name: linux73-nginx-deployment
namespace: myserver
kind: Deployment
#apiVersion: extensions/v1beta1
apiVersion: apps/v1
metadata:
labels:
app: linux73-nginx-deployment-label
name: linux73-nginx-deployment
namespace: myserver
spec:
replicas: 4
selector:
matchLabels:
app: linux73-nginx-selector
template:
metadata:
labels:
app: linux73-nginx-selector
spec:
containers:
- name: linux73-nginx-container
image: nginx:1.20
#command: ["/apps/tomcat/bin/run_tomcat.sh"]
#imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
name: http
- containerPort: 443
protocol: TCP
name: https
env:
- name: "password"
value: "123456"
- name: "age"
value: "18"
# resources:
# limits:
# cpu: 2
# memory: 2Gi
# requests:
# cpu: 500m
# memory: 1Gi
---
kind: Service
apiVersion: v1
metadata:
labels:
app: linux73-nginx-service-label
name: linux73-nginx-service
namespace: myserver
spec:
type: NodePort
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
nodePort: 30004
- name: https
port: 443
protocol: TCP
targetPort: 443
nodePort: 30443
selector:
app: linux73-nginx-selector
# 使用yaml文件创建4个pod
root@k8s-deploy:/yaml/nginx-tomcat-case# kubectl apply -f nginx.yaml
deployment.apps/linux73-nginx-deployment created
# 再手动创建2个pod
root@k8s-deploy:/etc/kubeasz# kubectl run net-test1 --image=alpine sleep 1000000 -n myserver
pod/net-test1 created
root@k8s-deploy:/etc/kubeasz# kubectl run net-test2 --image=alpine sleep 1000000 -n myserver
pod/net-test2 created
# 查看pod
root@k8s-deploy:/etc/kubeasz# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
linux73-nginx-deployment-cfdc998c-2kgrl 1/1 Running 0 19m
linux73-nginx-deployment-cfdc998c-2ldkp 1/1 Running 0 19m
linux73-nginx-deployment-cfdc998c-gtc97 1/1 Running 0 19m
linux73-nginx-deployment-cfdc998c-z9rjf 1/1 Running 0 19m
net-test1 1/1 Running 0 23s
net-test2 1/1 Running 0 14s
此时有6个运行nginx的pod被创建
2、kubeasz备份
使用kubeasz的备份命令
查看playbook
# 开始备份
root@k8s-deploy:/etc/kubeasz# ./ezctl backup k8s-cluster01·
备份文件存放在/etc/kubeasz/clusters/k8s-cluster01/backup/
目录里
root@k8s-deploy:/etc/kubeasz# ll clusters/k8s-cluster01/backup/
total 4584
drwxr-xr-x 2 root root 4096 Nov 29 17:52 ./
drwxr-xr-x 5 root root 4096 Nov 27 17:14 ../
-rw------- 1 root root 2338848 Nov 29 17:52 snapshot_202211291752.db
-rw------- 1 root root 2338848 Nov 29 17:52 snapshot.db
删除3个pod后再次备份
root@k8s-deploy:/etc/kubeasz# kubectl delete pod linux73-nginx-deployment-cfdc998c-2kgrl net-test1 net-test2 -n myserver
pod "linux73-nginx-deployment-cfdc998c-2kgrl" deleted
pod "net-test1" deleted
pod "net-test2" deleted
一共有6个,我明明删除了3个pod,应该还剩3个,为什么现在反而还剩4个pod呢?那是因为“linux73”开头的那4个pod是用yaml文件创建了,你删除了一个,k8s会自动再把它创建出来。现在再次备份:
root@k8s-deploy:/etc/kubeasz# ./ezctl backup k8s-cluster01
root@k8s-deploy:/etc/kubeasz# ll clusters/k8s-cluster01/backup/
total 6872
drwxr-xr-x 2 root root 4096 Nov 29 18:05 ./
drwxr-xr-x 5 root root 4096 Nov 27 17:14 ../
-rw------- 1 root root 2338848 Nov 29 17:52 snapshot_202211291752.db
-rw------- 1 root root 2338848 Nov 29 18:05 snapshot_202211291805.db
-rw------- 1 root root 2338848 Nov 29 18:05 snapshot.db
3、恢复数据
恢复默认使用snapshot.db文件,如果想要恢复到更早之前的版本,要把之前版本名字改成snapshot.db。
此时我要把集群恢复到拥有6个pod的时候,先改名字
root@k8s-deploy:/etc/kubeasz# cd clusters/k8s-cluster01/backup/
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster01/backup# ll
total 6872
drwxr-xr-x 2 root root 4096 Nov 29 18:05 ./
drwxr-xr-x 5 root root 4096 Nov 27 17:14 ../
-rw------- 1 root root 2338848 Nov 29 17:52 snapshot_202211291752.db
-rw------- 1 root root 2338848 Nov 29 18:05 snapshot_202211291805.db
-rw------- 1 root root 2338848 Nov 29 18:05 snapshot.db
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster01/backup# cp snapshot_202211291752.db snapshot.db
# 恢复
root@k8s-deploy:/etc/kubeasz# ./ezctl restore k8s-cluster01
# 查看
root@k8s-deploy:/etc/kubeasz# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
linux73-nginx-deployment-cfdc998c-2kgrl 1/1 Running 0 76m
linux73-nginx-deployment-cfdc998c-2ldkp 1/1 Running 0 76m
linux73-nginx-deployment-cfdc998c-gtc97 1/1 Running 0 76m
linux73-nginx-deployment-cfdc998c-z9rjf 1/1 Running 0 76m
net-test1 1/1 Running 0 57m
net-test2 1/1 Running 0 57m
4、etcd数据恢复流程
当etcd集群宕机数量超过超过集群总节点数一半以上(比如总数为3台宕机两台),就会导致整个集群宕机,后期需要重新恢复数据,流程如下:
- 恢复服务器系统
- 重新部署etcd集群
- master节点停止kube-apiserver、controller-manager、scheduler、kubelet、kube-proxy
- 停止etcd集群
- 各etcd节点恢复同一份备份数据
- 启动各节点并验证etcd集群
- master节点启动kube-apiserver、controller-manager、scheduler、kubelet、kube-proxy
- 验证master状态及pod数据
五、Dashboard
Releases · kubernetes/dashboard (github.com)
1、修改yaml文件
# 下载yaml文件
root@k8s-deploy:/yaml/dashboard-v2.6.1# pwd
/yaml/dashboard-v2.6.1
root@k8s-deploy:/yaml/dashboard-v2.6.1# wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.6.1/aio/deploy/recommended.yaml
# 修改名称
root@k8s-deploy:/yaml/dashboard-v2.6.1# mv recommended.yaml dashboard-v2.6.1.yaml
1、修改镜像源
# 下载文件里使用到的镜像并上传到harbor
root@k8s-deploy:/yaml/dashboard-v2.6.1# cat dashboard-v2.6.1.yaml | grep image
image: kubernetesui/dashboard:v2.6.1
imagePullPolicy: Always
image: kubernetesui/metrics-scraper:v1.0.8
root@k8s-deploy:/yaml/dashboard-v2.6.1# docker pull kubernetesui/dashboard:v2.6.1
root@k8s-deploy:/yaml/dashboard-v2.6.1# docker pull kubernetesui/metrics-scraper:v1.0.8
root@k8s-deploy:/yaml/dashboard-v2.6.1# docker tag kubernetesui/dashboard:v2.6.1 y73.harbor.com/baseimages/dashboard:v2.6.1
root@k8s-deploy:/yaml/dashboard-v2.6.1# docker push y73.harbor.com/baseimages/dashboard:v2.6.1
root@k8s-deploy:/yaml/dashboard-v2.6.1# docker tag kubernetesui/metrics-scraper:v1.0.8 y73.harbor.com/baseimages/metrics-scraper:v1.0.8
root@k8s-deploy:/yaml/dashboard-v2.6.1# docker push y73.harbor.com/baseimages/metrics-scraper:v1.0.8
root@k8s-deploy:/yaml/dashboard-v2.6.1# vim dashboard-v2.6.1.yaml
镜像上传harbor后,修改yaml文件里193行和278行的镜像源。
2、暴露端口
修改yaml文件的39-42行,将service 443端口暴露出来从外网访问。443端口默认k8s集群内部访问,需要暴露出来允许外网访问
2、安装dashboard
root@k8s-deploy:/yaml/dashboard-v2.6.1# kubectl apply -f dashboard-v2.6.1.yaml
root@k8s-deploy:/yaml/dashboard-v2.6.1# kubectl get pod -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-7d895cc848-5954j 1/1 Running 0 76s
kubernetes-dashboard-84fd997b5b-zxzlv 1/1 Running 0 76s
3、创建user和token
1、创建用户
root@k8s-deploy:/yaml/dashboard-v2.6.1# vim admin-user.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
#创建
root@k8s-deploy:/yaml/dashboard-v2.6.1# kubectl apply -f admin-user.yaml
serviceaccount/admin-user created
clusterrolebinding.rbac.authorization.k8s.io/admin-user created
2、创建token
在1.24之前创建用户后,会自动给这个用户创建token
root@k8s-deploy:/yaml/dashboard-v2.6.1# kubectl get secrets -A
NAMESPACE NAME TYPE DATA AGE
kube-system calico-etcd-secrets Opaque 3 3d23h
kubernetes-dashboard kubernetes-dashboard-certs Opaque 0 8m24s
kubernetes-dashboard kubernetes-dashboard-csrf Opaque 1 8m24s
kubernetes-dashboard kubernetes-dashboard-key-holder Opaque 2 8m24s
现在需要自己创建token
root@k8s-deploy:/yaml/dashboard-v2.6.1# vim admin-secret.yaml
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: dashboard-admin-user
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: "admin-user"
# 创建token
root@k8s-deploy:/yaml/dashboard-v2.6.1# kubectl apply -f admin-secret.yaml
secret/dashboard-admin-user created
#查看token
root@k8s-deploy:/yaml/dashboard-v2.6.1# kubectl get secrets -A | grep admin-user
kubernetes-dashboard dashboard-admin-user kubernetes.io/service-account-token 3 86s
3、拿到secret值
root@k8s-deploy:/yaml/dashboard-v2.6.1# kubectl describe secrets dashboard-admin-user -n kubernetes-dashboard
Name: dashboard-admin-user
Namespace: kubernetes-dashboard
Labels: <none>
Annotations: kubernetes.io/service-account.name: admin-user
kubernetes.io/service-account.uid: 0efead4b-c411-4d57-9776-3d05aa163583
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1302 bytes
namespace: 20 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6InhGR1R0NHQxRlQyaDY5V3VFWFhqSlNiQUFKaEphTmFRZnJZSndlbnAtanMifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdXNlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJhZG1pbi11c2VyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMGVmZWFkNGItYzQxMS00ZDU3LTk3NzYtM2QwNWFhMTYzNTgzIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmVybmV0ZXMtZGFzaGJvYXJkOmFkbWluLXVzZXIifQ.Bya6Y_s3pCI81U1AMV--gpvBrVOQyIvucrw_77OGUIfxF6B5Sat50CsmF_Kh2AnZnu8GWFucxTDO0uHt0-Z319sYGPEKlVlxxZTmHNSz0OCFlQC5kAbnzynRgHM_gB2jIyqW6LRcIynW2MNvwrZsh3x12UZ7No8y3GwEitZ2J0eGT-C4KqJu8kleyNY9WM_OykR89551hyfeUmPg2zBWQAUMiTq7Pbu7hIrTJ3by99S3GS8XbMt2mEoSvH6FIC2jTxxJHluo-ftSzsycQAuwd_XYaxOE-ji6tqGyCNJ41U50Kn_HNPDcgDSd0lsWY-VmJg9wm3t30dLEyqJ2weZXRA
4、登录dashboard
使用任意一个node的IP登录