1 简介
随着k8s 变得越来越大、越来越多样化,其调度管理就变得非常重要。k8s中通过kube-scheduler来进行调取,它通过topology-aware 算法来决定哪些节点可以运行一个pod。
Scheduler 会跟踪集群中一系列的节点,并基于多个判断条件对节点进行过滤,然后通过优先级函数来决定每个pod应该被调度到哪个节点上。
k8s 中有多种控制 pod 调度的方式,包括label、taints、podAffinity 和 podAntiAffinity;本文对其中最常见的两种(label 和 taints)进行介绍,并加以实验测试。
2 基于 label 和 taints 控制调度
2.1 通过 label 分配 pod
k8s 中可以给node打标签,然后pod中跟nodeSelector来确定调度到哪些节点上;本案例通过对节点打标签来控制pod的分配。
- 查看当前节点信息
$ kubectl get nodes NAME STATUS ROLES AGE VERSION kmaster Ready master 5h v1.19.4 knode01 Ready <none> 4h47m v1.19.4 knode02 Ready <none> 4h45m v1.19.4 knode03 Ready <none> 4h42m v1.19.4
- 查看节点标签信息
$ kubectl describe nodes|grep -A5 -i label Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=kmaster kubernetes.io/os=linux node-role.kubernetes.io/master= -- Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=knode01 kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock ...... $ kubectl describe nodes|grep -i taint Taints: node-role.kubernetes.io/master:NoSchedule Taints: <none> Taints: <none> Taints: <none> master 节点多了一个 node-role.kubernetes.io/master 标签, 且其 taint 对应为NoSchedule
- 查看各个节点上容器数量
$ docker ps -a|grep Up|wc -l kmaster节点:20 knode01节点:8 knode02节点:8 knode03节点:8
- 给节点设置标签
$ kubectl label nodes knode01 status=vip $ kubectl label nodes knode02 status=other $ kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS kmaster Ready master 5h8m v1.19.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=kmaster,kubernetes.io/os=linux,node-role.kubernetes.io/master= knode01 Ready <none> 4h55m v1.19.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=knode01,kubernetes.io/os=linux,status=vip knode02 Ready <none> 4h53m v1.19.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=knode02,kubernetes.io/os=linux,status=other knode03 Ready <none> 4h49m v1.19.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=knode03,kubernetes.io/os=linux
- 通过nodeSelector在vip节点上部署3个容器, 在other上部署2个容器
创建 vip pod $ im vip.yaml apiVersion: v1 kind: Pod metadata: name: vip spec: nodeSelector: status: vip containers: - name: vip1 image: busybox:1.31 args: - sleep - "1000000" - name: vip2 image: busybox:1.31 args: - sleep - "1000000" - name: vip3 image: busybox:1.31 args: - sleep - "1000000" $ kubectl apply -f vip.yaml 创建 other pod $ vim other.yaml apiVersion: v1 kind: Pod metadata: name: other spec: nodeSelector: status: other containers: - name: other1 image: busybox:1.31 args: - sleep - "1000000" - name: other2 image: busybox:1.31 args: - sleep - "1000000" $ kubectl apply -f other.yaml pod/other created
- 查看各个节点上容器数量
$ docker ps -a|grep Up|wc -l kmaster节点:20 knode01节点:12 (添加3个busybox+1个pause容器) knode02节点:11 (添加2个busybox+1个pause容器) knode03节点:8
2.2 通过 taints 控制 pod
k8s 当前支持 3 类taints, 分别为 NoSchedule ,PreferNoSchedule 和 NoExecute;下面分别对其进行实验。
- 保留taints NoSchedule
$ kubectl delete -f vim.yaml $ kubectl delete -f other.yaml kmaster knode01 knode02 knode03 容器数量:20 8 8 8 $ vim common.yaml apiVersion: apps/v1 kind: Deployment metadata: labels: app: common name: common spec: replicas: 4 selector: matchLabels: app: common template: metadata: labels: app: common spec: containers: - image: busybox:1.31 name: busybox command: [sh, -c, 'sleep 10000'] $ kubectl apply -f common.yaml deployment.apps/common created 此时 kmaster knode01 knode02 knode03 容器数量:20 10 10 12, kmaster 没有分配pod,knode01|02|03 分别分配了1|1|2个pod;
- 去掉Taints NoSchedule
$ kubectl delete -f common.yaml deployment.apps "common" deleted $ kubectl taint node kmaster node-role.kubernetes.io/master:NoSchedule- node/kmaster untainted $ kubectl apply -f common.yaml $ docker ps -a|grep Up|wc -l 此时 kmaster knode01 knode02 knode03 容器数量:22 10 10 10, 每个节点都分配了一个pod;
- 恢复kmaster为不可调度
$ kubectl taint node kmaster node-role.kubernetes.io/master:NoSchedule $ kubectl delete -f common.yaml
- 设置 vip 为 PreferNoSchedule
$ kubectl taint node knode01 status=vip:PreferNoSchedule node/knode01 tainted $ kubectl apply -f common.yaml $ docker ps -a|grep Up|wc -l 此时 kmaster knode01 knode02 knode03 容器数量:20 10 10 12, knode01|02|03 分别分配1|1|2个pod; 除了master设置NoSchedule外,每个node节点分配1个,多余的1个pod不优先分配给knode01; $ kubectl scale deployment common --replicas=8 此时 kmaster knode01 knode02 knode03 容器数量:20 12 14 14, knode01|02|03 分别分配2|3|4个pod; 除了master设置NoSchedule外,每个node节点分配2个,多余的2个pod不优先分配给knode01; $ kubectl delete -f common.yaml
- 设置knode01 为 NoExcute
$ kubectl taint node knode01 status=vip:PreferNoSchedule- node/knode01 untainted $ kubectl taint node knode01 status=vip:NoExecute node/knode01 tainted
- 测试knode01 查看pod是否被移除;
$ kubectl apply -f common.yaml deployment.apps/common created $ docker ps -a|grep Up|wc -l 此时 kmaster knode01 knode02 knode03 容器数量:20 4 12 14,knode01减少了2个pod,knode02|03分别增加了1|3个pod; knode01 上除了k8s_calico-node_calico-node 和 k8s_kube-proxy_kube-proxy 外,其它容器都被移除了;
- 恢复节点为正常状态
$ kubectl taint node knode01 status=vip:NoExecute- node/knode01 untainted $ kubectl label nodes knode01 status- $ kubectl label nodes knode02 status- 过一段时间后, 非核心的DaemonSets pod node-exporter再次调度到knode01上; 清除测试资源: $ kubectl delete -f common.yaml
3 注意事项
待补充
4 说明
1 概念->调度和驱逐 (Scheduling and Eviction)->将 Pod 分配给节点
2 概念->调度和驱逐 (Scheduling and Eviction)->污点和容忍度