k8s集群调度

开化者

于 2023-08-23 14:19:10 发布

阅读量1.2k

点赞数

分类专栏： k8s 文章标签： kubernetes java 容器

本文链接：https://blog.csdn.net/li175325/article/details/132451397

版权

k8s 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

Scheduler调度器

scheduler是kubernetes的调度器，主要的任务是把定义的pod分配到集群的节点上，听起来非常简单，但有很多要考虑的问题：
   公平：如何保证每个节点都能被分配资源
   资源高效利用：集群所有资源最大化被使用
   效率：允许用户根据自己的需求控制调度的逻辑
   灵活：允许用户根据自己的需求控制调度的逻辑

scheduler是作为单独的程序运行的，启动之后一直监听api server，获取podspec,nodename为空的pod，对每个pod都会创建一个binding，表明该pod应该放到哪个节点上。

一、调度过程

调度分为几个部分：
   首先过滤掉不满足条件的节点，这个过程为predicate；
   然后对通过的节点按照优先级排序，这个过程为priority;
   最后从中选择优先级最高的节点，如果中间步骤有错误，直接报错。

Predicate的算法：
   PodFitsResources: 节点上剩余的资源是否大于pod请求的资源
   PodFitsHost: 如果pod指定了nodename，检查节点名称是否和nodename匹配
   PodFitsHostPorts: 节点上已经使用的port是否和pod申请的port冲突
   PodSelectorMatches: 过滤掉和pod指定的label不匹配的节点
   NoDiskConflict：已经mount的volume和pod指定的volume不冲突，除非都是只读
   注：资源、nodename匹配、port冲突、标签匹配、持久卷支持

如果predicate过程中没有合适的节点，pod会一直在pending状态，不断重试调度，直到有节点满足条件。经过这个步骤，如果有多个节点满足条件，就据需priorities过程：按照优先级大小对节点排序。
优先级由一系列键值对组成，键是该优先级项的名称，值是权重，优先级项包括：
   LeastRequestedPriority: 通过计算cpu和memory的使用率决定权重，使用率越低权重越高
   BalanceResourceAllocation:节点上cpu和memory使用率越接近，权重越高，和上面一起用
   ImageLocalityPriority:倾向已经有要使用镜像的节点，镜像大小值越大，权重越高
通过算法对所有的优先级项目和权重进行计算，得出最终结果。

除了k8s自带的调度器，也可以通过spec:schedulername参数指定自定义的调度器。

二、节点亲和性（pod与节点之间）

pod.spec.nodeAffinity
preferredDuringSchedulingIgnoredDuringExecution: 软策略
requiredDuringSchedulingIgnoredDuringExecution: 硬策略

硬策略：必须在满足条件下执行
---yaml
apiVersion: v1
kind: Pod
metadata:
name: affinity
labels:
app: node-affinity-pod
spec:
containers:
- name: with-node-affinity
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn
values:
- node02

(key字段来自标签：kubectl get node --show-labels)

键值运算关系：
   In: label的值在某个列表中
   NotIn: label的值不在某个列表中
   Gt: label的值大于某个值
   Lt: label的值小于某个值
   Exists: 某个label存在
   DoesNotExist: 某个label不存在

软策略：满足条件执行，不满足就放弃,在其他条件下执行
---yaml

apiVersion: v1
kind: Pod
metadata:
name: affinity
labels:
app: node-affinity-pod
spec:
containers:
- name: with-node-affinity
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node03

三、pod亲和性（pod之间）

pod.spec.affinity.podAffinity/podAntiAffinity
preferredDuringSchedulingIgnoredDuringExecution: 软策略
requiredDuringSchedulingIgnoredDuringExecution: 硬策略

硬策略：匹配条件就在同一主机

vim pod1.yaml

apiVersion: v1
kind: Pod
metadata:
name: node1
labels:
app: node1
spec:
containers:
- name: with-node-affinity
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node01

vim pod2.yaml

apiVersion: v1
kind: Pod
metadata:
name: pod2
labels:
app: pod2
spec:
containers:
- name: pod2
image: nginx
imagePullPolicy: IfNotPresent
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- node1
topologyKey: kubernetes.io/hostname

---------------------------------------------
亲和性/反亲和性调度策略比较：

调度策略       匹配标签       拓扑域支持   调度目标
nodeAffinity 主机       否       指定主机
podAffinity   pod       是       pod与指定pod同一拓扑域
podAntiAffinity   pod       是       pod与指定pod不在同一拓扑域

---------------------------------------------

四、Taint和Toleration

节点亲和性，是pod的一种属性（偏好或硬性要求），它使pod被吸引到一类特定的节点，Taint则相反，它使节点能够排斥一类特定的pod。

Taint和Toleration相互配合，可以用来避免pod被分配到不适合的节点上。每个节点上都可以应用一个或多个Taint，这表示对于那些容忍这些Taint的pod，是不会被该节点接受的，如果将Toleration应用于pod上，则表示这些pod可以（但不要求）被调度到具有匹配Taint的节点上。

（一）污点（Taint）

1.Taint的组成
使用kubectl taint命令可以给某个node节点设置污点，node被设置上污点后就和pod产生了一种排斥的关系，可以让node拒绝pod的调度执行，甚至将node上已存在的pod驱逐出去。

每个污点的组成：
key=value:effect

每个污点有一个key和value作为污点的标签，其中value可以为空，effect描述污点的作用。当前Taint effect支持如下三个选项：
   NoSchedule: 表示k8s不会将pod调度到具有该污点的node上
   PreferNoSchedule: 表示k8s将尽量避免将pod调度到具有该污点的node上
   NoExecute: 表示k8s不会将pod调度到具有该污点的node上，还会把已有的pod驱逐

2.污点的设置、查看、去除

设置污点：
kubectl taint nodes node01 check=lhy:NoExecute

查看污点：
kubectl describe nodes node01 | grep Taint

去除污点：
kubectl taint nodes node01 check:NoExecute-

（二）容忍（Tolerations）

设置了污点的node将根据Taint的effect：NoSchedule、PreferNoSchedule、NoExecute和pod之间产生互斥的关系，pod将在一定程度上不会被调度到node上，但我们可以在pod上设置Toleration,意思是设置了容忍的pod将可以容忍污点的存在，可以被调度到存在污点的node上。

1.在pod的yaml中设置：
pod.spec.tolerations

spec:
tolerations:
- key: check
operator: Equal
value: lhy
effect: NoExecute
tolerationSeconds: 3600

注：
其中key,value,effect要与node上的Taint保持一致
operator的值为Exists将会忽略value值
tolerationSeconds用于描述当pod需要被驱逐时还可以保留运行的时间

2.当不指定key值时，表示容忍所有的污点key：
tolerations:
- operator: Exists

3.当不指定effect值时，表示容忍所有的污点：
tolerations:
- key: key1
operator: Exists

4.当有多个master存在时，为防止资源浪费，可以如下设置：
kubectl taint nodes master node-role.kubernetes.io/master=:PreferNoSchedule

五、指定调度节点

1.pod.spec.nodeName
指定node名直接调度到对应node节点上，强制匹配跳过Scheduler调度

apiVersion: apps/v1
kind: Deployment
metadata:
name: bdqn1
spec:
selector:
matchLabels:
app: bdqn1
replicas: 5
template:
metadata:
labels:
app: bdqn1
spec:
nodeName: node02
containers:
- name: bdqn1
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80

2. pod.spec.nodeSelector
通过label-selector机制选择节点，由调度器匹配label，然后调度pod到对应节点，强制约束

apiVersion: apps/v1
kind: Deployment
metadata:
name: bdqn2
spec:
selector:
matchLabels:
app: bdqn2
replicas: 3
template:
metadata:
labels:
app: bdqn2
spec:
nodeSelector:
disk: ssd
containers:
- name: bdqn2
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80

查看标签：
kubectl get nodes --show-labels

设置node标签：
kubectl label node node01 disk=ssd

去除node标签：
kubectl label node node01 disk-

开化者

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
k8s集群调度

设置了污点的node将根据Taint的effect：NoSchedule、PreferNoSchedule、NoExecute和pod之间产生互斥的关系，pod将在一定程度上不会被调度到node上，但我们可以在pod上设置Toleration,意思是设置了容忍的pod将可以容忍污点的存在，可以被调度到存在污点的node上。节点亲和性，是pod的一种属性（偏好或硬性要求），它使pod被吸引到一类特定的节点，Taint则相反，它使节点能够排斥一类特定的pod。
复制链接

扫一扫