k8s中亲和性与反亲和性
Kubernetes的默认调度器以预选、优选、选定机制完成将每个新的Pod资源绑定至为其选出的目标节点上,不过,它只是Pod对象的默认调度器,默认情况下调度器考虑的是资源足够,并且负载尽量平均。
在使用中,用户还可以自定义调度器插件,并在定义Pod资源配置清单时通过spec.schedulerName指定即可使
一、node亲和性
NodeAffinity意为Node节点亲和性的调度策略,是用于替换NodeSelector的全新调度策略。
定义节点亲和性规则时有两种类型的节点亲和性规则 :硬亲和性required和软亲和性preferred。 硬亲和性实现的是强制性规则,它是Pod调度时必须要满足的规则,而在不存在满足规则的节点时 , Pod对象会被置为Pending状态。 而软亲和性规则实现的是一种柔性调度限制,它倾向于将Pod对象运行于某类特定的节点之上,而调度器也将尽量满足此需求,但在无法满足调度需求时它将退而求其次地选择一个不匹配规则的节点
1.1、nodeSelector
对于最初的k8s实现pod指定node调度时使用nodeSelector来实现的,主要是通过定义node以及pod标签进行选择,具体实现如下:
为节点添加label标签
[root@node1 ~]# kubectl label node node2 app=web
node/node2 labeled
######
定义deployment启动pod,如下:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy
labels:
app: web
spec:
replicas: 13
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
nodeSelector:
app: web ###选择标签
containers:
- name: nginx-deploy
image: nginx:latest
imagePullPolicy: IfNotPresent
[root@node1 ~]# kubectl apply -f deploy-pod.yaml
deployment.apps/deploy created
[root@node1 ~]#
########
查看pod启动所在的节点,如下:
[root@node1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-6cb97b569b-292kw 1/1 Running 0 11s 172.25.104.62 node2 <none> <none>
deploy-6cb97b569b-2qbfm 1/1 Running 0 11s 172.25.104.52 node2 <none> <none>
deploy-6cb97b569b-58px4 1/1 Running 0 11s 172.25.104.54 node2 <none> <none>
deploy-6cb97b569b-7cmqv 1/1 Running 0 11s 172.25.104.56 node2 <none> <none>
deploy-6cb97b569b-cmq74 1/1 Running 0 11s 172.25.104.57 node2 <none> <none>
deploy-6cb97b569b-cpv8x 1/1 Running 0 11s 172.25.104.59 node2 <none> <none>
deploy-6cb97b569b-d9hwz 1/1 Running 0 11s 172.25.104.63 node2 <none> <none>
deploy-6cb97b569b-f2zwf 1/1 Running 0 11s 172.25.104.60 node2 <none> <none>
deploy-6cb97b569b-f6hbl 1/1 Running 0 11s 172.25.104.61 node2 <none> <none>
deploy-6cb97b569b-kz46f 1/1 Running 0 11s 172.25.104.58 node2 <none> <none>
deploy-6cb97b569b-mjmnv 1/1 Running 0 11s 172.25.104.55 node2 <none> <none>
deploy-6cb97b569b-nkdwm 1/1 Running 0 11s 172.25.104.51 node2 <none> <none>
deploy-6cb97b569b-tg7qc 1/1 Running 0 11s 172.25.104.53 node2 <none> <none>
1.2、node硬亲和性
为Pod对象使用nodeSelector属性可以基于节点标签匹配的方式将Pod对象强制调度至某一类特定的节点之上 ,不过它仅能基于简单的等值关系定义标签选择器,而nodeAffinity中支持使用 matchExpressions属性构建更为复杂的标签选择机制。
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy
labels:
app: web
spec:
replicas: 6
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx-deploy
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬策略
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- web
######
为node添加标签,如下:
[root@node1 ~]# kubectl label node node2 app=web
node/node2 labeled
######
启动pod,如下pod都起在node上:
[root@node1 ~]# kubectl apply -f deploy-pod.yaml
deployment.apps/deploy created
[root@node1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-66747445f7-25pwb 1/1 Running 0 14s 172.25.104.5 node2 <none> <none>
deploy-66747445f7-4qdf7 1/1 Running 0 14s 172.25.104.10 node2 <none> <none>
deploy-66747445f7-f24bq 1/1 Running 0 14s 172.25.104.20 node2 <none> <none>
deploy-66747445f7-f9vbq 1/1 Running 0 14s 172.25.104.34 node2 <none> <none>
deploy-66747445f7-gx4mq 1/1 Running 0 14s 172.25.104.28 node2 <none> <none>
deploy-66747445f7-zwtc8 1/1 Running 0 14s 172.25.104.33 node2 <none> <none>
在定义节点亲和性时,requiredDuringSchedulinglgnoredDuringExecution字段的值是一个对象列表,用于定义节点硬亲和性,它可由一到多个nodeSelectorTerm定义的对象组成, 彼此间为“逻辑或”的关系,进行匹配度检查时,在多个nodeSelectorTerm之间只要满足其中之一 即可。
preferredDuringSchedulingIgnoredDuringExecution和requiredDuringSchedulingIgnoredDuringExecution名字中的后半段符串IgnoredDuringExecution隐含的意义所指,在Pod资源基于节点亲和性规则调度至某节点之后,节点标签发生了改变而不再符合此节点亲和性规则时 ,调度器不会将Pod对象从此节点上移出,因为,它仅对新建的Pod对象生效。
nodeSelectorTerm用于定义节点选择器条目,其值为对象列表,它可由一个或多个matchExpressions对象定义的匹配规则组成,多个规则彼此之间为“逻辑与”的关系, 这就意味着某节点的标签需要完全匹配同一个nodeSelectorTerm下所有的matchExpression对象定义的规则才算成功通过节点选择器条目的检查。而matchExmpressions又可由 一到多 个标签选择器组成,多个标签选择器彼此间为“逻辑与”的关系 。
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy
labels:
app: web
spec:
replicas: 6
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx-deploy
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬策略
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- server
- web
#######为node2和node3打标签
[root@node1 ~]# kubectl label node node2 app=web
node/node2 labeled
[root@node1 ~]# kubectl label node node3 app=server
node/node3 labeled
########启动pod,如下会在node2和node3上调度
[root@node1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-d78b4d4d9-5hb4k 1/1 Running 0 2m17s 172.25.104.47 node2 <none> <none>
deploy-d78b4d4d9-l8tjk 1/1 Running 0 2m17s 172.25.135.61 node3 <none> <none>
deploy-d78b4d4d9-mcvsk 1/1 Running 0 2m17s 172.25.135.60 node3 <none> <none>
deploy-d78b4d4d9-mj7gk 1/1 Running 0 2m17s 172.25.104.43 node2 <none> <none>
deploy-d78b4d4d9-r5xqn 1/1 Running 0 2m17s 172.25.104.45 node2 <none> <none>
deploy-d78b4d4d9-zl684 1/1 Running 0 2m17s 172.25.104.46 node2 <none> <none>
构建标签选择器表达式中支持使用操作符有In、Notln、Exists、DoesNotExist、Lt和Gt等
In:label的值在某个列表中
NotIn:label的值不在某个列表中
Gt:label的值大于某个值
Lt:label的值小于某个值
Exists:某个label存在 #####values为任意值。
DoesNotExist:某个label不存在
1.3、node软亲和性
节点软亲和性为节点选择机制提供了一种柔性控制逻辑,被调度的Pod对象不再是“必须”而是“应该”放置于某些特定节点之上,当条件不满足时它也能够接受被编排于其他不符合条件的节点之上。另外,它还为每种倾向性提供了weight属性以便用户定义其优先级,取值范围是1 ~ 100,数字越大优先级越高 。
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy
labels:
app: web
spec:
replicas: 6
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx-deploy
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 60 ###设置app=web的权重为60
preference:
matchExpressions:
- key: app
operator: In
values:
- web
- weight: 40 ###设置app=server的权重为40
preference:
matchExpressions:
- key: app
operator: In
values:
- server
#####启动pod如下,会发现大部分pod在app=web的node上
[root@node1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-55bf777f76-5466z 1/1 Running 0 9m9s 172.25.104.50 node2 <none> <none>
deploy-55bf777f76-62rrz 1/1 Running 0 9m9s 172.25.104.49 node2 <none> <none>
deploy-55bf777f76-bf9bn 1/1 Running 0 9m9s 172.25.104.48 node2 <none> <none>
deploy-55bf777f76-lx5pz 1/1 Running 0 9m9s 172.25.104.53 node2 <none> <none>
deploy-55bf777f76-s78v5 1/1 Running 0 9m9s 172.25.135.62 node3 <none> <none>
deploy-55bf777f76-t9cw9 1/1 Running 0 9m9s 172.25.104.63 node2 <none> <none>