kubernetesr任务调度
节点亲和性
pod.spec.nodeAffinity
- preferredDuringSchedulingIgnoredDuringExecution:软策略
- requiredDuringSchedulingIgnoredDuringExecution:硬策略
键值运算关系
- In:label 的值在某个列表中
- NotIn:label 的值不在某个列表中
- Gt:label 的值大于某个值
- Lt:label 的值小于某个值
- Exists:某个 label 存在
- DoesNotExist:某个 label 不存在
硬策略,创建一个hostname不是k8s-node02的pod。
cat > pod-requiredDuringSchedulingIgnoredDuringExecution.yml <<EOF
apiVersion: v1
kind: Pod
metadata:
name: affinity
labels:
app: node-affinity-pod
spec:
containers:
- name: with-node-affinity
image: hub.hdj.com/library/nginx:v1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn
values:
- k8s-node02
EOF
kubectl create -f pod-requiredDuringSchedulingIgnoredDuringExecution.yml
表现效果:
创建pod后,查看pod,显示在k8s-node01上运行。
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
affinity 0/1 ContainerCreating 0 4s <none> k8s-node01 <none> <none>
软策略,创建一个尽量hostname是k8s-node03的pod。现在没有k8s-node03,会随机创建在任何一个node中。
cat > pod-preferredDuringSchedulingIgnoredDuringExecution.yml <<EOF
apiVersion: v1
kind: Pod
metadata:
name: affinity
labels:
app: node-affinity-pod
spec:
containers:
- name: with-node-affinity
image: hub.hdj.com/library/nginx:v1
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8s-node03
EOF
kubectl create -f pod-preferredDuringSchedulingIgnoredDuringExecution.yml
表现效果:
因为没有k8s-node03,所以随便找了一个node,此时将pod创建在了k8s-node02上。
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
affinity 1/1 Running 0 8s 10.244.2.82 k8s-node02 <none> <none>
pod亲和性
pod.spec.affinity.podAffinity/podAntiAffinity
调度策略 | 匹配标签 | 操作符 | 拓扑有支持 | 调度目标 |
---|---|---|---|---|
nodeAffinity | 主机 | In, NotIn, Exists,DoesNotExist, Gt, Lt | 否 | 指定主机 |
podAffinity | POD | In, NotIn, Exists,DoesNotExist | 是 | POD与指定POD同一拓扑域 |
podAnitAffinity | POD | In, NotIn, Exists,DoesNotExist | 是 | POD与指定POD不在同一拓扑域 |
podAffinity:POD与指定POD同一拓扑域
创建一个pod1,再创建一个pod3,创建pod3时,让pod3和pod1在同一个拓扑域上运行
cat > pod1.yml <<EOF
apiVersion: v1
kind: Pod
metadata:
name: pod-1
labels:
app: pod-1
spec:
containers:
- name: pod-1
image: hub.hdj.com/library/nginx:v1
EOF
kubectl create -f pod1.yml
cat > pod-podAffinityTerm.yml <<EOF
apiVersion: v1
kind: Pod
metadata:
name: pod-3
labels:
app: pod-3
spec:
containers:
- name: pod-3
image: hub.hdj.com/library/nginx:v1
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pod-1
topologyKey: kubernetes.io/hostname
EOF
kubectl create -f pod-podAffinityTerm.yml
查看运行效果,pod3 和pod1在一个node上
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-1 1/1 Running 0 4m27s 10.244.1.84 k8s-node01 <none> <none>
pod-3 1/1 Running 0 11s 10.244.1.85 k8s-node01 <none> <none>
podAntiAffinity:POD与指定POD不在同一拓扑域
创建pod3时,让pod3和pod1不在同一拓扑域上运行
cat > pod-podAntiAffinityTerm.yml <<EOF
apiVersion: v1
kind: Pod
metadata:
name: pod-3
labels:
app: pod-3
spec:
containers:
- name: pod-3
image: hub.hdj.com/library/nginx:v1
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pod-1
topologyKey: kubernetes.io/hostname
EOF
kubectl create -f pod-podAntiAffinityTerm.yml
查看运行效果,pod3 和pod1不在一个node上
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-1 1/1 Running 0 86m 10.244.1.84 k8s-node01 <none> <none>
pod-3 1/1 Running 0 68s 10.244.2.83 k8s-node02 <none> <none>
污点
- NoSchedule :表示 k8s 将不会将 Pod 调度到具有该污点的 Node 上
- PreferNoSchedule :表示 k8s 将尽量避免将 Pod 调度到具有该污点的 Node 上
- NoExecute :表示 k8s 将不会将 Pod 调度到具有该污点的 Node 上,同时会将 Node 上已经存在的 Pod 驱逐出去
标记污点
查看master上的污点,因为有NoSchedule污点,所以pod不会被创建到master上。
[root@k8s-master01 affinity]# kubectl describe node k8s-master01|grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
现查看一下pod
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-1 1/1 Running 0 107m 10.244.1.84 k8s-node01 <none> <none>
pod-3 1/1 Running 0 22m 10.244.2.83 k8s-node02 <none> <none>
在k8s-node01打一个check=hdj:NoExecute的污点
kubectl taint nodes k8s-node01 check=hdj:NoExecute
查看效果,k8s-node01上的pod被逐出去了
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-3 1/1 Running 0 23m 10.244.2.83 k8s-node02 <none> <none>
去除k8s-node01上的污点:
kubectl taint nodes k8s-node01 check=hdj:NoExecute-
容忍污点
设置了污点的 Node 将根据 taint 的 effect:NoSchedule、PreferNoSchedule、NoExecute 和 Pod 之间产生互斥的关系,Pod 将在一定程度上不会被调度到 Node 上。 但我们可以在 Pod 上设置容忍 ( Toleration ) ,意思是设置了容忍的 Pod 将可以容忍污点的存在,可以被调度到存在污点的 Node 上
cat > tolerations.yml <<EOF
apiVersion: v1
kind: Pod
metadata:
name: pod-3
labels:
app: pod-3
spec:
containers:
- name: pod-3
image: hub.hdj.com/library/nginx:v1
tolerations:
- key: "check"
operator: "Equal"
value: "hdj"
effect: "NoExecute"
EOF
具体操作及效果:
[root@k8s-master01 affinity]# kubectl taint nodes k8s-node02 check=hdj:NoExecute #给k8s-node02上也打上污点check=hdj:NoExecute
node/k8s-node02 tainted
[root@k8s-master01 affinity]# kubectl get pod -o wide #查看pod,没有了
No resources found.
[root@k8s-master01 affinity]# kubectl apply -f tolerations.yml #创建pod,容忍污点check=hdj:NoExecute
pod/pod-3 created
[root@k8s-master01 affinity]# kubectl get pod #再查看pod,运行成功
NAME READY STATUS RESTARTS AGE
pod-3 1/1 Running 0 6s
有多个 Master 存在时,防止资源浪费,可以如下设置
kubectl taint nodes k8s-master01 node-role.kubernetes.io/master=:PreferNoSchedule
固定节点
nodeName
Pod.spec.nodeName 将 Pod 直接调度到指定的 Node 节点上,会跳过 Scheduler 的调度策略,该匹配规则是强制匹配
cat > nodeName.yml <<EOF
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myweb
spec:
replicas: 7
template:
metadata:
labels:
app: myweb
spec:
nodeName: k8s-node01
containers:
- name: myweb
image: hub.hdj.com/library/nginx:v1
ports:
- containerPort: 80
EOF
kubectl apply -f nodeName.yml
运行后查看pod,所有pod都运行在node01上。
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myweb-5c986876d6-b5gx9 1/1 Running 0 4m49s 10.244.1.88 k8s-node01 <none> <none>
myweb-5c986876d6-d4mb6 1/1 Running 0 4m49s 10.244.1.87 k8s-node01 <none> <none>
myweb-5c986876d6-lffg7 1/1 Running 0 4m49s 10.244.1.86 k8s-node01 <none> <none>
myweb-5c986876d6-rprx5 1/1 Running 0 4m49s 10.244.1.91 k8s-node01 <none> <none>
myweb-5c986876d6-vxllm 1/1 Running 0 4m49s 10.244.1.89 k8s-node01 <none> <none>
myweb-5c986876d6-w5pht 1/1 Running 0 4m49s 10.244.1.90 k8s-node01 <none> <none>
myweb-5c986876d6-xhm8w 1/1 Running 0 4m49s 10.244.1.92 k8s-node01 <none> <none>
nodeSelector
Pod.spec.nodeSelector:通过 kubernetes 的 label-selector 机制选择节点,由调度器调度策略匹配 label,而后调度 Pod 到目标节点,该匹配规则属于强制约束
创建一个deployment,需要在node上游disk=ssd的label
cat > nodeSelector.yml <<EOF
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myweb
spec:
replicas: 2
template:
metadata:
labels:
app: myweb
spec:
nodeSelector:
disk: ssd
containers:
- name: myweb
image: hub.hdj.com/library/nginx:v1
ports:
- containerPort: 80
EOF
kubectl apply -f nodeSelector.yml
此时看到,pod都是padding状态
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myweb-7fb8c95bb4-k2r2t 0/1 Pending 0 14s <none> <none> <none> <none>
myweb-7fb8c95bb4-kftlm 0/1 Pending 0 14s <none> <none> <none> <none>
给node01上打一个标签disk=ssd
kubectl label node k8s-node01 disk=ssd
再查看pod,此时pod已运行。
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myweb-7fb8c95bb4-k2r2t 1/1 Running 0 3m17s 10.244.1.94 k8s-node01 <none> <none>
myweb-7fb8c95bb4-kftlm 1/1 Running 0 3m17s 10.244.1.93 k8s-node01 <none> <none>
再给node02也打上标签disk=ssd
kubectl label node k8s-node02 disk=ssd
修改deployment的副本数,将replicas改为8
kubectl edit deployment myweb
再查看pod,此时node01、node02上都存在pod
[root@k8s-master01 affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myweb-7fb8c95bb4-7n8ln 1/1 Running 0 65m 10.244.2.86 k8s-node02 <none> <none>
myweb-7fb8c95bb4-k2r2t 1/1 Running 0 72m 10.244.1.94 k8s-node01 <none> <none>
myweb-7fb8c95bb4-kftlm 1/1 Running 0 72m 10.244.1.93 k8s-node01 <none> <none>
myweb-7fb8c95bb4-kl7rm 1/1 Running 0 65m 10.244.1.95 k8s-node01 <none> <none>
myweb-7fb8c95bb4-nc5w5 1/1 Running 0 65m 10.244.1.96 k8s-node01 <none> <none>
myweb-7fb8c95bb4-qn2kn 1/1 Running 0 65m 10.244.2.87 k8s-node02 <none> <none>
myweb-7fb8c95bb4-t6ql9 1/1 Running 0 65m 10.244.2.88 k8s-node02 <none> <none>
myweb-7fb8c95bb4-xjzf5 1/1 Running 0 65m 10.244.2.85 k8s-node02 <none> <none>