k8s的调度器作用
当你需要把某个pod bind到对应的node上,由kube-scheduler来将创建好的pod调度到这个node上,然后经由改node上的kubelet 调用对应的cri csi cni来帮你创建容器运行时存储以及网络。
流程
调度器首先是要进行预选操作,首先filter出所有满足条件的node,然后对这些node进行打分选出最优的node进行bind。
预选:
1.比如有污点的 不可被调度的
kubectl taint nodes master key1=value1:NoSchedule
2 .pod端口冲突问题
3. pod所需要的request的内存 磁盘等不可压缩的资源
4. podAffinity
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- s1
- podTolerates NodeTaints 检查pod是否能容忍
目前支持的taint类型如下:
Noschedule:新的pod不能调度到这个node上,不影响正在运行的pod
PreferNoSchedule:软策略的NoSchedule,尽量不要调度到这个node上
Noexecute: 新的pod不调度到该Node上,并且会evict已存在的pod,pod可以添加一个tolerationSeconds防止假死现象,只有到了时间才会删除。
优选
1.优先减少节点上属于同一个service或Replication Controller的pod数量(实现负载均衡到不同的节点上)
2. 将pod调度到相同的zone上,这样不会造成访问快慢问题
3. 优先调度到请求资源少的节点上
4. 优先平衡节点的资源使用
5. 优先调度到匹配NodeAffinity的节点上,以及匹配到taintToleration到节点上
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
tolerations:
- key: "example-key"
operator: "Exists"
effect: "NoSchedule"
6.优先调度到已经存在该镜像的node上特别是大镜像
7 priorityClass
apiVersion: v1
kind: PriorityClass
metadata:
name: high-priority
value: 10000000
globalDefault: false
description: "priority"
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
priorityClassName: high-priority