调度器与apiserver采用watch连接,对那些nodename为空的进行调度,调度要考虑很多东西,公平度调度,如何让资源高效利用,pod的QoS,affinity和anti-affinity,调度器分为两个阶段,predicte就是过滤不符合条件的节点,priority就是优先级排序,选择优先级最高的节点
predicates有很多策略,podfithostport检查是否有port冲突
podfitsresources检查node的资源是否充足,包括允许pod的数量,cpu,内存,gpu
hostname检查pod.spec.nodeselector与候选节点是否一致
novolumezoneconflict检查volume zone是否冲突
检查是否匹配pod的亲和性
检查pod是否容忍node taints
检查pod是否可以调度到内存紧张的节点
检查节点是否满足pod所引用的volume条件
priority的策略
优先减少节点上属于同一个service或replication controller的数量
优先将pod调度到相同的拓扑上
优先调度请求资源少的节点上
优先平衡各节点的资源使用
preferAvoidpids字段判断,权重为10000,避免其他优先级策略的影响
优先调度上匹配nodeaffinity的节点上,优先调度到匹配tainttoleration的节点上,
尽量将同一个service的pod分布到不同的节点上
apiVersion: apps/v1
kind: Deployment
metadata:
name: ng-deploy
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: docker.io-nginx:latest
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 500m
memory: 500Gi
requests:
cpu: 250m
memory: 256Gi
我们使用 docker inspect 进入这个容器内
可以看到对资源的限制已经通过cgroups完成
limitrange对象可以对不加资源限制的pod加上资源限制
apiVersion: v1
kind: LimitRange
metadata:
name: limitrange
spec:
limits:
- default:
memory: 512Mi
defaultRequest:
memory: 256Mi
type: Container
limitrange不支持对容器名字进行限制,比如它也会对initcontainer加上资源限制
nodeSelector
nodeselector不够灵活一般使用node affinity
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
下面看看podaffinity
`apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
containers:
- name: nginx2
image: nginx:latest
imagePullPolicy: IfNotPresent`
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: hight-priority
value: 1000000
globalDefault: false
description: "this priority class"
在deployment里使用这个priorityclass
apiVersion: apps/v1
kind: Deployment
metadata:
name: ng-deploy
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: docker.io-nginx:latest
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 250m
memory: 256Mi
priorityClassName: hight-priority