http://blog.csdn.net/WaltonWang/article/details/55005453
摘要:本文对Kubernetes Resource QoS介绍,机制解析和简单的源码分析。
Kubernetes Resource QoS Classes介绍
Kubernetes根据Pod中Containers Resource的request和limit的值来定义Pod的QoS Class。
对于每一种Resource都可以将容器分为3种QoS Classes: Guaranteed, Burstable, and Best-Effort,它们的QoS级别依次递减。
- Guaranteed 如果Pod中所有Container的所有Resource的
limit和request都相等且不为0,则这个Pod的QoS Class就是Guaranteed。limit=request
注意,如果一个容器只指明了limit,而未指明request,则表明request的值等于limit的值。
Examples:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- Best-Effort 如果Pod中所有容器的所有Resource的request和limit都没有赋值,则这个Pod的QoS Class就是Best-Effort.
Examples:
- 1
- 2
- 3
- 4
- 5
- Burstable 除了符合Guaranteed和Best-Effort的场景,其他场景的Pod QoS Class都属于Burstable。
当limit值未指定时,其有效值其实是对应Node Resource的Capacity。
Examples:
容器bar没有对Resource进行指定。
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
容器foo和bar对不同的Resource进行了指定。
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
容器foo未指定limit,容器bar未指定request和limit。
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
可压缩/不可压缩资源的区别
kube-scheduler调度时,是基于Pod的request值进行Node Select完成调度的。Pod和它的所有Container都不允许Consume limit指定的有效值(if have)。
How the request and limit are enforced depends on whether the resource is compressible or incompressible.
Compressible Resource Guarantees
- For now, we are only supporting CPU.
- Pods are guaranteed to get the amount of CPU they request, they may or may not get additional CPU time (depending on the other jobs running). This isn’t fully guaranteed today because cpu isolation is at the container level. Pod level cgroups will be introduced soon to achieve this goal.
- Excess CPU resources will be distributed based on the amount of CPU requested. For example, suppose container A requests for 600 milli CPUs, and container B requests for 300 milli CPUs. Suppose that both containers are trying to use as much CPU as they can. Then the extra 10 milli CPUs will be distributed to A and B in a 2:1 ratio (implementation discussed in later sections).
- Pods will be throttled if they exceed their limit. If limit is unspecified, then the pods can use excess CPU when available.
Incompressible Resource Guarantees
- For now, we are only supporting memory.
- Pods will get the amount of memory they request, if they exceed their memory request, they could be killed (if some other pod needs memory), but if pods consume less memory than requested, they will not be killed (except in cases where system tasks or daemons need more memory).
- When Pods use more memory than their limit, a process that is using the most amount of memory, inside one of the pod’s containers, will be killed by the kernel.
Admission/Scheduling Policy
- Pods will be admitted by Kubelet & scheduled by the scheduler based on the sum of requests of its containers. The scheduler & kubelet will ensure that sum of requests of all containers is within the node’s allocatable capacity (for both memory and CPU).
如何根据不同的QoS回收Resources
-
CPU Pods will not be killed if CPU guarantees cannot be met (for example if system tasks or daemons take up lots of CPU), they will be temporarily throttled.
-
Memory Memory is an incompressible resource and so let’s discuss the semantics of memory management a bit.
-
Best-Effort pods will be treated as lowest priority. Processes in these pods are the first to get killed if the system runs out of memory.
These containers can use any amount of free memory in the node though. -
Guaranteed pods are considered top-priority and are guaranteed to not be killed until they exceed their limits, or if the system is under memory pressure and there are no lower priority containers that can be evicted.
-
Burstable pods have some form of minimal resource guarantee, but can use more resources when available.
Under system memory pressure, these containers are more likely to be killed once they exceed their requests and no Best-Effort pods exist.
-
OOM Score configuration at the Nodes
Pod OOM score configuration
- Note that the OOM score of a process is 10 times the % of memory the process consumes, adjusted by OOM_SCORE_ADJ, barring exceptions (e.g. process is launched by root). Processes with higher OOM scores are killed.
- The base OOM score is between 0 and 1000, so if process A’s OOM_SCORE_ADJ - process B’s OOM_SCORE_ADJ is over a 1000, then process A will always be OOM killed before B.
- The final OOM score of a process is also between 0 and 1000
Best-effort
- Set OOM_SCORE_ADJ: 1000
- So processes in best-effort containers will have an OOM_SCORE of 1000
Guaranteed
- Set OOM_SCORE_ADJ: -998
- So processes in guaranteed containers will have an OOM_SCORE of 0 or 1
Burstable
- If total memory request > 99.8% of available memory, OOM_SCORE_ADJ: 2
- Otherwise, set
OOM_SCORE_ADJto 1000 - 10 * (% of memory requested) - This ensures that the OOM_SCORE of burstable pod is > 1
- If memory request is
0,OOM_SCORE_ADJis set to999. - So burstable pods will be killed if they conflict with guaranteed pods
- If a burstable pod uses less memory than requested, its OOM_SCORE < 1000
- So best-effort pods will be killed if they conflict with burstable pods using less than requested memory
- If a process in burstable pod’s container uses more memory than what the container had requested, its
OOM_SCOREwill be 1000, if not itsOOM_SCOREwill be < 1000 - Assuming that a container typically has a single big process, if a burstable pod’s container that uses more memory than requested conflicts with another burstable pod’s container using less memory than requested, the former will be killed
- If burstable pod’s containers with multiple processes conflict, then the formula for OOM scores is a heuristic, it will not ensure “Request and Limit” guarantees.
Pod infra containers or Special Pod init process
OOM_SCORE_ADJ: -998
Kubelet, Docker
OOM_SCORE_ADJ: -999 (won’t be OOM killed)- Hack, because these critical tasks might die if they conflict with guaranteed containers. In the future, we should place all user-pods into a separate cgroup, and set a limit on the memory they can consume.
源码分析
QoS的源码位于:pkg/kubelet/qos,代码非常简单,主要就两个文件pkg/kubelet/qos/policy.go,pkg/kubelet/qos/qos.go。
上面讨论的各个QoS Class对应的OOM_SCORE_ADJ定义在:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
容器的OOM_SCORE_ADJ的计算方法定义在:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
获取Pod的QoS Class的方法为:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
PodQoS会在eviction_manager和scheduler的Predicates阶段被调用,也就说会在k8s处理超配和调度预选阶段中被使用。

本文详细介绍了Kubernetes中的QoS(Quality of Service)机制,包括Guaranteed、Burstable及Best-Effort三种类型,并解释了这些类型在资源分配、调度及内存管理中的作用。同时探讨了不同QoS类别的容器如何通过调整OOM_SCORE_ADJ值来优化内存资源的回收策略。
1462

被折叠的 条评论
为什么被折叠?



