K8S容忍度与污点及节点亲与pod 的亲和性

Crush_ly

于 2024-08-25 21:04:29 发布

阅读量594

点赞数 16

文章标签： kubernetes 容器云原生

本文链接：https://blog.csdn.net/m0_37660354/article/details/141534904

版权

污点与容忍 taint && toleration

污点与容度，排他性进行部署，只有容忍该污点的Pod 才会被部署到上面去。

kubectl taint node k8s-master01 master-test=test:NoSchedule

#查看节点信息
kubectl describe node k8s-master01

增加NodeSelector标签

nodeSelector:
     kubernetes.io/hostname: k8s-master01

此时的容器会处于Pending 状态，不会被调度……（若有多个pod 会存在部分pod 处于pending状态）

查看处于pending的原因：

kubectl describe po nginx-deployment-745cccf67b-wbbjz

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Warning FailedScheduling 36s (x3 over 117s) default-scheduler 0/5 nodes are available: 1 node(s) had taint {master-test: test}, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector.

我们加上容忍就可以进行调度

tolerations:
- key: master-test 
  value: test
  effect: NoSchedule
  operator: Equal

三个pod 都被调度到 k8s-master01 节点上

nginx-deployment-54df5b659b-9zcf4 1/1 Running 0 36s 172.25.244.222 k8s-master01 <none> <none>

nginx-deployment-54df5b659b-fwtg6 1/1 Running 0 31s 172.25.244.224 k8s-master01 <none> <none>

nginx-deployment-54df5b659b-lpgs7 1/1 Running 0 33s 172.25.244.223 k8s-master01 <none> <none>

调度的类型：

NoSchedule: 禁止调度

NoExecute: 不符合污点立即驱逐

PerferNoSchedule: 尽量将pod 不要调度到打有该effect 值污点的节点上（软污点）

删除污点：

kubectl tailt node k8s-master01  master-test:NoSchedule-

当node节点上有多个污点，要满足所有污点才能够部署上去；写法满足条件有都多种情况：

tolerations:

- key: master-test

value: test

effect: NoSchedule

operator: Exist

能容忍 key为 master-test effect 为 NoSchedule 无论value 是什么值。都能够容忍。

tolerations:

- operator: Exist

所有的污点都可以容忍，一般用不到

tolerations:

- key: master-test

operator: Exist

只能容忍污点key 为 master-test 不管 value 与 effect 这样可以同时容忍多个污点

给master01节点打上两个污点：

kubectl taint node k8s-master01 master-test=test:NoSchedule

kubectl taint node k8s-master01 master-test=test:NoExecute

tolerations:

- key: master-test

value: test

effect: NoSchedule

operator: Exist

- key: master-test

value: test

effect: NoExecute

operator: Exist

tolerationSeconds: 60s

NoExecute 打上污点后，节点上的pod 会被立即驱逐，但是有些情况，例如网络波动等，可以等待几分钟恢复后把污点去掉；否则过期后再被调度

当操作为 Exists 必须要填写容忍时间，否则就是Equal 不是必须

# deployments.apps "nginx-deployment" was not valid:

# * spec.template.spec.tolerations[1].operator: Invalid value: core.Toleration{Key:"master-test", Operator:"Exists", Value:"test", Effect:"NoExecute", TolerationSeconds:(*int64)(nil)}: value must be empty when `operator` is 'Exists'

查看重新被调度的pod 会被打上能够容忍的污点：

kubectl describe pod nginx-deployment-6cb78fcb79-96n7v

Node-Selectors: <none>

Tolerations: master-test=test:NoSchedule

master-test=test:NoExecute for 60s

node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

表示节点未就绪或不可达时能够容忍 300s

operator 的默认值是 Equal。

k8s 系统内置的其它的污点标记：参考链接

initContainer 使用与说明：

容器启动之前执行，postStart 不保证一定再entrypoint之前执行，但是initContainer 是

initContainers:
    - image: alpine:3.6
      command: ["/sbin/sysctl","-w","vm.max_map_count=262144"]
      name: init-es-log
      securityContext:
        privileged: true

节点与POD亲和性

Affinity：亲和力

nodeAffinity: 节点亲和性：

RequiredDuringSchedulingIgnoredDuringExecution:硬亲和力，即支持必须部署再指定的节点上，也支持必须不能部署到指定的节点上

PreferredDuringSchedulingIgnoredDuringExecution:软亲和力，尽量部署再满足条件的节点上，也支持尽量不部署在满足的节点上

podAffinity: pod亲和性

RequiredDuringSchedulingIgnoredDuringExecution：将应用A,B,C 必须部署在一起

PreferredDuringSchedulingIgnoredDuringExecution：将应用A，B，C 尽量部署在一起

podAntiAffity：pod 反亲和性：（常用，例如集群就是需要反亲和性不能部署在一起）

RequiredDuringSchedulingIgnoredDuringExecution：不可以，不要将应用A,B,C 部署在一起（硬性要求）

PreferredDuringSchedulingIgnoredDuringExecution：将应用A，B，C 尽量不要部署在一起（软要求）

官网文档：

1, 给节点打上label

kubectl label nodes k8s-master02 disktype=ssd

2, 节点亲和性配置,硬性要求 IN 部署在打有ssd label 的节点上

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd            
  containers:
  - name: nginx
    image: nginx:1.14.2
    imagePullPolicy: IfNotPresent

Warning FailedScheduling 18s default-scheduler 0/5 nodes are available: 5 node(s) didn't match Pod's node affinity/selector.

Warning FailedScheduling 17s default-scheduler 0/5 nodes are available: 5 node(s) didn't match Pod's node affinity/selector.

这个时候我们可以考虑使用软亲和行进行部署

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd          
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent

Note: In the preceding types, IgnoredDuringExecution means that if the node labels change after Kubernetes schedules the Pod, the Pod continues to run.

亲和性组合条件，先满足硬性条件的同时再考虑软亲和性条件

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - antarctica-east1
            - antarctica-west1
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: registry.k8s.io/pause:2.0

In this example, the following rules apply:

The node must have a label with the key topology.kubernetes.io/zone and the value of that label must be either antarctica-east1 or antarctica-west1.
The node preferably has a label with the key another-node-label-key and the value another-node-label-value.

You can use the operator field to specify a logical operator for Kubernetes to use when interpreting the rules. You can use In, NotIn, Exists, DoesNotExist, Gt and Lt.

In: 部署在满足多个条件的节点上

NotIn: 不部署在满足以下多个条件的节点上

Exists: 部署在具有某个label 的条件节点上

DoesNotExist：部署在不能具有某个label 的条件节点上

Gt: 条件参数大于多少，条件为数字，不能为字符串

Lt: 条件参数小于多少，条件为数字，不能为字符串

容器亲和力与反亲和力：podAffinity 与 podAntiAffinity

背景：想要把多个pod 部署到同一个节点上就叫pod亲和性，反之想把节点分散部署到不同的节点上，叫做pod 反亲和性。

apiVersion: v1
kind: Pod
metadata:
  name: with-pod-affinity
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - S1
# 如果写了namespces: 空值则匹配所有命名空间下的pod的label,如果没有写，则匹配该pod所在的命名空间下；否则就是只匹配写了命名空间下的pod进行匹配
        namespaces:
         - kube-system #匹配命名空间为kube-system下的pod
        topologyKey: topology.kubernetes.io/zone
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          topologyKey: topology.kubernetes.io/zone
  containers:
  - name: with-pod-affinity
    image: nginx:1.14.2

查看pod 的label 标签：

kubectl get pod  -n kube-system -l security=S2

如果写的是亲和力，则会部署在所匹配的pod 的同一个节点上，否则则相反。

Namespace selector

FEATURE STATE: Kubernetes v1.24 [stable]

You can also select matching namespaces using namespaceSelector, which is a label query over the set of namespaces. The affinity term is applied to namespaces selected by both namespaceSelector and the namespaces field. Note that an empty namespaceSelector ({}) matches all namespaces, while a null or empty namespaces list and null namespaceSelector matches the namespace of the Pod where the rule is defined.

More practical use-cases

Inter-pod affinity and anti-affinity can be even more useful when they are used with higher level collections such as ReplicaSets, StatefulSets, Deployments, etc. These rules allow you to configure that a set of workloads should be co-located in the same defined topology; for example, preferring to place two related Pods onto the same node.

For example: imagine a three-node cluster. You use the cluster to run a web application and also an in-memory cache (such as Redis). For this example, also assume that latency between the web application and the memory cache should be as low as is practical. You could use inter-pod affinity and anti-affinity to co-locate the web servers with the cache as much as possible.

In the following example Deployment for the Redis cache, the replicas get the label app=store. The podAntiAffinity rule tells the scheduler to avoid placing multiple replicas with the app=store label on a single node. This creates each cache in a separate node.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec：
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: redis-server
        image: redis:3.2-alpine

The following example Deployment for the web servers creates replicas with the label app=web-store. The Pod affinity rule tells the scheduler to place each replica on a node that has a Pod with the label app=store. The Pod anti-affinity rule tells the scheduler never to place multiple app=web-store servers on a single node.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  selector:
    matchLabels:
      app: web-store
  replicas: 3
  template:
    metadata:
      labels:
        app: web-store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-store
            topologyKey: "kubernetes.io/hostname"
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-app
        image: nginx:1.16-alpine

Creating the two preceding Deployments results in the following cluster layout, where each web server is co-located with a cache, on three separate nodes.

node-1	node-2	node-3
webserver-1	webserver-2	webserver-3
cache-1	cache-2	cache-3

The overall effect is that each cache instance is likely to be accessed by a single client, that is running on the same node. This approach aims to minimize both skew (imbalanced load) and latency.

You might have other reasons to use Pod anti-affinity. See the for an example of a StatefulSet configured with anti-affinity for high availability, using the same technique as this example.

拓扑域讲解：

topologyKey: "kubernetes.io/hostname"

相同的key不同的value 也被当做不同的拓扑域，除非相同的key 相同的value 才会当做同一个拓扑域。

kubernetes.io/hostname 该名字所绑定的拓扑域只有一个节点，也就是在满足namespaces 下以及匹配pod 的label 下，部署在唯一的个拓扑域的节点上，也就是尽管某个命名空间下匹配了很多相同label的pod ，按道理该pod会部署在这些已经匹配的pod 上，但是由于按照hostname进行拓扑域划分，导致该pod所有的副本都在一个节点上。

按照这个顺序匹配部署或者不能部署：namespaces --> topologykey ---> label pod --> node

当每个拓扑域都部署了一个符合label的pod ；反亲和力下该pod 会出现pending 无法部署成功

总结：topology 拓扑域的划分会导致要么pending 要么部署到一个节点上。

举例子：下面有四个副本，划分三个拓扑域，其中一个pod 会处于peding再反亲和力作用下

kubectl label node k8s-mater01 jigui=jigui1
kubectl label node k8s-mater02 jigui=jigui2
kubectl label node k8s-mater03 jigui=jigui2
kubectl label node k8s-nod01 jigui=jigui3
kubectl label node k8s-nod02 jigui=jigui3

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  selector:
    matchLabels:
      app: web-store
  replicas: 4
  template:
    metadata:
      labels:
        app: web-store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-store
            topologyKey: jigui
      containers:
      - name: web-app
        image: nginx:1.16-alpine