Kubernetes-Pod资源调度

Kubernetes按角色分为master和node节点,其中node节点是运行Pod的节点

创建Pod的时候如何调度除了按照资源(cpu,内存)等,还有很多调度策略

Label标签

Label是Kubernetes核心概念之一,主要作用就是给k8s的资源记录标签,简单的key-value键值对,比如Pod、Service、Deployment、Node等都可以设置Label

查看node节点的标签

[root@k8s-node1 ~]# kubectl get node --show-labels 
NAME        STATUS   ROLES         AGE   VERSION   LABELS
k8s-node1   Ready    master,node   8d    v1.13.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,cputype=intel-xeon-e5-2620-v4,gputype=nvidia-geforce-gtx-1080-ti,kubernetes.io/hostname=k8s-node1,node-role.kubernetes.io/master=k8s-node1,node-role.kubernetes.io/node=k8s-node1,pooltype=shared
k8s-node2   Ready    node          8d    v1.13.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,cputype=intel-xeon-e5-2620-v4,gputype=nvidia-geforce-gtx-1080-ti,kubernetes.io/hostname=k8s-node2,node-role.kubernetes.io/node=k8s-node2,pooltype=shared

比如如上给node节点设置了cputype为intel-xeon-e5-2620-v4,gputype为nvidia-geforce-gtx-1080-ti

给节点设置标签

# k8s-node1节点使用的是普通硬盘, 设置disk=hdd
[root@k8s-node1 ~]# kubectl label node k8s-node1 disk=hdd
node/k8s-node1 labeled

# k8s-node1节点使用的是机械硬盘, 设置disk=ssd
[root@k8s-node1 ~]# kubectl label node k8s-node2 disk=ssd
node/k8s-node2 labeled

按照label查看node

[root@k8s-node1 example]# kubectl get node -l 'disk=ssd'
NAME        STATUS   ROLES   AGE   VERSION
k8s-node2   Ready    node    8d    v1.13.4

# 多个lable组合查询
root@k8s-node1 example]# kubectl get node -l 'disk=hdd, pooltype!=unshared'
NAME        STATUS   ROLES         AGE   VERSION
k8s-node1   Ready    master,node   8d    v1.13.4

查看其他资源的label标签

[root@k8s-node1 ~]# kubectl --namespace=admin get pod --show-labels 
NAME                     READY   STATUS    RESTARTS   AGE     LABELS
enp183pm-session-8tf2d   1/1     Running   0          19h     app=enp183pm-session,controller-uid=a0758fe2-81e8-11e9-a660-88d7f6ae9c94,job-name=enp183pm-session,taskname=enp183pm,uuid=03670773-e16f-4886-9306-de13da7c9958
v4exp-bigdata            1/1     Running   0          7d20h   volume=bigdata
[root@k8s-node1 ~]# 
[root@k8s-node1 ~]# kubectl --namespace=admin get service --show-labels 
NAME                                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE     LABELS
gputask-session                                          NodePort    10.10.124.20    <none>        8888:30043/TCP   2d1h    app=gputask-session,taskname=gputask
v4exp-bigdata                                            NodePort    10.10.142.143   <none>        80:30379/TCP     7d20h   volume=bigdata

POD资源调度

nodeSelector

这是最简单的调度方法,具体使用就是(目前已经不建议使用,使用Node affinity可以实现同样的功能):

  1. 用户给node定义label;
  2. 用户创建pod时可以指定nodeSelector,通过label选择对应的node;

查看当前的node节点

[root@k8s-node1 example]# kubectl get node --label-columns=disk
NAME        STATUS   ROLES         AGE   VERSION   DISK
k8s-node1   Ready    master,node   8d    v1.13.4   hdd
k8s-node2   Ready    node          8d    v1.13.4   ssd

现在想创建一个POD调度到disk=ssd的节点上

[root@k8s-node1 example]# cat nginx-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  # 设置Pod自身的label
  labels:
    k8s-app: nginx-pod
  name: nginx-pod
spec:
  # 表明该Pod调度 指定nodeSelect为 disk=ssd
  nodeSelector:
    disk: ssd
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      name: nginx
      protocol: TCP

[root@k8s-node1 example]# kubectl create -f nginx-pod.yaml 
pod/nginx-pod created

# 可以看到nginx-pod调度到了k8s-node2也就是disk=ssd的节点
[root@k8s-node1 example]# kubectl get pod nginx-pod -owide
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
nginx-pod   1/1     Running   0          14s   172.17.76.16   k8s-node2   <none>           <none>

如果通过nodeselector选择一个不存在的label标签

[root@k8s-node1 example]# cat nginx-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    k8s-app: nginx-pod
  name: nginx-pod
spec:
  # nodeSelector设置为disk=ceph(实际上集群中没有设置该标签的node)
  nodeSelector:
    disk: ceph
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      name: nginx
      protocol: TCP

[root@k8s-node1 example]# kubectl create -f nginx-pod.yaml 
pod/nginx-pod created

# 可以看到状态一直是Pending
[root@k8s-node1 example]# kubectl get pod nginx-pod -owide
NAME        READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
nginx-pod   0/1     Pending   0          7s    <none>   <none>   <none>           <none>

#查看详细描述, 2个节点都不匹配 node selector
[root@k8s-node1 example]# kubectl describe pod nginx-pod
...
...
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  14s (x2 over 14s)  default-scheduler  0/2 nodes are available: 2 node(s) didn't match node selector.

nodeAffinity

node 节点 Affinity ,从字面上很容易理解nodeAffinity就是节点亲和性,Anti-Affinity也就是反亲和性。节点亲和性就是控制pod是否调度到指定节点,相对nodeSelector来说更为灵活,可以实现一些简单的逻辑组合。

nodeAffinity包括如下几种

  1. requiredDuringSchedulingIgnoredDuringExecution   必须满足,没有满足条件的node,pod会创建失败
  2. preferredDuringSchedulingIgnoredDuringExecution  尽力满足,没有满足条件的node,pod也会创建成

IgnoredDuringExecution的意思是,上面两条规则只在pod创建时起作用,如果pod已经运行,后来又改了node的lable,node不满足pod运行条件,但已经运行的pod不受影响

现在想通过nodeAffinity调度创建一个POD调度到disk=ssd的节点上

[root@k8s-node1 example]# cat nginx-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    k8s-app: nginx-pod
  name: nginx-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disk
            operator: In
            values:
            - ssd
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      name: nginx
      protocol: TCP

[root@k8s-node1 example]# kubectl create -f nginx-pod.yaml 
pod/nginx-pod created

[root@k8s-node1 example]# kubectl get pod nginx-pod -owide
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
nginx-pod   1/1     Running   0          4s    172.17.76.16   k8s-node2   <none>           <none>

使用requiredDuringSchedulingIgnoredDuringExecution查看不满足条件的情况,修改disk values为ceph,可以看到创建失败,一直在pending中(截图略,同node selector一样)

使用preferredDuringSchedulingIgnoredDuringExecution查看不满足条件的情况,修改disk values为ceph

[root@k8s-node1 example]# cat nginx-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    k8s-app: nginx-pod
  name: nginx-pod
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: disk
            operator: In
            values:
            - ceph
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      name: nginx
      protocol: TCP

[root@k8s-node1 example]# kubectl create -f nginx-pod.yaml 
pod/nginx-pod created

# 可以看到即使没有disk=ceph的节点,仍然会创建成功
[root@k8s-node1 example]# kubectl get pod nginx-pod -owide
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
nginx-pod   1/1     Running   0          2s    172.17.76.16   k8s-node2   <none>           <none>

通过yaml可以看到nodeAffinity支持组合查询,并且支持更加灵活的运算符,operator包括如下:

- In:label 的值在某个列表中
- NotIn:label 的值不在某个列表中
- Gt:label 的值大于某个值
- Lt:label 的值小于某个值
- Exists:某个 label 存在
- DoesNotExist:某个 label 不存

podAffinity

podAffinity和nodeAffinity差不多,区别是nodeAffinity是通过node节点的label进行选择,而podaffinity则是通过pod的label标签进行选择,实例如下:

首先创建一个pod,打上标签为ks-app=nginx-pod,会随机调度到一个节点

[root@k8s-node1 example]# cat nginx-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    k8s-app: nginx-pod
  name: nginx-pod
spec:
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      name: nginx
      protocol: TCP

[root@k8s-node1 example]# kubectl create -f nginx-pod.yaml 
pod/nginx-pod created

# 可以看到该Pod随机调度到了k8s-node2上
[root@k8s-node1 example]# kubectl get pod nginx-pod -owide --show-labels 
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES   LABELS
nginx-pod   1/1     Running   0          39s   172.17.76.16   k8s-node2   <none>           <none>            k8s-app=nginx-pod

然后再创建一个Pod,该Pod不想和标签为k8s-app=nginx-pod的Pod调度到一个节点上

[root@k8s-node1 example]# cat mysql-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    k8s-app: mysql-pod
  name: mysql-pod
spec:
  affinity:
    # podAntiAffinity表示反亲和, 也就是不想和选中的一起
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          # 该地方选择的是k8s-app=nginx-pod, 表示不想和有该标签的pod调度在一个节点
          labelSelector:
            matchExpressions:
            - key: k8s-app
              operator: In
              values:
              - nginx-pod
          topologyKey: kubernetes.io/hostname
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      name: nginx
      protocol: TCP

[root@k8s-node1 example]# kubectl create -f mysql-pod.yaml 
pod/mysql-pod created

# 可以看到mysql-pod调度在了k8s-node1节点
[root@k8s-node1 example]# kubectl get pod mysql-pod -owide
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
mysql-pod   1/1     Running   0          7s    172.17.86.12   k8s-node1   <none>           <none>

podAffinity和podAntiAffinity也可以同时使用,表示想和A调度在一起,不想和B调度在一起

常用的使用场景就是比如创建deployment时可以指定副本,如果希望每个副本分布在不同的节点,则可以使用podAitiAffinity

Taints

对于Node affinity,无论是强制约束(hard)或偏好(preference)方式,都是调度pod到预期节点上,而Taints恰好与之相反,如果一个节点标记为 Taints ,除非 Pod也被标识为可以耐受污点节点,否则该Taints节点不会被调度pod。

Taints节点应用场景比如用户希望把某个节点有特殊用途,不希望pod调度到该节点,或者某节点需要进行维护操作,不希望有pod调度,则可以设置为taints节点。

[root@k8s-node1 example]# kubectl taint nodes k8s-node1 key=value:NoSchedule
node "k8s-node1" tainted

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值