Pod 的高级调度策略
污点策略:
什么是污点策略:
污点是使节点与Pod产生排斥的一类规则
污点策略怎么实现的: 污点策略通过嵌合在键值对上的污点标签进行声明
污点标签:
尽力不调度: PreferNoSchedule
不会被调度:Noschdule
驱逐节点: NoExecute
概述:
管理污点标签:
污点标签必须绑定在键值对上,格式: key=value:[污点标签]
查看污点: kubectl describe nodes [节点名字]
设置污点标签: kubectl taint node [节点名字] key=value:污点标签
删除污点标签: kubectl taint node [节点名字] key=value:污点标签
案例:
```
# 查看污点策略
[root@master ~]# kubectl describe nodes|grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
Taints: <none>
Taints: <none>
Taints: <none>
Taints: <none>
Taints: <none>
# node-0001 设置污点策略 PreferNoSchedule
[root@master ~]# kubectl taint node node-0001 k=v1:PreferNoSchedule
node/node-0001 tainted
# node-0002 设置污点策略 NoSchedule
[root@master ~]# kubectl taint node node-0002 k=v2:NoSchedule
node/node-0002 tainted
[root@master ~]# kubectl describe nodes |grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
Taints: k=v1:PreferNoSchedule
Taints: k=v2:NoSchedule
Taints: <none>
Taints: <none>
Taints: <none>
```
#### Pod 资源文件
```
[root@master ~]# vim myphp.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: myphp
spec:
containers:
- name: php
image: myos:php-fpm
resources:
requests:
cpu: 1500m
```
#### 验证污点策略
```
# 优先使用没有污点的节点
[root@master ~]# sed "s,myphp,php1," myphp.yaml |kubectl apply -f -
pod/php1 created
[root@master ~]# sed "s,myphp,php2," myphp.yaml |kubectl apply -f -
pod/php2 created
[root@master ~]# sed "s,myphp,php3," myphp.yaml |kubectl apply -f -
pod/php3 created
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
php1 1/1 Running 0 13s 10.244.3.35 node-0003
php2 1/1 Running 0 5s 10.244.4.32 node-0004
php3 1/1 Running 0 5s 10.244.5.34 node-0005
# 最后使用 PreferNoSchedule 节点
[root@master ~]# sed 's,myphp,php4,' myphp.yaml |kubectl apply -f -
pod/php4 created
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
php1 1/1 Running 0 13s 10.244.3.35 node-0003
php2 1/1 Running 0 5s 10.244.4.32 node-0004
php3 1/1 Running 0 5s 10.244.5.34 node-0005
php4 1/1 Running 0 80s 10.244.1.33 node-0001
# 不会使用 NoSchedule 节点
[root@master ~]# sed 's,myphp,php5,' myphp.yaml |kubectl apply -f -
pod/php5 created
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
php1 1/1 Running 0 53s 10.244.3.35 node-0003
php2 1/1 Running 0 65s 10.244.4.32 node-0004
php3 1/1 Running 0 75s 10.244.5.34 node-0005
php4 1/1 Running 0 80s 10.244.1.33 node-0001
php5 0/1 Pending 0 5s <none> <none>
```
#### 验证污点策略
```
# NoSchedule 不会影响已经创建的 Pod
[root@master ~]# kubectl taint node node-0003 k=v3:NoSchedule
node/node-0003 tainted
[root@master ~]# kubectl describe nodes |grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
Taints: k=v1:PreferNoSchedule
Taints: k=v2:NoSchedule
Taints: k=v3:NoSchedule
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
php1 1/1 Running 0 53s 10.244.3.35 node-0003
php2 1/1 Running 0 65s 10.244.4.32 node-0004
php3 1/1 Running 0 75s 10.244.5.34 node-0005
php4 1/1 Running 0 80s 10.244.1.33 node-0001
php5 0/1 Pending 0 5s <none> <none>
# NoExecute 会删除节点上的 Pod
[root@master ~]# kubectl taint node node-0004 k=v4:NoExecute
node/node-0004 tainted
[root@master ~]# kubectl describe nodes |grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
Taints: k=v1:PreferNoSchedule
Taints: k=v2:NoSchedule
Taints: k=v3:NoSchedule
Taints: k=v4:NoExecute
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
php1 1/1 Running 0 53s 10.244.3.35 node-0003
php3 1/1 Running 0 75s 10.244.5.34 node-0005
php4 1/1 Running 0 80s 10.244.1.33 node-0001
php5 0/1 Pending 0 5s <none> <none>
```
容忍策略是什么:
某些时候我们需要在有污点的节点上运行Pod,这种无视污点标签的调度方式成为容忍
模糊匹配:
```
# 容忍 k=*:NoSchedule 污点
[root@master ~]# vim myphp.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: myphp
spec:
tolerations:
- operator: Exists # 部分匹配,存在即可
key: k # 键
effect: NoSchedule # 污点标签
containers:
- name: php
image: myos:php-fpm
resources:
requests:
cpu: 1500m
[root@master ~]# for i in php{1..5};do sed "s,myphp,${i}," myphp.yaml ;done|kubectl apply -f -
pod/php1 created
pod/php2 created
pod/php3 created
pod/php4 created
pod/php5 created
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
php1 1/1 Running 0 6s 10.244.1.12 node-0001
php2 1/1 Running 0 6s 10.244.2.21 node-0002
php3 1/1 Running 0 6s 10.244.3.18 node-0003
php3 1/1 Running 0 6s 10.244.4.24 node-0004
php5 1/1 Pending 0 6s <none> <none>
```
精确匹配:
```
# 容忍 k=v1:NoSchedule 污点
[root@master ~]# vim myphp.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: myphp
spec:
tolerations:
- operator: Equal # 完全匹配键值对
key: k # 键
value: v1 # 值
effect: NoSchedule # 污点标签
containers:
- name: php
image: myos:php-fpm
resources:
requests:
cpu: 1500m
[root@master ~]# for i in php{1..3};do sed "s,myphp,${i}," myphp.yaml ;done|kubectl apply -f -
pod/php1 created
pod/php2 created
pod/php3 created
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
php1 1/1 Running 0 6s 10.244.1.10 node-0001
php2 1/1 Running 0 6s 10.244.2.11 node-0002
php3 1/1 Pending 0 6s <none> <none>
```
优先级概述:
什么是优先级: 优先级表示一个Pod相对于其他Pod的重要性。
有什么用: 可以保证重要的Pod被调度运行
如何使用优先级和抢占: 配置优先级类PriorityClass 创建Pod时为其设置对应的优先级
PriorityClass 是一个全局资源对象,它定义了从优先级类名称到优先级整数值的映射。优先级在value字段中指定,可以设置小于10亿的整数值,值越大,优先级越高
PriorityClass有两个可选字段:
globalDefault 用于设置默认优先级状态,如果没有任何优先级设置Pod的优先级为零
description 用来配置描述性信息,告诉用户优先级的用途
优先级策略:
非抢占优先: 指的是在调度阶段优先进行调度分配,一旦容器调度完成就不可以抢占,资源不足时,只能等待
案例:
### 非抢占优先级
```
# 定义优先级(队列优先)
[root@master ~]# vim mypriority.yaml
---
kind: PriorityClass
apiVersion: scheduling.k8s.io/v1
metadata:
name: high-non
preemptionPolicy: Never
value: 1000
---
kind: PriorityClass
apiVersion: scheduling.k8s.io/v1
metadata:
name: low-non
preemptionPolicy: Never
value: 500
[root@master ~]# kubectl apply -f mypriority.yaml
priorityclass.scheduling.k8s.io/high-non created
priorityclass.scheduling.k8s.io/low-non created
[root@master ~]# kubectl get priorityclasses.scheduling.k8s.io
NAME VALUE GLOBAL-DEFAULT AGE
high-non 1000 false 12s
low-non 500 false 12s
system-cluster-critical 2000000000 false 45h
system-node-critical 2000001000 false 45h
# 无优先级的 Pod
[root@master ~]# cat php1.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: php1
spec:
nodeSelector:
kubernetes.io/hostname: node-0004
containers:
- name: php
image: myos:php-fpm
resources:
requests:
cpu: "1500m"
# 低优先级 Pod
[root@master ~]# cat php2.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: php2
spec:
nodeSelector:
kubernetes.io/hostname: node-0004
priorityClassName: low-non # 优先级名称
containers:
- name: php
image: myos:php-fpm
resources:
requests:
cpu: "1500m"
# 高优先级 Pod
[root@master ~]# cat php3.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: php3
spec:
nodeSelector:
kubernetes.io/hostname: node-0004
priorityClassName: high-non # 优先级名称
containers:
- name: php
image: myos:php-fpm
resources:
requests:
cpu: "1500m"
```
### 验证非抢占优先
```
[root@master ~]# kubectl apply -f php1.yaml
pod/php1 created
[root@master ~]# kubectl apply -f php2.yaml
pod/php2 created
[root@master ~]# kubectl apply -f php3.yaml
pod/php3 created
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
php1 1/1 Running 0 9s
php2 0/1 Pending 0 6s
php3 0/1 Pending 0 4s
[root@master ~]# kubectl delete pod php1
pod "php1" deleted
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
php2 0/1 Pending 0 20s
php3 1/1 Running 0 18s
# 清理实验 Pod
[root@master ~]# kubectl delete pod php2 php3
```
抢占优先: 强制调度一个Pod,如资源不足无法被调度,调度程序会抢占(删除)较低优先级的Pod的资源,来保证优先级Pod的运行
案例:
### 抢占策略
```
[root@master ~]# vim mypriority.yaml
---
kind: PriorityClass
apiVersion: scheduling.k8s.io/v1
metadata:
name: high-non
preemptionPolicy: Never
value: 1000
---
kind: PriorityClass
apiVersion: scheduling.k8s.io/v1
metadata:
name: low-non
preemptionPolicy: Never
value: 500
---
kind: PriorityClass
apiVersion: scheduling.k8s.io/v1
metadata:
name: high
preemptionPolicy: PreemptLowerPriority
value: 1000
---
kind: PriorityClass
apiVersion: scheduling.k8s.io/v1
metadata:
name: low
preemptionPolicy: PreemptLowerPriority
value: 500
[root@master ~]# kubectl apply -f mypriority.yaml
priorityclass.scheduling.k8s.io/high created
priorityclass.scheduling.k8s.io/low created
[root@master ~]# kubectl get priorityclasses.scheduling.k8s.io
NAME VALUE GLOBAL-DEFAULT AGE
high 1000 false 4s
high-non 1000 false 2h
low 500 false 4s
low-non 500 false 2h
system-cluster-critical 2000000000 false 21d
system-node-critical 2000001000 false 21d
```
### 验证抢占优先级
```
# 替换优先级策略
[root@master ~]# sed 's,-non,,' -i php?.yaml
# 默认优先级 Pod
[root@master ~]# kubectl apply -f php1.yaml
pod/php1 created
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
php1 1/1 Running 0 6s
# 高优先级 Pod
[root@master ~]# kubectl apply -f php3.yaml
pod/php3 created
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
php3 1/1 Running 0 9s
# 低优先级 Pod
[root@master ~]# kubectl apply -f php2.yaml
pod/php2 created
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
php2 0/1 Pending 0 3s
php3 1/1 Running 0 9s
# 清理实验 Pod
[root@master ~]# kubectl delete pod --all
pod "php2" deleted
pod "php3" deleted
[root@master ~]# kubectl delete -f mypriority.yaml
priorityclass.scheduling.k8s.io "high-non" deleted
priorityclass.scheduling.k8s.io "low-non" deleted
priorityclass.scheduling.k8s.io "high" deleted
priorityclass.scheduling.k8s.io "low" deleted
```
特权容器:
什么是特权容器: 就是通过名称空间技术隔离的,有时候我们执行一些应用服务,需要使用或修改敏感的系统信息,这时容器需要突破隔离限制,获取更高的权限,这类容器统称特权容器
运行特权容器会有一些安全风险,这种模式下运行容器对宿主机拥有root访问权限,可以突破隔离直接控制宿主机的资源配置
Pod 安全性
什么是Pod安全策略: 是指集群级别的资源,它能够控制Pod运行的行为,以及它具有访问什么的能力
如何使用Pod安全策略:kubernetes服务器版本不能低于v1.22 确保PodSecurity 特性门控被启用
Pod安全策略概述:
PodSecurity提供一种内置的Pod安全性准入控制器,作为PodSecurityPolicies特性的后续演化版本。Pod安全性限制是在Pod被创建时,在名字空间层面实施的
Pod安全性标准定义了三种不同的策略,以广泛覆盖安全应用场景。这些策略是渐进式的,安全级别从高度宽松至高度受限
kubernetes 定义了一组标签,你可以设置这些标签来定义某个名字空间上Pod安全标准级别。你所选择的标签定义了检测到潜在违例时,所要采取的动作。
enforce(强制性):策略违例会导致Pod被拒绝
audit(审计):策略违例会触发审计日志,但是Pod仍是被接受的
warn(警告):策略违例会触发用户可见的警告信息,但是Pod仍是被接受的
案例:
```
# 生产环境设置严格的准入控制
[root@master ~]# kubectl create namespace myprod
namespace/myprod created
[root@master ~]# kubectl label namespaces myprod pod-security.kubernetes.io/enforce=restricted
namespace/myprod labeled
# 测试环境测试警告提示
[root@master ~]# kubectl create namespace mytest
namespace/mytest created
[root@master ~]# kubectl label namespaces mytest pod-security.kubernetes.io/warn=baseline
namespace/mytest labeled
# 创建特权容器
[root@master ~]# kubectl -n myprod apply -f root.yaml
Error from server (Failure): error when creating "root.yaml": host namespaces (hostNetwork=true, hostPID=true), privileged (container "linux" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "linux" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "linux" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "linux" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "linux" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
[root@master ~]#
[root@master ~]# kubectl -n myprod get pods
No resources found in myprod namespace.
[root@master ~]# kubectl -n mytest apply -f root.yaml
Warning: would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), privileged (container "linux" must not set securityContext.privileged=true)
pod/root created
[root@master ~]#
[root@master ~]# kubectl -n mytest get pods
NAME READY STATUS RESTARTS AGE
root 1/1 Running 0 7s
[root@master ~]#
```