1 污点与容忍度
1.1 污点(Taints)
前面的调度方式都是站在Pod的角度上,通过在Pod上添加属性,来确定Pod是否要调度到指定的Node上,其实我们也可以站在Node的角度上,通过在Node上添加污点属性,来决定是否允许Pod调度过来。
Node被设置上污点之后就和Pod之间存在了一种相斥的关系,进而拒绝Pod调度进来,甚至可以将已经存在的Pod驱逐出去。
污点的格式为:key=value:effect, key和value是污点的标签,effect描述污点的作用,支持如下三个选项:
-
PreferNoSchedule:kubernetes将尽量避免把Pod调度到具有该污点的Node上,除非没有其他节点可调度
-
NoSchedule:kubernetes将不会把Pod调度到具有该污点的Node上,但不会影响当前Node上已存在的Pod
-
NoExecute:kubernetes将不会把Pod调度到具有该污点的Node上,同时也会将Node上已存在的Pod驱离
可以使用命令 kubectl taint 给节点增加一个污点:
kubectl taint帮助命令:
[root@master affinity]# kubectl taint --help
Update the taints on one or more nodes.
* A taint consists of a key, value, and effect. As an argument here, it is expressed as
key=value:effect.
* The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots,
and underscores, up to 253 characters.
* Optionally, the key can begin with a DNS subdomain prefix and a single '/', like
example.com/my-app.
* The value is optional. If given, it must begin with a letter or number, and may contain
letters, numbers, hyphens, dots, and underscores, up to 63 characters.
* The effect must be NoSchedule, PreferNoSchedule or NoExecute.
* Currently taint can only apply to node.
Examples:
# Update node 'foo' with a taint with key 'dedicated' and value 'special-user' and effect
'NoSchedule'
# If a taint with that key and effect already exists, its value is replaced as specified
kubectl taint nodes foo dedicated=special-user:NoSchedule
# Remove from node 'foo' the taint with key 'dedicated' and effect 'NoSchedule' if one exists
kubectl taint nodes foo dedicated:NoSchedule-
# Remove from node 'foo' all the taints with key 'dedicated'
kubectl taint nodes foo dedicated-
# Add a taint with key 'dedicated' on nodes having label mylabel=X
kubectl taint node -l myLabel=X dedicated=foo:PreferNoSchedule
# Add to node 'foo' a taint with key 'bar' and no value
kubectl taint nodes foo bar:NoSchedule
Options:
--all=false:
Select all nodes in the cluster
--allow-missing-template-keys=true:
If true, ignore any errors in templates when a field or map key is missing in the
template. Only applies to golang and jsonpath output formats.
--dry-run='none':
Must be "none", "server", or "client". If client strategy, only print the object that
would be sent, without sending it. If server strategy, submit server-side request without
persisting the resource.
--field-manager='kubectl-taint':
Name of the manager used to track field ownership.
-o, --output='':
Output format. One of: (json, yaml, name, go-template, go-template-file, template,
templatefile, jsonpath, jsonpath-as-json, jsonpath-file).
--overwrite=false:
If true, allow taints to be overwritten, otherwise reject taint updates that overwrite
existing taints.
-l, --selector='':
Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l
key1=value1,key2=value2). Matching objects must satisfy all of the specified label
constraints.
--show-managed-fields=false:
If true, keep the managedFields when printing objects in JSON or YAML format.
--template='':
Template string or path to template file to use when -o=go-template, -o=go-template-file.
The template format is golang templates
[http://golang.org/pkg/text/template/#pkg-overview].
--validate='strict':
Must be one of: strict (or true), warn, ignore (or false). "true" or "strict" will use a
schema to validate the input and fail the request if invalid. It will perform server side
validation if ServerSideFieldValidation is enabled on the api-server, but will fall back
to less reliable client-side validation if not. "warn" will warn about unknown or
duplicate fields without blocking the request if server-side field validation is enabled
on the API server, and behave as "ignore" otherwise. "false" or "ignore" will not
perform any schema validation, silently dropping any unknown or duplicate fields.
Usage:
kubectl taint NODE NAME KEY_1=VAL_1:TAINT_EFFECT_1 ... KEY_N=VAL_N:TAINT_EFFECT_N [options]
Use "kubectl options" for a list of global command-line options (applies to all commands).
例1:把node2当成是生产环境专用的,其他node是测试的
[root@master affinity]# kubectl taint node node2 node-type=production:NoSchedule
node/node2 tainted
[root@master affinity]# kubectl describe node node2|grep Taints
Taints: node-type=production:NoSchedule
例2:把node2当成是生产环境专用的,其他node是测试的,给node2打污点,pod如果不能容忍就不会调度过来
[root@master affinity]# vim pod-taint.yaml
您在 /var/spool/mail/root 中有新邮件
[root@master affinity]# cat pod-taint.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-taint
namespace: default
labels:
app: tomcat-pod
spec:
containers:
- name: pod-taint
ports:
- containerPort: 8080
image: tomcat:8.5-jre8-alpine
imagePullPolicy: IfNotPresent
[root@master affinity]# kubectl apply -f pod-taint.yaml
pod/pod-taint created
[root@master affinity]# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-taint 1/1 Running 0 7s 10.244.166.133 node1 <none> <none>
可以看到都被调度到node1上了,因为node2这个节点打了污点,而我们在创建pod的时候没有容忍度,所以node2上不会有pod调度上去的
假设给node1也打上污点,重建刚才的pod就会显示pending,无法调度
[root@master affinity]# kubectl taint node node1 node-type=dev:NoSchedule
node/node1 tainted
您在 /var/spool/mail/root 中有新邮件
[root@master affinity]# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-taint 1/1 Running 0 2m42s 10.244.166.133 node1 <none> <none>
[root@master affinity]# kubectl delete pod pod-taint
pod "pod-taint" deleted
[root@master affinity]# kubectl apply -f pod-taint.yaml
pod/pod-taint created
[root@master affinity]# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-taint 0/1 Pending 0 2s <none> <none> <none> <none>
[root@master affinity]# kubectl describe pod pod-taint
Name: pod-taint
Namespace: default
Priority: 0
Service Account: default
Node: <none>
Labels: app=tomcat-pod
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
pod-taint:
Image: tomcat:8.5-jre8-alpine
Port: 8080/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cvq47 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-cvq47:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 109s default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had untolerated taint {node-type: dev}, 1 node(s) had untolerated taint {node-type: production}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
1.2 容忍度(Toleration)
容忍度(Toleration) 是应用于 Pod 上的。容忍度允许调度器调度带有对应污点的 Pod。 容忍度允许调度但并不保证调度:作为其功能的一部分, 调度器也会评估其他参数。
[root@master ~]# kubectl explain pod.spec.tolerations
KIND: Pod
VERSION: v1
RESOURCE: tolerations <[]Object>
DESCRIPTION:
If specified, the pod's tolerations.
The pod this Toleration is attached to tolerates any taint that matches the
triple <key,value,effect> using the matching operator <operator>.
FIELDS:
effect <string>
Effect indicates the taint effect to match. Empty means match all taint
effects. When specified, allowed values are NoSchedule, PreferNoSchedule
and NoExecute.
Possible enum values:
- `"NoExecute"` Evict any already-running pods that do not tolerate the
taint. Currently enforced by NodeController.
- `"NoSchedule"` Do not allow new pods to schedule onto the node unless
they tolerate the taint, but allow all pods submitted to Kubelet without
going through the scheduler to start, and allow all already-running pods to
continue running. Enforced by the scheduler.
- `"PreferNoSchedule"` Like TaintEffectNoSchedule, but the scheduler tries
not to schedule new pods onto the node, rather than prohibiting new pods
from scheduling onto the node entirely. Enforced by the scheduler.
key <string>
Key is the taint key that the toleration applies to. Empty means match all
taint keys. If the key is empty, operator must be Exists; this combination
means to match all values and all keys.
operator <string>
Operator represents a key's relationship to the value. Valid operators are
Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for
value, so that a pod can tolerate all taints of a particular category.
Possible enum values:
- `"Equal"`
- `"Exists"`
tolerationSeconds <integer>
TolerationSeconds represents the period of time the toleration (which must
be of effect NoExecute, otherwise this field is ignored) tolerates the
taint. By default, it is not set, which means tolerate the taint forever
(do not evict). Zero and negative values will be treated as 0 (evict
immediately) by the system.
value <string>
Value is the taint value the toleration matches to. If the operator is
Exists, the value should be empty, otherwise just a regular string.
例:
[root@master affinity]# vim pod-demo-1.yaml
您在 /var/spool/mail/root 中有新邮件
[root@master affinity]# cat pod-demo-1.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp-deploy
namespace: default
labels:
app: myapp
release: canary
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
tolerations:
- effect: "NoSchedule"
operator: "Exists
[root@master affinity]# kubectl apply -f pod-demo-1.yaml
pod/myapp-deploy created
[root@master affinity]# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myapp-deploy 1/1 Running 0 8s 10.244.166.134 node1 <none> <none>
pod-taint 0/1 Pending 0 20m <none> <none> <none> <none>
2 Pod常见的状态和重启策略
2.1 Pod常见的状态
第一阶段:
**挂起(Pending)****:
1、正在创建Pod但是Pod中的容器还没有全部被创建完成,处于此状态的Pod应该检查Pod依赖的存储是否有权限挂载、镜像是否可以下载、调度是否正常等
2、我们在请求创建pod时,条件不满足,调度没有完成,没有任何一个节点能满足调度条件,已经创建了pod但是没有适合它运行的节点叫做挂起,调度没有完成。
失败(Failed):Pod 中的所有容器都已终止了,并且至少有一个容器是因为失败终止。也就是说,容器以非0状态退出或者被系统终止。
未知(Unknown):未知状态,所谓pod是什么状态是apiserver和运行在pod节点的kubelet进行通信获取状态信息的,如果节点之上的kubelet本身出故障,那么apiserver就连不上kubelet,得不到信息了,就会看Unknown,通常是由于与pod所在的node节点通信错误。
Error 状态:Pod 启动过程中发生了错误
成功(Succeeded):Pod中的所有容器都被成功终止,即pod里所有的containers均已terminated。
第二阶段:
Unschedulable:Pod不能被调度, scheduler没有匹配到合适的node节点
PodScheduled:pod正处于调度中,在scheduler刚开始调度的时候,还没有将pod分配到指定的node,在筛选出合适的节点后就会更新etcd数据,将pod分配到指定的node
Initialized:所有pod中的初始化容器已经完成了
ImagePullBackOff:Pod所在的node节点下载镜像失败
Running:Pod内部的容器已经被创建并且启动。
扩展:还有其他状态,如下:
Evicted状态:出现这种情况,多见于系统内存或硬盘资源不足,可df-h查看docker存储所在目录的资源使用情况,如果百分比大于85%,就要及时清理下资源,尤其是一些大文件、docker镜像。
CrashLoopBackOff:容器曾经启动了,但可能又异常退出了
2.2 Pod重启策略
Pod的重启策略(RestartPolicy)应用于Pod内的所有容器,当某个容器异常退出或者健康检查失败时,kubelet将根据 重启策略来进行相应的操作。
Pod 的 spec 中包含一个 restartPolicy 字段,其可能取值包括 Always、OnFailure 和 Never。默认值是 Always。
-
Always:只要容器异常退出,kubelet就会自动重启该容器。(这个是默认的重启策略)
-
OnFailure:当容器终止运行且退出码不为0时,由kubelet自动重启该容器。
-
Never:不论容器运行状态如何,kubelet都不会重启该容器。
测试Always重启策略:
#创建一个配置重启策略的pod
[root@master1 pod]# vim pod.yaml
[root@master1 pod]# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
namespace: default
labels:
app: myapp
spec:
restartPolicy: Always
containers:
- name: tomcat-pod-java
ports:
- containerPort: 8080
image: xianchao/tomcat-8.5-jre8:v1
imagePullPolicy: IfNotPresent
[root@master1 pod]# kubectl apply -f pod.yaml
pod/pod-demo created
#另开一个终端观察pod状态
[root@master1 ~]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 0 10s
#正常停止容器中的tomcat服务
[root@master1 pod]# kubectl exec -it pod-demo -- /bin/bash
bash-4.4# /usr/local/tomcat/bin/shutdown.sh
Using CATALINA_BASE: /usr/local/tomcat
Using CATALINA_HOME: /usr/local/tomcat
Using CATALINA_TMPDIR: /usr/local/tomcat/temp
Using JRE_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
Using CLASSPATH: /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar
bash-4.4# command terminated with exit code 137
#观察pod状态
[root@master1 ~]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 0 10s
pod-demo 0/1 Completed 0 3m4s
pod-demo 1/1 Running 1 (2s ago) 3m5s
发现正常停止容器里的tomcat服务,容器重启了一次,pod又恢复正常了
#非正常停止容器中tomcat服务
[root@master1 pod]# kubectl exec -it pod-demo -- /bin/bash
bash-4.4# ps -ef|grep tomcat
1 root 0:02 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start
60 root 0:00 grep tomcat
bash-4.4# kill 1
bash-4.4# command terminated with exit code 137
#观察pod状态
[root@master1 ~]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 0 10s
pod-demo 0/1 Completed 0 3m4s
pod-demo 1/1 Running 1 (2s ago) 3m5s
pod-demo 0/1 Error 1 (2m36s ago) 5m39s
pod-demo 0/1 CrashLoopBackOff 1 (14s ago) 5m52s
pod-demo 1/1 Running 2 (15s ago) 5m53s
上面可以看到容器终止了,并且又重启一次,重启次数增加了一次
测试never重启策略
#创建一个配置了Never重启策略的Pod
[root@master1 pod]# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
namespace: default
labels:
app: myapp
spec:
restartPolicy: Never
containers:
- name: tomcat-pod-java
ports:
- containerPort: 8080
image: xianchao/tomcat-8.5-jre8:v1
imagePullPolicy: IfNotPresent
[root@master1 pod]# kubectl apply -f pod.yaml
pod/pod-demo created
#正常停止容器中的tomcat服务
[root@master1 pod]# kubectl exec -it pod-demo -- /bin/bash
bash-4.4# /usr/local/tomcat/bin/shutdown.sh
Using CATALINA_BASE: /usr/local/tomcat
Using CATALINA_HOME: /usr/local/tomcat
Using CATALINA_TMPDIR: /usr/local/tomcat/temp
Using JRE_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
Using CLASSPATH: /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar
bash-4.4# command terminated with exit code 137
#观察pod状态
[root@master1 ~]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 0 6s
pod-demo 0/1 Completed 0 76s
pod-demo 0/1 Completed 0 78s
[root@master1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-demo 0/1 Completed 0 3m42s
发现正常停止容器里的tomcat服务,pod正常运行,容器没有重启
#非正常停止容器中tomcat服务
[root@master1 pod]# kubectl exec -it pod-demo -- /bin/bash
bash-4.4# ps -ef|grep tomcat
1 root 0:02 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start
60 root 0:00 grep tomcat
bash-4.4# kill 1
bash-4.4# command terminated with exit code 137
#观察pod状态
[root@master1 ~]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
pod-demo 0/1 Error 0 40s
上面可以看到容器状态是error,并且没有重启,这说明重启策略是never,那么pod里容器服务无论如何终止,都不会重启
测试OnFailure重启策略
#创建一个配置了OnFailure重启策略的pod
[root@master1 pod]# vim pod.yaml
[root@master1 pod]# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
namespace: default
labels:
app: myapp
spec:
restartPolicy: OnFailure
containers:
- name: tomcat-pod-java
ports:
- containerPort: 8080
image: xianchao/tomcat-8.5-jre8:v1
imagePullPolicy: IfNotPresent
[root@master1 pod]# kubectl apply -f pod.yaml
pod/pod-demo created
#正常停止容器里的tomcat服务
[root@master1 pod]# kubectl exec -it pod-demo -- /bin/bash
bash-4.4# /usr/local/tomcat/bin/shutdown.sh
Using CATALINA_BASE: /usr/local/tomcat
Using CATALINA_HOME: /usr/local/tomcat
Using CATALINA_TMPDIR: /usr/local/tomcat/temp
Using JRE_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
Using CLASSPATH: /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar
bash-4.4# command terminated with exit code 137
#观察pod状态
[root@master1 ~]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 0 11s
pod-demo 0/1 Completed 0 90s
pod-demo 0/1 Completed 0 91s
pod-demo 0/1 Completed 0 92s
发现正常停止容器里的tomcat服务,退出码是0,pod里的容器不会重启
#非正常停止容器中tomcat服务
[root@master1 pod]# kubectl exec -it pod-demo -- /bin/bash
bash-4.4# ps -ef|grep tomcat
1 root 0:03 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start
60 root 0:00 grep tomcat
bash-4.4# kill 1
bash-4.4# command terminated with exit code 137
#观察pod状态
[root@master1 ~]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 0 2s
pod-demo 0/1 Error 0 14s
pod-demo 0/1 Error 0 14s
pod-demo 1/1 Running 1 (2s ago) 15s
上面可以看到非正常停止pod里的容器,容器退出码不是0,那就会重启容器