一、前言
1.何为Pod健康检查
在k8s集群中,每一个节点当中都存在kubelet命令,容器的健康的健康检查使用kubelet定期来进行执行的,kubelet通过调用Pod当中的探针来进行执行
2.探针的类型
liveness probe(存活探针)
用于判断容器是否处于运行状态。主要通过检测容器内部应用的存活情况来判断容器是否存活,判断成功则运行、失败则进行重启操作
readiness probe (就绪探针)
用于判断容器是否就绪可以接收流量。主要通过检测容器内部应用是否可以对外提供服务来判断存活,探测失败,那么service的endpoint将不会进行转发
3.探测的类型
ExecAction
在容器中执行特定的命令,返回0表示成功
TCPSocketAction
通过容器内部开放的TCP端口进行检测,可以连接到该端口表示成功
HTTPGetAction
通过对容器内的服务进行HTTP请求,返回码在200到400之间表示成功
二、ExecAction的应用
编写如下yaml文件
[root@master liveness]# vi liveness_exec.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec
labels:
app: liveness
spec:
containers:
- name: httpd
image: httpd:latest
ports:
- containerPort: 80
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600;
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5 #容器启动后延迟5秒开始检测
periodSeconds: 5 #每隔5描述开始检测
容器启动时会执行如下的命令
touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600;
这个时候会有一个healthy文件,健康检测就会判断这个文件是否存在,前30秒文件存在,那么容器时健康的
30秒以后文件删除,检测到容器不健康,会进入重启状态
以下为重启的过程
[root@master liveness]# kubectl describe pods liveness-exec
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m1s default-scheduler Successfully assigned default/liveness-exec to master
Normal Killing 89s (x2 over 2m59s) kubelet, master Container httpd failed liveness probe, will be restarted
Normal Pulling 59s (x3 over 3m58s) kubelet, master Pulling image "httpd:latest"
Normal Pulled 43s (x3 over 3m42s) kubelet, master Successfully pulled image "httpd:latest"
Normal Created 43s (x3 over 3m42s) kubelet, master Created container httpd
Normal Started 43s (x3 over 3m42s) kubelet, master Started container httpd
Warning Unhealthy 4s (x8 over 3m9s) kubelet, master Liveness probe failed: cat: /tmp/healthy: No such file or directory
查看Pod也会显示重启次数
[root@master liveness]# kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 2 4m47s
三、TCPSocketAction的应用
编写如下yaml文件
[root@master liveness]# vi liveness_tcp.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-tcp
labels:
app: liveness-tcp
spec:
containers:
- name: httpd-tcp
image: httpd:latest
ports:
- containerPort: 80
readinessProbe: #就绪状态检测 连接80端口是否成功来判断是否可以对外提供流量
tcpSocket:
port: 80
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe: #存活状态检测,请求80端口,有恢复则成功
tcpSocket:
port: 80
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: svc-1
labels:
app: svc
spec:
type: NodePort
selector:
app: liveness-tcp
ports:
- port: 80
以下是正常情况(就绪状态和存活状态都检测成功)
[root@master liveness]# kubectl get pod,svc,ep
NAME READY STATUS RESTARTS AGE
pod/liveness-tcp 1/1 Running 0 90s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 11d
service/svc-1 NodePort 10.100.53.142 <none> 80:30930/TCP 23s
NAME ENDPOINTS AGE
endpoints/kubernetes 192.168.100.10:6443 11d
endpoints/svc-1 10.244.0.18:80 23s
还有一种情况
就绪状态(readinessProbe)检测成功、存活状态(livenessProbe)检测失败会出现如下现象
POD的状态为Running endpoint会直接和Pod端口联系,因为就绪状态检测不成功
这里不会重启,因为存活状态时检测成功的
[root@master liveness]# kubectl get pods,svc,ep
NAME READY STATUS RESTARTS AGE
pod/liveness-tcp 0/1 Running 0 105s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 11d
service/svc-1 NodePort 10.111.6.129 <none> 80:30759/TCP 105s
NAME ENDPOINTS AGE
endpoints/kubernetes 192.168.100.10:6443 11d
endpoints/svc-1 105s
最后一种情况两者都检测不成功
[root@master liveness]# kubectl get pod,svc,ep
NAME READY STATUS RESTARTS AGE
pod/liveness-tcp 0/1 CrashLoopBackOff 4 2m49s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 11d
service/svc-1 NodePort 10.108.204.64 <none> 80:32499/TCP 2m49s
NAME ENDPOINTS AGE
endpoints/kubernetes 192.168.100.10:6443 11d
endpoints/svc-1 2m49s
日志会显示如下(提示检测不成功要进行重启操作)
Warning Unhealthy 2m9s (x9 over 2m54s) kubelet, master Readiness probe failed: dial tcp 10.244.0.21:81: connect: connection refused
Warning Unhealthy 2m9s (x6 over 2m54s) kubelet, master Liveness probe failed: dial tcp 10.244.0.21:81: connect: connection refused
四、HTTPGetAction
编写如下yaml文件
[root@master liveness]# vi liveness_http.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-http
labels:
app: liveness-http
spec:
containers:
- name: liveness-http
image: httpd:latest
ports:
- containerPort: 80
livenessProbe:
httpGet: #设置http请求的相关信息
path: /index.html #请求的是该文件
port: 80 #请求的端口号
httpHeaders: #定义请求头
- name: X-Custom-Header
value: test
initialDelaySeconds: 5
periodSeconds: 5
如果请求该容器内的http的index.html文件成功那么返回200-400之间的数值
日志信息如下
...........
10.244.0.1 - - [23/Nov/2021:06:58:10 +0000] "GET /index.html HTTP/1.1" 200 45
..........
请求失败的效果
会一直出现请求失败 不断的进行重启
[root@master liveness]# kubectl get pods #重启次数有明显的变化
NAME READY STATUS RESTARTS AGE
liveness-http 1/1 Running 3 2m29s
Warning Unhealthy 19s (x9 over 99s) kubelet, master Liveness probe failed: Get http://10.244.0.23:88/index.html: dial tcp 10.244.0.23:88: connect: connection refused
Normal Killing 19s (x3 over 89s) kubelet, master Container liveness-http failed liveness probe, will be restarted