probe :探测pod是否正常健康运行 --》监控
根据pod启动的不同阶段进行探测
探测方法
exec
到容器里运行一条命令,执行成功,说明pod是存活的。
grpc
使用 gRPC 执行一个远程过程调用。 目标应该实现 gRPC 健康检查。
httpGet
对容器的 IP 地址上指定端口和路径执行 HTTP GET 请求。
tcpSocket
访问pod里主程序开放的端口 --》nc,nmap,telnet等
下载telnet
[root@k8smaster ~]# yum install telnet -y
[root@k8smaster ~]# telnet www.baidu.com 80
Trying 14.119.104.189...
Connected to www.baidu.com.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
[root@k8smaster ~]# echo $?
0
下载nc
[root@k8smaster ~]# yum install nc -y
[root@k8smaster ~]# nc -z www.baidu.com 80
[root@k8smaster ~]# echo $?
0
[root@k8smaster ~]# nc -z www.baidu.com 8080
[root@k8smaster ~]# echo $?
1
探测结果
Success
(成功)
Failure
(失败)
Unknown
(未知)
探针类型
livenessProbe
指示容器是否正在运行。如果存活态探测失败,则 kubelet 会杀死容器, 并且容器将根据其重启策略决定未来。
readinessProbe
指示容器是否准备好为请求提供服务。如果就绪态探测失败, 端点控制器将从与 Pod 匹配的所有服务的端点列表中删除该 Pod 的 IP 地址。
startupProbe
指示容器中的应用是否已经启动。如果提供了启动探针,则所有其他探针都会被禁用,直到此探针成功为止。如果启动探测失败,kubelet 将杀死容器, 而容器依其重启策略进行重启。
配置存活探针
[root@k8smaster ~]# mkdir probe
[root@k8smaster ~]# cd probe
定义存活命令
[root@k8smaster probe]# wget https://k8s.io/examples/pods/probe/exec-liveness.yaml
--2023-03-28 11:13:17-- https://k8s.io/examples/pods/probe/exec-liveness.yaml
正在解析主机 k8s.io (k8s.io)... 34.107.204.206, 2600:1901:0:26f3::
正在连接 k8s.io (k8s.io)|34.107.204.206|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 301 Moved Permanently
位置:https://kubernetes.io/examples/pods/probe/exec-liveness.yaml [跟随至新的 URL]
--2023-03-28 11:13:18-- https://kubernetes.io/examples/pods/probe/exec-liveness.yaml
正在解析主机 kubernetes.io (kubernetes.io)... 147.75.40.148
正在连接 kubernetes.io (kubernetes.io)|147.75.40.148|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:396 [application/x-yaml]
正在保存至: “exec-liveness.yaml”
100%[===========================================================>] 396 --.-K/s 用时 0s
2023-03-28 11:13:19 (84.0 MB/s) - 已保存 “exec-liveness.yaml” [396/396])
[root@k8smaster probe]# ls
exec-liveness.yaml
[root@k8smaster probe]# cat exec-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: registry.k8s.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
[root@k8smaster probe]# kubectl apply -f exec-liveness.yaml
pod/liveness-exec created
[root@k8smaster probe]# kubectl get pod
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 ContainerCreating 0 28s
[root@k8smaster probe]# kubectl get pod
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 ErrImagePull 0 33s
这里发现pod的状态是ErrImagePull,说明拉取不到镜像。
解决办法:修改image
[root@k8smaster probe]# vim exec-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: busybox # 修改为 image: busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
[root@k8smaster probe]# kubectl delete -f exec-liveness.yaml
pod "liveness-exec" deleted
[root@k8smaster probe]# kubectl apply -f exec-liveness.yaml
pod/liveness-exec created
[root@k8smaster probe]# kubectl get pod
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 ContainerCreating 0 8s
[root@k8smaster probe]# kubectl get pod
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 0 37s
[root@k8smaster probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 0 65s 10.244.249.26 k8snode1 <none> <none>
查看 Pod 的事件
[root@k8smaster probe]# kubectl describe pod liveness-exec
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 119s default-scheduler Successfully assigned default/liveness-exec to k8snode1
Normal Pulled 101s kubelet Successfully pulled image "busybox" in 17.001214974s
Warning Unhealthy 57s (x3 over 67s) kubelet Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal Killing 57s kubelet Container liveness failed liveness probe, will be restarted
Normal Pulling 27s (x2 over 118s) kubelet Pulling image "busybox"
Normal Created 12s (x2 over 101s) kubelet Created container liveness
Normal Pulled 12s kubelet Successfully pulled image "busybox" in 15.5761624s
Normal Started 11s (x2 over 101s) kubelet Started container liveness
[root@k8smaster probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 2 3m40s 10.244.249.26 k8snode1 <none> <none>
一旦失败的容器恢复为运行状态,RESTARTS
计数器就会增加 1
定义一个存活态 HTTP 请求接口
[root@k8smaster probe]# wget https://k8s.io/examples/pods/probe/http-liveness.yaml
--2023-03-28 11:31:35-- https://k8s.io/examples/pods/probe/http-liveness.yaml
正在解析主机 k8s.io (k8s.io)... 34.107.204.206, 2600:1901:0:26f3::
正在连接 k8s.io (k8s.io)|34.107.204.206|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 301 Moved Permanently
位置:https://kubernetes.io/examples/pods/probe/http-liveness.yaml [跟随至新的 URL]
--2023-03-28 11:31:35-- https://kubernetes.io/examples/pods/probe/http-liveness.yaml
正在解析主机 kubernetes.io (kubernetes.io)... 147.75.40.148
正在连接 kubernetes.io (kubernetes.io)|147.75.40.148|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:389 [application/x-yaml]
正在保存至: “http-liveness.yaml”
100%[======================================================================>] 389 --.-K/s 用时 0s
2023-03-28 11:31:36 (73.9 MB/s) - 已保存 “http-liveness.yaml” [389/389])
[root@k8smaster probe]# ls
exec-liveness.yaml http-liveness.yaml
[root@k8smaster probe]# cat http-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: registry.k8s.io/liveness
imagePullPolicy: IfNotPresent
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
[root@k8smaster probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-http 0/1 ImagePullBackOff 0 39s 10.244.249.29 k8snode1 <none> <none>
这里pod的状态是ImagePullBackOff;说明Pod卡顿在镜像拉取循环中。
解决办法:如果有国外的云服务器,可以下载后传回本机,再用xftp上传到Linux里。
[root@k8snode1 ~]# ls
liveness.tar
您在 /var/spool/mail/root 中有新邮件
[root@k8snode1 ~]# docker load -i liveness.tar
5f70bf18a086: Loading layer [==================================================>] 1.024kB/1.024kB
6def2c1ad6bb: Loading layer [==================================================>] 4.389MB/4.389MB
Loaded image: registry.k8s.io/liveness:latest
[root@k8smaster probe]# kubectl explain pod.spec.containers.imagePullPolicy
KIND: Pod
VERSION: v1
FIELD: imagePullPolicy <string>
DESCRIPTION:
Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always
if :latest tag is specified, or IfNotPresent otherwise. Cannot be updated.
More info:
https://kubernetes.io/docs/concepts/containers/images#updating-images
[root@k8smaster probe]# cat http-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: registry.k8s.io/liveness
imagePullPolicy: IfNotPresent # 本地有就不会去网上下载
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
[root@k8smaster probe]# kubectl delete -f http-liveness.yaml
pod "liveness-http" deleted
[root@k8smaster probe]# kubectl apply -f http-liveness.yaml
pod/liveness-http created
[root@k8smaster probe]# kubectl get pod
NAME READY STATUS RESTARTS AGE
liveness-http 1/1 Running 0 12s
[root@k8smaster probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-http 1/1 Running 0 3s 10.244.249.31 k8snode1 <none> <none>
[root@k8smaster probe]# curl 10.244.249.31:8080/healthz
ok[root@k8smaster probe]# curl 10.244.249.31:8080/healthz
定义TCP的存活探测
[root@k8smaster probe]# wget https://k8s.io/examples/pods/probe/tcp-liveness-readiness.yaml
--2023-03-28 11:52:00-- https://k8s.io/examples/pods/probe/tcp-liveness-readiness.yaml
正在解析主机 k8s.io (k8s.io)... 34.107.204.206, 2600:1901:0:26f3::
正在连接 k8s.io (k8s.io)|34.107.204.206|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 301 Moved Permanently
位置:https://kubernetes.io/examples/pods/probe/tcp-liveness-readiness.yaml [跟随至新的 URL]
--2023-03-28 11:52:01-- https://kubernetes.io/examples/pods/probe/tcp-liveness-readiness.yaml
正在解析主机 kubernetes.io (kubernetes.io)... 147.75.40.148
正在连接 kubernetes.io (kubernetes.io)|147.75.40.148|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:410 [application/x-yaml]
正在保存至: “tcp-liveness-readiness.yaml”
100%[======================================================================>] 410 --.-K/s 用时 0s
2023-03-28 11:52:02 (42.6 MB/s) - 已保存 “tcp-liveness-readiness.yaml” [410/410])
[root@k8smaster probe]# ls
exec-liveness.yaml http-liveness.yaml tcp-liveness-readiness.yaml
[root@k8smaster probe]# kubectl apply -f tcp-liveness-readiness.yaml
pod/goproxy created
[root@k8smaster probe]# kubectl get pod
NAME READY STATUS RESTARTS AGE
goproxy 0/1 Running 0 4s
# 15秒之后,通过看Pod事件来检测存活探针
[root@k8smaster probe]# kubectl describe pod goproxy
Name: goproxy
Namespace: default
Priority: 0
Node: k8snode2/192.168.102.138
Start Time: Wed, 29 Mar 2023 12:55:00 +0800
Labels: app=goproxy
Annotations: cni.projectcalico.org/podIP: 10.244.185.216/32
cni.projectcalico.org/podIPs: 10.244.185.216/32
Status: Running
IP: 10.244.185.216
IPs:
IP: 10.244.185.216
Containers:
goproxy:
Container ID: docker://622244978a3949b06a5682b2beda743d1e17e24a676aba0e77987d278a7cb8c5
Image: registry.k8s.io/goproxy:0.1
Image ID: docker://sha256:ca30c529755f98c53f660f86fe42f1b38f19ca6127c57aa6025afaf9a016742a
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 29 Mar 2023 12:55:01 +0800
Ready: True
Restart Count: 0
Liveness: tcp-socket :8080 delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: tcp-socket :8080 delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-r7hjj (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-r7hjj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-r7hjj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30s default-scheduler Successfully assigned default/goproxy to k8snode2
Normal Pulled 30s kubelet Container image "registry.k8s.io/goproxy:0.1" already present on machine
Normal Created 30s kubelet Created container goproxy
Normal Started 30s kubelet Started container goproxy
注意:也会碰到Pod卡顿在镜像拉取循环中;需要自己去解决镜像下载问题