问题描述:
在k8s集群中部署prometheus server的deploy时,执行
kubectl apply -f prometheus-server-deploy.yaml时发现pod服务无法启动成功,
问题分析:
查看prometheus-server的运行状态是处于如下:
[root@k8s-prd-master1 prometheus]# kubectl get pod -n monitor-sa
NAME READY STATUS RESTARTS AGE
node-exporter-565xb 1/1 Running 1 (35m ago) 2d23h
node-exporter-fhss8 1/1 Running 2 (35m ago) 2d23h
node-exporter-zzrdc 1/1 Running 1 (37m ago) 2d23h
prometheus-server-68d79d4565-wkpkw 0/1 Completed 3 (40s ago) 59s
然后去查看pod的启动过程:发现如下的报错:
.........................
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 5s (x3 over 22s) kubelet Container image "prom/prometheus:v2.2.1" already present on machine
Normal Created 5s (x3 over 22s) kubelet Created container prometheus
Normal Started 5s (x3 over 22s) kubelet Started container prometheus
Warning BackOff 5s (x3 over 20s) kubelet Back-off restarting failed container
但是这里看不出来什么问题,就去查看pod的实例日志内容:
level=info ts=2023-04-23T14:35:11.773380318Z caller=main.go:394 msg="Scrape discovery manager stopped"
level=info ts=2023-04-23T14:35:11.773388809Z caller=main.go:426 msg="Scrape manager stopped"
level=info ts=2023-04-23T14:35:11.773397666Z caller=main.go:573 msg="Notifier manager stopped"
level=info ts=2023-04-23T14:35:11.773413172Z caller=web.go:382 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=error ts=2023-04-23T14:35:11.774951105Z caller=main.go:582 err="Opening storage failed open DB in /prometheus: open /prometheus/026500082: permission denied"
level=info ts=2023-04-23T14:35:11.774981922Z caller=main.go:584 msg="See you next time!"
这里能看到一个报错内容:
level=error ts=2023-04-23T14:35:11.774951105Z caller=main.go:582 err="Opening storage failed open DB in /prometheus: open /prometheus/026500082: permission denied"
根据这里的提示能看出来是prometheus-server去读取DB文件的时候无法读取到/prometheus/026500082,也就是权限被拒绝或权限不够。
问题解决:
原因: 权限问题,prometheus 的镜像中是使用的 nobody 这个用户,通过 hostPath 挂载到宿主机上面的目录/data,但是/data目录的的 owner 是 root
[root@k8s-prd-work1 /]# ll
total 16
lrwxrwxrwx. 1 root root 7 Apr 18 22:16 bin -> usr/bin
dr-xr-xr-x. 5 root root 4096 Apr 23 22:00 boot
drwxr-xr-x 2 root root 6 Apr 20 23:17 data
想要处理这个问题,只要加一个参数即可:
securityContext:
runAsUser: 0
添加后的内容如下:
41 volumeMounts:
42 - mountPath: /etc/prometheus
43 name: prometheus-config
44 - mountPath: /prometheus/
45 name: prometheus-storage-volume
46 securityContext:
47 runAsUser: 0
48 volumes:
49 - name: prometheus-config
50 configMap:
51 name: prometheus-config
52 - name: prometheus-storage-volume
53 hostPath:
54 path: /data
55 type: Directory
注:添加完成之后重新执行一下:
kubectl apply -f prometheus-server-deploy.yaml
然后查看pod状态: