案例来源
1、故障现象
- 现在我们就可以添加 promethues 的资源对象了:
[root@master1 p8s-example]#kubectl apply -f prometheus-deploy.yaml
deployment.apps/prometheus created
[root@master1 p8s-example]#kubectl get pods -n monitor
NAME READY STATUS RESTARTS AGE
prometheus-58f59fd485-7hncv 0/1 CrashLoopBackOff 2 (24s ago) 112s
[root@master1 p8s-example]#kubectl logs prometheus-58f59fd485-7hncv -nmonitor
ts=2022-04-29T04:42:09.982Z caller=main.go:516 level=info msg="Starting Prometheus" version="(version=2.34.0, branch=HEAD, revision=881111fec4332c33094a6fb2680c71fffc427275)"
ts=2022-04-29T04:42:09.982Z caller=main.go:521 level=info build_context="(go=go1.17.8, user=root@121ad7ea5487, date=20220315-15:18:00)"
ts=2022-04-29T04:42:09.982Z caller=main.go:522 level=info host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 prometheus-58f59fd485-7hncv (none))"
ts=2022-04-29T04:42:09.982Z caller=main.go:523 level=info fd_limits="(soft=65536, hard=65536)"
ts=2022-04-29T04:42:09.982Z caller=main.go:524 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2022-04-29T04:42:09.983Z caller=query_logger.go:90 level=error component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log
goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker({0x7ffe09113e00, 0xb}, 0x14, {0x3637a40, 0xc0002032c0})
/app/promql/query_logger.go:120 +0x3d7
main.main()
/app/cmd/prometheus/main.go:569 +0x6049
创建 Pod 后,我们可以看到并没有成功运行,出现了 open /prometheus/queries.active: permission denied
这样的错误信息,这是因为我们的 prometheus 的镜像中是使用的 nobody 这个用户,然后现在我们通过 LocalPV 挂载到宿主机上面的目录的 ownership
却是 root
:
[root@node1 ~]#ls -la /data/k8s
total 0
drwxr-xr-x 3 root root 24 Apr 29 09:57 .
drwxr-xr-x 3 root root 17 Apr 29 09:57 ..
drwxr-xr-x 2 root root 6 Apr 29 09:57 prometheus
2、解决办法
所以当然会出现操作权限问题了,这个时候我们就可以通过 securityContext
来为 Pod 设置下 volumes 的权限,通过设置 runAsUser=0
指定运行的用户为 root;也可以通过设置一个 initContainer 来修改数据目录权限(本次使用后面这种方法):
[root@master1 p8s-example]#vim prometheus-deploy.yaml
......
initContainers:
- name: fix-permissions
image: busybox
command: [chown, -R, "nobody:nobody", /prometheus]
volumeMounts:
- name: data
mountPath: /prometheus
- 这个时候我们重新更新下 prometheus:
[root@master1 p8s-example]#kubectl apply -f prometheus-deploy.yaml
deployment.apps/prometheus configured
[root@master1 p8s-example]#kubectl get pods -n monitor
NAME READY STATUS RESTARTS AGE
prometheus-849c8456c7-dz6rt 1/1 Running 0 54s
[root@master1 p8s-example]#kubectl logs prometheus-849c8456c7-dz6rt -nmonitor
……
ts=2022-04-29T05:00:09.184Z caller=main.go:1142 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2022-04-29T05:00:09.185Z caller=main.go:1179 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=1.128855ms db_storage=1.302µs remote_storage=5.18µs web_handler=721ns query_engine=2.705µs scrape=773.91µs scrape_sd=66.725µs notify=1.443µs notify_sd=2.966µs rules=4.859µs tracing=20.118µs
ts=2022-04-29T05:00:09.185Z caller=main.go:910 level=info msg="Server is ready to receive web requests."
is ready to receive web requests."
[外链图片转存中...(img-OF6k5SR6-1651227063367)]